[GitHub] [spark] huaxingao commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
huaxingao commented on a change in pull request #30154: URL: https://github.com/apache/spark/pull/30154#discussion_r517154290 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala ## @@ -50,4 +50,17 @@ private object DerbyDialect extends JdbcDialect { override def renameTable(oldTable: String, newTable: String): String = { s"RENAME TABLE $oldTable TO $newTable" } + + // Derby currently doesn't support comment on table. Here is the ticket to add the support + // https://issues.apache.org/jira/browse/DERBY-7008 + override def createTable( + table: String, + strSchema: String, + createTableOptions: String, + tableComment: String): Array[String] = { +if (!tableComment.isEmpty) { + logWarning("Cannot create JDBC table comment. The table comment will be ignored.") Review comment: We don't append comment to the end of the CREATE TABLE statement. It's a separate sql statement. If creating comment fails, it doesn't affect the table creation, so it can be treated as a special property. We append all the other properties to the end of the CREATE TABLE statement as createTableOptions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
otterc commented on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-721567210 I ran an end-to-end test. One of the bugs was: ``` Caused by: java.lang.IndexOutOfBoundsException at java.nio.ByteBuffer.wrap(ByteBuffer.java:375) at org.sparkproject.io.netty.buffer.UnpooledHeapByteBuf.nioBuffer(UnpooledHeapByteBuf.java:306) at org.apache.spark.network.protocol.Encoders$Bitmaps.encode(Encoders.java:65) at org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShufflePartitionInfo.writeChunkTracker(RemoteBlockPushResolver.java:871) at org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShufflePartitionInfo.updateChunkInfo(RemoteBlockPushResolver.java:845) at org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.onComplete(RemoteBlockPushResolver.java:653) at org.apache.spark.network.server.TransportRequestHandler$3.onComplete(TransportRequestHandler.java:230) ``` Fix this by adding `buf.ensureWritable(encodedLength)` in `Encoders.Bitmaps.encode`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode
SparkQA commented on pull request #30062: URL: https://github.com/apache/spark/pull/30062#issuecomment-721566228 **[Test build #130595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130595/testReport)** for PR 30062 at commit [`7cf38c4`](https://github.com/apache/spark/commit/7cf38c4ad1fd3f62c46b0ff3f9b48490b281085c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls
HeartSaVioR commented on a change in pull request #30203: URL: https://github.com/apache/spark/pull/30203#discussion_r517144041 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ## @@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with PredicateHelper { } } + private def canonicalizeDeterministic(u: PythonUDF) = { Review comment: I agree that is orthogonal and isn't a blocker for this PR. Would we like to move the discussion further on dev@ mailing list, or it sounds like we are happy with filing a new JIRA issue and raising a PR to change the default? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721562547 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721563148 **[Test build #130594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130594/testReport)** for PR 30243 at commit [`613f1c3`](https://github.com/apache/spark/commit/613f1c35215f768979ed6efb24ee91376e4a3074). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721562547 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721561669 **[Test build #130586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130586/testReport)** for PR 30242 at commit [`6e5be90`](https://github.com/apache/spark/commit/6e5be90afb010ee03359adc0495f0bb0ada9a4f4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721486742 **[Test build #130586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130586/testReport)** for PR 30242 at commit [`6e5be90`](https://github.com/apache/spark/commit/6e5be90afb010ee03359adc0495f0bb0ada9a4f4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
AmplabJenkins removed a comment on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721561406 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
SparkQA commented on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721561387 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35193/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
AmplabJenkins commented on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721561406 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
SparkQA commented on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721557561 **[Test build #130593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130593/testReport)** for PR 30245 at commit [`db0cfcc`](https://github.com/apache/spark/commit/db0cfcc26a9ca2312afd34eda1e616670001218b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721555499 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130590/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721500664 **[Test build #130590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130590/testReport)** for PR 30243 at commit [`789d2ec`](https://github.com/apache/spark/commit/789d2ec66fa7ea3218ff8908cd13f74276389f04). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721555493 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
cloud-fan commented on a change in pull request #30154: URL: https://github.com/apache/spark/pull/30154#discussion_r517135283 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/DerbyDialect.scala ## @@ -50,4 +50,17 @@ private object DerbyDialect extends JdbcDialect { override def renameTable(oldTable: String, newTable: String): String = { s"RENAME TABLE $oldTable TO $newTable" } + + // Derby currently doesn't support comment on table. Here is the ticket to add the support + // https://issues.apache.org/jira/browse/DERBY-7008 + override def createTable( + table: String, + strSchema: String, + createTableOptions: String, + tableComment: String): Array[String] = { +if (!tableComment.isEmpty) { + logWarning("Cannot create JDBC table comment. The table comment will be ignored.") Review comment: So you are treating `comment` as a special table property that is OK to ignore. Is `comment` the only one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
cloud-fan commented on a change in pull request #30154: URL: https://github.com/apache/spark/pull/30154#discussion_r517135401 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala ## @@ -117,14 +117,33 @@ class JDBCTableCatalog extends TableCatalog with Logging { if (partitions.nonEmpty) { throw new UnsupportedOperationException("Cannot create JDBC table with partition") } -// TODO (SPARK-32405): Apply table options while creating tables in JDBC Table Catalog + +var tableOptions = options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> getTableName(ident)) +var tableComment: String = "" +var tableProperties: String = "" if (!properties.isEmpty) { - logWarning("Cannot create JDBC table with properties, these properties will be " + -"ignored: " + properties.asScala.map { case (k, v) => s"$k=$v" }.mkString("[", ", ", "]")) + properties.asScala.map { +case (k, v) => k match { + case "comment" => tableComment = v + case "provider" | "owner" | "location" => // provider, owner and location can't be set. + case _ => tableProperties = tableProperties + " " + s"$k=$v" +} + } } -val writeOptions = new JdbcOptionsInWrite( - options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> getTableName(ident))) +if (tableComment != "") { + tableOptions = tableOptions + (JDBCOptions.JDBC_TABLE_COMMENT -> tableComment) +} +if (tableProperties != "") { + // table property is set in JDBC_CREATE_TABLE_OPTIONS, which will be appended + // to CREATE TABLE statement. + // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8" Review comment: is it a standard syntax to specify table properties in all DBs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721555493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721555042 **[Test build #130590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130590/testReport)** for PR 30243 at commit [`789d2ec`](https://github.com/apache/spark/commit/789d2ec66fa7ea3218ff8908cd13f74276389f04). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ArrayContainsArray(left: Expression, right: Expression)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
AmplabJenkins removed a comment on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721553355 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130592/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
SparkQA removed a comment on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721537711 **[Test build #130592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130592/testReport)** for PR 30245 at commit [`a8e0c22`](https://github.com/apache/spark/commit/a8e0c2251d3bf01fcaa29344f2357c8768792d49). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
AmplabJenkins commented on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721553348 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
SparkQA commented on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721553472 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35193/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
AmplabJenkins removed a comment on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721553348 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
SparkQA commented on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721553230 **[Test build #130592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130592/testReport)** for PR 30245 at commit [`a8e0c22`](https://github.com/apache/spark/commit/a8e0c2251d3bf01fcaa29344f2357c8768792d49). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29837: [SPARK-32463][SQL][DOCS] Add "Type Conversion" section in "Supported Data Types" of SQL docs
cloud-fan commented on a change in pull request #29837: URL: https://github.com/apache/spark/pull/29837#discussion_r517131842 ## File path: docs/sql-ref-datatypes.md ## @@ -314,3 +314,206 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2; |3| Infinity| +-+-+ ``` + +### Type Conversion + +In general, an expression can contain different data types and type conversion is the transformation of some data types into others in order to resolve type mismatches. +Spark supports both implicit conversions by type coercion and explicit conversions by explicit casting and store assignment casting. + + Type Coercion in Operations between Different Types + +Type Coercion refers to the automatic or implicit conversion of values from one type to another when you need to to resolve type mismatches. +The following matrix shows the resulting type to which they are implicitly converted to resolve an expression involving different data types. + +**Numeric Expressions**: + +| |ByteType |ShortType |IntegerType |LongType |FloatType |DoubleType|DecimalType | +|---|---|---||---|--|--|-| +|**ByteType** |-- |ShortType |IntegerType |LongType |FloatType |DoubleType|DecimalType(3,0)1 | +|**ShortType** |ShortType |-- |IntegerType |LongType |FloatType |DoubleType|DecimalType(5,0)1 | +|**IntegerType**|IntegerType|IntegerType|-- |LongType |FloatType |DoubleType|DecimalType(10,0)1| +|**LongType** |LongType |LongType |LongType|-- |FloatType |DoubleType|DecimalType(20,0)1| +|**FloatType** |FloatType |FloatType |FloatType |FloatType |-- |DoubleType|DoubleType | +|**DoubleType** |DoubleType |DoubleType |DoubleType |DoubleType |DoubleType |--|DoubleType | +|**DecimalType**|DecimalType|DecimalType|DecimalType |DecimalType|DoubleType2|DoubleType2|-- | + +**Note 1**: DecimalType(precision,scale) +**Note 2**: In these cases DecimalType can lose precision, there is no common type for decimal and double because double's range is larger than decimal, and yet decimal is more precise than double so when we cast Decimaltype into DobleType it could lose precision. + +**StringType Behavior** +* Arithmetic Expressions: When we have an arithmetic expression with one operand of type StringType, both operands will be implicitly casted to DoubleType. + +| |ByteType |ShortType |IntegerType |LongType |FloatType|DoubleType | + |---|---|---||---|-|| +|**StringType** |DoubleType |DoubleType |DoubleType |DoubleType |DoubleType |DoubleType | + +* Comparison: When we have a comparison expression with an operand of type StringType, the operand StringType will be casted implicitly according to the following table. + +| |ByteType |ShortType |IntegerType |LongType |FloatType|DoubleType |DecimalType |DateType |TimestampType | + |---|---|---||---|-|||-|--| +|**StringType** |ByteType |ShortType |IntegerType |LongType |FloatType|DoubleType |DoubleType |DateType1 |TimestampType1 | + +**Note 1**: If `spark.sql.legacy.typeCoercion.datetimeToString` is true, DateType and TimestampType will be casted to StringType + +* in, except, intersect, union, array: If the list of values has a StringType element, all the elements will be casted to StringType. + +* concat, concat_ws, array_join: All elements will be casted to StringType. + +* map_concat: If the list of key has a StringType element, all the keys will be casted to StringType. The same goes for the values. + +* if, when: If any of the results has StringType, all the results will be casted to StringType. + +**Time Expressions**: + +| |DateType |TimestampType | +|--|-|--| +|**DateType** |-- |TimestampType | +|**TimestampType** |TimestampType|--| + + +**Possible implicit conversions**: + +| |ByteType |ShortType |IntegerType |LongType |FloatType |DoubleType |DecimalType|StringType |BinaryType |BooleanType |TimestampType |DateType| +|--|--|--||-|--|---|---|---|---||--|| +|**ByteType** |--|X |X |X|X |X |X
[GitHub] [spark] huaxingao commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
huaxingao commented on pull request #30154: URL: https://github.com/apache/spark/pull/30154#issuecomment-721550399 > Does it conflict with the dialect API to generate CREATE TABLE statement? No, it doesn't conflict with the dialect API to generate CREATE TABLE statement. The table properties are appended to the end of the CREATE TABLE statement as `createTableOptions`. ```s"CREATE TABLE $table ($strSchema) $createTableOptions"``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30240: [SPARK-33333][BUILD][3.0] Upgrade Jetty to 9.4.28.v20200408
dongjoon-hyun commented on pull request #30240: URL: https://github.com/apache/spark/pull/30240#issuecomment-721548555 Thank you so much, @viirya ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binaries.
cloud-fan closed pull request #30122: URL: https://github.com/apache/spark/pull/30122 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier
cloud-fan closed pull request #30229: URL: https://github.com/apache/spark/pull/30229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binaries.
cloud-fan commented on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721546906 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on a change in pull request #30229: URL: https://github.com/apache/spark/pull/30229#discussion_r517127493 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -3249,18 +3249,23 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging val tableName = visitMultipartIdentifier(ctx.multipartIdentifier()) if (ctx.ALL() != null) { checkPartitionSpec() - AnalyzeColumnStatement(tableName, None, allColumns = true) + AnalyzeColumn(UnresolvedTableOrView(tableName), None, allColumns = true) } else if (ctx.identifierSeq() == null) { val partitionSpec = if (ctx.partitionSpec != null) { visitPartitionSpec(ctx.partitionSpec) } else { Map.empty[String, Option[String]] } - AnalyzeTableStatement(tableName, partitionSpec, noScan = ctx.identifier != null) + AnalyzeTable( +UnresolvedTableOrView(tableName, allowTempView = false), +partitionSpec, +noScan = ctx.identifier != null) } else { checkPartitionSpec() - AnalyzeColumnStatement( -tableName, Option(visitIdentifierSeq(ctx.identifierSeq())), allColumns = false) + AnalyzeColumn( +UnresolvedTableOrView(tableName), Review comment: OK, let's keep the current behavior for now, and improve it later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on pull request #30229: URL: https://github.com/apache/spark/pull/30229#issuecomment-721546392 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
cloud-fan commented on pull request #30154: URL: https://github.com/apache/spark/pull/30154#issuecomment-721545793 > ... send properties to underlying databases and let databases decide whether to fail the CREATE TALBE or not This works. Does it conflict with the dialect API to generate CREATE TABLE statement? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29837: [SPARK-32463][SQL][DOCS] Add "Type Conversion" section in "Supported Data Types" of SQL docs
cloud-fan commented on a change in pull request #29837: URL: https://github.com/apache/spark/pull/29837#discussion_r517125806 ## File path: docs/sql-ref-datatypes.md ## @@ -314,3 +314,206 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2; |3| Infinity| +-+-+ ``` + +### Type Conversion + +In general, an expression can contain different data types and type conversion is the transformation of some data types into others in order to resolve type mismatches. Review comment: ``` Type conversion turns the values of one data type to another data type. Spark needs to perform type conversions if users explicitly ask to do so via the CAST operator, or to resolve data type mismatch in operators, functions, and table writing implicitly. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
huaxingao commented on pull request #30154: URL: https://github.com/apache/spark/pull/30154#issuecomment-721542287 > when users specify some table properties, JDBC V2 should fail if the underlying database can't support the properties. I agree JDBC V2 should fail if the underlying database can't support the properties. However, I feel it's hard to come up with a complete list of the supported properties for each of the databases. It is easy to have a complete list of the supported properties for MySQL because in CREATE TABLE syntax, it explicitly lists the table_options: ``` table_option: { AUTO_INCREMENT [=] value | AVG_ROW_LENGTH [=] value | [DEFAULT] CHARACTER SET [=] charset_name | CHECKSUM [=] {0 | 1} | [DEFAULT] COLLATE [=] collation_name | COMMENT [=] 'string' | COMPRESSION [=] {'ZLIB' | 'LZ4' | 'NONE'} | CONNECTION [=] 'connect_string' | {DATA | INDEX} DIRECTORY [=] 'absolute path to directory' | DELAY_KEY_WRITE [=] {0 | 1} | ENCRYPTION [=] {'Y' | 'N'} | ENGINE [=] engine_name | ENGINE_ATTRIBUTE [=] 'string' | INSERT_METHOD [=] { NO | FIRST | LAST } | KEY_BLOCK_SIZE [=] value | MAX_ROWS [=] value | MIN_ROWS [=] value | PACK_KEYS [=] {0 | 1 | DEFAULT} | PASSWORD [=] 'string' | ROW_FORMAT [=] {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT} | SECONDARY_ENGINE_ATTRIBUTE [=] 'string' | STATS_AUTO_RECALC [=] {DEFAULT | 0 | 1} | STATS_PERSISTENT [=] {DEFAULT | 0 | 1} | STATS_SAMPLE_PAGES [=] value | TABLESPACE tablespace_name [STORAGE {DISK | MEMORY}] | UNION [=] (tbl_name[,tbl_name]...) } ``` but other databases don't have explicitly defined table_options. I am not sure which of the properties should be considered as valid table properties. For example, for postgresql, https://www.postgresql.org/docs/9.1/sql-createtable.html, should we treat `TABLESPACE tablespace` or `CONSTRAINT constraint_name` as valid table properties? I feel it might be better to send properties to underlying databases and let databases decide whether to fail the CREATE TALBE or not, than trying to have a complete list of the supported properties on Spark side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30244: [WIP][SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
AmplabJenkins removed a comment on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721542255 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30244: [WIP][SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
AmplabJenkins commented on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721542255 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30244: [WIP][SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
SparkQA commented on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721542243 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35192/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #30240: [SPARK-33333][BUILD][3.0] Upgrade Jetty to 9.4.28.v20200408
viirya closed pull request #30240: URL: https://github.com/apache/spark/pull/30240 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721539621 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721539621 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721539609 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35191/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721538733 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721538733 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30240: [SPARK-33333][BUILD][3.0] Upgrade Jetty to 9.4.28.v20200408
viirya commented on pull request #30240: URL: https://github.com/apache/spark/pull/30240#issuecomment-721538642 Thanks. Merging to branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721472000 **[Test build #130584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130584/testReport)** for PR 30242 at commit [`2a0d2af`](https://github.com/apache/spark/commit/2a0d2afcb6456308dfa1105828ecd4d0f95a3a8e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721537988 **[Test build #130584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130584/testReport)** for PR 30242 at commit [`2a0d2af`](https://github.com/apache/spark/commit/2a0d2afcb6456308dfa1105828ecd4d0f95a3a8e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
SparkQA commented on pull request #30245: URL: https://github.com/apache/spark/pull/30245#issuecomment-721537711 **[Test build #130592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130592/testReport)** for PR 30245 at commit [`a8e0c22`](https://github.com/apache/spark/commit/a8e0c2251d3bf01fcaa29344f2357c8768792d49). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang edited a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
LuciferYang edited a comment on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-721533492 > After this PR, do we have a way to prevent this compilation warnings? Let me take a look at this problem. Maybe we can change the alarm level of the compiler or check-style plugin by modify some configuration This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
viirya commented on a change in pull request #30245: URL: https://github.com/apache/spark/pull/30245#discussion_r517120205 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala ## @@ -149,17 +149,43 @@ class SubexpressionEliminationSuite extends SparkFunSuite { assert(equivalence.getAllEquivalentExprs.count(_.size == 1) == 3) // add, two, explode } - test("Children of conditional expressions") { -val condition = And(Literal(true), Literal(false)) + test("Children of conditional expressions: If") { Review comment: I will add the tests for `CaseWhen` and `Coalesce`, but the idea is the same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang edited a comment on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
LuciferYang edited a comment on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-721533492 > After this PR, do we have a way to prevent this compilation warnings? Let me take a look at this problem. Maybe we can change the alarm level of the compiler or check-style plugin This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
viirya commented on a change in pull request #30245: URL: https://github.com/apache/spark/pull/30245#discussion_r517120205 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala ## @@ -149,17 +149,43 @@ class SubexpressionEliminationSuite extends SparkFunSuite { assert(equivalence.getAllEquivalentExprs.count(_.size == 1) == 3) // add, two, explode } - test("Children of conditional expressions") { -val condition = And(Literal(true), Literal(false)) + test("Children of conditional expressions: If") { Review comment: I will add the tests for `CaseWhen` and `Coalesce`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #30245: [WIP][SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions
viirya opened a new pull request #30245: URL: https://github.com/apache/spark/pull/30245 ### What changes were proposed in this pull request? Currently we skip subexpression elimination in branches of conditional expressions including `If`, `CaseWhen`, and `Coalesce`. Actually we can do subexpression elimination for such branches if the subexpression is common across all branches. This patch proposes to support subexpression elimination in branches of conditional expressions. ### Why are the changes needed? We may miss subexpression elimination chances in branches of conditional expressions. This kind of subexpression is frequently seen. It may be written manually by users or come from query optimizer. For example, project collapsing could embed expressions between two `Project`s and produces conditional expression like: ``` CASE WHEN jsonToStruct(json).a = '1' THEN 1.0 WHEN jsonToStruct(json).a = '2' THEN 2.0 ... ELSE 1.2 END ``` If `jsonToStruct(json)` is time-expensive expression, we don't eliminate the duplication and waste time on running it repeatedly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ahshahid commented on pull request #30185: [SPARK-33152][SQL] This PR proposes a new logic to maintain & track constraints which solves the OOM or performance issues in query compilat
ahshahid commented on pull request #30185: URL: https://github.com/apache/spark/pull/30185#issuecomment-721536710 i will fix the scala 2.13 issue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30244: [WIP][SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
AmplabJenkins removed a comment on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721521860 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30244: [WIP][SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
SparkQA commented on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721534664 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35192/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #30234: [SPARK-33285][CORE][SQL] Fix deprecated compilation warnings of "Auto-application to () is deprecated" in Scala 2.13
LuciferYang commented on pull request #30234: URL: https://github.com/apache/spark/pull/30234#issuecomment-721533492 > After this PR, do we have a way to prevent this compilation warnings? Let me take a look at it. Maybe we can change the alarm level of the compiler or check-style plugin This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-721531572 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
AmplabJenkins removed a comment on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-721531572 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721531130 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35191/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
SparkQA commented on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-721530641 **[Test build #130587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130587/testReport)** for PR 30166 at commit [`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30166: [SPARK-33265][TEST] Rename classOf[Seq] to classOf[scala.collection.Seq] in PostgresIntegrationSuite for Scala 2.13
SparkQA removed a comment on pull request #30166: URL: https://github.com/apache/spark/pull/30166#issuecomment-721488609 **[Test build #130587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130587/testReport)** for PR 30166 at commit [`d0decd8`](https://github.com/apache/spark/commit/d0decd8994f67c0085ef2f71c03d4f04da4fe64c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binar
AmplabJenkins removed a comment on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721521859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binaries.
AmplabJenkins commented on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721521859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30244: [SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
AmplabJenkins commented on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721521860 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30244: [SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
HyukjinKwon commented on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721521439 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30244: [SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
AmplabJenkins removed a comment on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721520106 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binaries.
SparkQA removed a comment on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721493878 **[Test build #130588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130588/testReport)** for PR 30122 at commit [`c56cf1f`](https://github.com/apache/spark/commit/c56cf1f0a6fde354fc80a3db1def6e563a0fbe94). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binaries.
SparkQA commented on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721521320 **[Test build #130588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130588/testReport)** for PR 30122 at commit [`c56cf1f`](https://github.com/apache/spark/commit/c56cf1f0a6fde354fc80a3db1def6e563a0fbe94). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier
imback82 commented on a change in pull request #30229: URL: https://github.com/apache/spark/pull/30229#discussion_r517105993 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala ## @@ -2606,6 +2606,13 @@ class DataSourceV2SQLSuite } } + private def testNotSupportedV2Command(sqlCommand: String, sqlParams: String): Unit = { +val e = intercept[AnalysisException] { + sql(s"$sqlCommand $sqlParams") +} +assert(e.message.contains(s"$sqlCommand is not supported for v2 tables")) Review comment: Yes, the plan is to move all the commands using `parseV1Table` in `ResolveSessionCatalog` to use the new framework and unify the message in the process. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30244: [SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
AmplabJenkins commented on pull request #30244: URL: https://github.com/apache/spark/pull/30244#issuecomment-721520106 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kbendick opened a new pull request #30244: [SPARK-33282] Migrate from dead probot autolabeler to GitHub labeler action
kbendick opened a new pull request #30244: URL: https://github.com/apache/spark/pull/30244 ### What changes were proposed in this pull request? This PR removes the old Probot Autolabeler labeling configuration, as the probot autolabeler has been deprecated. I've updated the configs in Iceberg and in Avro, and we also need to update here. This PR adds in an additional workflow for labeling PRs and migrates the old probot config to the new format. Unfortunately, because certain features have not been released upstream, we will not get the _exact_ behavior as before. I have documented where that is and what changes are neeeded, and in the associated ticket I've also discussed other options and why I think this is the best way to go. Definitely a follow up ticket is needed to get the original behavior back in these few cases, but PRs have not been labeled for almost a month and so it's probably best to get it right 95% of the time and occasionally have some UI related PRs labeled as `CORE` while the issue is resolved upstream and/or further investigated. ### Why are the changes needed? The probot autolabeler is dead and will not be maintained going forward. This has been confirmed with github user [at]mithro in an issue in their repository. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By merging into my personal fork and then running a large number of tests. Unfortunately, I've overwritten my fork with the apache repo in order to create a proper PR. However, I've also added the config for the same thing in the Iceberg repo as well as the Avro repo. I can work on adding tests for this, but I'm pretty swamped this week and the next so either somebody else would have to take the lead on that, we'd have to wait, or we can just inspect it, merge it, and then observe what happens as PRs come in. I personally vote for the latter but thats probably because this is the third one of these I've done in the `apache` sphere and I admittedly won't have much time to work on it for another week or so, but I'm happy to make tickets and to let anybody who would like to take the reins. I've also discovered that we're likely not killing github actions that run (like large tests etc) when users push to their PR. In order to save time / capacity on the runners, we should add an action in each workflow that cancels old runs when a `push` action occurs on a PR. This will likely make waiting for test runners much faster, especially if added to all of the workflows in the Apache account (as github action API limits are set at the account level). Admittedly, the fact that the "old" workflow runs weren't cancelled could admittedly be because of the fact that I was working in a fork, but given that there are explicit actions to be added to the start of workflows to cancel old PR workflows and given that we don't have them configured indicates to me that likely this is the case in this repo (and in most `apache` repos as well). The last Apache repo to still have the probot autolabeler in it is Beam, at which point we can have Gavin from Infra remove the permissions for the probot autolabeler entirely. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog
cloud-fan commented on pull request #30154: URL: https://github.com/apache/spark/pull/30154#issuecomment-721519171 I think the principle is: when users specify some table properties, JDBC V2 should fail if the underlying database can't support the properties. I prefer a dialect API to generate CREATE TABLE SQL statement, because `comment` is not the only table property. It's cumbersome to add one dialect API for each table property, like comment, charset, etc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721517967 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35190/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins removed a comment on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721517959 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721517981 **[Test build #130591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130591/testReport)** for PR 30242 at commit [`997e1aa`](https://github.com/apache/spark/commit/997e1aac7548518b39bf0a46546cdc4813544a65). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
AmplabJenkins commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721517959 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721517941 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35190/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721452935 **[Test build #130582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130582/testReport)** for PR 30242 at commit [`3f96145`](https://github.com/apache/spark/commit/3f96145c5e867df1c1d68d841378b45b1c2e6842). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721515943 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on a change in pull request #30229: URL: https://github.com/apache/spark/pull/30229#discussion_r517101470 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala ## @@ -2606,6 +2606,13 @@ class DataSourceV2SQLSuite } } + private def testNotSupportedV2Command(sqlCommand: String, sqlParams: String): Unit = { +val e = intercept[AnalysisException] { + sql(s"$sqlCommand $sqlParams") +} +assert(e.message.contains(s"$sqlCommand is not supported for v2 tables")) Review comment: Not related to this PR, but we should unify the error message, to either `is not supported for v2 tables` or `is only supported with v1 tables` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721515943 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721515404 **[Test build #130582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130582/testReport)** for PR 30242 at commit [`3f96145`](https://github.com/apache/spark/commit/3f96145c5e867df1c1d68d841378b45b1c2e6842). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on a change in pull request #30229: URL: https://github.com/apache/spark/pull/30229#discussion_r517100450 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -3249,18 +3249,23 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging val tableName = visitMultipartIdentifier(ctx.multipartIdentifier()) if (ctx.ALL() != null) { checkPartitionSpec() - AnalyzeColumnStatement(tableName, None, allColumns = true) + AnalyzeColumn(UnresolvedTableOrView(tableName), None, allColumns = true) } else if (ctx.identifierSeq() == null) { val partitionSpec = if (ctx.partitionSpec != null) { visitPartitionSpec(ctx.partitionSpec) } else { Map.empty[String, Option[String]] } - AnalyzeTableStatement(tableName, partitionSpec, noScan = ctx.identifier != null) + AnalyzeTable( +UnresolvedTableOrView(tableName, allowTempView = false), +partitionSpec, +noScan = ctx.identifier != null) } else { checkPartitionSpec() - AnalyzeColumnStatement( -tableName, Option(visitIdentifierSeq(ctx.identifierSeq())), allColumns = false) + AnalyzeColumn( +UnresolvedTableOrView(tableName), Review comment: `AnalyzeColumn` allows temp view but `AnalyzeTable` does not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30203: [SPARK-33303][SQL] Deduplicate deterministic PythonUDF calls
cloud-fan commented on a change in pull request #30203: URL: https://github.com/apache/spark/pull/30203#discussion_r517099242 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ## @@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with PredicateHelper { } } + private def canonicalizeDeterministic(u: PythonUDF) = { Review comment: Changing the default is orthogonal to this PR IMO. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30230: [SPARK-33323][SQL] Add query resolved check before convert hive relation
cloud-fan closed pull request #30230: URL: https://github.com/apache/spark/pull/30230 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30230: [SPARK-33323][SQL] Add query resolved check before convert hive relation
cloud-fan commented on pull request #30230: URL: https://github.com/apache/spark/pull/30230#issuecomment-721512840 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721510623 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35189/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
SparkQA commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721510610 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35189/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins commented on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721510614 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.
AmplabJenkins removed a comment on pull request #30242: URL: https://github.com/apache/spark/pull/30242#issuecomment-721510614 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binar
AmplabJenkins removed a comment on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721509935 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35188/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30243: [SPARK-33335][SQL] Support `array_contains_array` func
SparkQA commented on pull request #30243: URL: https://github.com/apache/spark/pull/30243#issuecomment-721510264 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35190/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binar
AmplabJenkins removed a comment on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721509927 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binaries.
SparkQA commented on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721509917 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35188/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30122: [SPARK-33214][TEST][HIVE] Stop HiveExternalCatalogVersionsSuite from using a hard-coded location to store localized Spark binaries.
AmplabJenkins commented on pull request #30122: URL: https://github.com/apache/spark/pull/30122#issuecomment-721509927 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30229: [SPARK-33321][SQL] Migrate ANALYZE TABLE commands to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on a change in pull request #30229: URL: https://github.com/apache/spark/pull/30229#discussion_r517095433 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala ## @@ -280,6 +280,9 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat case r @ ShowTableProperties(rt: ResolvedTable, propertyKey) => ShowTablePropertiesExec(r.output, rt.table, propertyKey) :: Nil +case AnalyzeTable(_: ResolvedTable, _, _) | AnalyzeColumn(_: ResolvedTable, _, _) => + throw new AnalysisException("ANALYZE TABLE is not supported for v2 tables.") Review comment: ah ok, then it's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30222: [SPARK-33315][SQL] Simplify CaseWhen with EqualTo
cloud-fan commented on a change in pull request #30222: URL: https://github.com/apache/spark/pull/30222#discussion_r517095107 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -510,6 +510,10 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { } else { e.copy(branches = branches.take(i).map(branch => (branch._1, elseValue))) } + + case EqualTo(CaseWhen(branches, _), right) Review comment: As an example `(CASE WHEN a=1 THEN 1 ELSE b) = 1` can be true if `a=1` or `b=1`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org