[GitHub] [spark] dongjinleekr commented on issue #25251: [MINOR] Trivial cleanups
dongjinleekr commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514886740 @maropu I will have a try. Let me see. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view
AmplabJenkins removed a comment on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view URL: https://github.com/apache/spark/pull/25068#issuecomment-514886374 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view
AmplabJenkins removed a comment on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view URL: https://github.com/apache/spark/pull/25068#issuecomment-514886379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108138/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view
AmplabJenkins commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view URL: https://github.com/apache/spark/pull/25068#issuecomment-514886379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108138/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view
SparkQA removed a comment on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view URL: https://github.com/apache/spark/pull/25068#issuecomment-514836024 **[Test build #108138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108138/testReport)** for PR 25068 at commit [`474c16e`](https://github.com/apache/spark/commit/474c16ea5eae33da3f198c8f240a77ae8b1ce13e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view
AmplabJenkins commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view URL: https://github.com/apache/spark/pull/25068#issuecomment-514886374 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view
SparkQA commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view URL: https://github.com/apache/spark/pull/25068#issuecomment-514886133 **[Test build #108138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108138/testReport)** for PR 25068 at commit [`474c16e`](https://github.com/apache/spark/commit/474c16ea5eae33da3f198c8f240a77ae8b1ce13e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514885083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13251/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514885080 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514885053 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13251/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514885080 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514885083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13251/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
AmplabJenkins removed a comment on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514883978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13250/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
AmplabJenkins removed a comment on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514883973 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable
brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable URL: https://github.com/apache/spark/pull/24832#discussion_r307102960 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -284,9 +294,13 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging */ override def visitInsertIntoTable( ctx: InsertIntoTableContext): InsertTableParams = withOrigin(ctx) { -val tableIdent = visitTableIdentifier(ctx.tableIdentifier) +val tableIdent = visitMultipartIdentifier(ctx.multipartIdentifier) val partitionKeys = Option(ctx.partitionSpec).map(visitPartitionSpec).getOrElse(Map.empty) +if (ctx.EXISTS != null) { Review comment: what's the point of adding this to the parser, if we're not going to support it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable
brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable URL: https://github.com/apache/spark/pull/24832#discussion_r307103799 ## File path: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala ## @@ -149,49 +161,89 @@ private class InMemoryTable( } override def newWriteBuilder(options: CaseInsensitiveStringMap): WriteBuilder = { -new WriteBuilder with SupportsTruncate { - private var shouldTruncate: Boolean = false +new WriteBuilder with SupportsTruncate with SupportsOverwrite with SupportsDynamicOverwrite { + private var writer: BatchWrite = Append override def truncate(): WriteBuilder = { -shouldTruncate = true +assert(writer == Append) +writer = TruncateAndAppend +this + } + + override def overwrite(filters: Array[Filter]): WriteBuilder = { +assert(writer == Append) +writer = new Overwrite(filters) this } - override def buildForBatch(): BatchWrite = { -if (shouldTruncate) TruncateAndAppend else Append + override def overwriteDynamicPartitions(): WriteBuilder = { +assert(writer == Append) +writer = DynamicOverwrite +this } + + override def buildForBatch(): BatchWrite = writer } } - private object TruncateAndAppend extends BatchWrite { + private abstract class TestBatchWrite extends BatchWrite { override def createBatchWriterFactory(): DataWriterFactory = { BufferedRowsWriterFactory } -override def commit(messages: Array[WriterCommitMessage]): Unit = { - replaceData(messages.map(_.asInstanceOf[BufferedRows])) +override def abort(messages: Array[WriterCommitMessage]): Unit = { } + } -override def abort(messages: Array[WriterCommitMessage]): Unit = { + private object Append extends TestBatchWrite { +override def commit(messages: Array[WriterCommitMessage]): Unit = dataMap.synchronized { + withData(messages.map(_.asInstanceOf[BufferedRows])) } } - private object Append extends BatchWrite { -override def createBatchWriterFactory(): DataWriterFactory = { - BufferedRowsWriterFactory + private object DynamicOverwrite extends TestBatchWrite { +override def commit(messages: Array[WriterCommitMessage]): Unit = dataMap.synchronized { + val newData = messages.map(_.asInstanceOf[BufferedRows]) + dataMap --= newData.flatMap(_.rows.map(getKey)) + withData(newData) } + } -override def commit(messages: Array[WriterCommitMessage]): Unit = { - replaceData(data ++ messages.map(_.asInstanceOf[BufferedRows])) + private class Overwrite(filters: Array[Filter]) extends TestBatchWrite { +override def commit(messages: Array[WriterCommitMessage]): Unit = dataMap.synchronized { + val deleteKeys = dataMap.keys.filter { partValues => +filters.exists { + case EqualTo(attr, value) => +partFieldNames.zipWithIndex.find(_._1 == attr) match { + case Some((_, partIndex)) => +value == partValues(partIndex) + case _ => +throw new IllegalArgumentException(s"Unknown filter attribute: $attr") +} + case f @ _ => +throw new IllegalArgumentException(s"Unsupported filter type: $f") +} + } + dataMap --= deleteKeys + withData(messages.map(_.asInstanceOf[BufferedRows])) } + } -override def abort(messages: Array[WriterCommitMessage]): Unit = { + private object TruncateAndAppend extends TestBatchWrite { +override def commit(messages: Array[WriterCommitMessage]): Unit = dataMap.synchronized { + dataMap = mutable.Map.empty Review comment: @rdblue You forgot to address this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
AmplabJenkins commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514883978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13250/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
AmplabJenkins commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514883973 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable
brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable URL: https://github.com/apache/spark/pull/24832#discussion_r307103888 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala ## @@ -23,17 +23,19 @@ import scala.collection.mutable import org.apache.spark.sql.{AnalysisException, SaveMode} import org.apache.spark.sql.catalog.v2.{CatalogPlugin, Identifier, LookupCatalog, TableCatalog} -import org.apache.spark.sql.catalog.v2.expressions.Transform +import org.apache.spark.sql.catalog.v2.expressions.{FieldReference, IdentityTransform, Transform} Review comment: are any of the changes here needed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable
brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable URL: https://github.com/apache/spark/pull/24832#discussion_r307103748 ## File path: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala ## @@ -157,43 +170,86 @@ class InMemoryTable( override def newWriteBuilder(options: CaseInsensitiveStringMap): WriteBuilder = { TestInMemoryTableCatalog.maybeSimulateFailedTableWrite(options) -new WriteBuilder with SupportsTruncate { - private var shouldTruncate: Boolean = false + +new WriteBuilder with SupportsTruncate with SupportsOverwrite with SupportsDynamicOverwrite { + private var writer: BatchWrite = Append override def truncate(): WriteBuilder = { -shouldTruncate = true +assert(writer == Append) +writer = TruncateAndAppend this } - override def buildForBatch(): BatchWrite = { -if (shouldTruncate) TruncateAndAppend else Append + override def overwrite(filters: Array[Filter]): WriteBuilder = { +assert(writer == Append) +writer = new Overwrite(filters) +this } + + override def overwriteDynamicPartitions(): WriteBuilder = { +assert(writer == Append) +writer = DynamicOverwrite +this + } + + override def buildForBatch(): BatchWrite = writer } } - private object TruncateAndAppend extends BatchWrite { + private abstract class TestBatchWrite extends BatchWrite { override def createBatchWriterFactory(): DataWriterFactory = { BufferedRowsWriterFactory } -override def commit(messages: Array[WriterCommitMessage]): Unit = { - replaceData(messages.map(_.asInstanceOf[BufferedRows])) +override def abort(messages: Array[WriterCommitMessage]): Unit = { } + } -override def abort(messages: Array[WriterCommitMessage]): Unit = { + private object Append extends TestBatchWrite { +override def commit(messages: Array[WriterCommitMessage]): Unit = dataMap.synchronized { + withData(messages.map(_.asInstanceOf[BufferedRows])) } } - private object Append extends BatchWrite { -override def createBatchWriterFactory(): DataWriterFactory = { - BufferedRowsWriterFactory + private object DynamicOverwrite extends TestBatchWrite { +override def commit(messages: Array[WriterCommitMessage]): Unit = dataMap.synchronized { + val newData = messages.map(_.asInstanceOf[BufferedRows]) + dataMap --= newData.flatMap(_.rows.map(getKey)) + withData(newData) } + } -override def commit(messages: Array[WriterCommitMessage]): Unit = { - replaceData(data ++ messages.map(_.asInstanceOf[BufferedRows])) + private class Overwrite(filters: Array[Filter]) extends TestBatchWrite { +override def commit(messages: Array[WriterCommitMessage]): Unit = dataMap.synchronized { + val deleteKeys = dataMap.keys.filter { partValues => +filters.flatMap(splitAnd).forall { + case EqualTo(attr, value) => +partFieldNames.zipWithIndex.find(_._1 == attr) match { + case Some((_, partIndex)) => +value == partValues(partIndex) + case _ => +throw new IllegalArgumentException(s"Unknown filter attribute: $attr") +} + case f @ _ => Review comment: nit, no need for `@ _` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable
brkyvz commented on a change in pull request #24832: [SPARK-27845][SQL] DataSourceV2: InsertTable URL: https://github.com/apache/spark/pull/24832#discussion_r307058102 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -294,8 +294,8 @@ query ; insertInto -: INSERT OVERWRITE TABLE tableIdentifier (partitionSpec (IF NOT EXISTS)?)? #insertOverwriteTable -| INSERT INTO TABLE? tableIdentifier partitionSpec? #insertIntoTable +: INSERT OVERWRITE TABLE? multipartIdentifier (partitionSpec (IF NOT EXISTS)?)? #insertOverwriteTable +| INSERT INTO TABLE? multipartIdentifier partitionSpec? (IF NOT EXISTS)? #insertIntoTable Review comment: do we need to wrap with parentheses `(partitionSpec (IF NOT EXISTS)?)?` like above? Otherwise, what happens if there's no `partitionSpec` but the `IF NOT EXISTS`? - If the table not exists? Then wouldn't that be CTAS? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
SparkQA commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514883941 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13250/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
SparkQA removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514881351 **[Test build #108149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108149/testReport)** for PR 24879 at commit [`6e5fcf6`](https://github.com/apache/spark/commit/6e5fcf64472cd25785673b3c9599cc59360d9381). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514883509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108149/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins removed a comment on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514883504 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514883450 **[Test build #108149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108149/testReport)** for PR 24879 at commit [`6e5fcf6`](https://github.com/apache/spark/commit/6e5fcf64472cd25785673b3c9599cc59360d9381). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514883509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108149/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
AmplabJenkins commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514883504 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514882742 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13251/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307102653 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ## @@ -228,6 +234,156 @@ case class FilterExec(condition: Expression, child: SparkPlan) override def outputPartitioning: Partitioning = child.outputPartitioning } +/** + * Physical plan node for a recursive table that encapsulates the physical plans of the anchor + * terms and the logical plans of the recursive terms and the maximum number of rows to return. + * + * Anchor terms are physical plans and they are used to initialize the query in the first run. + * Recursive terms are used to extend the result with new rows, They are logical plans and contain + * references to the result of the previous iteration or to the so far cumulated result. These + * references are updated with new statistics and compiled to physical plan and then updated to + * reflect the appropriate RDD before execution. + * + * The execution terminates once the anchor terms or the current iteration of the recursive terms + * return no rows or the number of cumulated rows reaches the limit. + * + * During the execution of a recursive query the previously computed results are reused multiple + * times. To avoid massive recomputation of these pieces of the final result, they are cached. + * + * @param name the name of the recursive table + * @param anchorTerms this child is used for initializing the query + * @param recursiveTerms this child is used for extending the set of results with new rows based on + * the results of the previous iteration (or the anchor in the first + * iteration) + * @param limit the maximum number of rows to return + */ +case class RecursiveTableExec( +name: String, +anchorTerms: Seq[SparkPlan], +@transient +val recursiveTerms: Seq[LogicalPlan], +limit: Option[Long]) extends SparkPlan { + override def children: Seq[SparkPlan] = anchorTerms + + override def output: Seq[Attribute] = anchorTerms.head.output.map(_.withNullability(true)) + + override def simpleString(maxFields: Int): String = +s"RecursiveTable $name${limit.map(", " + _).getOrElse("")}" + + override def innerChildren: Seq[QueryPlan[_]] = recursiveTerms ++ super.innerChildren + + override protected def doExecute(): RDD[InternalRow] = { +val storageLevel = StorageLevel.fromString(conf.getConf(SQLConf.RECURSION_CACHE_STORAGE_LEVEL)) + +val prevIterationRDDs = ArrayBuffer.empty[RDD[InternalRow]] +var prevIterationCount = 0L + +val anchorTermsIterator = anchorTerms.iterator +while (anchorTermsIterator.hasNext && limit.forall(_ > prevIterationCount)) { + val anchorTerm = anchorTermsIterator.next() + + lazy val cumulatedResult = if (prevIterationRDDs.size > 1) { +sparkContext.union(prevIterationRDDs) + } else { +prevIterationRDDs.head + } + + anchorTerm.foreach { +case rr: RecursiveReferenceExec if rr.name == name => rr.recursiveTable = cumulatedResult +case _ => + } + + val rdd = anchorTerm.execute().map(_.copy()).persist(storageLevel) + val count = rdd.count() + if (count > 0) { +prevIterationRDDs += rdd +prevIterationCount += count + } +} + +val cumulatedRDDs = ArrayBuffer(prevIterationRDDs: _*) +var cumulatedCount = prevIterationCount +var level = 0 +val levelLimit = conf.getConf(SQLConf.RECURSION_LEVEL_LIMIT) Review comment: `conf.recursionLevelLimit`? Also, can you move this definition outside the loop? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307102482 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ## @@ -228,6 +234,156 @@ case class FilterExec(condition: Expression, child: SparkPlan) override def outputPartitioning: Partitioning = child.outputPartitioning } +/** + * Physical plan node for a recursive table that encapsulates the physical plans of the anchor + * terms and the logical plans of the recursive terms and the maximum number of rows to return. + * + * Anchor terms are physical plans and they are used to initialize the query in the first run. + * Recursive terms are used to extend the result with new rows, They are logical plans and contain + * references to the result of the previous iteration or to the so far cumulated result. These + * references are updated with new statistics and compiled to physical plan and then updated to + * reflect the appropriate RDD before execution. + * + * The execution terminates once the anchor terms or the current iteration of the recursive terms + * return no rows or the number of cumulated rows reaches the limit. + * + * During the execution of a recursive query the previously computed results are reused multiple + * times. To avoid massive recomputation of these pieces of the final result, they are cached. + * + * @param name the name of the recursive table + * @param anchorTerms this child is used for initializing the query + * @param recursiveTerms this child is used for extending the set of results with new rows based on + * the results of the previous iteration (or the anchor in the first + * iteration) + * @param limit the maximum number of rows to return + */ +case class RecursiveTableExec( +name: String, +anchorTerms: Seq[SparkPlan], +@transient +val recursiveTerms: Seq[LogicalPlan], +limit: Option[Long]) extends SparkPlan { + override def children: Seq[SparkPlan] = anchorTerms + + override def output: Seq[Attribute] = anchorTerms.head.output.map(_.withNullability(true)) + + override def simpleString(maxFields: Int): String = +s"RecursiveTable $name${limit.map(", " + _).getOrElse("")}" + + override def innerChildren: Seq[QueryPlan[_]] = recursiveTerms ++ super.innerChildren + + override protected def doExecute(): RDD[InternalRow] = { +val storageLevel = StorageLevel.fromString(conf.getConf(SQLConf.RECURSION_CACHE_STORAGE_LEVEL)) Review comment: `conf.recursionCacheStorageLevel`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
SparkQA commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514881458 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13250/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage
SparkQA commented on issue #24879: [SPARK-28042][K8S] Support using volume mount as local storage URL: https://github.com/apache/spark/pull/24879#issuecomment-514881351 **[Test build #108149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108149/testReport)** for PR 24879 at commit [`6e5fcf6`](https://github.com/apache/spark/commit/6e5fcf64472cd25785673b3c9599cc59360d9381). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #25251: [MINOR] Trivial cleanups
maropu commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514880267 At the same time, can you check unused imports by using IDE supports? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
SparkQA commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514878393 **[Test build #108148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108148/testReport)** for PR 25236 at commit [`fd0a1d2`](https://github.com/apache/spark/commit/fd0a1d2cb240e4a0f0f8e442e083dfa221cfce00). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S.
dongjoon-hyun commented on issue #25236: [SPARK-28487][k8s] More responsive dynamic allocation with K8S. URL: https://github.com/apache/spark/pull/25236#issuecomment-514878133 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type
HyukjinKwon commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type URL: https://github.com/apache/spark/pull/25242#discussion_r307098867 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala ## @@ -196,6 +196,43 @@ class EncoderResolutionSuite extends PlanTest { encoder.resolveAndBind(attrs) } + test("SPARK-28497: complex type is not compatible with string encoder schema") { +val encoder = ExpressionEncoder[String] + +{ + val attrs = Seq('a.struct('x.long)) + assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message == +s""" + |Cannot up cast `a` from struct to string. + |The type path of the target object is: + |- root class: "java.lang.String" + |You can either add an explicit cast to the input data or choose a higher precision type +""".stripMargin.trim + " of the field in the target object") +} + +{ + val attrs = Seq('a.array(StringType)) + assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message == +s""" + |Cannot up cast `a` from array to string. Review comment: Oh, it was the same comment as https://github.com/apache/spark/pull/25242#discussion_r307064357 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] hddong commented on issue #23952: [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline
hddong commented on issue #23952: [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline URL: https://github.com/apache/spark/pull/23952#issuecomment-514876679 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type
cloud-fan commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type URL: https://github.com/apache/spark/pull/25242#discussion_r307097063 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala ## @@ -196,6 +196,43 @@ class EncoderResolutionSuite extends PlanTest { encoder.resolveAndBind(attrs) } + test("SPARK-28497: complex type is not compatible with string encoder schema") { +val encoder = ExpressionEncoder[String] + +{ + val attrs = Seq('a.struct('x.long)) + assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message == +s""" Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108136/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875176 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875176 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108136/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514834222 **[Test build #108136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108136/testReport)** for PR 25007 at commit [`9f17b9b`](https://github.com/apache/spark/commit/9f17b9bbf0d3d5677abf424c5b2d4d3b93dfc95a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875071 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108140/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875069 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875071 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108140/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514875069 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514874761 **[Test build #108136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108136/testReport)** for PR 25007 at commit [`9f17b9b`](https://github.com/apache/spark/commit/9f17b9bbf0d3d5677abf424c5b2d4d3b93dfc95a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514837877 **[Test build #108140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108140/testReport)** for PR 25007 at commit [`56fa450`](https://github.com/apache/spark/commit/56fa450b4d703c7bc34f338e2ed0cd21fc82a98c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor
AmplabJenkins removed a comment on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor URL: https://github.com/apache/spark/pull/25249#issuecomment-514874301 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108143/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514874681 **[Test build #108140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108140/testReport)** for PR 25007 at commit [`56fa450`](https://github.com/apache/spark/commit/56fa450b4d703c7bc34f338e2ed0cd21fc82a98c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor
AmplabJenkins commented on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor URL: https://github.com/apache/spark/pull/25249#issuecomment-514874301 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108143/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor
SparkQA removed a comment on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor URL: https://github.com/apache/spark/pull/25249#issuecomment-514854663 **[Test build #108143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108143/testReport)** for PR 25249 at commit [`10678ad`](https://github.com/apache/spark/commit/10678ad436e9c50499703df96fae64f782e8722f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor
AmplabJenkins commented on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor URL: https://github.com/apache/spark/pull/25249#issuecomment-514874295 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor
AmplabJenkins removed a comment on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor URL: https://github.com/apache/spark/pull/25249#issuecomment-514874295 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor
SparkQA commented on issue #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor URL: https://github.com/apache/spark/pull/25249#issuecomment-514874161 **[Test build #108143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108143/testReport)** for PR 25249 at commit [`10678ad`](https://github.com/apache/spark/commit/10678ad436e9c50499703df96fae64f782e8722f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514873120 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514873123 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108139/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514873123 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108139/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514873120 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514836023 **[Test build #108139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108139/testReport)** for PR 25007 at commit [`e53a001`](https://github.com/apache/spark/commit/e53a001b30a15ebc06df6b62c5650ae6f3213477). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514872726 **[Test build #108139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108139/testReport)** for PR 25007 at commit [`e53a001`](https://github.com/apache/spark/commit/e53a001b30a15ebc06df6b62c5650ae6f3213477). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25251: [MINOR] Trivial cleanups
SparkQA commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514872275 **[Test build #108147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108147/testReport)** for PR 25251 at commit [`600444e`](https://github.com/apache/spark/commit/600444e9f7044c9972fb76599d99a985641e840d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514871919 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514871920 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13249/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514871919 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514871920 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13249/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514866728 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon removed a comment on issue #25251: [MINOR] Trivial cleanups
HyukjinKwon removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514871649 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25251: [MINOR] Trivial cleanups
HyukjinKwon commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514871696 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25251: [MINOR] Trivial cleanups
HyukjinKwon commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-514871649 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #25248: [SPARK-28152][SQL][2.4] Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect
dongjoon-hyun closed pull request #25248: [SPARK-28152][SQL][2.4] Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect URL: https://github.com/apache/spark/pull/25248 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307093683 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 34 schema +struct +-- !query 34 output +0 A +1 AB +1 AC +2 ABB +2 ABC +2 ACB +2 ACC +3 ABBC +3 ABCC +3 ACBC +3 ACCC + + +-- !query 35 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 35 schema +struct +-- !query 35 output +0 A +0 B +1 AC +1 AD +1 BC +1 BD +2 ACC +2 ACD +2 ADC +2 ADD +2 BCC +2 BCD +2 BDC +2 BDD +3 ACCD +3 ACDD +3 ADCD +3 ADDD +3 BCCD +3 BCDD +3 BDCD +3 BDDD + + +-- !query 36 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 36 schema +struct<> +-- !query 36 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 37 +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r +-- !query 37 schema +struct +-- !query 37 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 38 +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 38 schema +struct<> +-- !query 38 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 39 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r +-- !query 39 schema +struct<> +-- !
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307093581 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 34 schema +struct +-- !query 34 output +0 A +1 AB +1 AC +2 ABB +2 ABC +2 ACB +2 ACC +3 ABBC +3 ABCC +3 ACBC +3 ACCC + + +-- !query 35 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 35 schema +struct +-- !query 35 output +0 A +0 B +1 AC +1 AD +1 BC +1 BD +2 ACC +2 ACD +2 ADC +2 ADD +2 BCC +2 BCD +2 BDC +2 BDD +3 ACCD +3 ACDD +3 ADCD +3 ADDD +3 BCCD +3 BCDD +3 BDCD +3 BDDD + + +-- !query 36 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 36 schema +struct<> +-- !query 36 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 37 +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r +-- !query 37 schema +struct +-- !query 37 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 38 +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 38 schema +struct<> +-- !query 38 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 39 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r +-- !query 39 schema +struct<> +-- !
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307093321 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 34 schema +struct +-- !query 34 output +0 A +1 AB +1 AC +2 ABB +2 ABC +2 ACB +2 ACC +3 ABBC +3 ABCC +3 ACBC +3 ACCC + + +-- !query 35 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 35 schema +struct +-- !query 35 output +0 A +0 B +1 AC +1 AD +1 BC +1 BD +2 ACC +2 ACD +2 ADC +2 ADD +2 BCC +2 BCD +2 BDC +2 BDD +3 ACCD +3 ACDD +3 ADCD +3 ADDD +3 BCCD +3 BCDD +3 BDCD +3 BDDD + + +-- !query 36 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 36 schema +struct<> +-- !query 36 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 37 +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r +-- !query 37 schema +struct +-- !query 37 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 38 +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 38 schema +struct<> +-- !query 38 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 39 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r +-- !query 39 schema +struct<> +-- !
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307093160 ## File path: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala ## @@ -98,3 +103,163 @@ private[image] class ImageFileFormat extends FileFormat with DataSourceRegister } } } + +object ImageFileFormat { + + val undefinedImageType = "Undefined" + + /** + * (Scala-specific) OpenCV type mapping supported + */ + val ocvTypes: Map[String, Int] = Map( +undefinedImageType -> -1, +"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 + ) + + /** + * (Java-specific) OpenCV type mapping supported + */ + val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava + + /** + * Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte]) + */ + private[image] val columnSchema = StructType( +StructField("origin", StringType, true) :: +StructField("height", IntegerType, true) :: +StructField("width", IntegerType, true) :: +StructField("nChannels", IntegerType, true) :: +// OpenCV-compatible type: CV_8UC3 in most cases +StructField("mode", IntegerType, true) :: Review comment: That's not true in structured streaming. Shall we don't change this here for now? Sounds like orthogonal with this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307092949 ## File path: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala ## @@ -1,266 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.ml.image - -import java.awt.Color -import java.awt.color.ColorSpace -import java.io.ByteArrayInputStream -import javax.imageio.ImageIO - -import scala.collection.JavaConverters._ - -import org.apache.spark.annotation.{Experimental, Since} -import org.apache.spark.input.PortableDataStream -import org.apache.spark.sql.{DataFrame, Row, SparkSession} -import org.apache.spark.sql.types._ - -/** - * :: Experimental :: - * Defines the image schema and methods to read and manipulate images. - */ -@Experimental -@Since("2.3.0") -object ImageSchema { - - val undefinedImageType = "Undefined" - - /** - * (Scala-specific) OpenCV type mapping supported - */ - val ocvTypes: Map[String, Int] = Map( -undefinedImageType -> -1, -"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 - ) - - /** - * (Java-specific) OpenCV type mapping supported - */ - val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava - - /** - * Schema for the image column: Row(String, Int, Int, Int, Int, Array[Byte]) - */ - val columnSchema = StructType( -StructField("origin", StringType, true) :: -StructField("height", IntegerType, false) :: -StructField("width", IntegerType, false) :: -StructField("nChannels", IntegerType, false) :: -// OpenCV-compatible type: CV_8UC3 in most cases -StructField("mode", IntegerType, false) :: -// Bytes in OpenCV-compatible order: row-wise BGR in most cases -StructField("data", BinaryType, false) :: Nil) - - val imageFields: Array[String] = columnSchema.fieldNames - - /** - * DataFrame with a single column of images named "image" (nullable) - */ - val imageSchema = StructType(StructField("image", columnSchema, true) :: Nil) - - /** - * Gets the origin of the image - * - * @return The origin of the image - */ - def getOrigin(row: Row): String = row.getString(0) - - /** - * Gets the height of the image - * - * @return The height of the image - */ - def getHeight(row: Row): Int = row.getInt(1) - - /** - * Gets the width of the image - * - * @return The width of the image - */ - def getWidth(row: Row): Int = row.getInt(2) - - /** - * Gets the number of channels in the image - * - * @return The number of channels in the image - */ - def getNChannels(row: Row): Int = row.getInt(3) - - /** - * Gets the OpenCV representation as an int - * - * @return The OpenCV representation as an int - */ - def getMode(row: Row): Int = row.getInt(4) - - /** - * Gets the image data - * - * @return The image data - */ - def getData(row: Row): Array[Byte] = row.getAs[Array[Byte]](5) - - /** - * Default values for the invalid image - * - * @param origin Origin of the invalid image - * @return Row with the default values - */ - private[spark] def invalidImageRow(origin: String): Row = -Row(Row(origin, -1, -1, -1, ocvTypes(undefinedImageType), Array.ofDim[Byte](0))) - - /** - * Convert the compressed image (jpeg, png, etc.) into OpenCV - * representation and store it in DataFrame Row - * - * @param origin Arbitrary string that identifies the image - * @param bytes Image bytes (for example, jpeg) - * @return DataFrame Row or None (if the decompression fails) - */ - private[spark] def decode(origin: String, bytes: Array[Byte]): Option[Row] = { - -val img = try { - ImageIO.read(new ByteArrayInputStream(bytes)) -} catch { - // Catch runtime exception because `ImageIO` may throw unexcepted `RuntimeException`. - // But do not catch the declared `IOException` (regarded as FileSystem failure) - case _: RuntimeException => null -} - -if (img == null) { - None -} else { - val isGray = img.getColorModel.getColorSpace.getType == ColorSpace.TYPE_GRAY - val hasAlpha = img.getColorModel.hasA
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307092864 ## File path: mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala ## @@ -98,3 +103,163 @@ private[image] class ImageFileFormat extends FileFormat with DataSourceRegister } } } + +object ImageFileFormat { Review comment: `BinaryFileFormat` has to be private. We only don't do `private[sql]` or `private[spark]` in execution and catalyst modules because we explicitly mention that those modules are private as of SPARK-16813 and SPARK-16964 I don't think we should keep non-API instances as public. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307092843 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 34 schema +struct +-- !query 34 output +0 A +1 AB +1 AC +2 ABB +2 ABC +2 ACB +2 ACC +3 ABBC +3 ABCC +3 ACBC +3 ACCC + + +-- !query 35 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r Review comment: ``` psql:with-recursive.sql:98: ERROR: recursive reference to query "r" must not appear within its non-recursive term LINE 6: SELECT level + 1, data || 'C' FROM r WHERE level < 2 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307092786 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r Review comment: ``` psql:with-recursive.sql:86: ERROR: recursive reference to query "r" must not appear within its non-recursive term LINE 4: SELECT level + 1, data || 'B' FROM r WHERE level < 2 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307092625 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output Review comment: pg cannot accept this query; ``` psql:with-recursive.sql:66: ERROR: recursive reference to query "r" must not appear within its non-recursive term LINE 2: SELECT level + 1 FROM r WHERE level < 10 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307092141 ## File path: sql/core/src/test/resources/sql-tests/inputs/cte.sql ## @@ -155,6 +155,419 @@ SELECT ( ) ); +-- fails due to recursion isn't allowed with RECURSIVE keyword +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- very basic recursion +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; Review comment: better to sort the output by using `ORDER BY`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307091967 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 Review comment: Can you make the error message more clear as pg does so? ``` postgres=# WITH r(level) AS ( postgres(# VALUES (0) postgres(# UNION ALL postgres(# SELECT level + 1 FROM r WHERE level < 10 postgres(# ) postgres-# SELECT * FROM r; ERROR: relation "r" does not exist LINE 4: SELECT level + 1 FROM r WHERE level < 10 ^ DETAIL: There is a WITH item named "r", but it cannot be referenced from this part of the query. HINT: Use WITH RECURSIVE, or re-order the WITH items to remove forward references. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307091759 ## File path: python/pyspark/ml/image.py ## @@ -203,52 +205,16 @@ def toImage(self, array, origin=""): return _create_row(self.imageFields, [origin, height, width, nChannels, mode, data]) -def readImages(self, path, recursive=False, numPartitions=-1, - dropImageFailures=False, sampleRatio=1.0, seed=0): -""" -Reads the directory of images from the local or remote source. - -.. note:: If multiple jobs are run in parallel with different sampleRatio or recursive flag, -there may be a race condition where one job overwrites the hadoop configs of another. - -.. note:: If sample ratio is less than 1, sampling uses a PathFilter that is efficient but -potentially non-deterministic. - -.. note:: Deprecated in 2.4.0. Use `spark.read.format("image").load(path)` instead and -this `readImages` will be removed in 3.0.0. - -:param str path: Path to the image directory. -:param bool recursive: Recursive search flag. -:param int numPartitions: Number of DataFrame partitions. -:param bool dropImageFailures: Drop the files that are not valid images. -:param float sampleRatio: Fraction of the images loaded. -:param int seed: Random number seed. -:return: a :class:`DataFrame` with a single column of "images", - see ImageSchema for details. - ->>> df = ImageSchema.readImages('data/mllib/images/origin/kittens', recursive=True) ->>> df.count() -5 - -.. versionadded:: 2.3.0 -""" -warnings.warn("`ImageSchema.readImage` is deprecated. " + - "Use `spark.read.format(\"image\").load(path)` instead.", DeprecationWarning) -spark = SparkSession.builder.getOrCreate() -image_schema = spark._jvm.org.apache.spark.ml.image.ImageSchema -jsession = spark._jsparkSession -jresult = image_schema.readImages(path, jsession, recursive, numPartitions, - dropImageFailures, float(sampleRatio), seed) -return DataFrame(jresult, spark._wrapped) - -ImageSchema = _ImageSchema() +ImageUtils = _ImageUtils() Review comment: Are we going to expose those utils as APIs for PySpark specifically or not? If we're going to keep them, we should keep the original name `ImageSchema` for backward compatibility. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307091599 ## File path: sql/core/src/test/resources/sql-tests/inputs/cte.sql ## @@ -155,6 +155,419 @@ SELECT ( ) ); +-- fails due to recursion isn't allowed with RECURSIVE keyword +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- very basic recursion +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- unlimited recursion fails at spark.sql.cte.recursion.level.limits level +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r; + +-- terminate recursion with LIMIT +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10; + +-- terminate projected recursion with LIMIT +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10; + +-- fails because using LIMIT to terminate recursion only works where Limit can be pushed through +-- recursion +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10; + +-- using string column in recursion +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r; + +-- recursion works regardless the order of anchor and recursive terms +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r; + +-- multiple anchor terms are supported +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- multiple recursive terms are supported +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- multiple anchor and recursive terms are supported +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- recursion without an anchor term fails +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- UNION combinator supported to eliminate duplicates and stop recursion +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r; + +-- fails because a recursive query should contain UNION ALL or UNION combinator +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- recursive reference is not allowed in a subquery +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r; + +-- recursive reference can't be used multiple times in a recursive term +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT r1.level + 1, r1.data + FROM r AS r1 + JOIN r AS r2 ON r2.data = r1.data + WHERE r1.level < 10 +) +SELECT * FROM r; + +-- recursive reference is not allowed on right side of a left outer join +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, r.data + FROM ( +SELECT 'B' AS data + ) AS o + LEFT JOIN r ON r.data = o.data +) +SELECT * FROM r; + +-- recursive reference is not allowed on left side of a right outer join +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, r.data + FROM r + RIGHT JOIN ( +SELECT 'B' AS data + ) AS o ON o.data = r.data +) +SELECT * FROM r; + +-- aggregate is supported in the anchor term +WITH RECURSIVE r(level, data) AS ( + SELECT MAX(level) AS level, SUM(data) AS data FROM VALUES (0, 1), (0, 2) + UNION ALL + SELECT level + 1, data FROM r WHERE level < 10 +) +SELECT * FROM r ORDER BY level; + +-- recursive reference is not allowed in an aggregate in a recursive term +WITH RECURSIVE r(group, data) AS ( + VALUES (0, 1L) + UNION ALL + SELECT 1, SUM(data) FROM r WHERE data < 10 GROUP BY group +) +SELECT * FROM r; + +-- recursive reference is not allowed in an aggregate (made from project) in a recursive term +WITH RECURSIVE r(level) AS ( + VALUES (1L) + UNION ALL + SELECT SUM(level) FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- aggregate is supported on a recursive table +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data FROM r WHERE level < 10 +) +SELECT COUNT(*) FROM r; + +-- recursive refe
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307091452 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 27 +-- Number of queries: 63 Review comment: I checked if these queries in this file could work well in postgresql; rewritten queries for postgresql: https://gist.github.com/maropu/0f7ac12ef3f1b6c6262ecfda4be6be09 output: https://gist.github.com/maropu/a3c12d50058157c6e697ba2a0d4b19dd This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307091452 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 27 +-- Number of queries: 63 Review comment: I checked these queries in this file work well in postgresql; rewritten queries for postgresql: https://gist.github.com/maropu/0f7ac12ef3f1b6c6262ecfda4be6be09 output: https://gist.github.com/maropu/a3c12d50058157c6e697ba2a0d4b19dd This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307091483 ## File path: python/pyspark/ml/image.py ## @@ -16,11 +16,11 @@ # """ -.. attribute:: ImageSchema +.. attribute:: ImageUtils -An attribute of this module that contains the instance of :class:`_ImageSchema`. +An attribute of this module that contains the instance of :class:`_ImageUtils`. -.. autoclass:: _ImageSchema +.. autoclass:: _ImageUtils :members: """ Review comment: We should remove ``` pyspark.ml.image module .. automodule:: pyspark.ml.image :members: :undoc-members: :inherited-members: ``` at `spark/python/docs/pyspark.ml.rst` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307091121 ## File path: python/pyspark/ml/image.py ## @@ -16,11 +16,11 @@ # """ Review comment: Shall we remove this doc since this isn't an API anymore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307091258 ## File path: python/pyspark/ml/image.py ## @@ -16,11 +16,11 @@ # """ Review comment: We should remove: ``` pyspark.ml.image module .. automodule:: pyspark.ml.image :members: :undoc-members: :inherited-members: ``` at `spark/python/docs/pyspark.ml.rst` as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307091258 ## File path: python/pyspark/ml/image.py ## @@ -16,11 +16,11 @@ # """ Review comment: We should remove: ``` pyspark.ml.image module .. automodule:: pyspark.ml.image :members: :undoc-members: :inherited-members: ``` at `spark/python/docs/pyspark.ml.rst` as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 URL: https://github.com/apache/spark/pull/25245#discussion_r307091121 ## File path: python/pyspark/ml/image.py ## @@ -16,11 +16,11 @@ # """ Review comment: Shall we remove this doc since this isn't an API anymore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r305786347 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1243,6 +1244,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging IsNotNull(e) case SqlBaseParser.NULL => IsNull(e) + case SqlBaseParser.TRUE => +invertIfNotDefined(BooleanTest(e, Some(true))) + case SqlBaseParser.FALSE => +invertIfNotDefined(BooleanTest(e, Some(false))) + case SqlBaseParser.UNKNOWN => +invertIfNotDefined(BooleanTest(e, None)) Review comment: Thanks for your reminder. But I don't known the reason that must use `IsNull` or `IsNotNull`. I think boolean predicate and null predicate is two different syntax. If we surely do this, I will change and reference @maropu 's suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307091036 ## File path: sql/core/src/test/resources/sql-tests/inputs/cte.sql ## @@ -155,6 +155,419 @@ SELECT ( ) ); +-- fails due to recursion isn't allowed with RECURSIVE keyword +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- very basic recursion +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- unlimited recursion fails at spark.sql.cte.recursion.level.limits level +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r; + +-- terminate recursion with LIMIT +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10; + +-- terminate projected recursion with LIMIT +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10; + +-- fails because using LIMIT to terminate recursion only works where Limit can be pushed through +-- recursion +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10; + +-- using string column in recursion +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r; + +-- recursion works regardless the order of anchor and recursive terms +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r; + +-- multiple anchor terms are supported +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- multiple recursive terms are supported +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- multiple anchor and recursive terms are supported +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- recursion without an anchor term fails +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r; + +-- UNION combinator supported to eliminate duplicates and stop recursion +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r; + +-- fails because a recursive query should contain UNION ALL or UNION combinator +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- recursive reference is not allowed in a subquery +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r; + +-- recursive reference can't be used multiple times in a recursive term +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT r1.level + 1, r1.data + FROM r AS r1 + JOIN r AS r2 ON r2.data = r1.data + WHERE r1.level < 10 +) +SELECT * FROM r; + +-- recursive reference is not allowed on right side of a left outer join +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, r.data + FROM ( +SELECT 'B' AS data + ) AS o + LEFT JOIN r ON r.data = o.data +) +SELECT * FROM r; + +-- recursive reference is not allowed on left side of a right outer join +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, r.data + FROM r + RIGHT JOIN ( +SELECT 'B' AS data + ) AS o ON o.data = r.data +) +SELECT * FROM r; + +-- aggregate is supported in the anchor term +WITH RECURSIVE r(level, data) AS ( + SELECT MAX(level) AS level, SUM(data) AS data FROM VALUES (0, 1), (0, 2) + UNION ALL + SELECT level + 1, data FROM r WHERE level < 10 +) +SELECT * FROM r ORDER BY level; + +-- recursive reference is not allowed in an aggregate in a recursive term +WITH RECURSIVE r(group, data) AS ( + VALUES (0, 1L) + UNION ALL + SELECT 1, SUM(data) FROM r WHERE data < 10 GROUP BY group +) +SELECT * FROM r; + +-- recursive reference is not allowed in an aggregate (made from project) in a recursive term +WITH RECURSIVE r(level) AS ( + VALUES (1L) + UNION ALL + SELECT SUM(level) FROM r WHERE level < 10 +) +SELECT * FROM r; + +-- aggregate is supported on a recursive table +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data FROM r WHERE level < 10 +) +SELECT COUNT(*) FROM r; + +-- recursive refe
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type
HyukjinKwon commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type URL: https://github.com/apache/spark/pull/25242#discussion_r307090548 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala ## @@ -196,6 +196,43 @@ class EncoderResolutionSuite extends PlanTest { encoder.resolveAndBind(attrs) } + test("SPARK-28497: complex type is not compatible with string encoder schema") { +val encoder = ExpressionEncoder[String] + +{ + val attrs = Seq('a.struct('x.long)) + assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message == +s""" + |Cannot up cast `a` from struct to string. + |The type path of the target object is: + |- root class: "java.lang.String" + |You can either add an explicit cast to the input data or choose a higher precision type +""".stripMargin.trim + " of the field in the target object") +} + +{ + val attrs = Seq('a.array(StringType)) + assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message == +s""" + |Cannot up cast `a` from array to string. Review comment: It doesn't necessarily compare the whole message. We can check if the message contains some keywords. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514867350 **[Test build #108146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108146/testReport)** for PR 25007 at commit [`b8b7b8d`](https://github.com/apache/spark/commit/b8b7b8d00418ccc735ec7bdcc10de5e71384e8bc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type
HyukjinKwon commented on a change in pull request #25242: [SPARK-28497][SQL] Disallow upcasting complex data types to string type URL: https://github.com/apache/spark/pull/25242#discussion_r307090548 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala ## @@ -196,6 +196,43 @@ class EncoderResolutionSuite extends PlanTest { encoder.resolveAndBind(attrs) } + test("SPARK-28497: complex type is not compatible with string encoder schema") { +val encoder = ExpressionEncoder[String] + +{ + val attrs = Seq('a.struct('x.long)) + assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message == +s""" + |Cannot up cast `a` from struct to string. + |The type path of the target object is: + |- root class: "java.lang.String" + |You can either add an explicit cast to the input data or choose a higher precision type +""".stripMargin.trim + " of the field in the target object") +} + +{ + val attrs = Seq('a.array(StringType)) + assert(intercept[AnalysisException](encoder.resolveAndBind(attrs)).message == +s""" + |Cannot up cast `a` from array to string. Review comment: I doesn't necessarily compare the whole message. We can check if the message contains some keywords. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
maropu commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307090403 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1873,6 +1874,22 @@ object SQLConf { .booleanConf .createWithDefault(false) + val RECURSION_LEVEL_LIMIT = buildConf("spark.sql.cte.recursion.level.limit") +.internal() +.doc("Maximum level of recursion that is allowed wile executing a recursive CTE definition." + + "If a query does not get exhausted before reaching this limit it fails.") +.intConf +.createWithDefault(100) Review comment: Can you use "-1" for the unlimited case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-514866817 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13248/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org