[GitHub] spark pull request #16251: [SPARK-18826][SS]Add 'latestFirst' option to File...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/16251#discussion_r92558907 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -1059,6 +1060,72 @@ class FileStreamSourceSuite extends FileStreamSourceTest { val str = Source.fromFile(getClass.getResource(s"/structured-streaming/$file").toURI).mkString SerializedOffset(str.trim) } + + test("FileStreamSource - latestFirst") { +withTempDir { src => + // Prepare two files: 1.txt, 2.txt, and make sure they have different modified time. + val f1 = stringToFile(new File(src, "1.txt"), "1") + val f2 = stringToFile(new File(src, "2.txt"), "2") + eventually(timeout(streamingTimeout)) { +f2.setLastModified(System.currentTimeMillis()) +assert(f1.lastModified < f2.lastModified) + } + + // Read oldest files first, so the first batch is "1", and the second batch is "2". + val fileStream = createFileStream( +"text", +src.getCanonicalPath, +options = Map("latestFirst" -> "false", "maxFilesPerTrigger" -> "1")) + val clock = new StreamManualClock() + testStream(fileStream)( +StartStream(trigger = ProcessingTime(10), triggerClock = clock), +AssertOnQuery { _ => --- End diff -- why do you need to wait on the manual clock? CheckLastBatch will automatically wait for the batch to complete? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16281 Parquet is the default format of Spark. It is pretty significant to Spark. Now, Parquet is becoming stable and might be the right time to fork it. We are just fixing the bugs. @liancheng and @rdblue are Parquet committers. They might be the right person to judge the changes we made in the forked version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16251: [SPARK-18826][SS]Add 'latestFirst' option to File...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/16251#discussion_r92558557 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -1059,6 +1060,72 @@ class FileStreamSourceSuite extends FileStreamSourceTest { val str = Source.fromFile(getClass.getResource(s"/structured-streaming/$file").toURI).mkString SerializedOffset(str.trim) } + + test("FileStreamSource - latestFirst") { +withTempDir { src => + // Prepare two files: 1.txt, 2.txt, and make sure they have different modified time. + val f1 = stringToFile(new File(src, "1.txt"), "1") + val f2 = stringToFile(new File(src, "2.txt"), "2") + eventually(timeout(streamingTimeout)) { --- End diff -- why use eventually? Why not just set f1.setLatModified(f2.lastModified + 1000) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16294: [WIP][SPARK-18669][SS][DOCS] Update Apache docs for Stru...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16294 **[Test build #70182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70182/testReport)** for PR 16294 at commit [`ed8d9e0`](https://github.com/apache/spark/commit/ed8d9e0e40292979ff250ceab76ff966510f2597). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16293: [SPARK-17119][Core]allow the history server to delete .i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16293 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16289 **[Test build #70183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70183/testReport)** for PR 16289 at commit [`9cc8d2b`](https://github.com/apache/spark/commit/9cc8d2b65e38d8cf1395ce265e5ee08d6006a19c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16294: [WIP][SPARK-18669][SS][DOCS] Update Apache docs f...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/16294 [WIP][SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with update jsons generated by ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-18669 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16294.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16294 commit a31c861c31a977537aa3b4c86a2fd1ea1ee544a3 Author: Tathagata Das Date: 2016-12-14T02:52:20Z Added watermarking commit ed8d9e0e40292979ff250ceab76ff966510f2597 Author: Tathagata Das Date: 2016-12-14T20:36:42Z Update metrics --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16293: [SPARK-17119][Core]allow the history server to de...
GitHub user cnZach opened a pull request: https://github.com/apache/spark/pull/16293 [SPARK-17119][Core]allow the history server to delete .inprogress files(configurable) ## What changes were proposed in this pull request? The History Server (HS) currently only considers completed applications when deleting event logs from spark.history.fs.logDirectory (since SPARK-6879). This means that over time, .inprogress files (from failed jobs, jobs where the SparkContext is not closed, spark-shell exits etc...) can accumulate and impact the HS. Instead of having to manually delete these files, this change add a configurable feature to let user decide if the .inprogress files should also be deleted after a period of time: spark.history.fs.cleaner.deleteInProgress.enabled spark.history.fs.cleaner.noProgressMaxAge ## How was this patch tested? verified with manual tests unit tests added in FsHistoryProviderSuite.scala but I am not able to run ./dev/run-tests for the whole project on my laptop, failed on SparkSinkSuite and network related tests uner org.apache.spark.network.* (all due to java.io.IOException: Failed to connect to /:62343). [info] SparkSinkSuite: [info] - Success with ack *** FAILED *** (1 minute) [info] java.io.IOException: Error connecting to /0.0.0.0:62298 [info] at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261) ## doc ## monitoring.md is also updated You can merge this pull request into a Git repository by running: $ git pull https://github.com/cnZach/spark SPARK-17119 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16293.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16293 commit aa45caa42a7bc1b4a06e6634f9a40c4db6b83a89 Author: Yuexin Zhang Date: 2016-12-15T06:19:11Z allow the history server to delete .inprogress files and make it configurable commit f281d92a49e54f64f157f8d2936a13a73c7284cb Author: Yuexin Zhang Date: 2016-12-15T06:39:12Z fix a typo noProgressMaxAg -> noProgressMaxAge commit 989422d310a0addeb25217e61fda85c34e5d4c89 Author: Yuexin Zhang Date: 2016-12-15T06:41:57Z fix checkstyle failures in FsHistoryProviderSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70175/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70175 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70175/testReport)** for PR 16263 at commit [`a2d071d`](https://github.com/apache/spark/commit/a2d071d6f5ab916f9e39b5ccb50e4fb11cba183d). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70178/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #70178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70178/testReport)** for PR 16030 at commit [`dc54b69`](https://github.com/apache/spark/commit/dc54b699c3c93f11eaa93063b3b950e04c614a56). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70176/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70176/testReport)** for PR 16263 at commit [`67882d2`](https://github.com/apache/spark/commit/67882d2d4ebfad955b07cf0020c726ea5a153864). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16285: [SPARK-18867] [SQL] Throw cause if IsolatedClientLoad ca...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16285 Is it ever possible cause is null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70177/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #70177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70177/testReport)** for PR 16030 at commit [`5b23b89`](https://github.com/apache/spark/commit/5b23b89a4a0b9b16f16c56d03fc226b8eb53c92f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16290#discussion_r92553628 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", { expect_equal(grepl("blah", retError), TRUE) }) +test_that("Default warehouse dir should be set to tempdir", { + # nothing should be written outside tempdir() without explicit user permission + inital_working_directory_files <- list.files() --- End diff -- From my test, the `spark-warehouse` directory is created when I run `a <- createDataFrame(iris)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16290#discussion_r92553387 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", { expect_equal(grepl("blah", retError), TRUE) }) +test_that("Default warehouse dir should be set to tempdir", { + # nothing should be written outside tempdir() without explicit user permission + inital_working_directory_files <- list.files() --- End diff -- I'm referring to other tests in this test file, test_sparkSQL, that is calling to API that might already initialize the warehouse dir. `sparkR.session()` is called at the top. Does this `createOrReplaceTempView` cause the warehouse dir to be created? https://github.com/shivaram/spark-1/blob/25834109588e8e545deafb1da162958766a057e2/R/pkg/inst/tests/testthat/test_sparkSQL.R#L570 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16030#discussion_r92551213 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala --- @@ -49,9 +49,12 @@ case class HadoopFsRelation( override def sqlContext: SQLContext = sparkSession.sqlContext val schema: StructType = { -val dataSchemaColumnNames = dataSchema.map(_.name.toLowerCase).toSet -StructType(dataSchema ++ partitionSchema.filterNot { column => - dataSchemaColumnNames.contains(column.name.toLowerCase) +val equality = sparkSession.sessionState.conf.resolver +val overriddenDataSchema = dataSchema.map { dataField => --- End diff -- how about ``` val getColName: (StructField => String) = if (conf.caseSensitive) _.name else _.name.toLowerCase val overlappedPartCols = mutable.Map.empty[String, StructField] for { dataField <- dataSchema partitionField <- partitionSchema if getColName(dataField) == getColName(partitionField) } overlappedPartCols += getColName(partitionField) -> partitionField StructType(dataSchema.map(f => overlappedPartCols.getOrElse(getColName(f), f)) ++ partitionSchema.filterNot(f => overlappedPartCols.contains(getColName(f ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16290 **[Test build #70181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70181/testReport)** for PR 16290 at commit [`1d0d1d2`](https://github.com/apache/spark/commit/1d0d1d219f392721e9be73e21752100db0ce065f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16276: [SPARK-18855][CORE] Add RDD flatten function
Github user linbojin commented on a diff in the pull request: https://github.com/apache/spark/pull/16276#discussion_r92550699 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -381,6 +381,14 @@ abstract class RDD[T: ClassTag]( } /** +* Return a new RDD by flattening all elements from RDD with traversable elements +*/ + def flatten[U: ClassTag](implicit asTraversable: T => TraversableOnce[U]): RDD[U] = withScope { --- End diff -- @srowen I think i figured out a simpler way: ``` def flatten[U: ClassTag](implicit asTraversable: T => TraversableOnce[U]): RDD[U] = withScope { new MapPartitionsRDD[U, T](this, (context, pid, iter) => { var newIter: Iterator[U] = Iterator.empty for (x <- iter) newIter ++= asTraversable(x) newIter }) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16290 If the default database has already been created in the metastore, any following changes of `spark.sql.default.warehouse.dir` can trigger an issue when we create a data source table in the default database (Here, we assume Hive support is enabled). Note, we will not hit any issue if we create a Hive serde table in the default database, or create a data source table in the non-default database. The directory of managed data source tables is created by Hive. When creating a new data source table, the created directory is based on the current value of `hive.metastore.warehouse.dir`. However, the value of table location in the metastore is pointing to the child directory of the location of the default database. Thus, you will not hit any issue when you creating such a table. However, the mismatch will cause a problem (because the expected directory does not exist), when we try to select from /insert into this table. This is a bug of Hive metastore. @dilipbiswal hit this issue very recently. Below shows the location of these two tables. `t11` is a Hive managed data source table we created in the default database. ``` spark-sql> describe extended t11; ... Storage(Location: file:/user/hive/warehouse/t11, InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: [serialization.format=1])) Time taken: 0.105 seconds, Fetched 8 row(s) ``` `t1` is a Hive managed data source table we created in the non-default database. ``` spark-sql> use dilip; Time taken: 0.028 seconds spark-sql> describe extended t1; ... Storage(Location: file:/home/cloudera/mygit/apache/spark/bin/spark-warehouse/dilip.db/t1, InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: [serialization.format=1])) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16290#discussion_r92548697 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala --- @@ -55,14 +55,19 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging { s"is set. Setting ${WAREHOUSE_PATH.key} to the value of " + s"hive.metastore.warehouse.dir ('$hiveWarehouseDir').") hiveWarehouseDir -} else { +} else if (sparkContext.conf.contains(WAREHOUSE_PATH.key) && + sparkContext.conf.get(WAREHOUSE_PATH).isDefined) { --- End diff -- Nit: indent is not right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16286: [SPARK-18849][ML][SPARKR][DOC] vignettes final ch...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16286 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16249 Just FYI, I'm holding off on this till 2.1 - I think its better to make this change after the release just to be safe --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16286: [SPARK-18849][ML][SPARKR][DOC] vignettes final check upd...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16286 Merging this to master, branch-2.1 - to catch the next RC. @mengxr feel free to open a follow up if you find anything ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15996#discussion_r92547533 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -363,48 +364,120 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { throw new AnalysisException("Cannot create hive serde table with saveAsTable API") } -val tableExists = df.sparkSession.sessionState.catalog.tableExists(tableIdent) - -(tableExists, mode) match { - case (true, SaveMode.Ignore) => -// Do nothing - - case (true, SaveMode.ErrorIfExists) => -throw new AnalysisException(s"Table $tableIdent already exists.") - - case _ => -val existingTable = if (tableExists) { - Some(df.sparkSession.sessionState.catalog.getTableMetadata(tableIdent)) -} else { - None -} -val storage = if (tableExists) { - existingTable.get.storage -} else { - DataSource.buildStorageFormatFromOptions(extraOptions.toMap) -} -val tableType = if (tableExists) { - existingTable.get.tableType -} else if (storage.locationUri.isDefined) { - CatalogTableType.EXTERNAL -} else { - CatalogTableType.MANAGED +val catalog = df.sparkSession.sessionState.catalog +val db = tableIdent.database.getOrElse(catalog.getCurrentDatabase) +val tableIdentWithDB = tableIdent.copy(database = Some(db)) +catalog.getTableMetadataOption(tableIdent) match { + // If the table already exists... + case Some(tableMeta) => +mode match { + case SaveMode.Ignore => // Do nothing + + case SaveMode.ErrorIfExists => +throw new AnalysisException(s"Table $tableIdent already exists. You can set SaveMode " + + "to SaveMode.Append to insert data into the table or set SaveMode to " + + "SaveMode.Overwrite to overwrite the existing data.") + + case SaveMode.Append => +// Check if the specified data source match the data source of the existing table. +val specifiedProvider = DataSource.lookupDataSource(source) +// TODO: Check that options from the resolved relation match the relation that we are +// inserting into (i.e. using the same compression). + +// Pass a table identifier with database part, so that `lookupRelation` won't get temp +// views unexpectedly. + EliminateSubqueryAliases(catalog.lookupRelation(tableIdentWithDB)) match { + case l @ LogicalRelation(_: InsertableRelation | _: HadoopFsRelation, _, _) => +// check if the file formats match +l.relation match { + case r: HadoopFsRelation if r.fileFormat.getClass != specifiedProvider => +throw new AnalysisException( + s"The file format of the existing table $tableIdent is " + +s"`${r.fileFormat.getClass.getName}`. It doesn't match the specified " + +s"format `$source`") + case _ => +} + case s: SimpleCatalogRelation if DDLUtils.isDatasourceTable(s.metadata) => // OK. + case c: CatalogRelation if c.catalogTable.provider == Some(DDLUtils.HIVE_PROVIDER) => +throw new AnalysisException("Saving data in the Hive serde table " + + s"${c.catalogTable.identifier} is not supported yet. Please use the " + + "insertInto() API as an alternative..") + case o => +throw new AnalysisException(s"Saving data in ${o.toString} is not supported.") +} + +val existingSchema = tableMeta.schema +if (df.logicalPlan.schema.size != existingSchema.size) { + throw new AnalysisException( +s"The column number of the existing schema[$existingSchema] " + + s"doesn't match the data schema[${df.logicalPlan.schema}]") +} + +val specifiedPartCols = partitioningColumns.getOrElse(Nil) +val existingPartCols = tableMeta.partitionColumnNames +if (specifiedPartCols.map(_.toLowerCase) != existingPartCols.map(_.toLowerCase)) { + throw new AnalysisException("The partition columns of the existing table " + +s"$tableIdent are: [${existingPartCols.mkString(", ")}]. It doesn't match the " + +s"specified partition columns: [${specifiedPartCols.mkString(", ")}]")
[GitHub] spark pull request #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16292 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16292 Thank you, @shivaram ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16292 LGTM. Thanks @dongjoon-hyun - Merging this to master, branch-2.1 and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16292 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70179/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16276: [SPARK-18855][CORE] Add RDD flatten function
Github user linbojin commented on a diff in the pull request: https://github.com/apache/spark/pull/16276#discussion_r92546374 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -381,6 +381,14 @@ abstract class RDD[T: ClassTag]( } /** +* Return a new RDD by flattening all elements from RDD with traversable elements +*/ + def flatten[U: ClassTag](implicit asTraversable: T => TraversableOnce[U]): RDD[U] = withScope { --- End diff -- Hi @srowen, thx for your suggestion. I have one way to use scala flatMap as follows: ``` def flatten[U: ClassTag](implicit asTraversable: T => TraversableOnce[U]): RDD[U] = withScope { val f = (x: T) => asTraversable(x) val cleanF = sc.clean(f) new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.flatMap(f)) } ``` Or i implement the logic by myself: ``` def flatten[U: ClassTag](implicit asTraversable: T => TraversableOnce[U]): RDD[U] = withScope { new MapPartitionsRDD[U, T](this, (context, pid, iter) => new Iterator[U] { private val empty = Iterator.empty private var cur: Iterator[U] = empty private def nextCur() { cur = asTraversable(iter.next).toIterator } def hasNext: Boolean = { while (!cur.hasNext) { if (!iter.hasNext) return false nextCur() } true } def next(): U = (if (hasNext) cur else empty).next() }) } ``` ref: https://github.com/scala/scala/blob/v2.11.8/src/library/scala/collection/Iterator.scala#L432 Which one do you think is better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16292 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16292 **[Test build #70179 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70179/testReport)** for PR 16292 at commit [`50a5c2e`](https://github.com/apache/spark/commit/50a5c2e51d1bc99f7237ca896ce406caa33cd9bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70174/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70174/testReport)** for PR 16263 at commit [`003da89`](https://github.com/apache/spark/commit/003da89d22f04cae62de1a3ed38d105d42fe0051). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...
Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/16240#discussion_r92546082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala --- @@ -100,31 +100,76 @@ abstract class SQLImplicits { // Seqs /** @since 1.6.1 */ - implicit def newIntSeqEncoder: Encoder[Seq[Int]] = ExpressionEncoder() + implicit def newIntSeqEncoder[T <: Seq[Int] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newLongSeqEncoder: Encoder[Seq[Long]] = ExpressionEncoder() + implicit def newLongSeqEncoder[T <: Seq[Long] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newDoubleSeqEncoder: Encoder[Seq[Double]] = ExpressionEncoder() + implicit def newDoubleSeqEncoder[T <: Seq[Double] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newFloatSeqEncoder: Encoder[Seq[Float]] = ExpressionEncoder() + implicit def newFloatSeqEncoder[T <: Seq[Float] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newByteSeqEncoder: Encoder[Seq[Byte]] = ExpressionEncoder() + implicit def newByteSeqEncoder[T <: Seq[Byte] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newShortSeqEncoder: Encoder[Seq[Short]] = ExpressionEncoder() + implicit def newShortSeqEncoder[T <: Seq[Short] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newBooleanSeqEncoder: Encoder[Seq[Boolean]] = ExpressionEncoder() + implicit def newBooleanSeqEncoder[T <: Seq[Boolean] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newStringSeqEncoder: Encoder[Seq[String]] = ExpressionEncoder() + implicit def newStringSeqEncoder[T <: Seq[String] : TypeTag]: Encoder[T] = ExpressionEncoder() /** @since 1.6.1 */ - implicit def newProductSeqEncoder[A <: Product : TypeTag]: Encoder[Seq[A]] = ExpressionEncoder() + implicit def newProductSeqEncoder[A <: Product : TypeTag, T <: Seq[A] : TypeTag]: Encoder[T] = +ExpressionEncoder() + + // Seqs with product (List) disambiguation + + /** @since 2.2.0 */ + implicit def newIntSeqWithProductEncoder[T <: Seq[Int] with Product : TypeTag]: Encoder[T] = +newIntSeqEncoder + + /** @since 2.2.0 */ + implicit def newLongSeqWithProductEncoder[T <: Seq[Long] with Product : TypeTag]: Encoder[T] = +newLongSeqEncoder + + /** @since 2.2.0 */ + implicit def newDoubleListEncoder[T <: Seq[Double] with Product : TypeTag]: Encoder[T] = --- End diff -- Should this be `newDoubleSeqWithProductEncoder`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838][WIP] Use separate executor service for eac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16291 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70180/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16289: [SPARK-18870] Disallowed Distinct Aggregations on...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16289#discussion_r92545873 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -95,6 +96,15 @@ object UnsupportedOperationChecker { // Operations that cannot exists anywhere in a streaming plan subPlan match { +case Aggregate(_, aggregateExpressions, child) => + val distinctAggExprs = aggregateExpressions.flatMap { expr => +expr.collect { case ae: AggregateExpression if ae.isDistinct => ae } + } + throwErrorIf( +child.isStreaming && distinctAggExprs.nonEmpty, +"Distinct aggregations are not supported on streaming DataFrames/Datasets, unless" + --- End diff -- you need an extra space here. I'd also recommend users using approximate distinct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838][WIP] Use separate executor service for eac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16291 **[Test build #70180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70180/testReport)** for PR 16291 at commit [`defd536`](https://github.com/apache/spark/commit/defd536bd3a1692156a3bcc82526ffbea01ca702). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class StreamingQueryListenerBus(val sparkListenerBus: LiveListenerBus)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838][WIP] Use separate executor service for eac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16291 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16030 @liancheng As for `DataFrameReader.dataSchema()` and `DataFrameReader.partitoinSchema()`, did you mean we add new interfaces there for users to set user-defined data and partition schema, respectively? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16291 **[Test build #70180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70180/testReport)** for PR 16291 at commit [`defd536`](https://github.com/apache/spark/commit/defd536bd3a1692156a3bcc82526ffbea01ca702). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16288: [SPARK-18869][SQL] Add TreeNode.p that returns Ba...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16288 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16228 @Tagar We can always find extreme cases to which these formula can't apply. In my opinion, it's better to over-estimate than under-estimate, which can lead to OOM problems, e.g. broadcast a very large result. If A is a big table and B is a small one, every A.k has a match in B (a common case for PK and FK), then > cardinality(A) + cardinality(B) - inner_join_cardinality(table_A, table_B)) becomes card(B), which is dramatically smaller than the real outer join card. Even more, it can be negative if all A.k and B.k has the same value, the inner join part becomes a cartesian product. This formula, > cardinality = MAX(card(A) + card(B), innerCard(AB)) although over estimates sometimes, it's still obviously better than the original one in spark: card(A) * card(B). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r92545598 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +58,93 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") +val array = ctx.freshName("array") -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val isPrimitiveArray = ctx.isPrimitiveType(et) +val primitiveTypeName = if (isPrimitiveArray) ctx.primitiveTypeName(et) else "" +val (preprocess, arrayData, arrayWriter) = + GenArrayData.getCodeArrayData(ctx, et, children.size, isPrimitiveArray, array) + +ev.copy(code = + preprocess + ctx.splitExpressions( ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; +evals.zipWithIndex.map { case (eval, i) => + eval.code + +(if (isPrimitiveArray) { + (if (!children(i).nullable) { +s"\n$arrayWriter.write($i, ${eval.value});" + } else { +s""" +if (${eval.isNull}) { + $arrayWriter.setNull$primitiveTypeName($i); +} else { + $arrayWriter.write($i, ${eval.value}); +} + """ + }) } else { - $values[$i] = ${eval.value}; -} - """ + s""" + if (${eval.isNull}) { +$array[$i] = null; + } else { +$array[$i] = ${eval.value}; + } + """ +}) }) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") + s"\nfinal ArrayData ${ev.value} = $arrayData;\n", + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + // This function returns Java code pieces based on DataType and isPrimitive + // for allocation of ArrayData class + def getCodeArrayData( + ctx: CodegenContext, + dt: DataType, + size: Int, + isPrimitive : Boolean, + array: String): (String, String, String) = { +if (!isPrimitive) { + val arrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", array, +s"this.$array = new Object[${size}];") + ("", s"new $arrayClass($array)", null) +} else { + val holder = ctx.freshName("holder") + val arrayWriter = ctx.freshName("createArrayWriter") + val unsafeArrayClass = classOf[UnsafeArrayData].getName + val holderClass = classOf[BufferHolder].getName + val arrayWriterClass = classOf[UnsafeArrayWriter].getName + ctx.addMutableState(unsafeArrayClass, array, "") + ctx.addMutableState(holderClass, holder, "") + ctx.addMutableState(arrayWriterClass, arrayWriter, "") + val baseOffset = Platform.BYTE_ARRAY_OFFSET + val unsafeArraySizeInBytes = +UnsafeArrayData.calculateHeaderPortionInBytes(size) + +ByteArrayMethods.roundNumberOfBytesToNearestWord(dt.defaultSize * size) + + (s""" +$array = new $unsafeArrayClass(); +$holder = new $holderClass($unsafeArraySizeInBytes); +$arrayWriter = new $arrayWriterClass(); --- End diff -- @cloud-fan what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16232#discussion_r92545564 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java --- @@ -96,13 +98,35 @@ public UnsafeKVExternalSorter( numElementsForSpillThreshold, canUseRadixSort); } else { - // The array will be used to do in-place sort, which require half of the space to be empty. - assert(map.numKeys() <= map.getArray().size() / 2); + // Becasue we insert the number of values in the map into `UnsafeInMemorySorter`, if + // the number of values is more than the number of keys, and the array in the map is + // not big enough to do in-place sort, we must acquire new array. + // To insert a record into `UnsafeInMemorySorter` will consume two spaces in the array. + // We must have half of the array as empty. There are totally `map.numValues()` records + // to be inserted. + LongArray sortArray = null; + boolean useAllocatedArray = false; + if (map.numValues() > map.numKeys() && map.numValues() * 2 > map.getArray().size() / 2) { --- End diff -- oh. I added the comment to explain the correct number. So I keep the multiplication and division to make it clear and match the explanation. Do you prefer to simplify it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16288 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16280: [SPARK-18856][SQL] non-empty partitioned table sh...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16292 cc @shivaram @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16280 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16292 **[Test build #70179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70179/testReport)** for PR 16292 at commit [`50a5c2e`](https://github.com/apache/spark/commit/50a5c2e51d1bc99f7237ca896ce406caa33cd9bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generat...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/16292 [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by adding `DESCRIPTION` file ## What changes were proposed in this pull request? Since Apache Spark 1.4.0, R API document page has a broken link on `DESCRIPTION file` because Jekyll plugin script doesn't copy the file. This PR aims to fix that. - Official Latest Website: http://spark.apache.org/docs/latest/api/R/index.html - Apache Spark 2.1.0-rc2: http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/api/R/index.html ## How was this patch tested? Manual. ```bash cd docs SKIP_SCALADOC=1 jekyll build ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-18875 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16292 commit 50a5c2e51d1bc99f7237ca896ce406caa33cd9bc Author: Dongjoon Hyun Date: 2016-12-15T04:38:21Z [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by adding `DESCRIPTION` file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #70178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70178/testReport)** for PR 16030 at commit [`dc54b69`](https://github.com/apache/spark/commit/dc54b699c3c93f11eaa93063b3b950e04c614a56). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16030 @cloud-fan Does the latest fix satisfy what you suggested? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #70177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70177/testReport)** for PR 16030 at commit [`5b23b89`](https://github.com/apache/spark/commit/5b23b89a4a0b9b16f16c56d03fc226b8eb53c92f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70176/testReport)** for PR 16263 at commit [`67882d2`](https://github.com/apache/spark/commit/67882d2d4ebfad955b07cf0020c726ea5a153864). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/16030#discussion_r92543342 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala --- @@ -969,4 +969,17 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha )) } } + + test("SPARK-18108 Partition discovery fails with explicitly written long partitions") { --- End diff -- yea, thanks. I'm now working on this and I'll update soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16030#discussion_r92542776 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala --- @@ -969,4 +969,17 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha )) } } + + test("SPARK-18108 Partition discovery fails with explicitly written long partitions") { --- End diff -- I think it's not `Partition discovery fails`, but `parquet reader fails` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70167/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16280 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16280 **[Test build #70167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70167/testReport)** for PR 16280 at commit [`1628b29`](https://github.com/apache/spark/commit/1628b29f90eb97f0d951c3850518ce6bd9b49d2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...
Github user Tagar commented on the issue: https://github.com/apache/spark/pull/16228 @wzhfy, it's easier to check validity of these type of expressions when you look at extreme cases. Your formula for full outer join cardinality, > cardinality = MAX(card(A) + card(B), innerCard(AB)) in one of extreme cases when set(A) and set(B) are the same sets, then calculated cardinality would be 2 times more of the actual cardinality. While > full_outer_join_cardinality(table_A, table_B) = cardinality(A) + cardinality(B) - inner_join_cardinality(table_A, table_B)) will produce correct result. ps. I find this visualization http://www.radacad.com/wp-content/uploads/2015/07/joins.jpg very helpful. https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle A U B = A + B - A \ B Hope this helps. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16288 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70164/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16288 **[Test build #70164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70164/testReport)** for PR 16288 at commit [`f498b4a`](https://github.com/apache/spark/commit/f498b4a0f2cbc9d3f0c038b73053c8018a6d9984). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16289 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70170/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16289 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16282 **[Test build #70153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70153/testReport)** for PR 16282 at commit [`c4e6962`](https://github.com/apache/spark/commit/c4e6962dbf22c2ec7658f95fd1be069628860855). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70153/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16289 **[Test build #70170 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70170/testReport)** for PR 16289 at commit [`dd2b2c8`](https://github.com/apache/spark/commit/dd2b2c8c1a11b0b9d9fa70dc146a36c65d94530c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16282 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70175/testReport)** for PR 16263 at commit [`a2d071d`](https://github.com/apache/spark/commit/a2d071d6f5ab916f9e39b5ccb50e4fb11cba183d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70174/testReport)** for PR 16263 at commit [`003da89`](https://github.com/apache/spark/commit/003da89d22f04cae62de1a3ed38d105d42fe0051). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16290 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16290 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70171/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16288 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16288 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70166/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16290 **[Test build #70171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70171/testReport)** for PR 16290 at commit [`2583410`](https://github.com/apache/spark/commit/25834109588e8e545deafb1da162958766a057e2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16288 **[Test build #70166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70166/testReport)** for PR 16288 at commit [`62acdb6`](https://github.com/apache/spark/commit/62acdb6ecbf8c645be2cecbd7202819f74438efe). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16281 Thank you for review, @rxin . Forking may give more controllability, but it goes invisible (in terms of documents) soon. According to the recent mail on Spark dev list, only committers seem to know the latest repository location of Spark Hive fork. I also want to contribute some on that, but it's difficult for me to find how to do. Every Apache projects (including Apache Spark) have some bugs at every release. I don't think there is no bug on Parquet 1.9.0. But, Parquet community exists to improve that, doesn't it? BTW, @rxin and @srowen . To reduce the risk, - Do you want to add more Spark-side test case here? - Or, do you prefer to skip 1.9.0 and go 1.10 directly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16030 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70169/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16030 **[Test build #70169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70169/testReport)** for PR 16030 at commit [`248833d`](https://github.com/apache/spark/commit/248833d0b689e243467c9901d40c3c53a63b284a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16291 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16291 **[Test build #70173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70173/testReport)** for PR 16291 at commit [`ed79578`](https://github.com/apache/spark/commit/ed795783ddc413bedadcadc012b94041c82ac71f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class StreamingQueryListenerBus(val sparkListenerBus: LiveListenerBus)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16291 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70173/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16291 **[Test build #70173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70173/testReport)** for PR 16291 at commit [`ed79578`](https://github.com/apache/spark/commit/ed795783ddc413bedadcadc012b94041c82ac71f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16287: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16287 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16287: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16287 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70163/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16287: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16287 **[Test build #70163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70163/testReport)** for PR 16287 at commit [`cedaafd`](https://github.com/apache/spark/commit/cedaafdff23ad99e4be06077c5b5cc3bee6ebf07). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/16291 cc - @zsxwing - Please note that the PR is incomplete and there are some test failures. I just wanted some initial feedback on the design before investing more time on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16291 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16291 **[Test build #70172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70172/testReport)** for PR 16291 at commit [`b4af82f`](https://github.com/apache/spark/commit/b4af82f0a95487cd099432d17864b3cfac2780bb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class StreamingQueryListenerBus(val sparkListenerBus: LiveListenerBus)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org