[GitHub] spark issue #13593: [SPARK-15864] [SQL] Fix Inconsistent Behaviors when Unca...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13593 @rxin @liancheng I see. Since the existing Dataset API `sparkSession.catalog.uncacheTable("non-cachedTable")` issues an error if uncaching non-cached tables. Thus, to ensure both SQL statements and Dataset APIs have the same behavior. We still need to change one of them, right? Will follow what @rxin said. No-op if the table is already uncached. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13137 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60336/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13137 **[Test build #60336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60336/consoleFull)** for PR 13137 at commit [`4f3ee3c`](https://github.com/apache/spark/commit/4f3ee3cccba78911530767feef99a07794428b73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13444 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13444 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60335/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13444 **[Test build #60335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60335/consoleFull)** for PR 13444 at commit [`f392f91`](https://github.com/apache/spark/commit/f392f915ade9fb2863e421891981cc278a887bdb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13572: [SPARK-15862] [SQL] Better Error Message When Hav...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13572#discussion_r66701399 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/cache.scala --- @@ -17,30 +17,30 @@ package org.apache.spark.sql.execution.command -import org.apache.spark.sql.{Dataset, Row, SparkSession} +import org.apache.spark.sql.{AnalysisException, Dataset, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.expressions.Attribute import org.apache.spark.sql.catalyst.plans.QueryPlan import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan case class CacheTableCommand( - tableName: String, - plan: Option[LogicalPlan], - isLazy: Boolean) - extends RunnableCommand { +tableIdent: TableIdentifier, +plan: Option[LogicalPlan], +isLazy: Boolean) extends RunnableCommand { --- End diff -- Just added. : ) Please check if the checking is enough. https://github.com/apache/spark/pull/13572/files#diff-bc55b5f76add105ec32ae4107075b278R30 `default`.`tab` is still not allowed to create temp tables. Thus, I did not change that part. Let me know if anything else I need to change. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60333/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60330/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13613 **[Test build #60333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60333/consoleFull)** for PR 13613 at commit [`5f74d95`](https://github.com/apache/spark/commit/5f74d9529c59e28341906429ba27450f91ffbcc4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13571 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13571 **[Test build #60330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60330/consoleFull)** for PR 13571 at commit [`84e1bf1`](https://github.com/apache/spark/commit/84e1bf14e51f98c13b2177d6c04c0a02e54982f7). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CollectionAccumulator[T] extends AccumulatorV2[T, java.util.List[T]] ` * `class LibSVMFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `abstract class ForeachWriter[T] extends Serializable ` * ` * case class Person(name: String, age: Long)` * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] ` * `class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `abstract class TextBasedFileFormat extends FileFormat ` * `class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `class TextFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `class ForeachSink[T : Encoder](writer: ForeachWriter[T]) extends Sink with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13616 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60334/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13616 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13616 **[Test build #60334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60334/consoleFull)** for PR 13616 at commit [`edb0395`](https://github.com/apache/spark/commit/edb03956308bb78e09330587c6fbf6ee1ab53a71). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13596 **[Test build #60337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60337/consoleFull)** for PR 13596 at commit [`cf4b6d8`](https://github.com/apache/spark/commit/cf4b6d89657434dc7cc0cda6f84fedeeb2578a7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13572: [SPARK-15862] [SQL] Better Error Message When Having Dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13572 **[Test build #60338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60338/consoleFull)** for PR 13572 at commit [`b22c44a`](https://github.com/apache/spark/commit/b22c44ab232fe712547cd6dd6c3180fa9c84d2cf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/13596 @cloud-fan I modified the test. Please take a look at it again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/13596#discussion_r66700916 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -321,7 +321,8 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(spark.sharedState.cacheManager.isEmpty) } - test("Clear accumulators when uncacheTable to prevent memory leaking") { + // This test would be flaky. + ignore("Ensure accumulators to be cleared after GC when uncacheTable") { --- End diff -- Thank you for the pointer. Let me check it and I'll update the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13595 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60329/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13595 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13595 **[Test build #60329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60329/consoleFull)** for PR 13595 at commit [`097f2ca`](https://github.com/apache/spark/commit/097f2ca06614dbf1c8299cbd788829fbb32063f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CollectionAccumulator[T] extends AccumulatorV2[T, java.util.List[T]] ` * `class LibSVMFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] ` * `class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `abstract class TextBasedFileFormat extends FileFormat ` * `class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `class TextFileFormat extends TextBasedFileFormat with DataSourceRegister ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13419: [SPARK-15678][SQL] Not use cache on appends and overwrit...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13419 I ended up creating a small design doc describing the problem and presenting 2 possible solutions at https://docs.google.com/document/d/1h5SzfC5UsvIrRpeLNDKSMKrKJvohkkccFlXo-GBAwQQ/edit?ts=574f717f#. Based on this, we decided in favor of option 2 (https://github.com/apache/spark/pull/13566) as it is a less intrusive change to the default behavior. I'm going to close this PR for now, but we may revisit this approach (i.e., option 1) for 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13419: [SPARK-15678][SQL] Not use cache on appends and o...
Github user sameeragarwal closed the pull request at: https://github.com/apache/spark/pull/13419 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12938 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13381#discussion_r66700697 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/IsotonicRegressionExample.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples.ml + +// $example on$ +import org.apache.spark.ml.regression.IsotonicRegression +// $example off$ +import org.apache.spark.sql.SparkSession + +/** + * An example demonstrating Isotonic Regression. + * Run with + * {{{ + * bin/run-example ml.IsotonicRegressionExample + * }}} + */ +object IsotonicRegressionExample { + + def main(args: Array[String]): Unit = { + +// Creates a SparkSession. +val spark = SparkSession + .builder + .appName(s"${this.getClass.getSimpleName}") + .getOrCreate() + +// $example on$ +// Loads data. +val dataset = spark.read.format("libsvm") + .load("data/mllib/sample_isotonic_regression_libsvm_data.txt") + +// Trains an isotonic regression model. +val ir = new IsotonicRegression() +val model = ir.fit(dataset) + +println(s"Boundaries in increasing order: ${model.boundaries}") +println(s"Predictions associated with the boundaries: ${model.predictions}") + +// Makes predictions. +model.transform(dataset).show --- End diff -- @jkbradley Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12938 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60332/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13415: [SPARK-15676] [SQL] Disallow Column Names as Partition C...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13415 Thank you! @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12938 **[Test build #60332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60332/consoleFull)** for PR 12938 at commit [`873f6c8`](https://github.com/apache/spark/commit/873f6c8656c9f07543e5907d6bde7bf0c582673d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13381 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60331/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13381 **[Test build #60331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60331/consoleFull)** for PR 13381 at commit [`2e46416`](https://github.com/apache/spark/commit/2e46416317356f8f8fa53457ba2318449a795218). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13588: SPARK-15858: Fix calculating error by tree stack over fl...
Github user mhmoudr commented on the issue: https://github.com/apache/spark/pull/13588 This PR contains exactly the same fix but targeting version 1.6 as if there is a plan to release 1.6.2 in the future, if that was not the case let me know to close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13501: [SPARK-15759] [SQL] Fallback to non-codegen when ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13501 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13501: [SPARK-15759] [SQL] Fallback to non-codegen when fail to...
Github user davies commented on the issue: https://github.com/apache/spark/pull/13501 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13137: [SPARK-15247][SQL] Set the default number of partitions ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13137 **[Test build #60336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60336/consoleFull)** for PR 13137 at commit [`4f3ee3c`](https://github.com/apache/spark/commit/4f3ee3cccba78911530767feef99a07794428b73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/13444 @yhuai okay, fixed. I also fixed #13137 in the same way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13444: [SPARK-15530][SQL] Set #parallelism for file listing in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13444 **[Test build #60335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60335/consoleFull)** for PR 13444 at commit [`f392f91`](https://github.com/apache/spark/commit/f392f915ade9fb2863e421891981cc278a887bdb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13371 @liancheng Got it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13596#discussion_r66700295 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -321,7 +321,8 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(spark.sharedState.cacheManager.isEmpty) } - test("Clear accumulators when uncacheTable to prevent memory leaking") { + // This test would be flaky. + ignore("Ensure accumulators to be cleared after GC when uncacheTable") { --- End diff -- how about we attach a listener to `ContextCleaner`, and watch the `accumCleaned` event? an example is: https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala#L406-L417 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66700281 --- Diff: docs/sql-programming-guide.md --- @@ -517,24 +517,26 @@ types such as Sequences or Arrays. This RDD can be implicitly converted to a Dat registered as a table. Tables can be used in subsequent SQL statements. {% highlight scala %} -// sc is an existing SparkContext. -val sqlContext = new org.apache.spark.sql.SQLContext(sc) +val spark: SparkSession // An existing SparkSession // this is used to implicitly convert an RDD to a DataFrame. -import sqlContext.implicits._ +import spark.implicits._ // Define the schema using a case class. // Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit, // you can use custom classes that implement the Product interface. case class Person(name: String, age: Int) -// Create an RDD of Person objects and register it as a table. -val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF() +// Create an RDD of Person objects and register it as a temporary view. +val people = sc + .textFile("examples/src/main/resources/people.txt") + .map(_.split(",")) + .map(p => Person(p(0), p(1).trim.toInt)) + .toDF() people.createOrReplaceTempView("people") --- End diff -- Here it seems better to update the input data file as json format, and then can use `SparkSession.read.json('path/to/data.json')` so we don't need to use SparkContext, and can directly get a `DataFrame`, it can simplify the example code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66700277 --- Diff: docs/sql-programming-guide.md --- @@ -1607,13 +1600,13 @@ a regular multi-line JSON file will most often fail. {% highlight r %} # sc is an existing SparkContext. -sqlContext <- sparkRSQL.init(sc) +spark <- sparkRSQL.init(sc) --- End diff -- R API is still in experimental status, and we haven't introduced `SparkSession` to SparkR yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13616: [SPARK-15585][SQL] Add doc for turning off quotations
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13616 **[Test build #60334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60334/consoleFull)** for PR 13616 at commit [`edb0395`](https://github.com/apache/spark/commit/edb03956308bb78e09330587c6fbf6ee1ab53a71). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13371 Reverted from master and branch-2.0. @viirya For the benchmark, there are two things: 1. The benchmark also counts Parquet file writing into it, so the real number should be much better than the posted one. 2. We should also benchmark for cases where no filters are pushed down to verify that this patch doesn't affect normal code path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13566: [SPARK-15678] Add support to REFRESH data source ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13566 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13596#discussion_r66700211 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -321,7 +321,8 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(spark.sharedState.cacheManager.isEmpty) } - test("Clear accumulators when uncacheTable to prevent memory leaking") { + // This test would be flaky. + ignore("Ensure accumulators to be cleared after GC when uncacheTable") { --- End diff -- This is the only risky part of this PR, I'll think about how to deterministically test it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13616: [SPARK-15585][SQL] Add doc for turning off quotat...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/13616 [SPARK-15585][SQL] Add doc for turning off quotations ## What changes were proposed in this pull request? This pr is to add doc for turning off quotations because this behavior is different from `com.databricks.spark.csv`. ## How was this patch tested? Check behavior to put an empty string in csv options. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-15585-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13616.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13616 commit edb03956308bb78e09330587c6fbf6ee1ab53a71 Author: Takeshi YAMAMURODate: 2016-06-07T08:16:16Z Add doc for turning off quotations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13613 **[Test build #60333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60333/consoleFull)** for PR 13613 at commit [`5f74d95`](https://github.com/apache/spark/commit/5f74d9529c59e28341906429ba27450f91ffbcc4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13596#discussion_r66700185 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -105,7 +105,7 @@ private[sql] class CacheManager extends Logging { val planToCache = query.queryExecution.analyzed val dataIndex = cachedData.indexWhere(cd => planToCache.sameResult(cd.plan)) require(dataIndex >= 0, s"Table $query is not cached.") -cachedData(dataIndex).cachedRepresentation.uncache(blocking) + cachedData(dataIndex).cachedRepresentation.cachedColumnBuffers.unpersist(blocking) --- End diff -- yea, the null setting looks useless, this change LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13566: [SPARK-15678] Add support to REFRESH data source paths
Github user davies commented on the issue: https://github.com/apache/spark/pull/13566 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12938 **[Test build #60332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60332/consoleFull)** for PR 12938 at commit [`873f6c8`](https://github.com/apache/spark/commit/873f6c8656c9f07543e5907d6bde7bf0c582673d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13381 **[Test build #60331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60331/consoleFull)** for PR 13381 at commit [`2e46416`](https://github.com/apache/spark/commit/2e46416317356f8f8fa53457ba2318449a795218). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13571 **[Test build #60330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60330/consoleFull)** for PR 13571 at commit [`84e1bf1`](https://github.com/apache/spark/commit/84e1bf14e51f98c13b2177d6c04c0a02e54982f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate subquery...
Github user ioana-delaney commented on the issue: https://github.com/apache/spark/pull/13570 @hvanhovell The EXISTS/NOT EXISTS predicates will have an empty condition. e.g. select c1 from t1 where EXISTS (select c2 from t2) == Optimized Logical Plan == Project [_1#224 AS c1#227] +- Join LeftSemi :- LocalRelation [_1#224, _2#225] +- LocalRelation [c2#239] But the other subquery predicates are quaranteed to have at least one condition. Regarding the rewriteExistentialExpr interface, I think that I need to pass an expression instead of a sequence of conditions since the last case in the main rewrite rule does not have conditions. It's just an expression. e.g. where (case when c2 IN (select 1 as one) then 1 else 2) = c1 Please let me know. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13371 @rxin One thing needs to be explain is, because we just have one configuration to control filter push down, it affects row-based filter push down and this row-group filter push down. The benchmark I posted above is running it against this patch and master branch individually. Of course it includes the time to write the parquet data, I will change it. I want to confirm if this kind of benchmark is enough? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13557: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13557 @jkbradley Could you help review it ? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12313 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60327/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12313 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12313 **[Test build #60327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60327/consoleFull)** for PR 12313 at commit [`906e68d`](https://github.com/apache/spark/commit/906e68d071daf4e2f15b0f2017b248b872bb6285). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13595 **[Test build #60329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60329/consoleFull)** for PR 13595 at commit [`097f2ca`](https://github.com/apache/spark/commit/097f2ca06614dbf1c8299cbd788829fbb32063f1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66699400 --- Diff: docs/sql-programming-guide.md --- @@ -1607,13 +1600,13 @@ a regular multi-line JSON file will most often fail. {% highlight r %} # sc is an existing SparkContext. -sqlContext <- sparkRSQL.init(sc) +spark <- sparkRSQL.init(sc) --- End diff -- Currently, `sparkRSQL.init` call `org.apache.spark.sql.api.r.SQLUtils.createSQLContext` which return `SQLContext` object not `SparkSession` object. So here it seems to update the R api ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13544 @liancheng OK, no problem ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60328/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13613 **[Test build #60328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60328/consoleFull)** for PR 13613 at commit [`459bbb1`](https://github.com/apache/spark/commit/459bbb17603f132eb737f1272f05e29b60d04842). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13606: [SPARK-15086] [CORE] [STREAMING] Deprecate old Java accu...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13606 @srowen , the [[streaming programming guide] - accumulators-and-broadcast-variables](https://github.com/apache/spark/blob/1e2c9311871968426e019164b129652fd6d0037f/docs/streaming-programming-guide.md#accumulators-and-broadcast-variables) section might also need an update to reflect the code change here, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60326/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13613 **[Test build #60326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60326/consoleFull)** for PR 13613 at commit [`d1fbb9d`](https://github.com/apache/spark/commit/d1fbb9d7eccb20a8e5ad7a521393bb979866c243). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13147: [SPARK-6320][SQL] Move planLater method into GenericStra...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/13147 @marmbrus Thank you for merging this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13612 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13612 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60325/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13612 **[Test build #60325 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60325/consoleFull)** for PR 13612 at commit [`0379025`](https://github.com/apache/spark/commit/03790251ccef03687535ea9b2968101d1206ae22). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13371 And once we have more data, it might make sense to merge this in 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13371 To be more clear, please write a proper benchmark that reads data when filter push down is not useful to compare whether this regress performance for the non-push-down case. Also make sure the benchmark does not include the time it takes to write the parquet data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13371 I just talked to @liancheng offline. I don't think we should've merged this until we have verified there is no performance regression, and we definitely shouldn't have merged this in 2.0. @liancheng can you revert this from both master and branch-2.0? @viirya can you run some parquet scan benchmark and make sure this does not result in perf regression? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/13596 @cloud-fan Thank you for your review. That's right, so we can't unregister the `batchStats` accumulator here yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13381 **[Test build #3079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3079/consoleFull)** for PR 13381 at commit [`13027b7`](https://github.com/apache/spark/commit/13027b79bbe8e77119207cc8810a775bca022c32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/13596#discussion_r66698444 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -105,7 +105,7 @@ private[sql] class CacheManager extends Logging { val planToCache = query.queryExecution.analyzed val dataIndex = cachedData.indexWhere(cd => planToCache.sameResult(cd.plan)) require(dataIndex >= 0, s"Table $query is not cached.") -cachedData(dataIndex).cachedRepresentation.uncache(blocking) + cachedData(dataIndex).cachedRepresentation.cachedColumnBuffers.unpersist(blocking) --- End diff -- Yes, that's right. But I noticed that the original `InMemoryRelation` instance to be set `_cachedColumnBuffers` to `null` is not the same instance that will be executed by the `DataFrame` because it was copied by `withOutput` when `CacheManager` replace the logical plan for the `DataFrame`. So we don't need to set it to null and the original one will be collected by GC soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13393: [SPARK-14615][ML][FOLLOWUP] Fix Python examples t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13393 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13544 Ah, too bad... I wasn't aware of this PR when I was doing #13592. Will review this one to see whether I missed something in #13592. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13381#discussion_r66698378 --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaIsotonicRegressionExample.java --- @@ -35,14 +37,15 @@ public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName("JavaIsotonicRegressionExample"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); // $example on$ -JavaRDD data = jsc.textFile("data/mllib/sample_isotonic_regression_data.txt"); +JavaRDD data = MLUtils.loadLibSVMFile( +jsc.sc(), "data/mllib/sample_isotonic_regression_libsvm_data.txt").toJavaRDD(); --- End diff -- Fix indentation: indent by 2 spaces here and elsewhere --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13381#discussion_r66698376 --- Diff: docs/ml-classification-regression.md --- @@ -685,6 +685,76 @@ The implementation matches the result from R's survival function +## Isotonic regression +[Isotonic regression](http://en.wikipedia.org/wiki/Isotonic_regression) +belongs to the family of regression algorithms. Formally isotonic regression is a problem where +given a finite set of real numbers `$Y = {y_1, y_2, ..., y_n}$` representing observed responses +and `$X = {x_1, x_2, ..., x_n}$` the unknown response values to be fitted +finding a function that minimises + +`\begin{equation} + f(x) = \sum_{i=1}^n w_i (y_i - x_i)^2 +\end{equation}` + +with respect to complete order subject to +`$x_1\le x_2\le ...\le x_n$` where `$w_i$` are positive weights. +The resulting function is called isotonic regression and it is unique. +It can be viewed as least squares problem under order restriction. +Essentially isotonic regression is a +[monotonic function](http://en.wikipedia.org/wiki/Monotonic_function) +best fitting the original data points. + +MLlib supports a +[pool adjacent violators algorithm](http://doi.org/10.1198/TECH.2010.10111) +which uses an approach to +[parallelizing isotonic regression](http://doi.org/10.1007/978-3-642-99789-1_10). +The training input is a RDD of tuples of three double values that represent --- End diff -- not an RDD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13381#discussion_r66698380 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/IsotonicRegressionExample.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples.ml + +// $example on$ +import org.apache.spark.ml.regression.IsotonicRegression +// $example off$ +import org.apache.spark.sql.SparkSession + +/** + * An example demonstrating Isotonic Regression. + * Run with + * {{{ + * bin/run-example ml.IsotonicRegressionExample + * }}} + */ +object IsotonicRegressionExample { + + def main(args: Array[String]): Unit = { + +// Creates a SparkSession. +val spark = SparkSession + .builder + .appName(s"${this.getClass.getSimpleName}") + .getOrCreate() + +// $example on$ +// Loads data. +val dataset = spark.read.format("libsvm") + .load("data/mllib/sample_isotonic_regression_libsvm_data.txt") + +// Trains an isotonic regression model. +val ir = new IsotonicRegression() +val model = ir.fit(dataset) + +println(s"Boundaries in increasing order: ${model.boundaries}") +println(s"Predictions associated with the boundaries: ${model.predictions}") + +// Makes predictions. +model.transform(dataset).show --- End diff -- "show" --> "show()" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13371: [SPARK-15639][SQL] Try to push down filter at Row...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13371 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13371 @yhuai We used to support row group level filter push-down before refactoring `HadoopFsRelation` into `FileFormat`, but lost it (by accident I guess) after the refactoring. So now we only have row group level filtering when the vectorized reader is not used, [see here][1]. And yes, both `ParquetInputFormat` and `ParquetRecordReader` do row group level filtering. This LGTM. Thanks for fixing it! Merging to master and 2.0. [1]: https://github.com/apache/spark/blob/54f758b5fc60ecb0da6b191939a72ef5829be38c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L371-L378 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13613 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60324/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13613: [SPARK-15889][SQL][STREAMING] Add a unique id to Continu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13613 **[Test build #60324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60324/consoleFull)** for PR 13613 at commit [`d686df4`](https://github.com/apache/spark/commit/d686df49399d5387721e2ad761a23eb1a63a0890). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13612 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60323/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13595: [MINOR][SQL] Standardize 'continuous queries' to 'stream...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13595 @zsxwing @tdas, sure, this can wait. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13612 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13612: [SPARK-15851] [Build] Fix the call of the bash script to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13612 **[Test build #60323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60323/consoleFull)** for PR 13612 at commit [`aa1927f`](https://github.com/apache/spark/commit/aa1927f19ed24af60001fd822898cae51043f8e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate subquery...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13570 @ioana-delaney no worries. I think the approach you have taken is the correct one. I have left one smallish comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate s...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13570#discussion_r66697830 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1715,31 +1715,52 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { // Filter the plan by applying left semi and left anti joins. withSubquery.foldLeft(newFilter) { case (p, PredicateSubquery(sub, conditions, _, _)) => - Join(p, sub, LeftSemi, conditions.reduceOption(And)) + val (joinCond, outerPlan) = rewriteExistentialExpr(conditions.reduceOption(And), p) + Join(outerPlan, sub, LeftSemi, joinCond) case (p, Not(PredicateSubquery(sub, conditions, false, _))) => - Join(p, sub, LeftAnti, conditions.reduceOption(And)) + val (joinCond, outerPlan) = rewriteExistentialExpr(conditions.reduceOption(And), p) + Join(outerPlan, sub, LeftAnti, joinCond) case (p, Not(PredicateSubquery(sub, conditions, true, _))) => - // This is a NULL-aware (left) anti join (NAAJ). + // This is a NULL-aware (left) anti join (NAAJ) e.g. col NOT IN expr // Construct the condition. A NULL in one of the conditions is regarded as a positive // result; such a row will be filtered out by the Anti-Join operator. - val anyNull = conditions.map(IsNull).reduceLeft(Or) - val condition = conditions.reduceLeft(And) - // Note that will almost certainly be planned as a Broadcast Nested Loop join. Use EXISTS - // if performance matters to you. - Join(p, sub, LeftAnti, Option(Or(anyNull, condition))) + // Note that will almost certainly be planned as a Broadcast Nested Loop join. + // Use EXISTS if performance matters to you. + val (joinCond, outerPlan) = rewriteExistentialExpr(conditions.reduceLeftOption(And), p) + val anyNull = splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or) + Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get))) case (p, predicate) => - var joined = p - val replaced = predicate transformUp { -case PredicateSubquery(sub, conditions, nullAware, _) => - // TODO: support null-aware join - val exists = AttributeReference("exists", BooleanType, nullable = false)() - joined = Join(joined, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)) - exists - } - Project(p.output, Filter(replaced, joined)) + val (newCond, inputPlan) = rewriteExistentialExpr(Option(predicate), p) + Project(p.output, Filter(newCond.get, inputPlan)) } } + + /** + * Given a predicate expression and an input plan, it rewrites + * any embedded existential sub-query into an existential join. + * It returns the rewritten expression together with the updated plan. + * Currently, it does not support null-aware joins. Embedded NOT IN predicates + * are blocked in the Analyzer. + */ + private def rewriteExistentialExpr( + expr: Option[Expression], + plan: LogicalPlan): (Option[Expression], LogicalPlan) = { +var newPlan = plan --- End diff -- Move this down to the Some(case). A bit of mutability is not a problem though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate s...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13570#discussion_r66697790 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1715,31 +1715,52 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { // Filter the plan by applying left semi and left anti joins. withSubquery.foldLeft(newFilter) { case (p, PredicateSubquery(sub, conditions, _, _)) => - Join(p, sub, LeftSemi, conditions.reduceOption(And)) + val (joinCond, outerPlan) = rewriteExistentialExpr(conditions.reduceOption(And), p) + Join(outerPlan, sub, LeftSemi, joinCond) case (p, Not(PredicateSubquery(sub, conditions, false, _))) => - Join(p, sub, LeftAnti, conditions.reduceOption(And)) + val (joinCond, outerPlan) = rewriteExistentialExpr(conditions.reduceOption(And), p) + Join(outerPlan, sub, LeftAnti, joinCond) case (p, Not(PredicateSubquery(sub, conditions, true, _))) => - // This is a NULL-aware (left) anti join (NAAJ). + // This is a NULL-aware (left) anti join (NAAJ) e.g. col NOT IN expr // Construct the condition. A NULL in one of the conditions is regarded as a positive // result; such a row will be filtered out by the Anti-Join operator. - val anyNull = conditions.map(IsNull).reduceLeft(Or) - val condition = conditions.reduceLeft(And) - // Note that will almost certainly be planned as a Broadcast Nested Loop join. Use EXISTS - // if performance matters to you. - Join(p, sub, LeftAnti, Option(Or(anyNull, condition))) + // Note that will almost certainly be planned as a Broadcast Nested Loop join. + // Use EXISTS if performance matters to you. + val (joinCond, outerPlan) = rewriteExistentialExpr(conditions.reduceLeftOption(And), p) + val anyNull = splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or) + Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get))) case (p, predicate) => - var joined = p - val replaced = predicate transformUp { -case PredicateSubquery(sub, conditions, nullAware, _) => - // TODO: support null-aware join - val exists = AttributeReference("exists", BooleanType, nullable = false)() - joined = Join(joined, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And)) - exists - } - Project(p.output, Filter(replaced, joined)) + val (newCond, inputPlan) = rewriteExistentialExpr(Option(predicate), p) + Project(p.output, Filter(newCond.get, inputPlan)) } } + + /** + * Given a predicate expression and an input plan, it rewrites + * any embedded existential sub-query into an existential join. + * It returns the rewritten expression together with the updated plan. + * Currently, it does not support null-aware joins. Embedded NOT IN predicates + * are blocked in the Analyzer. + */ + private def rewriteExistentialExpr( + expr: Option[Expression], --- End diff -- Lets just pass a sequence of expressions. Predicate subqueries are guaranteed to have one or more conditions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13596 My suggestion is: in `InMemoryRelation.uncache`, we set `batchStats` to null at the end, when this `InMemoryRelation` get executed again, it will regenerate the accumulator and register it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13381: [SPARK-15608][ml][examples][doc] add examples and docume...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13381 **[Test build #3079 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3079/consoleFull)** for PR 13381 at commit [`13027b7`](https://github.com/apache/spark/commit/13027b79bbe8e77119207cc8810a775bca022c32). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org