[GitHub] spark issue #14937: [WIP] [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64927/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14937: [WIP] [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14937 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14937: [WIP] [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14937 **[Test build #64927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64927/consoleFull)** for PR 14937 at commit [`1c31cda`](https://github.com/apache/spark/commit/1c31cda0f78b8c2b11406d76da447e9b3216a97d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14912: [SPARK-17357][SQL] Simplified predicates should b...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14912#discussion_r77472048 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala --- @@ -171,6 +172,27 @@ class FilterPushdownSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + test("push down filters that are combined") { +// The following predicate ('a === 2 || 'a === 3) && ('c > 10 || 'a === 2) +// will be simplified as ('a == 2) || ('c > 10 && 'a == 3). +// ('a === 2 || 'a === 3) can be pushed down. But the simplified one can't. --- End diff -- Considering how the Optimizer works, we can't extract `CombineFilters` and `PushDownPredicates` as a new batch, as we should also respect the interaction between them and other rules. I do an alternative approach to convert predicates of filters to cnf during combining filters, and then perform additional predicate pushdown immediately. So the following BooleanSimplification will not affect the predicate pushdown. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14955: [SPARK-17394][SQL] should not allow specify datab...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14955 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14955 thanks for the review, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #64928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64928/consoleFull)** for PR 14452 at commit [`6d79beb`](https://github.com/apache/spark/commit/6d79bebacd8e9f0672c713c1f954502dbda3f992). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14937: [WIP] [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14937 **[Test build #64927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64927/consoleFull)** for PR 14937 at commit [`1c31cda`](https://github.com/apache/spark/commit/1c31cda0f78b8c2b11406d76da447e9b3216a97d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14955 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14955 DB2 has different behaviors, but the target table names should not have a schema name ``` db2 => rename table DB2INST1.t1 to DB2INST1.t2 DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL0108N The name "T2" has the wrong number of qualifiers. SQLSTATE=42601 db2 => rename table DB2INST1.t1 to t2 DB2I The SQL command completed successfully. db2 => rename table t2 to t3 DB2I The SQL command completed successfully. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14955 Also confirmed Oracle does not allow it too. When we rename a table, the table must belong to the current owner. ``` SQL> rename system.t1 to system.t2; rename system.t1 to system.t2 * ERROR at line 1: ORA-01765: specifying owner's name of the table is not allowed SQL> rename t1 to system.t2; rename t1 to system.t2 * ERROR at line 1: ORA-01765: specifying owner's name of the table is not allowed SQL> rename t1 to t2; Table renamed. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14954: [SPARK-17393] [SQL] Error Handling when CTAS Agai...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14954#discussion_r77468619 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -1151,6 +1151,58 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv } } + test("saveAsTable - source and target are the same table") { +val tableName = "tab1" +withTable(tableName) { + Seq((1, 2)).toDF("i", "j").write.saveAsTable(tableName) + + table(tableName).write.mode(SaveMode.Append).saveAsTable(tableName) + checkAnswer(table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + table(tableName).write.mode(SaveMode.Ignore).saveAsTable(tableName) + checkAnswer(table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + var e = intercept[AnalysisException] { + table(tableName).write.mode(SaveMode.Overwrite).saveAsTable(tableName) + }.getMessage + assert(e.contains(s"Cannot overwrite table `$tableName` that is also being read from")) + + e = intercept[AnalysisException] { + table(tableName).write.mode(SaveMode.ErrorIfExists).saveAsTable(tableName) + }.getMessage + assert(e.contains(s"Table `$tableName` already exists")) +} + } + + test("insertInto - source and target are the same table") { +val tableName = "tab1" +withTable(tableName) { + Seq((1, 2)).toDF("i", "j").write.saveAsTable(tableName) + + table(tableName).write.mode(SaveMode.Append).insertInto(tableName) + checkAnswer( +table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + table(tableName).write.mode(SaveMode.Ignore).insertInto(tableName) --- End diff -- Sure, will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14951: [SPARK-17391] [TEST] [2.0] Fix Two Test Failures After B...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14951 Since it is merged, I just --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14951: [SPARK-17391] [TEST] [2.0] Fix Two Test Failures ...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14951 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14954: [SPARK-17393] [SQL] Error Handling when CTAS Agai...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14954 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14954: [SPARK-17393] [SQL] Error Handling when CTAS Against the...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14954 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14954: [SPARK-17393] [SQL] Error Handling when CTAS Agai...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14954#discussion_r77467255 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -1151,6 +1151,58 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv } } + test("saveAsTable - source and target are the same table") { +val tableName = "tab1" +withTable(tableName) { + Seq((1, 2)).toDF("i", "j").write.saveAsTable(tableName) + + table(tableName).write.mode(SaveMode.Append).saveAsTable(tableName) + checkAnswer(table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + table(tableName).write.mode(SaveMode.Ignore).saveAsTable(tableName) + checkAnswer(table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + var e = intercept[AnalysisException] { + table(tableName).write.mode(SaveMode.Overwrite).saveAsTable(tableName) + }.getMessage + assert(e.contains(s"Cannot overwrite table `$tableName` that is also being read from")) + + e = intercept[AnalysisException] { + table(tableName).write.mode(SaveMode.ErrorIfExists).saveAsTable(tableName) + }.getMessage + assert(e.contains(s"Table `$tableName` already exists")) +} + } + + test("insertInto - source and target are the same table") { +val tableName = "tab1" +withTable(tableName) { + Seq((1, 2)).toDF("i", "j").write.saveAsTable(tableName) + + table(tableName).write.mode(SaveMode.Append).insertInto(tableName) + checkAnswer( +table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + table(tableName).write.mode(SaveMode.Ignore).insertInto(tableName) --- End diff -- yea I think we should, let's do it in follow-up PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14954: [SPARK-17393] [SQL] Error Handling when CTAS Agai...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14954#discussion_r77467164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -357,32 +376,6 @@ case class PreWriteCheck(conf: SQLConf, catalog: SessionCatalog) // The relation in l is not an InsertableRelation. failAnalysis(s"$l does not allow insertion.") - case CreateTable(tableDesc, mode, Some(query)) => --- End diff -- ah good catch! This branch is actually unreachable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14951: [SPARK-17391] [TEST] [2.0] Fix Two Test Failures After B...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14951 thanks, merging to 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14913: [SPARK-17358][SQL] Cached table(parquet/orc) should be s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14913 oh sorry, like this `case class HadoopFsRelation(location: FileCatalog, ...)(val sparkSession: SparkSession)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14913: [SPARK-17358][SQL] Cached table(parquet/orc) should be s...
Github user watermen commented on the issue: https://github.com/apache/spark/pull/14913 @cloud-fan, if we just make sparkSession a curry parameter, `sparkSession` can't be accessed. e.g. ```scala val supportsBatch = relation.fileFormat.supportBatch( relation.sparkSession, StructType.fromAttributes(output)) ``` Or we need to add a method `_sparkSession` in `HadoopFsRelation` to get the value of `sparkSession`? ```scala def _sparkSession = sparkSession ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14803 ping @marmbrus @zsxwing Can you take a quick look? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14858: [SPARK-17219][ML] Add NaN value handling in Bucke...
Github user VinceShieh commented on a diff in the pull request: https://github.com/apache/spark/pull/14858#discussion_r77465278 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -114,10 +115,10 @@ final class QuantileDiscretizer @Since("1.6.0") (@Since("1.6.0") override val ui splits(0) = Double.NegativeInfinity splits(splits.length - 1) = Double.PositiveInfinity -val distinctSplits = splits.distinct +val distinctSplits = splits.filter(!_.isNaN).distinct --- End diff -- @srowen then maybe we should, as we discussed earlier on JIRA, align with R, by having a NaN checker in approxQuantile, that is, having a NaN filter inside of approxQuantile, rather than ahead of calling approxQuantile. We can also have a same flag for user to choose to either remove NaN values or throw an error when there is NaN in data, although, this API change will introduce collateral impact on several existing function calls. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14867: [SPARK-17296][SQL] Simplify parser join processing.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14867 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14867: [SPARK-17296][SQL] Simplify parser join processing.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64926/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14867: [SPARK-17296][SQL] Simplify parser join processing.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14867 **[Test build #64926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64926/consoleFull)** for PR 14867 at commit [`c30e665`](https://github.com/apache/spark/commit/c30e665f8a228633398456d6893684c224e4d3ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14919: [SPARK-17354][SQL] Partitioning by dates/timestamps shou...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14919 Do you mind if I ask to review @davies please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14867: [SPARK-17296][SQL] Simplify parser join processing.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14867 **[Test build #64926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64926/consoleFull)** for PR 14867 at commit [`c30e665`](https://github.com/apache/spark/commit/c30e665f8a228633398456d6893684c224e4d3ea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14952: [SPARK-17110] Fix StreamCorruptionException in BlockMana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64925/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14952: [SPARK-17110] Fix StreamCorruptionException in BlockMana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14952 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14952: [SPARK-17110] Fix StreamCorruptionException in BlockMana...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14952 **[Test build #64925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64925/consoleFull)** for PR 14952 at commit [`68db68d`](https://github.com/apache/spark/commit/68db68dbbc3ad7ecfe8180b165155f643c92cf2b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14952: [SPARK-17110] Fix StreamCorruptionException in Bl...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14952#discussion_r77459578 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -520,10 +520,11 @@ private[spark] class BlockManager( * * This does not acquire a lock on this block in this JVM. */ - private def getRemoteValues(blockId: BlockId): Option[BlockResult] = { + private def getRemoteValues[T: ClassTag](blockId: BlockId): Option[BlockResult] = { +val ct = implicitly[ClassTag[T]] getRemoteBytes(blockId).map { data => val values = -serializerManager.dataDeserializeStream(blockId, data.toInputStream(dispose = true)) +serializerManager.dataDeserializeStream(blockId, data.toInputStream(dispose = true))(ct) --- End diff -- It seems like it is easy to accidentally forget to pass a correct classtag, since this has happened twice already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14952: [SPARK-17110] Fix StreamCorruptionException in Bl...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14952#discussion_r77459444 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -520,10 +520,11 @@ private[spark] class BlockManager( * * This does not acquire a lock on this block in this JVM. */ - private def getRemoteValues(blockId: BlockId): Option[BlockResult] = { + private def getRemoteValues[T: ClassTag](blockId: BlockId): Option[BlockResult] = { +val ct = implicitly[ClassTag[T]] getRemoteBytes(blockId).map { data => val values = -serializerManager.dataDeserializeStream(blockId, data.toInputStream(dispose = true)) +serializerManager.dataDeserializeStream(blockId, data.toInputStream(dispose = true))(ct) --- End diff -- I'm not saying this should definitely be done one way or the other, but I'm curious why you have a preference for the extra code and more verbose API that come with making the classTag an explicit parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14952: [SPARK-17110] Fix StreamCorruptionException in BlockMana...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14952 **[Test build #64925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64925/consoleFull)** for PR 14952 at commit [`68db68d`](https://github.com/apache/spark/commit/68db68dbbc3ad7ecfe8180b165155f643c92cf2b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14954: [SPARK-17393] [SQL] Error Handling when CTAS Agai...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14954#discussion_r77456629 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -1151,6 +1151,58 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv } } + test("saveAsTable - source and target are the same table") { +val tableName = "tab1" +withTable(tableName) { + Seq((1, 2)).toDF("i", "j").write.saveAsTable(tableName) + + table(tableName).write.mode(SaveMode.Append).saveAsTable(tableName) + checkAnswer(table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + table(tableName).write.mode(SaveMode.Ignore).saveAsTable(tableName) + checkAnswer(table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + var e = intercept[AnalysisException] { + table(tableName).write.mode(SaveMode.Overwrite).saveAsTable(tableName) + }.getMessage + assert(e.contains(s"Cannot overwrite table `$tableName` that is also being read from")) + + e = intercept[AnalysisException] { + table(tableName).write.mode(SaveMode.ErrorIfExists).saveAsTable(tableName) + }.getMessage + assert(e.contains(s"Table `$tableName` already exists")) +} + } + + test("insertInto - source and target are the same table") { +val tableName = "tab1" +withTable(tableName) { + Seq((1, 2)).toDF("i", "j").write.saveAsTable(tableName) + + table(tableName).write.mode(SaveMode.Append).insertInto(tableName) + checkAnswer( +table(tableName), +Seq(Row(1, 2), Row(1, 2))) + + table(tableName).write.mode(SaveMode.Ignore).insertInto(tableName) --- End diff -- Should we issue error messages when the operation is `insertInto` but the mode is `Ignore` or `ErrorIfExists`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14956 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64924/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14956 **[Test build #64924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64924/consoleFull)** for PR 14956 at commit [`3343acc`](https://github.com/apache/spark/commit/3343accdcf450ad1fa4f48f7d624b051c2113170). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14956 **[Test build #64924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64924/consoleFull)** for PR 14956 at commit [`3343acc`](https://github.com/apache/spark/commit/3343accdcf450ad1fa4f48f7d624b051c2113170). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14951: [SPARK-17391] [TEST] [2.0] Fix Two Test Failures After B...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14951 Yeah, it is a little bit painful when we backport the PRs. When I backported `CREATE TABLE LIKE`, I found I already forgot how `CREATE TABLE` works in Spark 2.0. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14951: [SPARK-17391] [TEST] [2.0] Fix Two Test Failures ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14951#discussion_r77454002 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala --- @@ -77,7 +77,7 @@ class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleto "src") } - test("SPARK-6212: The EXPLAIN output of CTAS only shows the analyzed plan") { + test("SPARK-17230: The EXPLAIN output of CTAS only shows the analyzed plan") { --- End diff -- When we backporting this PR: https://github.com/apache/spark/pull/14797 , it breaks the existing test case. It does not correctly work in the master branch. If we do not want to optimize the query of CTAS, we should see `SubqueryAlias`. Thus, the test case did not fail in the master branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14954: [SPARK-17393] [SQL] Error Handling when CTAS Against the...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14954 cc @cloud-fan @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64922/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14859 **[Test build #64922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64922/consoleFull)** for PR 14859 at commit [`ccf176d`](https://github.com/apache/spark/commit/ccf176da8b6b8742e18abdf8058ec0d866956de5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14955 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14955 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64920/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14955 **[Test build #64920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64920/consoleFull)** for PR 14955 at commit [`5c8b892`](https://github.com/apache/spark/commit/5c8b8922d8b96cc406f9b462c7368095c2167e2a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #64921 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64921/consoleFull)** for PR 14788 at commit [`0040f35`](https://github.com/apache/spark/commit/0040f354ec0b34bc36bed190322df03e5baac453). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64921/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14953: [Minor] [ML] [MLlib] Remove work around for breez...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14953 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14953: [Minor] [ML] [MLlib] Remove work around for breeze spars...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14953 Merged into master. Thanks for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14937: [WIP] [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KM...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14937 @srowen Thanks for your suggestion, I will update it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14956 **[Test build #64923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64923/consoleFull)** for PR 14956 at commit [`014f26a`](https://github.com/apache/spark/commit/014f26ad71f04caa9049f49a39d4dd169290b6df). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14956 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64923/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14956 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14956 **[Test build #64923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64923/consoleFull)** for PR 14956 at commit [`014f26a`](https://github.com/apache/spark/commit/014f26ad71f04caa9049f49a39d4dd169290b6df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with be...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/14956 [SPARK-17389] [ML] [MLLIB] KMeans speedup with better choice of k-means|| init steps = 2 ## What changes were proposed in this pull request? Reduce default k-means|| init steps to 2 from 5. See JIRA for discussion. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-17389.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14956.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14956 commit 014f26ad71f04caa9049f49a39d4dd169290b6df Author: Sean OwenDate: 2016-09-04T11:49:58Z Reduce default k-means|| init steps to 2 from 5 (see SPARK-17389 for discussion) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14826: [SPARK-17311] [MLLIB] Standardize Python-Java MLl...
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/14826 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14873: [SPARK-17308]Improved the spark core code by repl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14873 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14873: [SPARK-17308]Improved the spark core code by replacing a...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14873 I'm going to merge this for reasons above. I think it's more than a cosmetic change, but barely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14953: [Minor] [ML] [MLlib] Remove work around for breeze spars...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14953 Looks good for master (only) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14859 **[Test build #64922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64922/consoleFull)** for PR 14859 at commit [`ccf176d`](https://github.com/apache/spark/commit/ccf176da8b6b8742e18abdf8058ec0d866956de5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14937: [WIP] [SPARK-8519][SPARK-11560] [ML] [MLlib] Optimize KM...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14937 @yanboliang here are a few other changes I made in my PR that accidentally duplicated some of this work. Refer to https://github.com/apache/spark/pull/14948 for details. For your consideration: I think getRuns/setRuns should be formally deprecated and the runs param to the constructor removed (it's private). There are some mentions of 'runs' in the docs that should be removed too at this point. mergeContribs and the "type WeightedPoint" don't really serve a purpose at this point and can be 'inlined' IMHO. Minor: the "contribs.iterator" can really be an iterator only over triples with non-zero counts, which eliminates the filtering by 0 counts The "run finished" log message is obsolete now. Minor, but in k-means|| the sample of 1 element is very slightly better if it's without replacement. Won't matter much but otherwise you might sample a couple elements. pointsWithCosts.flatMap might be a little faster as filter + map instead because virtually every element is filtered out. mergeNewCenters() is pretty superfluous, because it's simpler to compute newCenters, then add it to centers, in the same loop. No clear() or multiple calls to update this. weightMap can be computed with countByValue directly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #64921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64921/consoleFull)** for PR 14788 at commit [`0040f35`](https://github.com/apache/spark/commit/0040f354ec0b34bc36bed190322df03e5baac453). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14955 **[Test build #64920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64920/consoleFull)** for PR 14955 at commit [`5c8b892`](https://github.com/apache/spark/commit/5c8b8922d8b96cc406f9b462c7368095c2167e2a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14948: [SPARK-17389] [SPARK-3261] [MLLIB] Significant KMeans sp...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14948 @yanboliang I'm going to close this PR and instead 'port' a few changes I made in this version for your consideration. Yours should be the primary PR for removing `runs` I think. I'll break out the two other changes in separate PRs. You're right that this is the question regarding init steps -- more steps could make for a better clustering, which could indirectly mean a faster convergence too. Maybe you can check my work. I think the default of 5 was taken from Table 6 in http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf but it's not saying 5 is necessarily an optimal value. In fact Figure 5.2/5.3 imply that (for l/k=2 as we've chosen here) there's virtually no improvement for more than 2 init steps. Coupled with the fact that an init step now takes about 5x longer than a single iteration, it seems like 5 is pretty expensive as a default too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14948: [SPARK-17389] [SPARK-3261] [MLLIB] Significant KM...
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/14948 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14859 **[Test build #64916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64916/consoleFull)** for PR 14859 at commit [`ceef9bf`](https://github.com/apache/spark/commit/ceef9bf03b286910b5b96aa6129f537c8c7cae54). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64918/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #64918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64918/consoleFull)** for PR 14788 at commit [`03bb6a1`](https://github.com/apache/spark/commit/03bb6a1cf8984b97aa68f0cf4bfbf118fd75eeab). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64916/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64915/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14859 **[Test build #64915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64915/consoleFull)** for PR 14859 at commit [`e6793a7`](https://github.com/apache/spark/commit/e6793a74a7a931c1f84d3ba8b96076dd27ba0021). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14954: [SPARK-17393] [SQL] Error Handling when CTAS Against the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14954 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64914/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14954: [SPARK-17393] [SQL] Error Handling when CTAS Against the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14954 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14954: [SPARK-17393] [SQL] Error Handling when CTAS Against the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14954 **[Test build #64914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64914/consoleFull)** for PR 14954 at commit [`85e51dc`](https://github.com/apache/spark/commit/85e51dc6361b3607fafc95021d93f791f5947080). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14953: [Minor] [ML] [MLlib] Remove work around for breeze spars...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14953 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14948: [SPARK-17389] [SPARK-3261] [MLLIB] Significant KM...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14948#discussion_r77449026 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -258,149 +252,107 @@ class KMeans private ( } } val initTimeInSeconds = (System.nanoTime() - initStartTime) / 1e9 -logInfo(s"Initialization with $initializationMode took " + "%.3f".format(initTimeInSeconds) + - " seconds.") +logInfo(f"Initialization with $initializationMode took $initTimeInSeconds%.3f seconds.") -val active = Array.fill(numRuns)(true) -val costs = Array.fill(numRuns)(0.0) - -var activeRuns = new ArrayBuffer[Int] ++ (0 until numRuns) +var active = true +var cost = 0.0 var iteration = 0 val iterationStartTime = System.nanoTime() -instr.foreach(_.logNumFeatures(centers(0)(0).vector.size)) - -// Execute iterations of Lloyd's algorithm until all runs have converged -while (iteration < maxIterations && !activeRuns.isEmpty) { - type WeightedPoint = (Vector, Long) - def mergeContribs(x: WeightedPoint, y: WeightedPoint): WeightedPoint = { -axpy(1.0, x._1, y._1) -(y._1, x._2 + y._2) - } - - val activeCenters = activeRuns.map(r => centers(r)).toArray - val costAccums = activeRuns.map(_ => sc.doubleAccumulator) +instr.foreach(_.logNumFeatures(centers.head.vector.size)) - val bcActiveCenters = sc.broadcast(activeCenters) +// Execute iterations of Lloyd's algorithm until converged +while (iteration < maxIterations && active) { + val costAccum = sc.doubleAccumulator + val bcCenters = sc.broadcast(centers) // Find the sum and count of points mapping to each center val totalContribs = data.mapPartitions { points => -val thisActiveCenters = bcActiveCenters.value -val runs = thisActiveCenters.length -val k = thisActiveCenters(0).length -val dims = thisActiveCenters(0)(0).vector.size +val thisCenters = bcCenters.value +val dims = thisCenters.head.vector.size -val sums = Array.fill(runs, k)(Vectors.zeros(dims)) -val counts = Array.fill(runs, k)(0L) +val sums = Array.fill(thisCenters.length)(Vectors.zeros(dims)) +val counts = Array.fill(thisCenters.length)(0L) points.foreach { point => - (0 until runs).foreach { i => -val (bestCenter, cost) = KMeans.findClosest(thisActiveCenters(i), point) -costAccums(i).add(cost) -val sum = sums(i)(bestCenter) -axpy(1.0, point.vector, sum) -counts(i)(bestCenter) += 1 - } + val (bestCenter, cost) = KMeans.findClosest(thisCenters, point) + costAccum.add(cost) + val sum = sums(bestCenter) + axpy(1.0, point.vector, sum) + counts(bestCenter) += 1 } -val contribs = for (i <- 0 until runs; j <- 0 until k) yield { - ((i, j), (sums(i)(j), counts(i)(j))) -} -contribs.iterator - }.reduceByKey(mergeContribs).collectAsMap() - - bcActiveCenters.destroy(blocking = false) - - // Update the cluster centers and costs for each active run - for ((run, i) <- activeRuns.zipWithIndex) { -var changed = false -var j = 0 -while (j < k) { - val (sum, count) = totalContribs((i, j)) - if (count != 0) { -scal(1.0 / count, sum) -val newCenter = new VectorWithNorm(sum) -if (KMeans.fastSquaredDistance(newCenter, centers(run)(j)) > epsilon * epsilon) { - changed = true -} -centers(run)(j) = newCenter - } - j += 1 +counts.indices.filter(counts(_) > 0).map(j => (j, (sums(j), counts(j.iterator + }.reduceByKey { case ((sum1, count1), (sum2, count2)) => +axpy(1.0, sum2, sum1) +(sum1, count1 + count2) + }.collectAsMap() + + bcCenters.destroy(blocking = false) + + // Update the cluster centers and costs + active = false + totalContribs.foreach { case (j, (sum, count)) => +scal(1.0 / count, sum) +val newCenter = new VectorWithNorm(sum) +if (!active && KMeans.fastSquaredDistance(newCenter, centers(j)) > epsilon * epsilon) { + active = true } -if (!changed) { - active(run) = false - logInfo("Run " + run + " finished in " + (iteration + 1) + " iterations") -}
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64913/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14859 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate bui...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14859 **[Test build #64913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64913/consoleFull)** for PR 14859 at commit [`43a5a44`](https://github.com/apache/spark/commit/43a5a44237d70261369e0d6d26ef12cbde604a3d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14948: [SPARK-17389] [SPARK-3261] [MLLIB] Significant KMeans sp...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/14948 Yep, these changes are not overlapping with #14937 for the most part, and they are actually same changes for the overlapping part. Let's make this PR focus on the optimization of initialization step and to fix it's likely to return duplicate centroids. For reducing initialization steps to 2 for default k-means||, I wonder the impact to the total training iteration number. It will definitely reduce the initialization time, but whether it will introduce more training iterations due to not good enough initial centers? If it does not introduce extra iterations for most cases, I think it's OK. Or we should trade off the initialization and training iterations. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14955 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14955 **[Test build #64919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64919/consoleFull)** for PR 14955 at commit [`43efac9`](https://github.com/apache/spark/commit/43efac98b93dc32a20296825862c57099819ca36). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14955 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64919/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14955 **[Test build #64919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64919/consoleFull)** for PR 14955 at commit [`43efac9`](https://github.com/apache/spark/commit/43efac98b93dc32a20296825862c57099819ca36). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14955: [SPARK-17394][SQL] should not allow specify database in ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14955 cc @yhuai @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14955: [SPARK-17394][SQL] should not allow specify datab...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/14955 [SPARK-17394][SQL] should not allow specify database in table/view name after RENAME TO ## What changes were proposed in this pull request? It's really weird that we allow users to specify database in both from table name and to table name in `ALTER TABLE RENAME TO`, as logically we can't support rename a table to a different database. Both postgres and MySQL disallow this syntax, it's reasonable to follow them and simply our code. ## How was this patch tested? new test in `DDLCommandSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark rename Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14955.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14955 commit 43efac98b93dc32a20296825862c57099819ca36 Author: Wenchen FanDate: 2016-09-04T08:29:09Z should not allow specify database in table/view name after RENAME TO --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #64918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64918/consoleFull)** for PR 14788 at commit [`03bb6a1`](https://github.com/apache/spark/commit/03bb6a1cf8984b97aa68f0cf4bfbf118fd75eeab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14894: [SPARK-17330] [SPARK UT] Clean up spark-warehouse...
Github user tone-zhang commented on a diff in the pull request: https://github.com/apache/spark/pull/14894#discussion_r77448364 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -109,6 +109,8 @@ class HiveSparkSubmitSuite } test("SPARK-8368: includes jars passed in through --jars") { +val warehousePath = System.getProperty("user.dir") + "/../../spark-warehouse" --- End diff -- @srowen Thanks a lot for your help! Yes, it is better to clean up the temporary path at the source. I will have a check and update the PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64917/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14951: [SPARK-17391] [TEST] [2.0] Fix Two Test Failures After B...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14951 thanks for fixing it! Sorry this is my bad, I should be more careful when backporting DDL related bug fixes to 2.0, as the code of master and 2.0 differ a lot now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14859: [WIP][SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14859 **[Test build #64916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64916/consoleFull)** for PR 14859 at commit [`ceef9bf`](https://github.com/apache/spark/commit/ceef9bf03b286910b5b96aa6129f537c8c7cae54). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14951: [SPARK-17391] [TEST] [2.0] Fix Two Test Failures ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14951#discussion_r77447824 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala --- @@ -77,7 +77,7 @@ class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleto "src") } - test("SPARK-6212: The EXPLAIN output of CTAS only shows the analyzed plan") { + test("SPARK-17230: The EXPLAIN output of CTAS only shows the analyzed plan") { --- End diff -- how did we break this test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org