[GitHub] spark issue #14867: [SPARK-17296][SQL] Simplify parser join processing.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14867 **[Test build #64880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64880/consoleFull)** for PR 14867 at commit [`3b13cd7`](https://github.com/apache/spark/commit/3b13cd7531e3f6f8e27c9cd231f8f9ea77c8fa39). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy a...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14517#discussion_r77420246 --- Diff: python/pyspark/sql/readwriter.py --- @@ -747,16 +800,25 @@ def _test(): except py4j.protocol.Py4JError: spark = SparkSession(sc) +seed = int(time() * 1000) --- End diff -- It's better to have determistic test, testing with parquet should be enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user davies commented on the issue: https://github.com/apache/spark/pull/14866 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14866: [SPARK-17298][SQL] Require explicit CROSS join fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14866 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14866 Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77419498 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.hadoop.fs.Path + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTablePartition +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * A [[BasicFileCatalog]] for a metastore catalog table. + * + * @param sparkSession a [[SparkSession]] + * @param db the table's database name + * @param table the table's (unqualified) name + * @param partitionSchema the schema of a partitioned table's partition columns + * @param sizeInBytes the table's data size in bytes + */ +class TableFileCatalog( +sparkSession: SparkSession, +db: String, +table: String, +partitionSchema: Option[StructType], +override val sizeInBytes: Long) + extends SessionFileCatalog(sparkSession) { + + override protected val hadoopConf = sparkSession.sessionState.newHadoopConf + + private val externalCatalog = sparkSession.sharedState.externalCatalog + + private val catalogTable = externalCatalog.getTable(db, table) + + private val baseLocation = catalogTable.storage.locationUri + + override def rootPaths: Seq[Path] = baseLocation.map(new Path(_)).toSeq + + override def listFiles(filters: Seq[Expression]): Seq[Partition] = partitionSchema match { +case Some(partitionSchema) => + externalCatalog.listPartitionsByFilter(db, table, filters).flatMap { +case CatalogTablePartition(spec, storage, _) => + storage.locationUri.map(new Path(_)).map { path => +val files = listDataLeafFiles(path :: Nil).toSeq +val values = + InternalRow.fromSeq(partitionSchema.map { case StructField(name, dataType, _, _) => +Cast(Literal(spec(name)), dataType).eval() + }) +Partition(values, files) + } + } +case None => + Partition(InternalRow.empty, listDataLeafFiles(rootPaths).toSeq) :: Nil + } + + override def refresh(): Unit = {} + + + /** + * Returns a [[ListingFileCatalog]] for this table restricted to the subset of partitions + * specified by the given partition-pruning filters. + * + * @param filters partition-pruning filters + */ + def filterPartitions(filters: Seq[Expression]): ListingFileCatalog = { --- End diff -- It seems a little weird to have catalogs that refer to a pruned table. We should try to do this at execution time instead, so that planning does not block behind pruning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77419464 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -79,8 +79,16 @@ object FileSourceStrategy extends Strategy with Logging { ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet))) logInfo(s"Pruning directories with: ${partitionKeyFilters.mkString(",")}") + val prunedFsRelation = fsRelation.location match { --- End diff -- Can we push this pruning into the scan (i.e. do it when computing `inputRDD` in `FileSourceScanExec`)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77419510 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -346,11 +340,30 @@ trait FileCatalog { */ def listFiles(filters: Seq[Expression]): Seq[Partition] + /** Refresh any cached file listings */ + def refresh(): Unit + + /** Sum of table file sizes, in bytes */ + def sizeInBytes: Long +} + +/** + * A [[BasicFileCatalog]] which can enumerate all of the files comprising a relation and, from + * those, infer the relation's partition specification. + */ +trait FileCatalog extends BasicFileCatalog { --- End diff -- What's the motivation behind splitting FileCatalog and BasicFileCatalog? Is it to prevent accidental calls to allFiles()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77419448 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -184,7 +184,7 @@ case class FileSourceScanExec( "Batched" -> supportsBatch.toString, "PartitionFilters" -> partitionFilters.mkString("[", ", ", "]"), "PushedFilters" -> dataFilters.mkString("[", ", ", "]"), -"InputPaths" -> relation.location.paths.mkString(", ")) +"RootPaths" -> relation.location.rootPaths.mkString(", ")) --- End diff -- Btw, it would be nice to make sure the physical plan still has a good debug string when you call explain (i.e. tells which catalog it's using) since that will greatly impact performance in this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77419496 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.hadoop.fs.Path + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTablePartition +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * A [[BasicFileCatalog]] for a metastore catalog table. + * + * @param sparkSession a [[SparkSession]] + * @param db the table's database name + * @param table the table's (unqualified) name + * @param partitionSchema the schema of a partitioned table's partition columns + * @param sizeInBytes the table's data size in bytes + */ +class TableFileCatalog( +sparkSession: SparkSession, +db: String, +table: String, +partitionSchema: Option[StructType], +override val sizeInBytes: Long) + extends SessionFileCatalog(sparkSession) { + + override protected val hadoopConf = sparkSession.sessionState.newHadoopConf + + private val externalCatalog = sparkSession.sharedState.externalCatalog --- End diff -- Can we make this an explicit constructor parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14941: [SPARK-16334] Reusing same dictionary column for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14941 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r77419455 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2531,6 +2531,8 @@ class Dataset[T] private[sql]( */ def inputFiles: Array[String] = { val files: Seq[String] = logicalPlan.collect { + case LogicalRelation(HadoopFsRelation(_, location: FileCatalog, _, _, _, _, _), _, _) => --- End diff -- Hm, should we still have HadoopFsRelation implement FileRelation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...
Github user davies commented on the issue: https://github.com/apache/spark/pull/14797 Merged this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user davies commented on the issue: https://github.com/apache/spark/pull/14941 Merging this into master and 2.0 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14941 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64870/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14941 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14797 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14941 **[Test build #64870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64870/consoleFull)** for PR 14941 at commit [`efda298`](https://github.com/apache/spark/commit/efda29864506b4a9eb716652e0fcf5cd705c9b4c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14867: [SPARK-17296][SQL] Simplify parser join processin...
Github user srinathshankar commented on a diff in the pull request: https://github.com/apache/spark/pull/14867#discussion_r77418488 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala --- @@ -360,10 +360,25 @@ class PlanParserSuite extends PlanTest { test("left anti join", LeftAnti, testExistence) test("anti join", LeftAnti, testExistence) +// Test natural cross join +intercept("select * from a natural cross join b") + +// Test natural join with a condition +intercept("select * from a natural join b on a.id = b.id") + // Test multiple consecutive joins assertEqual( "select * from a join b join c right join d", table("a").join(table("b")).join(table("c")).join(table("d"), RightOuter).select(star())) + +// SPARK-17296 +assertEqual( + "select * from t1 cross join t2 join t3 on t3.id = t1.id join t4 on t4.id = t1.id", --- End diff -- To clarify, it looks like your patch will disallow both queries at the parser level. Could you add a test that enforces this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user heroldus commented on the issue: https://github.com/apache/spark/pull/14941 @davies Fine, thx. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14638 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64872/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14638 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14638 **[Test build #64872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64872/consoleFull)** for PR 14638 at commit [`3857e32`](https://github.com/apache/spark/commit/3857e321ac86c5e4777b508eb60999312a233e99). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14872: [SPARK-3162][MLlib][WIP] Add local tree training ...
Github user smurching closed the pull request at: https://github.com/apache/spark/pull/14872 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14931 The issue I see is how easy is it for the driver to know that? Adding a new flag to the `SlaveLost` class doesn't mean that you know how to set its value. I'm pretty sure, on the YARN side, that we don't know when hosts die, just that a container on that host went away. Maybe Standalone or Mesos would have that info more easily available (e.g. the `WorkerWatcher` code for Standalone). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/14866 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14942 cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64869/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #64869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64869/consoleFull)** for PR 14866 at commit [`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14872 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64879/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14872 **[Test build #64879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64879/consoleFull)** for PR 14872 at commit [`8d443ce`](https://github.com/apache/spark/commit/8d443ce38f958e7b83b502e614e01c824cb63c4b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14872 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...
Github user srinathshankar commented on the issue: https://github.com/apache/spark/pull/14797 Looks fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14872 **[Test build #64879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64879/consoleFull)** for PR 14872 at commit [`8d443ce`](https://github.com/apache/spark/commit/8d443ce38f958e7b83b502e614e01c824cb63c4b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14867: [SPARK-17296][SQL] Simplify parser join processin...
Github user srinathshankar commented on a diff in the pull request: https://github.com/apache/spark/pull/14867#discussion_r77417316 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala --- @@ -360,10 +360,25 @@ class PlanParserSuite extends PlanTest { test("left anti join", LeftAnti, testExistence) test("anti join", LeftAnti, testExistence) +// Test natural cross join +intercept("select * from a natural cross join b") + +// Test natural join with a condition +intercept("select * from a natural join b on a.id = b.id") + // Test multiple consecutive joins assertEqual( "select * from a join b join c right join d", table("a").join(table("b")).join(table("c")).join(table("d"), RightOuter).select(star())) + +// SPARK-17296 +assertEqual( + "select * from t1 cross join t2 join t3 on t3.id = t1.id join t4 on t4.id = t1.id", --- End diff -- How is something like SELECT * FROM T1 INNER JOIN T2 INNER JOIN T3 ON col3 = col2 ON col3 = col1; supposed to parse ? Without your change it returns the following error: org.apache.spark.sql.AnalysisException: cannot resolve '`col3`' given input columns: [col1, col2]; line 1 pos 63 which I don't understand. The following parses though: SELECT * FROM T1 INNER JOIN T2 INNER JOIN T3 ON col1 = col2 ON col2 = col1 and returns a result --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/14797 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #3245 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3245/consoleFull)** for PR 14866 at commit [`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14887: [SPARK-17321][YARN] YARN shuffle service should u...
Github user zhaoyunjiong closed the pull request at: https://github.com/apache/spark/pull/14887 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14929: [SPARK-17374][SQL] Better error messages when par...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14929#discussion_r77416301 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala --- @@ -62,8 +68,39 @@ class JacksonParser( throw new RuntimeException(s"Malformed line in FAILFAST mode: $record") } if (options.dropMalformed) { - logWarning(s"Dropping malformed line: $record") + if (!isWarningPrintedForMalformedRecord) { +logWarning( + s"""Found at least one malformed records (sample: $record). The JSON reader will drop + |all malformed records in current $DROP_MALFORMED_MODE parser mode. To find out which + |corrupted records have been dropped, please switch the parser mode to $PERMISSIVE_MODE + |mode and use the default inferred schema. + | + |Code example to print all malformed records (scala): + |=== + |// The corrupted record exists in column ${columnNameOfCorruptRecord} + |val parsedJson = spark.read.json("/path/to/json/file/test.json") + | + """.stripMargin) +isWarningPrintedForMalformedRecord = true + } Nil +} else if (schema.getFieldIndex(columnNameOfCorruptRecord).isEmpty) { + if (!isWarningPrintedForMalformedRecord) { +logWarning( + s"""Found at least one malformed records (sample: $record). The JSON reader will replace --- End diff -- It is different, although similar. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #64878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64878/consoleFull)** for PR 9 at commit [`47f182b`](https://github.com/apache/spark/commit/47f182b88242dbc2fa198591de5099b5644f4076). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64878/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #64878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64878/consoleFull)** for PR 9 at commit [`47f182b`](https://github.com/apache/spark/commit/47f182b88242dbc2fa198591de5099b5644f4076). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r77414688 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -139,16 +145,32 @@ class KMeansSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultR val kmeans = new KMeans() testEstimatorAndModelReadWrite(kmeans, dataset, KMeansSuite.allParamSettings, checkModelData) } + + test("Initialize using given cluster centers") { --- End diff -- I think the current test is OK to assert the right behavior of initialModel. And it's more economic to test with only one or two iterations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #64877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64877/consoleFull)** for PR 9 at commit [`d4f59d9`](https://github.com/apache/spark/commit/d4f59d9b2331df89b2745ed6050634defeaee08d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14938 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64867/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14938 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14938 **[Test build #64867 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64867/consoleFull)** for PR 14938 at commit [`b57bbb6`](https://github.com/apache/spark/commit/b57bbb6704cd360427126da2e2e1ef2e8f758e93). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user davies commented on the issue: https://github.com/apache/spark/pull/14941 @heroldus decodeDictionaryIds() is only used when a batch across pages with different encoding (dictionary or plain), so it's not in the hot pass, I think the performance impact should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14942 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14942 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64871/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14942 **[Test build #64871 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64871/consoleFull)** for PR 14942 at commit [`41db2cb`](https://github.com/apache/spark/commit/41db2cbe02afae68c82297f76d685fe4e6edf10c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14854 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14854 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64866/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14854 **[Test build #64866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64866/consoleFull)** for PR 14854 at commit [`32c3959`](https://github.com/apache/spark/commit/32c395966ed085371af025dc44d690280c726ea9). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class LevelDBProvider ` * ` public static class StoreVersion ` * ` public static class AppId ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user heroldus commented on the issue: https://github.com/apache/spark/pull/14941 @sameeragarwal: Do you expect any performace impact of this commit? It's an additional `if (!column.isNullAt(i))` for every single value read. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should handle th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14527 **[Test build #64874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64874/consoleFull)** for PR 14527 at commit [`af51466`](https://github.com/apache/spark/commit/af5146672228c34fe1bc0c720bf6d4cd267f9747). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14426 **[Test build #64875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64875/consoleFull)** for PR 14426 at commit [`2cc19b3`](https://github.com/apache/spark/commit/2cc19b362745a6d55c5102eadf55e65f191709f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14116 **[Test build #64876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64876/consoleFull)** for PR 14116 at commit [`d7bfc7b`](https://github.com/apache/spark/commit/d7bfc7b4ad1d350932b5d6a09327b25ad9b3d315). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14623 **[Test build #64873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64873/consoleFull)** for PR 14623 at commit [`2b8f2cb`](https://github.com/apache/spark/commit/2b8f2cba5e41ac9ec8d6a31723cac0b9640d24ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14638 **[Test build #64872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64872/consoleFull)** for PR 14638 at commit [`3857e32`](https://github.com/apache/spark/commit/3857e321ac86c5e4777b508eb60999312a233e99). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user davies commented on the issue: https://github.com/apache/spark/pull/14941 LGTM, pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14942 **[Test build #64871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64871/consoleFull)** for PR 14942 at commit [`41db2cb`](https://github.com/apache/spark/commit/41db2cbe02afae68c82297f76d685fe4e6edf10c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/14941 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14942: [SparkR][Minor] Fix docs for sparkR.session and c...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14942 [SparkR][Minor] Fix docs for sparkR.session and count ## What changes were proposed in this pull request? This PR tries to add some more explanation to `sparkR.session`. It also modifies doc for `count` so when grouped in one doc, the description doesn't confuse users. ## How was this patch tested? Manual test. ![screen shot 2016-09-02 at 1 21 36 pm](https://cloud.githubusercontent.com/assets/15318264/18217198/409613ac-7110-11e6-8dae-cb0c8df557bf.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark fixSparkRSessionDoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14942.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14942 commit 41db2cbe02afae68c82297f76d685fe4e6edf10c Author: Junyang QianDate: 2016-09-02T20:15:12Z Fix doc for sparkR.session and count. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...
Github user davies commented on the issue: https://github.com/apache/spark/pull/12436 @sitalkedia Have a quick look at this one, the use case sounds good, we should improve the stability for long running tasks. Could you explain a bit more how the current patch works? (in the PR description). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14941 **[Test build #64870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64870/consoleFull)** for PR 14941 at commit [`efda298`](https://github.com/apache/spark/commit/efda29864506b4a9eb716652e0fcf5cd705c9b4c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14941: [SPARK-16334] Reusing same dictionary column for ...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/14941 [SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error ## What changes were proposed in this pull request? This patch fixes a bug in the vectorized parquet reader that's caused by re-using the same dictionary column vector while reading consecutive row groups. Specifically, this issue manifests for a certain distribution of dictionary/plain encoded data while we read/populate the underlying bit packed dictionary data into a column-vector based data structure. ## How was this patch tested? Manually tested on datasets provided by the community. Thanks to Chris Perluss and Keith Kraus for their invaluable help in tracking down this issue! You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark parquet-exception-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14941.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14941 commit efda29864506b4a9eb716652e0fcf5cd705c9b4c Author: Sameer AgarwalDate: 2016-09-02T19:03:36Z Reusing dictionary column vectors for reading consecutive row groups shouldn't throw an error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14882: [SPARK-17316][Core] Make CoarseGrainedSchedulerBackend.r...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14882 I just checkpicked this one into branch 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14854 I'm actually going to close this now and will revisit later; the scheduling complexity may not be warranted now given benefits of simpler approaches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14854: [SPARK-17283][Core] Cancel job in RDD.take() as s...
Github user JoshRosen closed the pull request at: https://github.com/apache/spark/pull/14854 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14938 I just compared the wide schema benchmark on master with this patch and there do not seem to be performance regressions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64863/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #64863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64863/consoleFull)** for PR 14866 at commit [`7f3d67f`](https://github.com/apache/spark/commit/7f3d67fb6f4c49e14f67b4dda2e0e11e076808e5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #64869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64869/consoleFull)** for PR 14866 at commit [`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #3245 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3245/consoleFull)** for PR 14866 at commit [`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64865/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14881 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14881 **[Test build #64865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64865/consoleFull)** for PR 14881 at commit [`caeb91e`](https://github.com/apache/spark/commit/caeb91eb42ec47efd428c9a174d9d54c45f290fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/14854 @davies brought up a reasonable point that we might be able to achieve similar benefits with less complexity by replacing the exponential ramp-up with something that's linearly proportional to the amount of available executor cores, thereby running a larger number of smaller jobs. That approach is going to incur more per-job overheads but avoids adding any scheduler complexity --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #64868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64868/consoleFull)** for PR 14866 at commit [`686b549`](https://github.com/apache/spark/commit/686b54986875cd9d47d4b772764af06ba301d96e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14866 **[Test build #64868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64868/consoleFull)** for PR 14866 at commit [`686b549`](https://github.com/apache/spark/commit/686b54986875cd9d47d4b772764af06ba301d96e). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64868/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14940: [SPARK-17383][GRAPHX]LabelPropagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14940 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...
Github user srinathshankar commented on the issue: https://github.com/apache/spark/pull/14866 I'll update the python and R APIs in a follow up. Right now in python and R a cross join is done if no join exprs/columns and join types are specified. It would be good to require explicit cross joins in these apis as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14940: [SPARK-17383][GRAPHX]LabelPropagation
GitHub user bookling opened a pull request: https://github.com/apache/spark/pull/14940 [SPARK-17383][GRAPHX]LabelPropagation In the labelPropagation of graphx lib, node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have, but ignore the label it currently have. I think it is unreasonable, because the labe a node had is also useful. When a node trend to has a stable label, this means there is an association between two iterations, so a node not only affected by its neighbors, but also its current label. so I change the code, and use both the label of its neighbors and itself. This iterative process densely connected groups of nodes form a consensus on a unique label to form communities. But the communities of the LabelPropagation often discontinuous. Because when the label that most of its neighbors currents have are many,e.g, node "0" has 6 neigbors labed {"1","1","2","2","3","3"},it maybe randomly select a label. in order to get a stable label of communities, and prevent the randomness, so I chose the max lable of node. you can test graph with Edges: {10L->11L,10L->12L, 11L->12L,11L->14L,12L->14L,13L->14L,13L->15L,13L->16L,15L->16L,15L->17L,16L->17L };or dandelion shape {1L->2L,2L->7L,2L->3L,2L->4L,2L->5L,2L->6L},etc. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bookling/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14940.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14940 commit 11bdab6bb042cd2102570c96db17279cf6ebbd92 Author: booklingDate: 2016-08-30T17:51:43Z to solve "label shock " I have test the result, which is more reasonable Because the LabelPropagation often suffers "labe shock"ï¼ and the result of communities are often non-adjacent. I think the label of node is userful between adjacent supersteps, and the adjacent supersteps are relevant. commit bb875fef8f47ec99878d972f2c17b50123375a4c Author: bookling Date: 2016-08-30T17:55:06Z to reduce "label shock " I have test the result, which is more reasonable Because the LabelPropagation often suffers "labe shock"ï¼ and the result of communities are often non-adjacent. I think the label of node is userful between adjacent supersteps, and the adjacent supersteps are relevant. commit 60e6f0ee2a3cdfb2b526a6d12887513f3aabed42 Author: XiaoSen Lee Date: 2016-09-02T18:57:29Z Improvement labelPropagation of garphx lib In the labelPropagation of graphx lib, node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have, but ignore the label it currently have. I think it is unreasonable, because the labe a node had is also useful. When a node trend to has a stable label, this means there is an association between two iterations, so a node not only affected by its neighbors, but also its current label. so I change the code, and use both the label of its neighbors and itself. This iterative process densely connected groups of nodes form a consensus on a unique label to form communities. But the communities of the LabelPropagation often discontinuous. Because when the label that most of its neighbors currents have are many,e.g, node "0" has 6 neigbors labed {"1","1","2","2","3","3"},it maybe randomly select a label. in order to get a stable label of communities, and prevent the randomness, so I chose the max lable of node. you can test graph with Edges: {10L->11L,10L->12L, 11L->12L,11L->14L,12L->14L,13L->14L,13L->15L,13L->16L,15L->16L,15L->17L,16L->17L };or dandelion shape {1L->2L,2L->7L,2L->3L,2L->4L,2L->5L,2L->6L},etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14931 What if we added a flag to SlaveLost indicating if we think the entire host is lost? In many cases that should be true, if the event originated from worker loss or Mesos slave loss events. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14938 **[Test build #64867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64867/consoleFull)** for PR 14938 at commit [`b57bbb6`](https://github.com/apache/spark/commit/b57bbb6704cd360427126da2e2e1ef2e8f758e93). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64862/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14931 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14931 **[Test build #64862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64862/consoleFull)** for PR 14931 at commit [`2430b69`](https://github.com/apache/spark/commit/2430b698db4062aeded30018dceffc2700d32fe5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/14797#discussion_r77394975 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -479,13 +480,23 @@ case class DataSource( } } +// SPARK-17230: Resolve the partition columns so InsertIntoHadoopFsRelationCommand does +// not need to have the query as child, to avoid to analyze an optimized query, +// because InsertIntoHadoopFsRelationCommand will be optimized first. +val columns = partitionColumns.map { name => --- End diff -- This is only for write(), it does not have `val partitionSchema =` (others have). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14939: [SPARK-17376][SPARKR] followup - change since version
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14939 Ah thanks - I didn't notice this while merging the earlier PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14854 **[Test build #64866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64866/consoleFull)** for PR 14854 at commit [`32c3959`](https://github.com/apache/spark/commit/32c395966ed085371af025dc44d690280c726ea9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14881 **[Test build #64865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64865/consoleFull)** for PR 14881 at commit [`caeb91e`](https://github.com/apache/spark/commit/caeb91eb42ec47efd428c9a174d9d54c45f290fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14828: [SPARK-17258][SQL] Parse scientific decimal literals as ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14828 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14828: [SPARK-17258][SQL] Parse scientific decimal literals as ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14828 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64861/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org