date:20160902

[GitHub] spark issue #14867: [SPARK-17296][SQL] Simplify parser join processing.

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14867
  
**[Test build #64880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64880/consoleFull)**
 for PR 14867 at commit 
[`3b13cd7`](https://github.com/apache/spark/commit/3b13cd7531e3f6f8e27c9cd231f8f9ea77c8fa39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy a...

2016-09-02 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/14517#discussion_r77420246
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -747,16 +800,25 @@ def _test():
 except py4j.protocol.Py4JError:
 spark = SparkSession(sc)
 
+seed = int(time() * 1000)
--- End diff --

It's better to have determistic test, testing with parquet should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/14866
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14866: [SPARK-17298][SQL] Require explicit CROSS join fo...

2016-09-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14866


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14866
  
Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14690#discussion_r77419498
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala
 ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTablePartition
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * A [[BasicFileCatalog]] for a metastore catalog table.
+ *
+ * @param sparkSession a [[SparkSession]]
+ * @param db the table's database name
+ * @param table the table's (unqualified) name
+ * @param partitionSchema the schema of a partitioned table's partition 
columns
+ * @param sizeInBytes the table's data size in bytes
+ */
+class TableFileCatalog(
+sparkSession: SparkSession,
+db: String,
+table: String,
+partitionSchema: Option[StructType],
+override val sizeInBytes: Long)
+  extends SessionFileCatalog(sparkSession) {
+
+  override protected val hadoopConf = 
sparkSession.sessionState.newHadoopConf
+
+  private val externalCatalog = sparkSession.sharedState.externalCatalog
+
+  private val catalogTable = externalCatalog.getTable(db, table)
+
+  private val baseLocation = catalogTable.storage.locationUri
+
+  override def rootPaths: Seq[Path] = baseLocation.map(new Path(_)).toSeq
+
+  override def listFiles(filters: Seq[Expression]): Seq[Partition] = 
partitionSchema match {
+case Some(partitionSchema) =>
+  externalCatalog.listPartitionsByFilter(db, table, filters).flatMap {
+case CatalogTablePartition(spec, storage, _) =>
+  storage.locationUri.map(new Path(_)).map { path =>
+val files = listDataLeafFiles(path :: Nil).toSeq
+val values =
+  InternalRow.fromSeq(partitionSchema.map { case 
StructField(name, dataType, _, _) =>
+Cast(Literal(spec(name)), dataType).eval()
+  })
+Partition(values, files)
+  }
+  }
+case None =>
+  Partition(InternalRow.empty, listDataLeafFiles(rootPaths).toSeq) :: 
Nil
+  }
+
+  override def refresh(): Unit = {}
+
+
+  /**
+   * Returns a [[ListingFileCatalog]] for this table restricted to the 
subset of partitions
+   * specified by the given partition-pruning filters.
+   *
+   * @param filters partition-pruning filters
+   */
+  def filterPartitions(filters: Seq[Expression]): ListingFileCatalog = {
--- End diff --

It seems a little weird to have catalogs that refer to a pruned table. We 
should try to do this at execution time instead, so that planning does not 
block behind pruning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14690#discussion_r77419464
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 ---
@@ -79,8 +79,16 @@ object FileSourceStrategy extends Strategy with Logging {
 
ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet)))
   logInfo(s"Pruning directories with: 
${partitionKeyFilters.mkString(",")}")
 
+  val prunedFsRelation = fsRelation.location match {
--- End diff --

Can we push this pruning into the scan (i.e. do it when computing 
`inputRDD` in `FileSourceScanExec`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14690#discussion_r77419510
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -346,11 +340,30 @@ trait FileCatalog {
*/
   def listFiles(filters: Seq[Expression]): Seq[Partition]
 
+  /** Refresh any cached file listings */
+  def refresh(): Unit
+
+  /** Sum of table file sizes, in bytes */
+  def sizeInBytes: Long
+}
+
+/**
+ * A [[BasicFileCatalog]] which can enumerate all of the files comprising 
a relation and, from
+ * those, infer the relation's partition specification.
+ */
+trait FileCatalog extends BasicFileCatalog {
--- End diff --

What's the motivation behind splitting FileCatalog and BasicFileCatalog? Is 
it to prevent accidental calls to allFiles()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14690#discussion_r77419448
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -184,7 +184,7 @@ case class FileSourceScanExec(
 "Batched" -> supportsBatch.toString,
 "PartitionFilters" -> partitionFilters.mkString("[", ", ", "]"),
 "PushedFilters" -> dataFilters.mkString("[", ", ", "]"),
-"InputPaths" -> relation.location.paths.mkString(", "))
+"RootPaths" -> relation.location.rootPaths.mkString(", "))
--- End diff --

Btw, it would be nice to make sure the physical plan still has a good debug 
string when you call explain (i.e. tells which catalog it's using) since that 
will greatly impact performance in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14690#discussion_r77419496
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TableFileCatalog.scala
 ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTablePartition
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * A [[BasicFileCatalog]] for a metastore catalog table.
+ *
+ * @param sparkSession a [[SparkSession]]
+ * @param db the table's database name
+ * @param table the table's (unqualified) name
+ * @param partitionSchema the schema of a partitioned table's partition 
columns
+ * @param sizeInBytes the table's data size in bytes
+ */
+class TableFileCatalog(
+sparkSession: SparkSession,
+db: String,
+table: String,
+partitionSchema: Option[StructType],
+override val sizeInBytes: Long)
+  extends SessionFileCatalog(sparkSession) {
+
+  override protected val hadoopConf = 
sparkSession.sessionState.newHadoopConf
+
+  private val externalCatalog = sparkSession.sharedState.externalCatalog
--- End diff --

Can we make this an explicit constructor parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14941: [SPARK-16334] Reusing same dictionary column for ...

2016-09-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14941


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-09-02 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/14690#discussion_r77419455
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2531,6 +2531,8 @@ class Dataset[T] private[sql](
*/
   def inputFiles: Array[String] = {
 val files: Seq[String] = logicalPlan.collect {
+  case LogicalRelation(HadoopFsRelation(_, location: FileCatalog, _, 
_, _, _, _), _, _) =>
--- End diff --

Hm, should we still have HadoopFsRelation implement FileRelation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...

2016-09-02 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/14797
  
Merged this into master and 2.0 branch, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/14941
  
Merging this into master and 2.0 branch, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14941
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64870/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14941
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...

2016-09-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14797


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14941
  
**[Test build #64870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64870/consoleFull)**
 for PR 14941 at commit 
[`efda298`](https://github.com/apache/spark/commit/efda29864506b4a9eb716652e0fcf5cd705c9b4c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14867: [SPARK-17296][SQL] Simplify parser join processin...

2016-09-02 Thread srinathshankar

Github user srinathshankar commented on a diff in the pull request:

https://github.com/apache/spark/pull/14867#discussion_r77418488
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -360,10 +360,25 @@ class PlanParserSuite extends PlanTest {
 test("left anti join", LeftAnti, testExistence)
 test("anti join", LeftAnti, testExistence)
 
+// Test natural cross join
+intercept("select * from a natural cross join b")
+
+// Test natural join with a condition
+intercept("select * from a natural join b on a.id = b.id")
+
 // Test multiple consecutive joins
 assertEqual(
   "select * from a join b join c right join d",
   table("a").join(table("b")).join(table("c")).join(table("d"), 
RightOuter).select(star()))
+
+// SPARK-17296
+assertEqual(
+  "select * from t1 cross join t2 join t3 on t3.id = t1.id join t4 on 
t4.id = t1.id",
--- End diff --

To clarify, it looks like your patch will disallow both queries at the 
parser level. Could you add a test that enforces this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread heroldus

Github user heroldus commented on the issue:

https://github.com/apache/spark/pull/14941
  
@davies Fine, thx.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64872/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #64872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64872/consoleFull)**
 for PR 14638 at commit 
[`3857e32`](https://github.com/apache/spark/commit/3857e321ac86c5e4777b508eb60999312a233e99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14872: [SPARK-3162][MLlib][WIP] Add local tree training ...

2016-09-02 Thread smurching

Github user smurching closed the pull request at:

https://github.com/apache/spark/pull/14872


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-02 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/14931
  
The issue I see is how easy is it for the driver to know that? Adding a new 
flag to the `SlaveLost` class doesn't mean that you know how to set its value.

I'm pretty sure, on the YARN side, that we don't know when hosts die, just 
that a container on that host went away. Maybe Standalone or Mesos would have 
that info more easily available (e.g. the `WorkerWatcher` code for Standalone).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread sameeragarwal

Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/14866
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count

2016-09-02 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14942
  
cc @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64869/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #64869 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64869/consoleFull)**
 for PR 14866 at commit 
[`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14872
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64879/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14872
  
**[Test build #64879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64879/consoleFull)**
 for PR 14872 at commit 
[`8d443ce`](https://github.com/apache/spark/commit/8d443ce38f958e7b83b502e614e01c824cb63c4b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14872
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...

2016-09-02 Thread srinathshankar

Github user srinathshankar commented on the issue:

https://github.com/apache/spark/pull/14797
  
Looks fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14872: [SPARK-3162][MLlib][WIP] Add local tree training for dec...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14872
  
**[Test build #64879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64879/consoleFull)**
 for PR 14872 at commit 
[`8d443ce`](https://github.com/apache/spark/commit/8d443ce38f958e7b83b502e614e01c824cb63c4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14867: [SPARK-17296][SQL] Simplify parser join processin...

2016-09-02 Thread srinathshankar

Github user srinathshankar commented on a diff in the pull request:

https://github.com/apache/spark/pull/14867#discussion_r77417316
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -360,10 +360,25 @@ class PlanParserSuite extends PlanTest {
 test("left anti join", LeftAnti, testExistence)
 test("anti join", LeftAnti, testExistence)
 
+// Test natural cross join
+intercept("select * from a natural cross join b")
+
+// Test natural join with a condition
+intercept("select * from a natural join b on a.id = b.id")
+
 // Test multiple consecutive joins
 assertEqual(
   "select * from a join b join c right join d",
   table("a").join(table("b")).join(table("c")).join(table("d"), 
RightOuter).select(star()))
+
+// SPARK-17296
+assertEqual(
+  "select * from t1 cross join t2 join t3 on t3.id = t1.id join t4 on 
t4.id = t1.id",
--- End diff --

How is something like 
SELECT * FROM T1 INNER JOIN T2 INNER JOIN T3 ON col3 = col2 ON col3 = col1;
supposed to parse ? 
Without your change it returns the following error:
org.apache.spark.sql.AnalysisException: cannot resolve '`col3`' given input 
columns: [col1, col2]; line 1 pos 63
which I don't understand. The following parses though:
SELECT * FROM T1 INNER JOIN T2 INNER JOIN T3 ON col1 = col2 ON col2 = col1
and returns a result


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14797: [SPARK-17230] [SQL] Should not pass optimized query into...

2016-09-02 Thread sameeragarwal

Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/14797
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #3245 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3245/consoleFull)**
 for PR 14866 at commit 
[`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14887: [SPARK-17321][YARN] YARN shuffle service should u...

2016-09-02 Thread zhaoyunjiong

Github user zhaoyunjiong closed the pull request at:

https://github.com/apache/spark/pull/14887


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14929: [SPARK-17374][SQL] Better error messages when par...

2016-09-02 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14929#discussion_r77416301
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonParser.scala
 ---
@@ -62,8 +68,39 @@ class JacksonParser(
   throw new RuntimeException(s"Malformed line in FAILFAST mode: 
$record")
 }
 if (options.dropMalformed) {
-  logWarning(s"Dropping malformed line: $record")
+  if (!isWarningPrintedForMalformedRecord) {
+logWarning(
+  s"""Found at least one malformed records (sample: $record). The 
JSON reader will drop
+ |all malformed records in current $DROP_MALFORMED_MODE parser 
mode. To find out which
+ |corrupted records have been dropped, please switch the 
parser mode to $PERMISSIVE_MODE
+ |mode and use the default inferred schema.
+ |
+ |Code example to print all malformed records (scala):
+ |===
+ |// The corrupted record exists in column 
${columnNameOfCorruptRecord}
+ |val parsedJson = 
spark.read.json("/path/to/json/file/test.json")
+ |
+   """.stripMargin)
+isWarningPrintedForMalformedRecord = true
+  }
   Nil
+} else if (schema.getFieldIndex(columnNameOfCorruptRecord).isEmpty) {
+  if (!isWarningPrintedForMalformedRecord) {
+logWarning(
+  s"""Found at least one malformed records (sample: $record). The 
JSON reader will replace
--- End diff --

It is different, although similar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9
  
**[Test build #64878 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64878/consoleFull)**
 for PR 9 at commit 
[`47f182b`](https://github.com/apache/spark/commit/47f182b88242dbc2fa198591de5099b5644f4076).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64878/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9
  
**[Test build #64878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64878/consoleFull)**
 for PR 9 at commit 
[`47f182b`](https://github.com/apache/spark/commit/47f182b88242dbc2fa198591de5099b5644f4076).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-02 Thread yinxusen

Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r77414688
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -139,16 +145,32 @@ class KMeansSuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultR
 val kmeans = new KMeans()
 testEstimatorAndModelReadWrite(kmeans, dataset, 
KMeansSuite.allParamSettings, checkModelData)
   }
+
+  test("Initialize using given cluster centers") {
--- End diff --

I think the current test is OK to assert the right behavior of 
initialModel. And it's more economic to test with only one or two iterations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9
  
**[Test build #64877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64877/consoleFull)**
 for PR 9 at commit 
[`d4f59d9`](https://github.com/apache/spark/commit/d4f59d9b2331df89b2745ed6050634defeaee08d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14938
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64867/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14938
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14938
  
**[Test build #64867 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64867/consoleFull)**
 for PR 14938 at commit 
[`b57bbb6`](https://github.com/apache/spark/commit/b57bbb6704cd360427126da2e2e1ef2e8f758e93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/14941
  
@heroldus decodeDictionaryIds() is only used when a batch across pages with 
different encoding (dictionary or plain), so it's not in the hot pass, I think 
the performance impact should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14942
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14942
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64871/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14942
  
**[Test build #64871 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64871/consoleFull)**
 for PR 14942 at commit 
[`41db2cb`](https://github.com/apache/spark/commit/41db2cbe02afae68c82297f76d685fe4e6edf10c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14854
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64866/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14854
  
**[Test build #64866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64866/consoleFull)**
 for PR 14854 at commit 
[`32c3959`](https://github.com/apache/spark/commit/32c395966ed085371af025dc44d690280c726ea9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class LevelDBProvider `
  * `  public static class StoreVersion `
  * `  public static class AppId `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread heroldus

Github user heroldus commented on the issue:

https://github.com/apache/spark/pull/14941
  
@sameeragarwal: Do you expect any performace impact of this commit? It's an 
additional `if (!column.isNullAt(i))` for every single value read. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should handle th...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14527
  
**[Test build #64874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64874/consoleFull)**
 for PR 14527 at commit 
[`af51466`](https://github.com/apache/spark/commit/af5146672228c34fe1bc0c720bf6d4cd267f9747).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14426
  
**[Test build #64875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64875/consoleFull)**
 for PR 14426 at commit 
[`2cc19b3`](https://github.com/apache/spark/commit/2cc19b362745a6d55c5102eadf55e65f191709f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14116
  
**[Test build #64876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64876/consoleFull)**
 for PR 14116 at commit 
[`d7bfc7b`](https://github.com/apache/spark/commit/d7bfc7b4ad1d350932b5d6a09327b25ad9b3d315).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14623
  
**[Test build #64873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64873/consoleFull)**
 for PR 14623 at commit 
[`2b8f2cb`](https://github.com/apache/spark/commit/2b8f2cba5e41ac9ec8d6a31723cac0b9640d24ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #64872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64872/consoleFull)**
 for PR 14638 at commit 
[`3857e32`](https://github.com/apache/spark/commit/3857e321ac86c5e4777b508eb60999312a233e99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/14941
  
LGTM, pending jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14942: [SparkR][Minor] Fix docs for sparkR.session and count

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14942
  
**[Test build #64871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64871/consoleFull)**
 for PR 14942 at commit 
[`41db2cb`](https://github.com/apache/spark/commit/41db2cbe02afae68c82297f76d685fe4e6edf10c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread sameeragarwal

Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/14941
  
cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14942: [SparkR][Minor] Fix docs for sparkR.session and c...

2016-09-02 Thread junyangq

GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14942

[SparkR][Minor] Fix docs for sparkR.session and count

## What changes were proposed in this pull request?

This PR tries to add some more explanation to `sparkR.session`. It also 
modifies doc for `count` so when grouped in one doc, the description doesn't 
confuse users.

## How was this patch tested?

Manual test.

![screen shot 2016-09-02 at 1 21 36 
pm](https://cloud.githubusercontent.com/assets/15318264/18217198/409613ac-7110-11e6-8dae-cb0c8df557bf.png)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark fixSparkRSessionDoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14942


commit 41db2cbe02afae68c82297f76d685fe4e6edf10c
Author: Junyang Qian 
Date:   2016-09-02T20:15:12Z

Fix doc for sparkR.session and count.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-09-02 Thread davies

Github user davies commented on the issue:

https://github.com/apache/spark/pull/12436
  
@sitalkedia Have a quick look at this one, the use case sounds good, we 
should improve the stability for long running tasks. Could you explain a bit 
more how the current patch works? (in the PR description).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14941: [SPARK-16334] Reusing same dictionary column for decodin...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14941
  
**[Test build #64870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64870/consoleFull)**
 for PR 14941 at commit 
[`efda298`](https://github.com/apache/spark/commit/efda29864506b4a9eb716652e0fcf5cd705c9b4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14941: [SPARK-16334] Reusing same dictionary column for ...

2016-09-02 Thread sameeragarwal

GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/14941

[SPARK-16334] Reusing same dictionary column for decoding consecutive row 
groups shouldn't throw an error

## What changes were proposed in this pull request?

This patch fixes a bug in the vectorized parquet reader that's caused by 
re-using the same dictionary column vector while reading consecutive row 
groups. Specifically, this issue manifests for a certain distribution of 
dictionary/plain encoded data while we read/populate the underlying bit packed 
dictionary data into a column-vector based data structure.

## How was this patch tested?

Manually tested on datasets provided by the community. Thanks to Chris 
Perluss and Keith Kraus for their invaluable help in tracking down this issue!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark parquet-exception-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14941


commit efda29864506b4a9eb716652e0fcf5cd705c9b4c
Author: Sameer Agarwal 
Date:   2016-09-02T19:03:36Z

Reusing dictionary column vectors for reading consecutive row groups 
shouldn't throw an error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14882: [SPARK-17316][Core] Make CoarseGrainedSchedulerBackend.r...

2016-09-02 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14882
  
I just checkpicked this one into branch 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-02 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/14854
  
I'm actually going to close this now and will revisit later; the scheduling 
complexity may not be warranted now given benefits of simpler approaches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14854: [SPARK-17283][Core] Cancel job in RDD.take() as s...

2016-09-02 Thread JoshRosen

Github user JoshRosen closed the pull request at:

https://github.com/apache/spark/pull/14854


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...

2016-09-02 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14938
  
I just compared the wide schema benchmark on master with this patch and 
there do not seem to be performance regressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #64863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64863/consoleFull)**
 for PR 14866 at commit 
[`7f3d67f`](https://github.com/apache/spark/commit/7f3d67fb6f4c49e14f67b4dda2e0e11e076808e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #64869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64869/consoleFull)**
 for PR 14866 at commit 
[`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #3245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3245/consoleFull)**
 for PR 14866 at commit 
[`2509e45`](https://github.com/apache/spark/commit/2509e451326d673ba6ea9d4d9a4e3991ea73b291).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14881
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64865/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14881
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14881
  
**[Test build #64865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64865/consoleFull)**
 for PR 14881 at commit 
[`caeb91e`](https://github.com/apache/spark/commit/caeb91eb42ec47efd428c9a174d9d54c45f290fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-02 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/14854
  
@davies brought up a reasonable point that we might be able to achieve 
similar benefits with less complexity by replacing the exponential ramp-up with 
something that's linearly proportional to the amount of available executor 
cores, thereby running a larger number of smaller jobs. That approach is going 
to incur more per-job overheads but avoids adding any scheduler complexity


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #64868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64868/consoleFull)**
 for PR 14866 at commit 
[`686b549`](https://github.com/apache/spark/commit/686b54986875cd9d47d4b772764af06ba301d96e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14866
  
**[Test build #64868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64868/consoleFull)**
 for PR 14866 at commit 
[`686b549`](https://github.com/apache/spark/commit/686b54986875cd9d47d4b772764af06ba301d96e).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64868/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14940: [SPARK-17383][GRAPHX]LabelPropagation

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14940
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14866: [SPARK-17298][SQL] Require explicit CROSS join for carte...

2016-09-02 Thread srinathshankar

Github user srinathshankar commented on the issue:

https://github.com/apache/spark/pull/14866
  
I'll update the python and R APIs in a follow up. Right now in python and R 
a cross join is done if no join exprs/columns and join types are specified. It 
would be good to require explicit cross joins in these apis as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14940: [SPARK-17383][GRAPHX]LabelPropagation

2016-09-02 Thread bookling

GitHub user bookling opened a pull request:

https://github.com/apache/spark/pull/14940

[SPARK-17383][GRAPHX]LabelPropagation

In the labelPropagation of graphx lib, node is initialized with a unique
label and at every step each node adopts the label that most of its 
neighbors currently have, but ignore the label it currently have. I think it is 
unreasonable, because the labe a node had is also useful. When a node trend to 
has a stable label, this means there is an association between two iterations, 
so a node not only affected by its neighbors, but also its current label.
so I change the code, and use both the label of its neighbors and itself.

This iterative process densely connected groups of nodes form a consensus 
on a unique label to form
communities. But the communities of the LabelPropagation often 
discontinuous.
Because when the label that most of its neighbors currents have are 
many,e.g, node "0" has 6 neigbors labed {"1","1","2","2","3","3"},it maybe 
randomly select a label. in order to get a stable label of communities, and 
prevent the randomness, so I chose the max lable of node.

you can test graph with Edges: {10L->11L,10L->12L, 
11L->12L,11L->14L,12L->14L,13L->14L,13L->15L,13L->16L,15L->16L,15L->17L,16L->17L
 };or dandelion shape {1L->2L,2L->7L,2L->3L,2L->4L,2L->5L,2L->6L},etc.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bookling/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14940


commit 11bdab6bb042cd2102570c96db17279cf6ebbd92
Author: bookling 
Date:   2016-08-30T17:51:43Z

to solve "label shock "

I have test the result, which  is more reasonable
Because the LabelPropagation often suffers "labe shock"ï¼ and the result 
of communities are often non-adjacent.
 I think the label of node  is  userful between adjacent supersteps, and 
the adjacent supersteps are relevant.

commit bb875fef8f47ec99878d972f2c17b50123375a4c
Author: bookling 
Date:   2016-08-30T17:55:06Z

to reduce "label shock " 

I have test the result, which  is more reasonable
Because the LabelPropagation often suffers "labe shock"ï¼ and the result 
of communities are often non-adjacent.
 I think the label of node  is  userful between adjacent supersteps, and 
the adjacent supersteps are relevant.

commit 60e6f0ee2a3cdfb2b526a6d12887513f3aabed42
Author: XiaoSen Lee 
Date:   2016-09-02T18:57:29Z

Improvement labelPropagation of garphx lib



In the labelPropagation of graphx lib, node is initialized with a unique
label and at every step each node adopts the label that most of its 
neighbors currently have, but ignore the label it currently have. I think it is 
unreasonable, because the labe a node had is also useful. When a node trend to 
has a stable label, this means there is an association between two iterations, 
so a node not only affected by its neighbors, but also its current label.
so I change the code, and use both the label of its neighbors and itself.

This iterative process densely connected groups of nodes form a consensus 
on a unique label to form
communities. But the communities of the LabelPropagation often 
discontinuous.
Because when the label that most of its neighbors currents have are 
many,e.g, node "0" has 6 neigbors labed {"1","1","2","2","3","3"},it maybe 
randomly select a label. in order to get a stable label of communities, and 
prevent the randomness, so I chose the max lable of node.

you can test graph with Edges: {10L->11L,10L->12L, 
11L->12L,11L->14L,12L->14L,13L->14L,13L->15L,13L->16L,15L->16L,15L->17L,16L->17L
 };or dandelion shape {1L->2L,2L->7L,2L->3L,2L->4L,2L->5L,2L->6L},etc.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-02 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/14931
  
What if we added a flag to SlaveLost indicating if we think the entire host 
is lost? In many cases that should be true, if the event originated from worker 
loss or Mesos slave loss events.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14938: [SPARK-17335][SQL] Fix ArrayType and MapType CatalogStri...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14938
  
**[Test build #64867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64867/consoleFull)**
 for PR 14938 at commit 
[`b57bbb6`](https://github.com/apache/spark/commit/b57bbb6704cd360427126da2e2e1ef2e8f758e93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14931
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64862/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14931
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14931: [SPARK-17370] Shuffle service files not invalidated when...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14931
  
**[Test build #64862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64862/consoleFull)**
 for PR 14931 at commit 
[`2430b69`](https://github.com/apache/spark/commit/2430b698db4062aeded30018dceffc2700d32fe5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...

2016-09-02 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/14797#discussion_r77394975
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -479,13 +480,23 @@ case class DataSource(
   }
 }
 
+// SPARK-17230: Resolve the partition columns so 
InsertIntoHadoopFsRelationCommand does
+// not need to have the query as child, to avoid to analyze an 
optimized query,
+// because InsertIntoHadoopFsRelationCommand will be optimized 
first.
+val columns = partitionColumns.map { name =>
--- End diff --

This is only for write(), it does not have `val partitionSchema =` (others 
have).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14939: [SPARK-17376][SPARKR] followup - change since version

2016-09-02 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14939
  
Ah thanks - I didn't notice this while merging the earlier PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][Core] Cancel job in RDD.take() as soon as ...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14854
  
**[Test build #64866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64866/consoleFull)**
 for PR 14854 at commit 
[`32c3959`](https://github.com/apache/spark/commit/32c395966ed085371af025dc44d690280c726ea9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14881: [SPARK-17315][SparkR] Kolmogorov-Smirnov test SparkR wra...

2016-09-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14881
  
**[Test build #64865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64865/consoleFull)**
 for PR 14881 at commit 
[`caeb91e`](https://github.com/apache/spark/commit/caeb91eb42ec47efd428c9a174d9d54c45f290fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14828: [SPARK-17258][SQL] Parse scientific decimal literals as ...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14828
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14828: [SPARK-17258][SQL] Parse scientific decimal literals as ...

2016-09-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14828
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64861/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 >

101 - 200 of 497 matches

Mail list logo