[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193585815
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52628/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193585535
  
**[Test build #52628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52628/consoleFull)**
 for PR 11569 at commit 
[`ac8f4c9`](https://github.com/apache/spark/commit/ac8f4c9ca56b592c32c60dc945023050df89bdb4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13600] [MLlib] [WIP] Incorrect number o...

2016-03-07 Thread oliverpierson
Github user oliverpierson commented on the pull request:

https://github.com/apache/spark/pull/11553#issuecomment-193584221
  
Putting this up for review now.  Tests are passing on my machine.  Using 
`approxQuantile` in DataFrame stats reduces amount of code required by a good 
bit.

As for the default `relativeError` value, which is passed onto 
`approxQuantile`... perhaps @jkbradley has a suggestion?  I basically chose 
0.01 on whim, since I couldn't really make a compelling argument for any 
particular value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-193583045
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-193583046
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52620/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-193582844
  
**[Test build #52620 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52620/consoleFull)**
 for PR 11565 at commit 
[`467b095`](https://github.com/apache/spark/commit/467b095d89ce641f568aade09d710fb9ea573273).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193582705
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193582706
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52621/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193582028
  
**[Test build #52621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52621/consoleFull)**
 for PR 11550 at commit 
[`c51d4ef`](https://github.com/apache/spark/commit/c51d4efbe72e4713a53ee7706996bef837d79fa5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193580343
  
**[Test build #52628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52628/consoleFull)**
 for PR 11569 at commit 
[`ac8f4c9`](https://github.com/apache/spark/commit/ac8f4c9ca56b592c32c60dc945023050df89bdb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55311096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -345,8 +343,6 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Prunes the unused columns from child of 
Aggregate/Window/Expand/Generate
 case a @ Aggregate(_, _, child) if (child.outputSet -- 
a.references).nonEmpty =>
   a.copy(child = prunedChild(child, a.references))
-case w @ Window(_, _, _, _, child) if (child.outputSet -- 
w.references).nonEmpty =>
--- End diff --

Seems we even don't have any tests for this in `ColumnPruningSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193579525
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193579527
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52617/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193578999
  
**[Test build #52617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52617/consoleFull)**
 for PR 11235 at commit 
[`312cb32`](https://github.com/apache/spark/commit/312cb326922624e95528b7f2dc92129c59b3b524).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOT-FIX][BUILD] Use the new location of `chec...

2016-03-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11567#issuecomment-193578502
  
`PySpark` failure is irrelevant for this PR, but I rebased this PR to the 
master because this is still a problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOT-FIX][BUILD] Use the new location of `chec...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11567#issuecomment-193578669
  
**[Test build #52627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52627/consoleFull)**
 for PR 11567 at commit 
[`4a58fba`](https://github.com/apache/spark/commit/4a58fba530df6e4b665389804908d04da88e7d4f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193576069
  
@falaki Just let to know, I changed the name `CSVInferSchema` to 
`InferSchema` mainly for consistent names for CSV and JSON data source but 
maybe they might have to be `CSVInferSchema` and `JSONInferSchema`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SPARK-13720][SQL] SQL generation...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193573893
  
**[Test build #52626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52626/consoleFull)**
 for PR 11555 at commit 
[`c82229a`](https://github.com/apache/spark/commit/c82229a42efec9131652435b9543df81d1feab6c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9325][SPARK-R] collect() head() and sho...

2016-03-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11336#issuecomment-193573753
  
I personally find it confusing having to reason about when we can 
"head"/"collect"/"show" and when we cannot, and that's why the Scala/Python 
version of the API didn't have this feature.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12719][SQL] SQL generation support for ...

2016-03-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11563#issuecomment-193572743
  
"as we wanted to generate SQLs which is closer to the original SQL"

Why is this a goal? I worry about the fragility of this two cases, if we 
really only need one to satisfy correctness.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55309414
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -144,6 +146,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of an inner Join to filter out rows with 
null keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddFilterOfNullForInnerJoin extends Rule[LogicalPlan] with 
PredicateHelper {
--- End diff --

Renamed. Will do part of semi join and outer join in separate PR once this 
getting merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193566395
  
It seems better to keep SparkR as a base package providing core 
functionalities, while visualization features can be implemented in other 
packages based on SparkR. There is an example at 
https://github.com/PAPL-SKKU/ggplot2.SparkR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12719][SQL] SQL generation support for ...

2016-03-07 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/11563#discussion_r55309116
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/LogicalPlanToSQLSuite.scala 
---
@@ -445,4 +461,86 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest 
with SQLTestUtils {
   "f1", "b[0].f1", "f1", "c[foo]", "d[0]"
 )
   }
+
+  test("SQL generation for generate") {
--- End diff --

@rxin I have split the tests into 5 groups. Pl. let me know if it looks ok 
to you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13738][SQL] Cleanup Data Source resolut...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11572#issuecomment-193565461
  
**[Test build #52625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52625/consoleFull)**
 for PR 11572 at commit 
[`cf7c719`](https://github.com/apache/spark/commit/cf7c719b72896450affad9b866ad9077a6140e40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SPARK-13720][SQL] SQL generation...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193562698
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10387][ML] Add code gen for gbt

2016-03-07 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/9524#discussion_r55308566
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/codeGenerator.scala 
---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.tree
+
+import org.codehaus.janino.ClassBodyEvaluator
+
+import org.apache.spark.Logging
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+
+/**
+ * An object for creating a code generated decision tree model.
+ * NodeToTree is used to convert a node to a series if code gen
+ * if/else statements conditions returning the predicition for a
+ * given vector.
+ * getScorer wraps this and provides a function we can use to get
+ * the prediction.
+ */
+private[spark] object CodeGenerationDecisionTreeModel extends Logging {
+  private val prefix = "mllibCodeGen"
+  private val curId = new java.util.concurrent.atomic.AtomicInteger()
+
+  /**
+   * Compile the Java source code into a Java class, using Janino.
+   * Based on Spark SQL's implementation. This should be moved to a common 
class
+   * once we have multiple code generators in ML.
+   *
+   * It will track the time used to compile
+   */
+  protected def compile(code: String, implements: Array[Class[_]]): 
Class[_] = {
+val startTime = System.nanoTime()
+val evaluator = new ClassBodyEvaluator()
+val clName = freshName()
+evaluator.setParentClassLoader(getClass.getClassLoader)
+evaluator.setImplementedInterfaces(implements)
+evaluator.setClassName(clName)
+evaluator.setDefaultImports(Array(
+  "org.apache.spark.mllib.linalg.Vectors",
+  "org.apache.spark.mllib.linalg.Vector"
+))
+evaluator.cook(s"${clName}.java", code)
+val clazz = evaluator.getClazz()
+val endTime = System.nanoTime()
+def timeMs: Double = (endTime - startTime).toDouble / 100
+logDebug(s"Compiled Java code (${code.size} bytes) in $timeMs ms")
+clazz
+  }
+
+  protected def freshName(): String = {
+s"$prefix${curId.getAndIncrement}"
+  }
+
+
+  /**
+   * Convert the tree starting at the provided root node into a code 
generated
+   * series of if/else statements. If the tree is too large to fit in a 
single
+   * in-line method breaks it up into multiple methods.
+   * Returns a string for the current function body and a string of any 
additional
+   * functions.
+   */
+  def nodeToTree(root: Node, depth: Int): (String, String) = {
+// Handle the different types of nodes
+root match {
+  case node: InternalNode => {
+// Handle trees that get too large to fit in a single in-line java 
method
+depth match {
+  case 8 => {
+val newFunctionName = freshName()
+val newFunction = nodeToFunction(root, newFunctionName)
+(s"return ${newFunctionName}(input);", newFunction)
+  }
+  case _ => {
+val nodeSplit = node.split
+val (leftSubCode, leftSubFunction) = 
nodeToTree(node.leftChild, depth + 1)
+val (rightSubCode, rightSubFunction) = 
nodeToTree(node.rightChild, depth + 1)
+val subCode = nodeSplit match {
+  case split: CategoricalSplit => {
+val isLeft = split.isLeft
+isLeft match {
+  case true => s"""
+  if 
(categories.contains(input.apply(${split.featureIndex}))) {
--- End diff --

Sounds like a good idea, I'll take a look at in-lining this for small sets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a 

[GitHub] spark pull request: [SPARK-12718][SPARK-13720][SQL] SQL generation...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193562700
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52616/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13737][SQL][wip]Add getOrCreate method ...

2016-03-07 Thread mwws
Github user mwws closed the pull request at:

https://github.com/apache/spark/pull/11571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SPARK-13720][SQL] SQL generation...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193562358
  
**[Test build #52616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52616/consoleFull)**
 for PR 11555 at commit 
[`656a13a`](https://github.com/apache/spark/commit/656a13a84be56de2a6806296492951016082092e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13738][SQL] Cleanup Data Source resolut...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11572#issuecomment-193562059
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52624/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13738][SQL] Cleanup Data Source resolut...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11572#issuecomment-193562058
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13737][SQL][wip]Add getOrCreate method ...

2016-03-07 Thread mwws
Github user mwws commented on the pull request:

https://github.com/apache/spark/pull/11571#issuecomment-193562023
  
OK, thanks for the explanation, I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13738][SQL] Cleanup Data Source resolut...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11572#issuecomment-193562054
  
**[Test build #52624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52624/consoleFull)**
 for PR 11572 at commit 
[`36969f8`](https://github.com/apache/spark/commit/36969f8671ed396e9ed2027b8ee2c7435bbf7dfc).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13738][SQL] Cleanup Data Source resolut...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11572#issuecomment-193561759
  
**[Test build #52624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52624/consoleFull)**
 for PR 11572 at commit 
[`36969f8`](https://github.com/apache/spark/commit/36969f8671ed396e9ed2027b8ee2c7435bbf7dfc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9325][SPARK-R] collect() head() and sho...

2016-03-07 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/11336#issuecomment-193561063
  
A column can be applied to different dataframes.
For example, if both df1 and df2 have a column named "col",then
col <- column("col")
collect(select(df1, col))
collect(select(df2, col))
both works.

Take the join case above as example,
You have can different DataFrames resulting from different joins on both 
df1 and df2,
and apply c3 to the different resulting DataFrames also work.

So how do you know which dataFrame to associate with a column in such cases?

@rxin, any comments on this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13695] Don't cache MEMORY_AND_DISK bloc...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11533#issuecomment-193560625
  
**[Test build #2615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2615/consoleFull)**
 for PR 11533 at commit 
[`8f332a7`](https://github.com/apache/spark/commit/8f332a7c14aff8aebfd8b36ec56fa33b8330605e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13664][SQL] Cleanup Data Source resolut...

2016-03-07 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/11572

[SPARK-13664][SQL] Cleanup Data Source resolution

Follow-up to #11509, that simply refactors the interface that we use when 
resolving a pluggable `DataSource`.
 - Multiple functions share the same set of arguments so we make this a 
case class `DataSource`.  Actual resolution is now done by calling a function.
 - Instead of having multiple methods named `apply` (some of which do 
writing some of which do reading) we now explicitly have `resolveRelation(...)` 
and `write(...)`.
 - Get rid of `Array[String]` since this is an internal API and was forcing 
us to call `toArray` in a bunch of places.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark dataSourceResolution

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11572


commit 36969f8671ed396e9ed2027b8ee2c7435bbf7dfc
Author: Michael Armbrust 
Date:   2016-03-08T02:18:44Z

[SPARK-13664][SQL] Cleanup Data Source resolution




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13737][SQL][wip]Add getOrCreate method ...

2016-03-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11571#issuecomment-193560334
  
But this can only be added to 2.0 (we won't be able to change an existing 
release). If users already need to change the constructor in order to use it, 
why don't they just create a SQLContext/SparkSession?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-07 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11301#discussion_r55307944
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 ---
@@ -50,7 +50,7 @@ object ExpressionSet {
 class ExpressionSet protected(
 protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
 protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
-  extends Set[Expression] {
+  extends Set[Expression] with Serializable {
--- End diff --

Yes, I got an exception regarding non-serializable in test suites in 
```hive``` when ```ExpressionSet``` is not ```Serializable```. This is why I 
added ```Serialiable``` to ```ExpressionSet```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13527] [SQL] Prune Filters based on Con...

2016-03-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11406#discussion_r55307963
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -769,6 +770,28 @@ object CombineFilters extends Rule[LogicalPlan] {
 }
 
 /**
+ * Remove all the deterministic conditions in a [[Filter]] that are 
guaranteed to be true
+ * given the constraints on the child's output.
+ */
+object PruneFilters extends Rule[LogicalPlan] with PredicateHelper {
--- End diff --

Sure, will do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13527] [SQL] Prune Filters based on Con...

2016-03-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11406#discussion_r55307448
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -769,6 +770,28 @@ object CombineFilters extends Rule[LogicalPlan] {
 }
 
 /**
+ * Remove all the deterministic conditions in a [[Filter]] that are 
guaranteed to be true
+ * given the constraints on the child's output.
+ */
+object PruneFilters extends Rule[LogicalPlan] with PredicateHelper {
--- End diff --

Looks like `SimplifyFilters` is similar in purpose with this rule. Can we 
merge them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOT-FIX][BUILD] Use the new location of `chec...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11567#issuecomment-19389
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOT-FIX][BUILD] Use the new location of `chec...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11567#issuecomment-19395
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52614/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13737][SQL][wip]Add getOrCreate method ...

2016-03-07 Thread mwws
Github user mwws commented on the pull request:

https://github.com/apache/spark/pull/11571#issuecomment-19338
  
@rxin HiveContext is heavily used by many users now, and many of them still 
coupled with old spark version. As this change would be trivial but not 
constructive, I think there is not conflict with the context combination work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOT-FIX][BUILD] Use the new location of `chec...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11567#issuecomment-193555142
  
**[Test build #52614 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52614/consoleFull)**
 for PR 11567 at commit 
[`380ceb3`](https://github.com/apache/spark/commit/380ceb30823ea2fbd76a33538381a64fe3d5171a).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13737][SQL][wip]Add getOrCreate method ...

2016-03-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11571#issuecomment-193555075
  
Thanks for the pull request. So we are actually going to deprecate 
HiveContext because it has been one of the most confusing contexts in Spark.

See more in https://issues.apache.org/jira/browse/SPARK-13485


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11557#issuecomment-193554969
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52613/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11557#issuecomment-193554967
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11557#issuecomment-193554765
  
**[Test build #52613 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52613/consoleFull)**
 for PR 11557 at commit 
[`723edfb`](https://github.com/apache/spark/commit/723edfba11c40e832916d90b5d1453c926317022).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ParseException(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13689] [SQL] Move helper things in Cata...

2016-03-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11529


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13737][SQL][wip]Add getOrCreate method ...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11571#issuecomment-193553927
  
**[Test build #52623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52623/consoleFull)**
 for PR 11571 at commit 
[`a64a0a4`](https://github.com/apache/spark/commit/a64a0a4bb9dad43b837678f06f45e7a15215826f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13689] [SQL] Move helper things in Cata...

2016-03-07 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/11529#issuecomment-193553008
  
Merging into master, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13737][SQL][wip]Add getOrCreate method ...

2016-03-07 Thread mwws
GitHub user mwws opened a pull request:

https://github.com/apache/spark/pull/11571

[SPARK-13737][SQL][wip]Add getOrCreate method for HiveContext

There is a "getOrCreate" method in SQLContext, which is useful to 
recoverable streaming application with SQL operation. 

https://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
But the corresponding method is missing in HiveContext.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mwws/spark SPARK-HiveGetOrCreate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11571


commit a64a0a4bb9dad43b837678f06f45e7a15215826f
Author: mwws 
Date:   2016-03-08T01:48:38Z

Add getOrCreate method for HiveContext




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-529] [sql] Modify SQLConf to use new co...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11570#issuecomment-193552569
  
**[Test build #52622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52622/consoleFull)**
 for PR 11570 at commit 
[`884926c`](https://github.com/apache/spark/commit/884926c76e0403eca0aba43319eb28c37eca2e66).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-529] [sql] Modify SQLConf to use new co...

2016-03-07 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/11570

[SPARK-529] [sql] Modify SQLConf to use new config API from core.

Because SQL keeps track of all known configs, some customization was
needed in SQLConf to allow that, since the core API does not have that
feature.

Tested via existing (and slightly updated) unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-529-sql

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11570


commit 884926c76e0403eca0aba43319eb28c37eca2e66
Author: Marcelo Vanzin 
Date:   2015-12-07T19:54:00Z

[SPARK-529] [sql] Modify SQLConf to use new config API from core.

Because SQL keeps track of all known configs, some customization was
needed in SQLConf to allow that, since the core API does not have that
feature.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13692][CORE][SQL] Fix trivial Coverity/...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11530#issuecomment-193552260
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13692][CORE][SQL] Fix trivial Coverity/...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11530#issuecomment-193552263
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52611/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13692][CORE][SQL] Fix trivial Coverity/...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11530#issuecomment-193552052
  
**[Test build #52611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52611/consoleFull)**
 for PR 11530 at commit 
[`9a0f8fa`](https://github.com/apache/spark/commit/9a0f8fabeccf56800dd8af74c39f14a99b8041a7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13711][Core]Don't call SparkUncaughtExc...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11566#issuecomment-193551318
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52609/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13711][Core]Don't call SparkUncaughtExc...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11566#issuecomment-193551316
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13711][Core]Don't call SparkUncaughtExc...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11566#issuecomment-193551056
  
**[Test build #52609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52609/consoleFull)**
 for PR 11566 at commit 
[`398859c`](https://github.com/apache/spark/commit/398859cf12df28a38b1fbf0d740eb14a1af20e63).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193550601
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52618/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193550529
  
**[Test build #52618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52618/consoleFull)**
 for PR 11569 at commit 
[`d19992b`](https://github.com/apache/spark/commit/d19992b4ec5141221cbf8724dc592b09e541039b).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193550598
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193550315
  
**[Test build #52621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52621/consoleFull)**
 for PR 11550 at commit 
[`c51d4ef`](https://github.com/apache/spark/commit/c51d4efbe72e4713a53ee7706996bef837d79fa5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-193550327
  
**[Test build #52620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52620/consoleFull)**
 for PR 11565 at commit 
[`467b095`](https://github.com/apache/spark/commit/467b095d89ce641f568aade09d710fb9ea573273).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193549873
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13667][SQL] Support for specifying cust...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11550#issuecomment-193549215
  
**[Test build #52619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52619/consoleFull)**
 for PR 11550 at commit 
[`db27259`](https://github.com/apache/spark/commit/db27259629721f2e584457b4e5739baabfd851ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55304511
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -345,8 +343,6 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Prunes the unused columns from child of 
Aggregate/Window/Expand/Generate
 case a @ Aggregate(_, _, child) if (child.outputSet -- 
a.references).nonEmpty =>
   a.copy(child = prunedChild(child, a.references))
-case w @ Window(_, _, _, _, child) if (child.outputSet -- 
w.references).nonEmpty =>
--- End diff --

First, `w.outputSet` always include `child.outputSet`.

Second, `w.references` only include the expressions present in the current 
`Window` operator. This set does not include attributes that are implicitly 
referenced by being passed through to the output tuple. Thus, it is not valid 
now. It will wrongly prune the child, if we keep it. Please correct me if my 
understanding is wrong. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11274#issuecomment-193548355
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52612/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11274#issuecomment-193548354
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11274#issuecomment-193547869
  
**[Test build #52612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52612/consoleFull)**
 for PR 11274 at commit 
[`f431170`](https://github.com/apache/spark/commit/f4311709dd0c66add99aeb248acdc70863fba239).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193545016
  
**[Test build #52618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52618/consoleFull)**
 for PR 11569 at commit 
[`d19992b`](https://github.com/apache/spark/commit/d19992b4ec5141221cbf8724dc592b09e541039b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13577] [yarn] Allow Spark jar to be mul...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11500#issuecomment-193543523
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52607/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13577] [yarn] Allow Spark jar to be mul...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11500#issuecomment-193543521
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13577] [yarn] Allow Spark jar to be mul...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11500#issuecomment-193542877
  
**[Test build #52607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52607/consoleFull)**
 for PR 11500 at commit 
[`9bab2ea`](https://github.com/apache/spark/commit/9bab2ea1fc5bbb91497e1994b3613cd6cdc4b3be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12719][SQL] SQL generation support for ...

2016-03-07 Thread dilipbiswal
Github user dilipbiswal commented on the pull request:

https://github.com/apache/spark/pull/11563#issuecomment-193542558
  
@rxin.  Hi Reynold,

We have two cases to handle.
```SQL
SELECT explode(array(1,2,3)) FROM src
SELECT gentab2.* FROM t1 LATERAL VIEW explode(array(array(1,2,3))) gentab1 
AS gencol1 LATERAL VIEW explode(gentab1.gencol1) gentab2 AS gencol2
```
Currently, I handle the first case in `projToSql` and the 2nd case in 
`generateToSql`, 
as we wanted to generate SQLs which is closer to the original SQL. 

Lateral view also can refer to columns from tables before itself. So i felt 
it is safer to 
generate the SQL very close to the source SQL to reduce any risk. I also 
thought 
about treating the first case as a special case of LATERAL view. In this 
case we 
had to handle the generation of a table alias which is missing in case-1 
and fixing up
 the projection list above to refer to it.

However, I went with the approach in this PR as it didn't seem too complex 
and also retained the layout of the original SQL. I could be easily overlooking 
something here and would appreciate your guidance. Please let me know.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11108][ML] OneHotEncoder should support...

2016-03-07 Thread frreiss
Github user frreiss commented on the pull request:

https://github.com/apache/spark/pull/9777#issuecomment-193542138
  
LGTM aside from that one typo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11108][ML] OneHotEncoder should support...

2016-03-07 Thread frreiss
Github user frreiss commented on a diff in the pull request:

https://github.com/apache/spark/pull/9777#discussion_r55302705
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala ---
@@ -132,8 +133,10 @@ class OneHotEncoder(override val uid: String) extends 
Transformer
   val numAttrs = 
dataset.select(col(inputColName).cast(DoubleType)).map(_.getDouble(0))
 .aggregate(0.0)(
   (m, x) => {
-assert(x >=0.0 && x == x.toInt,
-  s"Values from column $inputColName must be indices, but got 
$x.")
+assert(x <= Int.MaxValue,
+  s"OneHotEncoder only supports up to ${Int.MaxValue} indices, 
but got $x")
+assert(x >= 0.0 && x == x.toInt,
+  s"Values e column $inputColName must be indices, but got 
$x.")
--- End diff --

values *in column


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11535][ML] handling empty string in Str...

2016-03-07 Thread thunterdb
Github user thunterdb commented on the pull request:

https://github.com/apache/spark/pull/9522#issuecomment-193542005
  
@pravingadakh sorry for the delay. Would you mind resolving the conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193541806
  
**[Test build #52617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52617/consoleFull)**
 for PR 11235 at commit 
[`312cb32`](https://github.com/apache/spark/commit/312cb326922624e95528b7f2dc92129c59b3b524).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13549] [SQL] Refactor the Optimizer Rul...

2016-03-07 Thread frreiss
Github user frreiss commented on the pull request:

https://github.com/apache/spark/pull/11427#issuecomment-193540108
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...

2016-03-07 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11490#issuecomment-193539323
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193538403
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193538292
  
**[Test build #52615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52615/consoleFull)**
 for PR 11569 at commit 
[`0ad424b`](https://github.com/apache/spark/commit/0ad424bbcd03bf4c57566dbe92e537db213ba187).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13734][SPARKR] Added histogram function

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11569#issuecomment-193538409
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52615/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13711][Core]Don't call SparkUncaughtExc...

2016-03-07 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/11566#issuecomment-193537547
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11565#discussion_r55301619
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -345,8 +343,6 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Prunes the unused columns from child of 
Aggregate/Window/Expand/Generate
 case a @ Aggregate(_, _, child) if (child.outputSet -- 
a.references).nonEmpty =>
   a.copy(child = prunedChild(child, a.references))
-case w @ Window(_, _, _, _, child) if (child.outputSet -- 
w.references).nonEmpty =>
--- End diff --

Isn't it still a valid optimization?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13711][Core]Don't call SparkUncaughtExc...

2016-03-07 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11566#discussion_r55301512
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala ---
@@ -125,16 +125,14 @@ private[spark] class AppClient(
   registerMasterFutures.set(tryRegisterAllMasters())
   registrationRetryTimer.set(registrationRetryThread.schedule(new 
Runnable {
 override def run(): Unit = {
-  Utils.tryOrExit {
-if (registered.get) {
-  registerMasterFutures.get.foreach(_.cancel(true))
-  registerMasterThreadPool.shutdownNow()
-} else if (nthRetry >= REGISTRATION_RETRIES) {
-  markDead("All masters are unresponsive! Giving up.")
-} else {
-  registerMasterFutures.get.foreach(_.cancel(true))
-  registerWithMaster(nthRetry + 1)
-}
+  if (registered.get) {
+registerMasterFutures.get.foreach(_.cancel(true))
+registerMasterThreadPool.shutdownNow()
+  } else if (nthRetry >= REGISTRATION_RETRIES) {
+markDead("All masters are unresponsive! Giving up.")
--- End diff --

FYI, this line will call `sc.stop()`: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala#L136


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SPARK-13720][SQL] SQL generation...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193534923
  
**[Test build #52616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52616/consoleFull)**
 for PR 11555 at commit 
[`656a13a`](https://github.com/apache/spark/commit/656a13a84be56de2a6806296492951016082092e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13711][Core]Don't call SparkUncaughtExc...

2016-03-07 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11566#issuecomment-193534718
  
> So what happens now if the scheduled Runnable throws an exception?

Just go to `Thread.getDefaultUncaughtExceptionHandler()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13675][UI] Fix wrong historyserver url ...

2016-03-07 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11518#issuecomment-193534057
  
Yes, I also tested with multiple attempts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13404] [SQL] Create variables for input...

2016-03-07 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11274#issuecomment-193534012
  
sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12718][SPARK-13720][SQL] SQL generation...

2016-03-07 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11555#issuecomment-193533951
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13732] [SQL] Remove projectList from Wi...

2016-03-07 Thread frreiss
Github user frreiss commented on the pull request:

https://github.com/apache/spark/pull/11565#issuecomment-193533582
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13711][Core]Don't call SparkUncaughtExc...

2016-03-07 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/11566#issuecomment-193533661
  
So what happens now if the scheduled Runnable throws an exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11011][SQL] Narrow type of UDT serializ...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11379#issuecomment-193531378
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52600/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11011][SQL] Narrow type of UDT serializ...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11379#issuecomment-193531376
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12469][CORE][WIP/RFC] Consistent accumu...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11105#issuecomment-193531213
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52601/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12469][CORE][WIP/RFC] Consistent accumu...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11105#issuecomment-193531212
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   >