date:20160303

[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...

2016-03-03 Thread MLnick

Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/10607#issuecomment-192170689
  
@sethah This looks fine to me though there are merge conflicts that need to 
be resolved.

It would be good to get this in ASAP so the work (and clean up that can 
happen) in [SPARK-12381](https://issues.apache.org/jira/browse/SPARK-12381) and 
[SPARK-12382](https://issues.apache.org/jira/browse/SPARK-12382) can begin.

@jkbradley can you take a quick pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192164174
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52452/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192164158
  
**[Test build #52452 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52452/consoleFull)**
 for PR 11403 at commit 
[`e2b9987`](https://github.com/apache/spark/commit/e2b998702a7f82bc5fdf41ab689efa56631af910).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends 
TreeNode[PlanType] `
  * `abstract class Exchange extends UnaryNode `
  * `case class ReusedExchange(override val output: Seq[Attribute], child: 
Exchange) extends LeafNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192164166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192163462
  
**[Test build #52452 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52452/consoleFull)**
 for PR 11403 at commit 
[`e2b9987`](https://github.com/apache/spark/commit/e2b998702a7f82bc5fdf41ab689efa56631af910).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...

2016-03-03 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/10607#discussion_r54998395
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
---
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.tree.impl
+
+import org.apache.spark.Logging
+import org.apache.spark.mllib.impl.PeriodicRDDCheckpointer
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.ml.regression.{DecisionTreeRegressionModel, 
DecisionTreeRegressor}
+import org.apache.spark.mllib.tree.configuration.Algo._
+import org.apache.spark.mllib.tree.configuration.BoostingStrategy
+import org.apache.spark.mllib.tree.impl.TimeTracker
+import org.apache.spark.mllib.tree.impurity.Variance
+import org.apache.spark.mllib.tree.loss.Loss
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+private[ml] object GradientBoostedTrees extends Logging {
+
+  /**
+   * Method to train a gradient boosting model
+   * @param input Training dataset: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]].
+   * @return a gradient boosted trees model that can be used for prediction
+   */
+  def run(input: RDD[LabeledPoint],
+  boostingStrategy: BoostingStrategy): 
(Array[DecisionTreeRegressionModel], Array[Double]) = {
+val algo = boostingStrategy.treeStrategy.algo
+algo match {
+  case Regression =>
+GradientBoostedTrees.boost(input, input, boostingStrategy, 
validate = false)
+  case Classification =>
+// Map labels to -1, +1 so binary classification can be treated as 
regression.
+val remappedInput = input.map(x => new LabeledPoint((x.label * 2) 
- 1, x.features))
+GradientBoostedTrees.boost(remappedInput, remappedInput, 
boostingStrategy, validate = false)
+  case _ =>
+throw new IllegalArgumentException(s"$algo is not supported by 
gradient boosting.")
+}
+  }
+
+  /**
+   * Method to validate a gradient boosting model
+   * @param input Training dataset: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]].
+   * @param validationInput Validation dataset.
+   *This dataset should be different from the 
training dataset,
+   *but it should follow the same distribution.
+   *E.g., these two datasets could be created from 
an original dataset
+   *by using 
[[org.apache.spark.rdd.RDD.randomSplit()]]
+   * @return a gradient boosted trees model that can be used for prediction
+   */
+  def runWithValidation(
+  input: RDD[LabeledPoint],
+  validationInput: RDD[LabeledPoint],
+  boostingStrategy: BoostingStrategy): 
(Array[DecisionTreeRegressionModel], Array[Double]) = {
+val algo = boostingStrategy.treeStrategy.algo
+algo match {
+  case Regression =>
+GradientBoostedTrees.boost(input, validationInput, 
boostingStrategy, validate = true)
+  case Classification =>
+// Map labels to -1, +1 so binary classification can be treated as 
regression.
+val remappedInput = input.map(
+  x => new LabeledPoint((x.label * 2) - 1, x.features))
+val remappedValidationInput = validationInput.map(
+  x => new LabeledPoint((x.label * 2) - 1, x.features))
+GradientBoostedTrees.boost(remappedInput, remappedValidationInput, 
boostingStrategy,
+  validate = true)
+  case _ =>
+throw new IllegalArgumentException(s"$algo is not supported by the 
gradient boosting.")
+}
+  }
+
+  /**
+   * Compute the initial predictions and errors for a dataset for the first
+   * iteration of gradient boosting.
+   * @param data: training data.
+   * @param initTreeWeight: learning rate assigned to the first tree.
+   * @param initT

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192161609
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192161600
  
**[Test build #52451 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52451/consoleFull)**
 for PR 11403 at commit 
[`42096c8`](https://github.com/apache/spark/commit/42096c8f1707fd9a66a5dcc4a0df4f8d9d8f046e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends 
TreeNode[PlanType] `
  * `abstract class Exchange extends UnaryNode `
  * `case class ReusedExchange(override val output: Seq[Attribute], child: 
Exchange) extends LeafNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192161616
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52451/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11403#issuecomment-192161109
  
**[Test build #52451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52451/consoleFull)**
 for PR 11403 at commit 
[`42096c8`](https://github.com/apache/spark/commit/42096c8f1707fd9a66a5dcc4a0df4f8d9d8f046e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11484#issuecomment-192155333
  
@kiszk this is not just for Sort operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread kiszk

Github user kiszk commented on the pull request:

https://github.com/apache/spark/pull/11484#issuecomment-192154924
  
Is it better to add "in sort" in a title of this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11513#issuecomment-192152376
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52449/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11513#issuecomment-192152366
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11513#issuecomment-192152149
  
**[Test build #52449 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52449/consoleFull)**
 for PR 11513 at commit 
[`8f95454`](https://github.com/apache/spark/commit/8f95454176b289c886f3eeaa82af0401541663d1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11484#discussion_r54997367
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -109,7 +114,10 @@ trait CodegenSupport extends SparkPlan {
 * Consume the columns generated from current SparkPlan, call it's 
parent.
 */
   final def consume(ctx: CodegenContext, input: Seq[ExprCode], row: String 
= null): String = {
-if (input != null) {
+// We check if input expressions has same length as output when:
+// 1. parent can't consume UnsafeRow and input is not null.
+// 2. parent consumes UnsafeRow and row is null.
+if ((input != null && !parent.consumeUnsafeRow) || 
(parent.consumeUnsafeRow && row == null)) {
--- End diff --

When the child knows its parent can consume UnsafeRow, it can choose to 
pass an UnsafeRow and empty input. If so, we don't need to check if 
`input.length == output.length` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7478][SQL] Added SQLContext.getOrCreate

2016-03-03 Thread mwws

Github user mwws commented on the pull request:

https://github.com/apache/spark/pull/6006#issuecomment-192150357
  
@jelez you can create an HiveContextSingleton to workaround it. Refer to 
example "SqlNetWorkWordCount"

@tdas Why you removed HiveContext.getOrCreate? I can't find obvious reasons 
from the conversation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11484#issuecomment-192148712
  
@davies Yea. That will be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13625][PYSPARK][ML] Added a check to se...

2016-03-03 Thread MLnick

Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/11476#issuecomment-192148581
  
@BryanCutler does it perhaps make sense to add a little test case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-12925. Improve HiveInspectors.unwrap for...

2016-03-03 Thread rajeshbalamohan

Github user rajeshbalamohan commented on the pull request:

https://github.com/apache/spark/pull/11477#issuecomment-192146953
  
Thanks @srowen . Incorporated the changes. 

This was tested with HiveCompatibilitySuite, HiveQuerySuite. These tests 
ran fine in master branch without the changes as well. However, when tried with 
1.6 branch, these test suites failed with the copy issues. Hence doing explicit 
bytes copy in master, so that this does not fail in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11484#issuecomment-192146205
  
@viirya Can we wait for #11274 , then we could avoid some complicity. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11484#discussion_r54996591
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -125,14 +133,27 @@ trait CodegenSupport extends SparkPlan {
   row: String = null): String = {
 ctx.freshNamePrefix = variablePrefix
 if (row != null) {
-  ctx.currentVars = null
-  ctx.INPUT_ROW = row
-  val evals = child.output.zipWithIndex.map { case (attr, i) =>
-BoundReference(i, attr.dataType, attr.nullable).gen(ctx)
+  val evals: Seq[ExprCode] = if (!consumeUnsafeRow) {
+// If this SparkPlan can't consume UnsafeRow and there is an 
UnsafeRow,
+// we extract the columns from the row and call doConsume.
+ctx.currentVars = null
+ctx.INPUT_ROW = row
+child.output.zipWithIndex.map { case (attr, i) =>
+  BoundReference(i, attr.dataType, attr.nullable).gen(ctx)
+}
+  } else {
+// If this SparkPlan consumes UnsafeRow and there is an UnsafeRow,
+// we don't need to unpack variables from the row.
+Seq.empty
+  }
+  val evalCode = if (evals.isEmpty) {
+""
+  } else {
+s"${evals.map(_.code).mkString("\n")}"
   }
   s"""
- | ${evals.map(_.code).mkString("\n")}
- | ${doConsume(ctx, evals)}
+ | $evalCode
+ | ${doConsume(ctx, evals, row)}
""".stripMargin
 } else {
   doConsume(ctx, input)
--- End diff --

pass `null` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11484#discussion_r54996609
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -125,14 +133,27 @@ trait CodegenSupport extends SparkPlan {
   row: String = null): String = {
 ctx.freshNamePrefix = variablePrefix
 if (row != null) {
-  ctx.currentVars = null
-  ctx.INPUT_ROW = row
-  val evals = child.output.zipWithIndex.map { case (attr, i) =>
-BoundReference(i, attr.dataType, attr.nullable).gen(ctx)
+  val evals: Seq[ExprCode] = if (!consumeUnsafeRow) {
+// If this SparkPlan can't consume UnsafeRow and there is an 
UnsafeRow,
+// we extract the columns from the row and call doConsume.
+ctx.currentVars = null
+ctx.INPUT_ROW = row
+child.output.zipWithIndex.map { case (attr, i) =>
+  BoundReference(i, attr.dataType, attr.nullable).gen(ctx)
+}
+  } else {
+// If this SparkPlan consumes UnsafeRow and there is an UnsafeRow,
+// we don't need to unpack variables from the row.
+Seq.empty
+  }
+  val evalCode = if (evals.isEmpty) {
+""
+  } else {
+s"${evals.map(_.code).mkString("\n")}"
   }
   s"""
- | ${evals.map(_.code).mkString("\n")}
- | ${doConsume(ctx, evals)}
+ | $evalCode
+ | ${doConsume(ctx, evals, row)}
""".stripMargin
 } else {
   doConsume(ctx, input)
--- End diff --

pass null here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11484#discussion_r54996552
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
@@ -93,7 +93,7 @@ case class Expand(
 child.asInstanceOf[CodegenSupport].produce(ctx, this)
   }
 
-  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
--- End diff --

I think we can remove the default value for row here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11435#issuecomment-192145369
  
**[Test build #52450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52450/consoleFull)**
 for PR 11435 at commit 
[`ed79eee`](https://github.com/apache/spark/commit/ed79eee5daeab177c4350f6f111898f0e7339309).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11484#discussion_r54996447
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -109,7 +114,10 @@ trait CodegenSupport extends SparkPlan {
 * Consume the columns generated from current SparkPlan, call it's 
parent.
 */
   final def consume(ctx: CodegenContext, input: Seq[ExprCode], row: String 
= null): String = {
-if (input != null) {
+// We check if input expressions has same length as output when:
+// 1. parent can't consume UnsafeRow and input is not null.
+// 2. parent consumes UnsafeRow and row is null.
+if ((input != null && !parent.consumeUnsafeRow) || 
(parent.consumeUnsafeRow && row == null)) {
--- End diff --

Why do we need to change this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11484#discussion_r54996174
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala 
---
@@ -67,7 +67,12 @@ trait CodegenSupport extends SparkPlan {
   /**
 * Which SparkPlan is calling produce() of this one. It's itself for 
the first SparkPlan.
 */
-  private var parent: CodegenSupport = null
+  protected var parent: CodegenSupport = null
+
+  /**
+* Whether this SparkPlan accepts UnsafeRow as input in consumeChild.
--- End diff --

consumeChild -> doConsume


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11499


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-03 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11499#issuecomment-192142269
  
Merging to master and 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11499#issuecomment-192141632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52445/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11499#issuecomment-192141626
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...

2016-03-03 Thread sameeragarwal

Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/11511#issuecomment-192141195
  
cc @nongli @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11499#issuecomment-192140867
  
**[Test build #52445 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52445/consoleFull)**
 for PR 11499 at commit 
[`7199237`](https://github.com/apache/spark/commit/71992375d2d3ad6e1b2db2769e21facb6c7cfe8c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11513#issuecomment-192139327
  
**[Test build #52449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52449/consoleFull)**
 for PR 11513 at commit 
[`8f95454`](https://github.com/apache/spark/commit/8f95454176b289c886f3eeaa82af0401541663d1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11511#issuecomment-192138366
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11511#issuecomment-192138372
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52447/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11511#issuecomment-192137331
  
**[Test build #52447 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52447/consoleFull)**
 for PR 11511 at commit 
[`dfb33ec`](https://github.com/apache/spark/commit/dfb33ecd27bb65903dd4a0a2cd6bfcd0d8d912c3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ReorderedPredicateSuite extends QueryTest with SharedSQLContext 
with PredicateHelper `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...

2016-03-03 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/11513

[ML] testEstimatorAndModelReadWrite should call checkModelData

## What changes were proposed in this pull request?
Although we defined ```checkModelData``` in ```read/write``` test of ML 
estimators/models and pass it to ```testEstimatorAndModelReadWrite```, 
```testEstimatorAndModelReadWrite``` omits to call ```checkModelData``` to 
check the equality of model data.
So actually we did not run the check of model data equality for all test 
cases currently, we should fix it.  
cc @jkbradley @mengxr 
## How was this patch tested?
No new unit test, should pass the exist ones.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark ml-check-model-data

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11513.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11513


commit 8f95454176b289c886f3eeaa82af0401541663d1
Author: Yanbo Liang 
Date:   2016-03-04T06:33:04Z

testEstimatorAndModelReadWrite should call checkModelData




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...

2016-03-03 Thread a1k0n

Github user a1k0n commented on the pull request:

https://github.com/apache/spark/pull/11505#issuecomment-192136179
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11505#issuecomment-192131987
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52444/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11505#issuecomment-192131985
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11512#issuecomment-192131877
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11512#issuecomment-192131879
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52448/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11505#issuecomment-192131731
  
**[Test build #52444 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52444/consoleFull)**
 for PR 11505 at commit 
[`4f78803`](https://github.com/apache/spark/commit/4f7880340f9c05e54b0758a308493b3d8dced83d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11512#issuecomment-192131732
  
**[Test build #52448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52448/consoleFull)**
 for PR 11512 at commit 
[`b461b71`](https://github.com/apache/spark/commit/b461b717ed51b532f823615bcb79f66b17635c4d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11453#issuecomment-192128896
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52443/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11453#issuecomment-192128892
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11453#issuecomment-192128187
  
**[Test build #52443 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52443/consoleFull)**
 for PR 11453 at commit 
[`5fbc714`](https://github.com/apache/spark/commit/5fbc714e3273ff5aadd347b53cc3af2d693db153).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...

2016-03-03 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11484#issuecomment-192128562
  
cc @davies @rxin @nongli 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11512#issuecomment-192125457
  
**[Test build #52448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52448/consoleFull)**
 for PR 11512 at commit 
[`b461b71`](https://github.com/apache/spark/commit/b461b717ed51b532f823615bcb79f66b17635c4d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...

2016-03-03 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/11512

[SPARK-13642][Yarn] Properly handle signal kill in ApplicationMaster

## What changes were proposed in this pull request?

This patch is fixing the race condition in ApplicationMaster when receiving 
a signal. In the current implementation, if signal is received and with no any 
exception, this application will be finished with successful state in Yarn, and 
there's no another attempt. Actually the application is killed by signal in the 
runtime, so another attempt is expected.

This patch adds a signal handler to handle the signal things, if signal is 
received, marking this application finished with failure, rather than success.

## How was this patch tested?

This patch is tested with following situations:

1. Application is finished normally.
2. Application is finished by calling `System.exit(n)`.
3. Application is killed by yarn command.
4. ApplicationMaster is killed by "SIGTERM" send by `kill pid` command.
5. ApplicationMaster is killed by NM with "SIGTERM" in case of NM failure.

All the scenarios return the expected states.

CC @tgravescs , please help to review this fix, thanks a lot.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-13642

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11512.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11512


commit b461b717ed51b532f823615bcb79f66b17635c4d
Author: jerryshao 
Date:   2016-03-04T05:52:18Z

Properly handle signal kill in ApplicationMaster




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11435#issuecomment-192122554
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11435#issuecomment-192122559
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52446/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11435#issuecomment-192122057
  
**[Test build #52446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52446/consoleFull)**
 for PR 11435 at commit 
[`f5f1e2b`](https://github.com/apache/spark/commit/f5f1e2be578ad40daafe25c6cc1b09bb4f8bb71a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13626] [core] Avoid duplicate config de...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11510#issuecomment-192121421
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13626] [core] Avoid duplicate config de...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11510#issuecomment-192121425
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52442/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13626] [core] Avoid duplicate config de...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11510#issuecomment-192120956
  
**[Test build #52442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52442/consoleFull)**
 for PR 11510 at commit 
[`c5338f6`](https://github.com/apache/spark/commit/c5338f6561d62ac4a869012f369df9339b1437cb).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...

2016-03-03 Thread devaraj-kavali

Github user devaraj-kavali commented on the pull request:

https://github.com/apache/spark/pull/11490#issuecomment-192118319
  
I agree @srowen, I see that SPARK_PUBLIC_DNS is not for binding purpose. I 
have changed the env var to SPARK_LOCAL_IP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13633] [SQL] Move things into catalyst....

2016-03-03 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/11506#issuecomment-192116711
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-192109035
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52441/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-192109034
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11487#issuecomment-192108763
  
**[Test build #52441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52441/consoleFull)**
 for PR 11487 at commit 
[`cee5896`](https://github.com/apache/spark/commit/cee58960dee030116ab5b027aaadc8203828d8cb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11502#issuecomment-192107904
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52438/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11283#issuecomment-192107849
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52440/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11283#issuecomment-192107847
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11502#issuecomment-192107901
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11502#issuecomment-192107709
  
**[Test build #52438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52438/consoleFull)**
 for PR 11502 at commit 
[`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11283#issuecomment-192107596
  
**[Test build #52440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52440/consoleFull)**
 for PR 11283 at commit 
[`9eaca51`](https://github.com/apache/spark/commit/9eaca515a3a86f07ed4ca85ba6da080ad605d1c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...

2016-03-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11270#issuecomment-192106315
  
@rxin If you think we should not list up not even once then, should we 
maybe then just detect the source only by given paths without listing up and 
then just leave the `sqlContext.conf.defaultDataSourceName` option?

So, in other words,
```bash
âââ iamjson.json # Detect success by the extension of 
`iamjson.json`
âÂ Â  âââ part-001
âÂ Â  âââ part-002
âââ iamjson  # Try use 
`sqlContext.conf.defaultDataSourceName` and then
â   â# throw an exception in Parquet-side.
âÂ Â  âââ part-001
âÂ Â  âââ part-002
âââ iamparquet.parquet   # Detect success by the extension of 
`iamparquet.parquet`
âÂ Â  âââ part-001.parquet
âÂ Â  âââ part-002.parquet
âââ iamparquet   # Just use 
`sqlContext.conf.defaultDataSourceName`
âââ part-001.parquet
âââ part-002.parquet
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

2016-03-03 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11372#discussion_r54990462
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+class NullFilteringSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches = Batch("NullFiltering", Once, NullFiltering) ::
+  Batch("CombineFilters", Once, CombineFilters) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
+
+  test("filter: filter out nulls in condition") {
+val originalQuery = testRelation.where('a === 1)
+val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 
1).analyze
--- End diff --

We can do that in a follow up pr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

2016-03-03 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11372#discussion_r54990555
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+class NullFilteringSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches = Batch("NullFiltering", Once, NullFiltering) ::
+  Batch("CombineFilters", Once, CombineFilters) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
+
+  test("filter: filter out nulls in condition") {
+val originalQuery = testRelation.where('a === 1)
+val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 
1).analyze
+val optimized = Optimize.execute(originalQuery.analyze)
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("join: filter out nulls on either side") {
+val x = testRelation.subquery('x)
+val y = testRelation.subquery('y)
+val originalQuery = x.join(y,
+  condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && 
"y.c".attr > 5))
+val left = x.where(IsNotNull('a) && IsNotNull('b))
+val right = y.where(IsNotNull('a) && IsNotNull('c))
+val correctAnswer = left.join(right,
+  condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && 
"y.c".attr > 5)).analyze
+val optimized = Optimize.execute(originalQuery.analyze)
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("join with pre-existing filters: filter out nulls on either side") {
+val x = testRelation.subquery('x)
+val y = testRelation.subquery('y)
+val originalQuery = x.where('b > 5).join(y.where('c === 10),
+  condition = Some("x.a".attr === "y.a".attr))
+val left = x.where(IsNotNull('a) && IsNotNull('b) && 'b > 5)
+val right = y.where(IsNotNull('a) && IsNotNull('c) && 'c === 10)
+val correctAnswer = left.join(right,
+  condition = Some("x.a".attr === "y.a".attr)).analyze
+val optimized = Optimize.execute(originalQuery.analyze)
+comparePlans(optimized, correctAnswer)
+  }
--- End diff --

I had a few more test cases when i tried to this. Can you see if any of 
them should be added?


https://github.com/nongli/spark/commit/ea0edd46e080cd0a1c6a1d41374563c149a030f7

We should also have outer join tests to make sure they don't add the is not 
null filter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

2016-03-03 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11372#discussion_r54990320
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+class NullFilteringSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches = Batch("NullFiltering", Once, NullFiltering) ::
+  Batch("CombineFilters", Once, CombineFilters) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
+
+  test("filter: filter out nulls in condition") {
+val originalQuery = testRelation.where('a === 1)
+val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 
1).analyze
--- End diff --

you haven't done anything with a === 1 right? There's still no logic that a 
=== 1 has a not nullable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11511#issuecomment-192105598
  
**[Test build #52447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52447/consoleFull)**
 for PR 11511 at commit 
[`dfb33ec`](https://github.com/apache/spark/commit/dfb33ecd27bb65903dd4a0a2cd6bfcd0d8d912c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...

2016-03-03 Thread sameeragarwal

GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/11511

[SPARK-13668][SQL] Reorder filter/join predicates to short-circuit 
isNotNull checks

## What changes were proposed in this pull request?

If a filter predicate or a join condition consists of `IsNotNull` checks, 
we should reorder these checks such that these non-nullability checks are 
evaluated before the rest of the predicates.

For e.g., if a filter predicate is of the form `a > 5 && isNotNull(b)`, we 
should rewrite this as `isNotNull(b) && a > 5` during physical plan generation.

## How was this patch tested?

new unit tests that verify the physical plan for both filters and joins in 
`ReorderedPredicateSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark reorder-isnotnull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11511


commit 9341da6bc45c868d14c4f5d4c020e40a6b5ba593
Author: Sameer Agarwal 
Date:   2016-03-02T23:57:57Z

Reorder conditions in join and filters

commit dfb33ecd27bb65903dd4a0a2cd6bfcd0d8d912c3
Author: Sameer Agarwal 
Date:   2016-03-03T01:20:18Z

unit tests: ReorderedPredicateSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

2016-03-03 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11372#discussion_r54990183
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 /**
+ * Attempts to eliminate reading (unnecessary) NULL values if they are not 
required for correctness
+ * by inserting isNotNull filters is the query plan. These filters are 
currently inserted beneath
+ * existing Filters and Join operators and are inferred based on their 
data constraints.
+ *
+ * Note: While this optimization is applicable to all types of join, it 
primarily benefits Inner and
+ * LeftSemi joins.
+ */
+object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case filter @ Filter(condition, child: LogicalPlan) =>
+  // We generate a list of additional isNotNull filters from the 
operator's existing constraints
+  // but remove those that are either already part of the filter 
condition or are part of the
+  // operator's child constraints.
+  val newIsNotNullConstraints = 
filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
+(child.constraints ++ splitConjunctivePredicates(condition))
+  val newCondition = if (newIsNotNullConstraints.nonEmpty) {
+And(newIsNotNullConstraints.reduce(And), condition)
+  } else {
+condition
+  }
+  Filter(newCondition, child)
+
+case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: 
JoinType,
+  condition: Option[Expression]) =>
+val leftIsNotNullConstraints = join.constraints
+.filter(_.isInstanceOf[IsNotNull])
+.filter(_.references.subsetOf(left.outputSet)) -- left.constraints
+  val rightIsNotNullConstraints =
+join.constraints
+  .filter(_.isInstanceOf[IsNotNull])
+  .filter(_.references.subsetOf(right.outputSet)) -- 
right.constraints
+  val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
+Filter(leftIsNotNullConstraints.reduce(And), left)
+  } else {
+left
+  }
+  val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
+Filter(rightIsNotNullConstraints.reduce(And), right)
+  } else {
+right
+  }
+  Join(newLeftChild, newRightChild, joinType, condition)
--- End diff --

same here, would be nice to reuse `join` if it is not changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

2016-03-03 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11372#discussion_r54990168
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 /**
+ * Attempts to eliminate reading (unnecessary) NULL values if they are not 
required for correctness
+ * by inserting isNotNull filters is the query plan. These filters are 
currently inserted beneath
+ * existing Filters and Join operators and are inferred based on their 
data constraints.
+ *
+ * Note: While this optimization is applicable to all types of join, it 
primarily benefits Inner and
+ * LeftSemi joins.
+ */
+object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case filter @ Filter(condition, child: LogicalPlan) =>
+  // We generate a list of additional isNotNull filters from the 
operator's existing constraints
+  // but remove those that are either already part of the filter 
condition or are part of the
+  // operator's child constraints.
+  val newIsNotNullConstraints = 
filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
+(child.constraints ++ splitConjunctivePredicates(condition))
+  val newCondition = if (newIsNotNullConstraints.nonEmpty) {
+And(newIsNotNullConstraints.reduce(And), condition)
+  } else {
+condition
+  }
+  Filter(newCondition, child)
+
+case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: 
JoinType,
+  condition: Option[Expression]) =>
+val leftIsNotNullConstraints = join.constraints
--- End diff --

indenting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

2016-03-03 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11372#discussion_r54989945
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 /**
+ * Attempts to eliminate reading (unnecessary) NULL values if they are not 
required for correctness
+ * by inserting isNotNull filters is the query plan. These filters are 
currently inserted beneath
+ * existing Filters and Join operators and are inferred based on their 
data constraints.
+ *
+ * Note: While this optimization is applicable to all types of join, it 
primarily benefits Inner and
+ * LeftSemi joins.
+ */
+object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case filter @ Filter(condition, child: LogicalPlan) =>
+  // We generate a list of additional isNotNull filters from the 
operator's existing constraints
+  // but remove those that are either already part of the filter 
condition or are part of the
+  // operator's child constraints.
+  val newIsNotNullConstraints = 
filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
+(child.constraints ++ splitConjunctivePredicates(condition))
+  val newCondition = if (newIsNotNullConstraints.nonEmpty) {
--- End diff --

remove newConditino and just return filter if this doesn't do anything so 
we can reuse that filter subplan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

2016-03-03 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11372#discussion_r54989882
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
 }
 
 /**
+ * Attempts to eliminate reading (unnecessary) NULL values if they are not 
required for correctness
+ * by inserting isNotNull filters is the query plan. These filters are 
currently inserted beneath
--- End diff --

"in the query plan"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11435#issuecomment-192099892
  
**[Test build #52446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52446/consoleFull)**
 for PR 11435 at commit 
[`f5f1e2b`](https://github.com/apache/spark/commit/f5f1e2be578ad40daafe25c6cc1b09bb4f8bb71a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...

2016-03-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11489


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...

2016-03-03 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/11489#issuecomment-192096905
  
thanks. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11499#issuecomment-192096385
  
**[Test build #52445 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52445/consoleFull)**
 for PR 11499 at commit 
[`7199237`](https://github.com/apache/spark/commit/71992375d2d3ad6e1b2db2769e21facb6c7cfe8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-03 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11499#issuecomment-192096096
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11505#issuecomment-192095362
  
**[Test build #52444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52444/consoleFull)**
 for PR 11505 at commit 
[`4f78803`](https://github.com/apache/spark/commit/4f7880340f9c05e54b0758a308493b3d8dced83d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11453#issuecomment-192094176
  
**[Test build #52443 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52443/consoleFull)**
 for PR 11453 at commit 
[`5fbc714`](https://github.com/apache/spark/commit/5fbc714e3273ff5aadd347b53cc3af2d693db153).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...

2016-03-03 Thread a1k0n

Github user a1k0n commented on the pull request:

https://github.com/apache/spark/pull/11505#issuecomment-192094056
  
rebasing to pick up flaky test fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13647][SQL] also check if numeric value...

2016-03-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11492


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13637][SQL] use more information to sim...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11485#issuecomment-192092754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52435/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13637][SQL] use more information to sim...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11485#issuecomment-192092752
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13637][SQL] use more information to sim...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11485#issuecomment-192092226
  
**[Test build #52435 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52435/consoleFull)**
 for PR 11485 at commit 
[`4f31c5c`](https://github.com/apache/spark/commit/4f31c5c8e1461a63a6e4ce9f74712b746ad098f4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13647][SQL] also check if numeric value...

2016-03-03 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11492#issuecomment-192092156
  
Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13647][SQL] also check if numeric value...

2016-03-03 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11492#issuecomment-192091999
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11489#issuecomment-192091588
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52434/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11489#issuecomment-192091584
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11489#issuecomment-192091121
  
**[Test build #52434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52434/consoleFull)**
 for PR 11489 at commit 
[`3ba7dc5`](https://github.com/apache/spark/commit/3ba7dc52e1980eef320faea07cc12eef7863a621).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11283#issuecomment-192090338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52433/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

2016-03-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11283#issuecomment-192090337
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

2016-03-03 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11283#issuecomment-192090048
  
**[Test build #52433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52433/consoleFull)**
 for PR 11283 at commit 
[`6f609fb`](https://github.com/apache/spark/commit/6f609fb2d844e2aaf4c809ef8c0fcd9e6eca38bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

2016-03-03 Thread yanboliang

Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/11486#issuecomment-192089154
  
It's a good question! It's possible that the label of input dataset is not 
0 based or not continuous. So we should use ```StringIndexer``` to index label 
in [0, numLabels), and after training we use ```IndexToString``` to map index 
label to the original ones. We have already store the label map in the metadata 
of label column.
All the models under ML package will follow this rule. For examples, if you 
train ```LogisticRegression``` with the input label ```"-1, +1"``` will produce 
erroneous results, you should use ```StringIndexer``` to transform labels to 
```"0, 1"``` firstly.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...

2016-03-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11457#discussion_r54987993
  
--- Diff: R/pkg/inst/tests/testthat/test_context.R ---
@@ -26,7 +26,7 @@ test_that("Check masked functions", {
   maskedBySparkR <- masked[funcSparkROrEmpty]
   namesOfMasked <- c("describe", "cov", "filter", "lag", "na.omit", 
"predict", "sd", "var",
  "colnames", "colnames<-", "intersect", "rank", 
"rbind", "sample", "subset",
- "summary", "transform", "drop")
+ "summary", "transform", "drop", "read.csv", 
"write.csv")
--- End diff --

@felixcheung Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 611 matches

Mail list logo