date:20170912

[GitHub] spark issue #19110: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-09-12 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19110
  
LGTM
Merging with master
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19202: [SPARK-21980][SQL]References in grouping function...

2017-09-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19202#discussion_r138408545
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,7 +314,7 @@ class Analyzer(
 s"grouping columns (${groupByExprs.mkString(",")})")
   }
 case e @ Grouping(col: Expression) =>
-  val idx = groupByExprs.indexOf(col)
+  val idx = groupByExprs.indexWhere(x => resolver(x.toString, 
col.toString))
--- End diff --

`indexWhere(_.semanticEquals(col))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-12 Thread smurching

Github user smurching commented on the issue:

https://github.com/apache/spark/pull/19106
  
@sethah I haven't heard of anybody hitting this issue in practice, but it 
did seem best to ensure that valid probability distributions would be produced 
regardless of input. There was some discussion of this in the JIRA: 
https://issues.apache.org/jira/browse/SPARK-21770


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints fr...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19201#discussion_r138401827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -106,91 +106,48 @@ trait QueryPlanConstraints { self: LogicalPlan =>
* Infers an additional set of constraints from a given set of equality 
constraints.
* For e.g., if an operator has constraints of the form (`a = 5`, `a = 
b`), this returns an
* additional constraint of the form `b = 5`.
-   *
-   * [SPARK-17733] We explicitly prevent producing recursive constraints 
of the form `a = f(a, b)`
-   * as they are often useless and can lead to a non-converging set of 
constraints.
*/
   private def inferAdditionalConstraints(constraints: Set[Expression]): 
Set[Expression] = {
-val constraintClasses = 
generateEquivalentConstraintClasses(constraints)
-
+val aliasedConstraints = 
eliminateAliasedExpressionInConstraints(constraints)
 var inferredConstraints = Set.empty[Expression]
-constraints.foreach {
+aliasedConstraints.foreach {
   case eq @ EqualTo(l: Attribute, r: Attribute) =>
-val candidateConstraints = constraints - eq
-inferredConstraints ++= candidateConstraints.map(_ transform {
-  case a: Attribute if a.semanticEquals(l) &&
-!isRecursiveDeduction(r, constraintClasses) => r
-})
-inferredConstraints ++= candidateConstraints.map(_ transform {
-  case a: Attribute if a.semanticEquals(r) &&
-!isRecursiveDeduction(l, constraintClasses) => l
-})
+val candidateConstraints = aliasedConstraints - eq
+inferredConstraints ++= replaceConstraints(candidateConstraints, 
l, r)
+inferredConstraints ++= replaceConstraints(candidateConstraints, 
r, l)
   case _ => // No inference
 }
 inferredConstraints -- constraints
   }
 
   /**
-   * Generate a sequence of expression sets from constraints, where each 
set stores an equivalence
-   * class of expressions. For example, Set(`a = b`, `b = c`, `e = f`) 
will generate the following
-   * expression sets: (Set(a, b, c), Set(e, f)). This will be used to 
search all expressions equal
-   * to an selected attribute.
+   * Replace the aliased expression in [[Alias]] with the alias name if 
both exist in constraints.
+   * Thus non-converging inference can be prevented.
+   * E.g. `a = f(a, b)`,  `a = f(b, c) && c = g(a, b)`.
--- End diff --

This example doesn't even have an alias...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints fr...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19201#discussion_r138384197
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -106,91 +106,48 @@ trait QueryPlanConstraints { self: LogicalPlan =>
* Infers an additional set of constraints from a given set of equality 
constraints.
* For e.g., if an operator has constraints of the form (`a = 5`, `a = 
b`), this returns an
* additional constraint of the form `b = 5`.
-   *
-   * [SPARK-17733] We explicitly prevent producing recursive constraints 
of the form `a = f(a, b)`
-   * as they are often useless and can lead to a non-converging set of 
constraints.
*/
   private def inferAdditionalConstraints(constraints: Set[Expression]): 
Set[Expression] = {
-val constraintClasses = 
generateEquivalentConstraintClasses(constraints)
-
+val aliasedConstraints = 
eliminateAliasedExpressionInConstraints(constraints)
 var inferredConstraints = Set.empty[Expression]
-constraints.foreach {
+aliasedConstraints.foreach {
   case eq @ EqualTo(l: Attribute, r: Attribute) =>
-val candidateConstraints = constraints - eq
-inferredConstraints ++= candidateConstraints.map(_ transform {
-  case a: Attribute if a.semanticEquals(l) &&
-!isRecursiveDeduction(r, constraintClasses) => r
-})
-inferredConstraints ++= candidateConstraints.map(_ transform {
-  case a: Attribute if a.semanticEquals(r) &&
-!isRecursiveDeduction(l, constraintClasses) => l
-})
+val candidateConstraints = aliasedConstraints - eq
+inferredConstraints ++= replaceConstraints(candidateConstraints, 
l, r)
+inferredConstraints ++= replaceConstraints(candidateConstraints, 
r, l)
   case _ => // No inference
 }
 inferredConstraints -- constraints
   }
 
   /**
-   * Generate a sequence of expression sets from constraints, where each 
set stores an equivalence
-   * class of expressions. For example, Set(`a = b`, `b = c`, `e = f`) 
will generate the following
-   * expression sets: (Set(a, b, c), Set(e, f)). This will be used to 
search all expressions equal
-   * to an selected attribute.
+   * Replace the aliased expression in [[Alias]] with the alias name if 
both exist in constraints.
+   * Thus non-converging inference can be prevented.
+   * E.g. `a = f(a, b)`,  `a = f(b, c) && c = g(a, b)`.
+   * Also, the size of constraints is reduced without losing any 
information.
+   * When the inferred filters are pushed down the operators that generate 
the alias,
+   * the alias names used in filters are replaced by the aliased 
expressions.
*/
-  private def generateEquivalentConstraintClasses(
-  constraints: Set[Expression]): Seq[Set[Expression]] = {
-var constraintClasses = Seq.empty[Set[Expression]]
-constraints.foreach {
-  case eq @ EqualTo(l: Attribute, r: Attribute) =>
-// Transform [[Alias]] to its child.
-val left = aliasMap.getOrElse(l, l)
-val right = aliasMap.getOrElse(r, r)
-// Get the expression set for an equivalence constraint class.
-val leftConstraintClass = getConstraintClass(left, 
constraintClasses)
-val rightConstraintClass = getConstraintClass(right, 
constraintClasses)
-if (leftConstraintClass.nonEmpty && rightConstraintClass.nonEmpty) 
{
-  // Combine the two sets.
-  constraintClasses = constraintClasses
-.diff(leftConstraintClass :: rightConstraintClass :: Nil) :+
-(leftConstraintClass ++ rightConstraintClass)
-} else if (leftConstraintClass.nonEmpty) { // && 
rightConstraintClass.isEmpty
-  // Update equivalence class of `left` expression.
-  constraintClasses = constraintClasses
-.diff(leftConstraintClass :: Nil) :+ (leftConstraintClass + 
right)
-} else if (rightConstraintClass.nonEmpty) { // && 
leftConstraintClass.isEmpty
-  // Update equivalence class of `right` expression.
-  constraintClasses = constraintClasses
-.diff(rightConstraintClass :: Nil) :+ (rightConstraintClass + 
left)
-} else { // leftConstraintClass.isEmpty && 
rightConstraintClass.isEmpty
-  // Create new equivalence constraint class since neither 
expression presents
-  // in any classes.
-  constraintClasses = constraintClasses :+ Set(left, right)
-}
-  case _ => // Skip
+  private def eliminateAliasedExpressionInConstraints(constraints: 
Set[Expression])
+: Set[Expression] = {
+val

[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...

2017-09-12 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18253
  
You commented on my code, not on the idea. My code was hacked together 
quickly, it can be cleaned up a lot. Your comments don't disprove that 
separating the refactoring of the listener bus hierarchy from the introduction 
of queues is impossible or undesirable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18253: [SPARK-18838][CORE] Introduce multiple queues in LiveLis...

2017-09-12 Thread bOOm-X

Github user bOOm-X commented on the issue:

https://github.com/apache/spark/pull/18253
  
@vanzin I pushed some comments on your code. I think that trying to keep 
the exact same class hierarchy leads to a very complex code, with many 
drawbacks. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19141: [SPARK-21384] [YARN] Spark + YARN fails with Loca...

2017-09-12 Thread devaraj-kavali

Github user devaraj-kavali commented on a diff in the pull request:

https://github.com/apache/spark/pull/19141#discussion_r138402807
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -565,7 +565,6 @@ private[spark] class Client(
   distribute(jarsArchive.toURI.getPath,
 resType = LocalResourceType.ARCHIVE,
 destName = Some(LOCALIZED_LIB_DIR))
-  jarsArchive.delete()
--- End diff --

Thanks @jerryshao for the comment.

> What if your scenario and SPARK-20741's scenario are both encountered? 
Looks like your approach above cannot be worked.

Can you provide some information why you think it doesn't work? If we 
delete the spark_libs.zip after completing the application(similar to staging 
dir deletion), it would not stack up till the process exit which solves 
SPARK-20741 and also becomes available during the execution for this current 
issue. 
> I'm wondering if we can copy or move this spark_libs.zip temp file to 
another non-temp file and add that file to the dist cache. That non-temp file 
will not be deleted and can be overwritten during another launching, so we will 
always have only one copy.

If there are multiple jobs submitted/running concurrently, we would be 
overwriting the existing with the latest spark_libs.zip which may lead to apps 
failure during the copy-in-progress and also would become ambiguous to delete 
the file by which application.

> Besides, I think we have several workarounds to handle this issue like 
spark.yarn.jars or spark.yarn.archive, so looks like this corner case is not so 
necessary to fix (just my thinking, normally people will not use local FS in a 
real cluster).

I agree, this is a corner case and can be handled with workaround.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81678/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19132
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19132
  
**[Test build #81678 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81678/testReport)**
 for PR 19132 at commit 
[`25fe22c`](https://github.com/apache/spark/commit/25fe22cddde276f846fd4808de1b575a87b1c059).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread pgandhi999

Github user pgandhi999 commented on the issue:

https://github.com/apache/spark/pull/19207
  
The error logs for test build #81683 state that method 
this(Long,Int,Int,Long,Long,Long,Long,Long,Long)Unit in class 
org.apache.spark.status.api.v1.ExecutorStageSummary does not have a 
correspondent in current version. All I have done is add new fields in the api 
ExecutorStageSummary and have not modified any existing ones. It should be fine 
but please let me know if it is not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19195
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19195
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81680/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19195
  
**[Test build #81680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81680/testReport)**
 for PR 19195 at commit 
[`bec41c8`](https://github.com/apache/spark/commit/bec41c8702b11654202b179769e291ac6bfa9894).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19106
  
**[Test build #81687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81687/testReport)**
 for PR 19106 at commit 
[`53891ed`](https://github.com/apache/spark/commit/53891ed5c16daebce40e37cc9109b71299a33aca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-12 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17862
  
Tested with several larger data set with Hinge Loss function, to compare 
l-bfgs and owlqn solvers.
Run until converged or exceed maxIter (2000).

dataset | numRecords | numFeatures | l-bfgs iterations | owlqn iterations | 
l-bfgs final loss | owlqn final loss
 | 
---|---|---|---|---|---
url_combined | 2396130 | 3231961 | 317 (952 sec) | 287 (1661 sec) | 
9.71E-5| 1.64E-4
kdda | 8407752 | 20216830 | 2000+ (29729 sec) | 288 13664 (sec) |  0.0068 | 
0.0135
webspam | 35 | 254 | 344 (67 sec) | 1502 (714 sec) | 0.18273 | 0.18273
SUSY | 500 | 18 | 152 (145 sec) | 1242 (3357 sec) |  0.499 | 0.499

l-bfgs does not always take fewer iterations, but it converges to a smaller 
final loss.
For each iteration, owlqn takes longer time ( 2 or 3 times) than l-bfgs. 
Logistic Regression also exhibits the similar behavior.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19208
  
**[Test build #81686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81686/testReport)**
 for PR 19208 at commit 
[`ae13440`](https://github.com/apache/spark/commit/ae13440fd2220e28b58df52836f55fe5ed77c43f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19201
  
LGTM except a minor comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints fr...

2017-09-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19201#discussion_r138393814
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -106,91 +106,48 @@ trait QueryPlanConstraints { self: LogicalPlan =>
* Infers an additional set of constraints from a given set of equality 
constraints.
* For e.g., if an operator has constraints of the form (`a = 5`, `a = 
b`), this returns an
* additional constraint of the form `b = 5`.
-   *
-   * [SPARK-17733] We explicitly prevent producing recursive constraints 
of the form `a = f(a, b)`
-   * as they are often useless and can lead to a non-converging set of 
constraints.
*/
   private def inferAdditionalConstraints(constraints: Set[Expression]): 
Set[Expression] = {
-val constraintClasses = 
generateEquivalentConstraintClasses(constraints)
-
+val aliasedConstraints = 
eliminateAliasedExpressionInConstraints(constraints)
 var inferredConstraints = Set.empty[Expression]
-constraints.foreach {
+aliasedConstraints.foreach {
   case eq @ EqualTo(l: Attribute, r: Attribute) =>
-val candidateConstraints = constraints - eq
-inferredConstraints ++= candidateConstraints.map(_ transform {
-  case a: Attribute if a.semanticEquals(l) &&
-!isRecursiveDeduction(r, constraintClasses) => r
-})
-inferredConstraints ++= candidateConstraints.map(_ transform {
-  case a: Attribute if a.semanticEquals(r) &&
-!isRecursiveDeduction(l, constraintClasses) => l
-})
+val candidateConstraints = aliasedConstraints - eq
+inferredConstraints ++= replaceConstraints(candidateConstraints, 
l, r)
+inferredConstraints ++= replaceConstraints(candidateConstraints, 
r, l)
   case _ => // No inference
 }
 inferredConstraints -- constraints
   }
 
   /**
-   * Generate a sequence of expression sets from constraints, where each 
set stores an equivalence
-   * class of expressions. For example, Set(`a = b`, `b = c`, `e = f`) 
will generate the following
-   * expression sets: (Set(a, b, c), Set(e, f)). This will be used to 
search all expressions equal
-   * to an selected attribute.
+   * Replace the aliased expression in [[Alias]] with the alias name if 
both exist in constraints.
+   * Thus non-converging inference can be prevented.
+   * E.g. `a = f(a, b)`,  `a = f(b, c) && c = g(a, b)`.
+   * Also, the size of constraints is reduced without losing any 
information.
+   * When the inferred filters are pushed down the operators that generate 
the alias,
+   * the alias names used in filters are replaced by the aliased 
expressions.
*/
-  private def generateEquivalentConstraintClasses(
-  constraints: Set[Expression]): Seq[Set[Expression]] = {
-var constraintClasses = Seq.empty[Set[Expression]]
-constraints.foreach {
-  case eq @ EqualTo(l: Attribute, r: Attribute) =>
-// Transform [[Alias]] to its child.
-val left = aliasMap.getOrElse(l, l)
-val right = aliasMap.getOrElse(r, r)
-// Get the expression set for an equivalence constraint class.
-val leftConstraintClass = getConstraintClass(left, 
constraintClasses)
-val rightConstraintClass = getConstraintClass(right, 
constraintClasses)
-if (leftConstraintClass.nonEmpty && rightConstraintClass.nonEmpty) 
{
-  // Combine the two sets.
-  constraintClasses = constraintClasses
-.diff(leftConstraintClass :: rightConstraintClass :: Nil) :+
-(leftConstraintClass ++ rightConstraintClass)
-} else if (leftConstraintClass.nonEmpty) { // && 
rightConstraintClass.isEmpty
-  // Update equivalence class of `left` expression.
-  constraintClasses = constraintClasses
-.diff(leftConstraintClass :: Nil) :+ (leftConstraintClass + 
right)
-} else if (rightConstraintClass.nonEmpty) { // && 
leftConstraintClass.isEmpty
-  // Update equivalence class of `right` expression.
-  constraintClasses = constraintClasses
-.diff(rightConstraintClass :: Nil) :+ (rightConstraintClass + 
left)
-} else { // leftConstraintClass.isEmpty && 
rightConstraintClass.isEmpty
-  // Create new equivalence constraint class since neither 
expression presents
-  // in any classes.
-  constraintClasses = constraintClasses :+ Set(left, right)
-}
-  case _ => // Skip
+  private def eliminateAliasedExpressionInConstraints(constraints: 
Set[Expression])
+: Set[Expression] = {
+val

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19208#discussion_r138391134
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala ---
@@ -150,20 +150,14 @@ private[ml] object ValidatorParams {
   }.toSeq
 ))
 
-val validatorSpecificParams = instance match {
-  case cv: CrossValidatorParams =>
-List("numFolds" -> parse(cv.numFolds.jsonEncode(cv.getNumFolds)))
-  case tvs: TrainValidationSplitParams =>
-List("trainRatio" -> 
parse(tvs.trainRatio.jsonEncode(tvs.getTrainRatio)))
-  case _ =>
-// This should not happen.
-throw new NotImplementedError("ValidatorParams.saveImpl does not 
handle type: " +
-  instance.getClass.getCanonicalName)
-}
-
-val jsonParams = validatorSpecificParams ++ List(
-  "estimatorParamMaps" -> parse(estimatorParamMapsJson),
-  "seed" -> parse(instance.seed.jsonEncode(instance.getSeed)))
+val params = instance.extractParamMap().toSeq
+val skipParams = List("estimator", "evaluator", "estimatorParamMaps")
+val jsonParams = render(params
+  .filter { case ParamPair(p, v) => !skipParams.contains(p.name)}
+  .map { case ParamPair(p, v) =>
+p.name -> parse(p.jsonEncode(v))
+  }.toList ++ List("estimatorParamMaps" -> 
parse(estimatorParamMapsJson))
+)
--- End diff --

Improve code here. So that we don't need to add code for each parameter. 
Now we have 3 new added parameter: (parallelism, collectSubModels, 
persistSubModelPath), all added only in CV/TVS estimator. The old code here is 
easy to cause bugs if we forgot to update it when we add new params.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19208#discussion_r138393318
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -212,14 +238,12 @@ object CrossValidator extends 
MLReadable[CrossValidator] {
 
   val (metadata, estimator, evaluator, estimatorParamMaps) =
 ValidatorParams.loadImpl(path, sc, className)
-  val numFolds = (metadata.params \ "numFolds").extract[Int]
-  val seed = (metadata.params \ "seed").extract[Long]
-  new CrossValidator(metadata.uid)
+  val cv = new CrossValidator(metadata.uid)
 .setEstimator(estimator)
 .setEvaluator(evaluator)
 .setEstimatorParamMaps(estimatorParamMaps)
-.setNumFolds(numFolds)
-.setSeed(seed)
+  DefaultParamsReader.getAndSetParams(cv, metadata, skipParams = 
List("estimatorParamMaps"))
--- End diff --

Use `getAndSetParams` instead of setting all params manually. This simplify 
code, and it can keep read/write compatibility.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19208#discussion_r138389265
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -261,17 +290,40 @@ class CrossValidatorModel private[ml] (
 val copied = new CrossValidatorModel(
   uid,
   bestModel.copy(extra).asInstanceOf[Model[_]],
-  avgMetrics.clone())
+  avgMetrics.clone(),
+  CrossValidatorModel.copySubModels(subModels))
 copyValues(copied, extra).setParent(parent)
   }
 
   @Since("1.6.0")
   override def write: MLWriter = new 
CrossValidatorModel.CrossValidatorModelWriter(this)
+
+  @Since("2.3.0")
+  @throws[IOException]("If the input path already exists but overwrite is 
not enabled.")
+  def save(path: String, persistSubModels: Boolean): Unit = {
+write.asInstanceOf[CrossValidatorModel.CrossValidatorModelWriter]
+  .persistSubModels(persistSubModels).save(path)
+  }
--- End diff --

I add this method because the `CrossValidatorModelWriter` is private. User 
cannot use it. But I don't know whether there is better solution.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2017-09-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16422
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2017-09-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16422
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19205: [SPARK-21982] Set locale to US

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19205
  
**[Test build #3918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3918/testReport)**
 for PR 19205 at commit 
[`22bbb92`](https://github.com/apache/spark/commit/22bbb924eae20b8d3f899008317f5d623c6a49ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19208
  
cc @jkbradley 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/18313
  
@hhbyyh I apologize to you that your PR is valuable (in the case model list 
is very big).
But now your PR is stale and I integrate it into my new PR #19208 
Would you mind to take a look?
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19208
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19208
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81685/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19208
  
**[Test build #81685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81685/testReport)**
 for PR 19208 at commit 
[`46d3ab3`](https://github.com/apache/spark/commit/46d3ab3899c196311368b3383338b3d4e6d5aeaa).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16774: [SPARK-19357][ML] Adding parallel model evaluation in ML...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/16774
  
@BryanCutler @MLnick I found a bug in this PR: after save estimator (CV or 
TVS) and then load again, the "Parallelism" setting will be lost. But I fix 
this in #19208 by the way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19208
  
**[Test build #81685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81685/testReport)**
 for PR 19208 at commit 
[`46d3ab3`](https://github.com/apache/spark/commit/46d3ab3899c196311368b3383338b3d4e6d5aeaa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19208: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-09-12 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/19208

[SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all 
models after fitting: Scala

## What changes were proposed in this pull request?

1. We add a parameter whether to collect the full model list when 
CrossValidator/TrainValidationSplit training (Default is NOT, avoid the change 
cause OOM)

- Add a method in CrossValidatorModel/TrainValidationSplitModel, allow user 
to get the model list

- CrossValidatorModelWriter add a âoptionâ, allow user to control 
whether to persist the model list to disk.

- Note: when persisting the model list, use indices as the sub-model path

2. We add a parameter indicating whether to persist models to disk during 
training (default = off).  

- This will use ML persistence to dump models to a directory so they are 
available later but do not consume memory.

- Note: when persisting the model list, use indices as the sub-model path


## How was this patch tested?

Test cases added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark expose-model-list

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19208


commit 46d3ab3899c196311368b3383338b3d4e6d5aeaa
Author: WeichenXu 
Date:   2017-09-11T13:28:53Z

init pr




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19175
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19175
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81676/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15544
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19175: [SPARK-21964][SQL]Enable splitting the Aggregate (on Exp...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19175
  
**[Test build #81676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81676/testReport)**
 for PR 19175 at commit 
[`709c2d3`](https://github.com/apache/spark/commit/709c2d3d81e331d6f69d8ed7ecdabe035142d296).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15544
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81677/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15544
  
**[Test build #81677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81677/testReport)**
 for PR 15544 at commit 
[`cd61382`](https://github.com/apache/spark/commit/cd61382aa7f5ef54059edead709da6b818267801).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19207
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19207
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81683/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19207
  
**[Test build #81683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81683/testReport)**
 for PR 19207 at commit 
[`20e04fa`](https://github.com/apache/spark/commit/20e04fa5e45556b7945203e332e6c4bb2f719e3a).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread goldmedal

Github user goldmedal commented on the issue:

https://github.com/apache/spark/pull/18875
  
@HyukjinKwon  @viirya 
Sorry for updating this PR so late. Please take a look when you are 
available. Thanks :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81684/testReport)**
 for PR 18875 at commit 
[`bddf283`](https://github.com/apache/spark/commit/bddf2838868b2b676ae9eb3c595b53f56de07468).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19207
  
**[Test build #81683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81683/testReport)**
 for PR 19207 at commit 
[`20e04fa`](https://github.com/apache/spark/commit/20e04fa5e45556b7945203e332e6c4bb2f719e3a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18592
  
**[Test build #81682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81682/testReport)**
 for PR 18592 at commit 
[`d2d22d4`](https://github.com/apache/spark/commit/d2d22d4502b8d1bc3ff6c0af207a2b64bc1bb5f6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19203: [BUILD] Close stale PRs

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19203
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81675/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19203: [BUILD] Close stale PRs

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19203
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19203: [BUILD] Close stale PRs

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19203
  
**[Test build #81675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81675/testReport)**
 for PR 19203 at commit 
[`6386e0c`](https://github.com/apache/spark/commit/6386e0c6ef027d2858d0860c6f9dd472e8ede6aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19207
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81681/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19207
  
**[Test build #81681 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81681/testReport)**
 for PR 19207 at commit 
[`d95d69b`](https://github.com/apache/spark/commit/d95d69b110f27e409b9185d694cde13a472762c2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19207
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19207
  
**[Test build #81681 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81681/testReport)**
 for PR 19207 at commit 
[`d95d69b`](https://github.com/apache/spark/commit/d95d69b110f27e409b9185d694cde13a472762c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19181: [SPARK-21907][CORE] oom during spill

2017-09-12 Thread eyalfa

Github user eyalfa commented on a diff in the pull request:

https://github.com/apache/spark/pull/19181#discussion_r138373142
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java
 ---
@@ -170,6 +170,10 @@ public void free() {
   public void reset() {
 if (consumer != null) {
   consumer.freeArray(array);
+  array = LongArray.empty;
--- End diff --

@hvanhovell ,
I'm starting to have second thoughts about the special `empty` instance 
here, I'm afraid the the nested call might trigger `freeArray` or something 
similar on it.
perhaps using null is a better option here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/19207
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19207: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19207
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19207: [SPARK-21809] : Change Stage Page to use datatabl...

2017-09-12 Thread pgandhi999

GitHub user pgandhi999 opened a pull request:

https://github.com/apache/spark/pull/19207

[SPARK-21809] : Change Stage Page to use datatables to support sorting 
columns and searching

Support column sort and search for Stage Server using jQuery DataTable and 
REST API. Before this commit, the Stage page was generated hard-coded HTML and 
can not support search, also, the sorting was disabled if there is any 
application that has more than one attempt. Supporting search and sort (over 
all applications rather than the 20 entries in the current page) in any case 
will greatly improve the user experience.
Created the stagespage-template.html for displaying application information 
in datables. Added REST api endpoint and javascript code to fetch data from the 
endpoint and display it on the data table.

## How was this patch tested?
I have attached the screenshots of the Stage Page UI before and after the 
fix.
Before:
https://user-images.githubusercontent.com/8190/30331985-d773d9ac-979e-11e7-8920-5d11fdf8766a.png;>

After:
https://user-images.githubusercontent.com/8190/30331998-dd22a1d0-979e-11e7-860e-3694e45cd782.png;>



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pgandhi999/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19207.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19207


commit 172fc20898896058b7288360eb5292ed9df9d79c
Author: pgandhi 
Date:   2017-07-21T21:00:22Z

[SPARK-21503]: Fixed the issue

Added the case ExecutorLostFailure which was previously not there, thus, 
the default case would be executed in which case, task would be marked as 
completed.

commit 81422e0f634c0f06eb2ea29fba4281176a1ab528
Author: pgandhi 
Date:   2017-07-25T14:54:41Z

[SPARK-21503][UI]: Adding changes as per comments

commit 55c6c37d09b41ae6914edb5d067e7f2c252ac92a
Author: pgandhi999 
Date:   2017-07-26T21:26:27Z

Merge pull request #1 from apache/master

Apache Spark Pull Request - July 26, 2017

commit f454c8933e07967548095e068063bd313ae4845c
Author: pgandhi 
Date:   2017-07-26T21:41:16Z

[SPARK-21541]: Spark Logs show incorrect job status for a job that does not 
create SparkContext

Added a flag to check whether user has initialized Spark Context. If it is 
true, then we let Application Master unregister with Resource Manager else we 
do not.

commit 6b7d5c6e2565c7c4dd97f31fe404c59e73c7474c
Author: pgandhi 
Date:   2017-07-26T21:58:27Z

Revert "[SPARK-21541]: Spark Logs show incorrect job status for a job that 
does not create SparkContext"

This reverts commit f454c8933e07967548095e068063bd313ae4845c.

"Merged another issue to this one by mistake"

commit bc4166490d2ff68898c00fae4c1ca1b8abe1e795
Author: pgandhi999 
Date:   2017-07-28T15:24:55Z

Merge pull request #2 from apache/master

Spark - July 28, 2017

commit e46126fe0f3d8d6f92f7f51c30d8c2154bddc126
Author: pgandhi 
Date:   2017-07-28T16:08:08Z

[SPARK-21503]- Making Changes as per comments

[SPARK-21503]- Making Changes as per comments: Removed match case statement 
and replaced it with an if clause.

commit 9b3cebc6b65d2da835f02efaa27015cfd1b0ccae
Author: pgandhi999 
Date:   2017-08-01T13:58:12Z

Merge pull request #4 from apache/master

Spark - August 1, 2017

commit 7f03341093c843086920e8218463b5d2ba6e37d2
Author: pgandhi 
Date:   2017-08-01T15:52:13Z

[SPARK-21503]: Reverting Unit Test Code

[SPARK-21503]: Reverting Unit Test Code - Not needed.

commit 2d01cab45ae269db9044815970dd008c851a46cc
Author: pgandhi999 
Date:   2017-08-24T21:59:52Z

Merge pull request #5 from apache/master

SPARK - August 24, 2017

commit eaf63e6bd4dddc726cf57fda080b9b5d6341e2f8
Author: pgandhi 
Date:   2017-08-24T22:03:29Z

[SPARK-21798]: No config to replace deprecated SPARK_CLASSPATH config for 
launching daemons like History Server

Adding new env variable SPARK_DAEMON_CLASSPATH to set classpath for 
launching daemons. Tested and verified for History Server and Standalone Mode.

commit e421a03acbd410a835cf3117fe6592523dc649b5
Author: pgandhi 
Date:   2017-08-25T16:13:47Z

[SPARK-21798]: No config to replace deprecated SPARK_CLASSPATH config for 
launching daemons like History Server

Reverted the previous code change and added the environment variable 
SPARK_DAEMON_CLASSPATH only for launching daemon processes.

commit

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138366512
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.Strategy
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.Filter
+import 
org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, 
ColumnPruningSupport, FilterPushDownSupport}
+
+object DataSourceV2Strategy extends Strategy {
+  // TODO: write path
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+case PhysicalOperation(projects, filters, DataSourceV2Relation(output, 
reader)) =>
+  val attrMap = AttributeMap(output.zip(output))
+
+  val projectSet = AttributeSet(projects.flatMap(_.references))
+  val filterSet = AttributeSet(filters.flatMap(_.references))
+
+  // Match original case of attributes.
+  // TODO: nested fields pruning
+  val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap)
+  reader match {
+case r: ColumnPruningSupport =>
+  r.pruneColumns(requiredColumns.toStructType)
+case _ =>
+  }
+
+  val stayUpFilters: Seq[Expression] = reader match {
+case r: CatalystFilterPushDownSupport =>
+  r.pushCatalystFilters(filters.toArray)
+
+case r: FilterPushDownSupport =>
--- End diff --

By doing so, do we still need to match both `CatalystFilterPushDownSupport` 
and `FilterPushDownSupport` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18659
  
(I am sorry, I didn't realise this PR was open already ..)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18704#discussion_r138364852
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala
 ---
@@ -149,4 +153,23 @@ private[columnar] object ColumnAccessor {
 throw new Exception(s"not support type: $other")
 }
   }
+
+  def decompress(columnAccessor: ColumnAccessor, columnVector: 
WritableColumnVector, numRows: Int):
+  Unit = {
+if (columnAccessor.isInstanceOf[NativeColumnAccessor[_]]) {
+  val nativeAccessor = 
columnAccessor.asInstanceOf[NativeColumnAccessor[_]]
+  nativeAccessor.decompress(columnVector, numRows)
+} else {
+  val dataBuffer = 
columnAccessor.asInstanceOf[BasicColumnAccessor[_]].getByteBuffer
+  val nullsBuffer = 
dataBuffer.duplicate().order(ByteOrder.nativeOrder())
+  nullsBuffer.rewind()
+
+  val numNulls = ByteBufferHelper.getInt(nullsBuffer)
+  for (i <- 0 until numNulls) {
+val cordinal = ByteBufferHelper.getInt(nullsBuffer)
--- End diff --

typo? `ordinal`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18704#discussion_r138363787
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 ---
@@ -147,6 +147,11 @@ private void throwUnsupportedException(int 
requiredCapacity, Throwable cause) {
   public abstract void putShorts(int rowId, int count, short[] src, int 
srcIndex);
 
   /**
+   * Sets values from [rowId, rowId + count) to [src[srcIndex], 
src[srcIndex + count])
--- End diff --

This description is a little vague, as the input data is `byte[]`. Can we 
say more about this? e.g. endianness.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18704#discussion_r138366156
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala
 ---
@@ -61,6 +63,162 @@ private[columnar] case object PassThrough extends 
CompressionScheme {
 }
 
 override def hasNext: Boolean = buffer.hasRemaining
+
+override def decompress(columnVector: WritableColumnVector, capacity: 
Int): Unit = {
+  val nullsBuffer = buffer.duplicate().order(ByteOrder.nativeOrder())
+  nullsBuffer.rewind()
+  val nullCount = ByteBufferHelper.getInt(nullsBuffer)
+  var nextNullIndex = if (nullCount > 0) 
ByteBufferHelper.getInt(nullsBuffer) else capacity
+  var pos = 0
+  var seenNulls = 0
+  val srcArray = buffer.array
+  var bufferPos = buffer.position
+  columnType.dataType match {
+case _: BooleanType =>
+  val unitSize = 1
+  while (pos < capacity) {
+if (pos != nextNullIndex) {
+  val len = nextNullIndex - pos
+  assert(len * unitSize < Int.MaxValue)
+  for (i <- 0 until len) {
+val value = buffer.get(bufferPos + i) != 0
+columnVector.putBoolean(pos + i, value)
+  }
+  bufferPos += len
+  pos += len
+} else {
+  seenNulls += 1
+  nextNullIndex = if (seenNulls < nullCount) {
+ByteBufferHelper.getInt(nullsBuffer)
+  } else {
+capacity
+  }
+  columnVector.putNull(pos)
+  pos += 1
+}
+  }
+case _: ByteType =>
--- End diff --

hmmm, is there any way to reduce the code duplication? maybe codegen?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18704#discussion_r138365192
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala
 ---
@@ -149,4 +153,23 @@ private[columnar] object ColumnAccessor {
 throw new Exception(s"not support type: $other")
 }
   }
+
+  def decompress(columnAccessor: ColumnAccessor, columnVector: 
WritableColumnVector, numRows: Int):
+  Unit = {
+if (columnAccessor.isInstanceOf[NativeColumnAccessor[_]]) {
+  val nativeAccessor = 
columnAccessor.asInstanceOf[NativeColumnAccessor[_]]
+  nativeAccessor.decompress(columnVector, numRows)
+} else {
+  val dataBuffer = 
columnAccessor.asInstanceOf[BasicColumnAccessor[_]].getByteBuffer
+  val nullsBuffer = 
dataBuffer.duplicate().order(ByteOrder.nativeOrder())
+  nullsBuffer.rewind()
+
+  val numNulls = ByteBufferHelper.getInt(nullsBuffer)
+  for (i <- 0 until numNulls) {
+val cordinal = ByteBufferHelper.getInt(nullsBuffer)
+columnVector.putNull(cordinal)
+  }
+  throw new RuntimeException("Not support non-primitive type now")
--- End diff --

If we need to throw exception at last, why not do it at the beginning?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18704#discussion_r138363222
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/columnar/ColumnDictionary.java
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.columnar;
+
+import org.apache.spark.sql.execution.vectorized.Dictionary;
+
+public final class ColumnDictionary implements Dictionary {
+  private Object[] dictionary;
+
+  public ColumnDictionary(Object[] dictionary) {
+this.dictionary = dictionary;
+  }
+
+  @Override
+  public int decodeToInt(int id) {
+return (Integer)dictionary[id];
--- End diff --

is it possible to avoid boxing here? e.g. we can have a lot of primitive 
array members.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19134: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behin...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19134
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19134: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behin...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19134
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81674/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19134: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behin...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19134
  
**[Test build #81674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81674/testReport)**
 for PR 19134 at commit 
[`d888f7b`](https://github.com/apache/spark/commit/d888f7b4b457d537c6875de31cbd77f5460c7d3b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19195


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19195
  
Merged to master/2.2, and now that I look again here, realize the last 
tests technically didn't pass. As it's a doc change only that passed before, I 
can't see it will fail, but iwll keep an eye out. Oops.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138357726
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/StatisticsSupport.java
 ---
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.upward;
+
+/**
+ * A mix in interface for `DataSourceV2Reader`. Users can implement this 
interface to report
+ * statistics to Spark.
+ */
+public interface StatisticsSupport {
+  Statistics getStatistics();
--- End diff --

It should, but we need some refactor on optimizer, see 
https://github.com/apache/spark/pull/19136#discussion_r137023744


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138357442
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/CatalystFilterPushDownSupport.java
 ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.downward;
+
+import org.apache.spark.annotation.Experimental;
+import org.apache.spark.annotation.InterfaceStability;
+import org.apache.spark.sql.catalyst.expressions.Expression;
+
+/**
+ * A mix-in interface for `DataSourceV2Reader`. Users can implement this 
interface to push down
+ * arbitrary expressions as predicates to the data source.
+ */
+@Experimental
+@InterfaceStability.Unstable
+public interface CatalystFilterPushDownSupport {
+
+  /**
+   * Push down filters, returns unsupported filters.
+   */
+  Expression[] pushCatalystFilters(Expression[] filters);
--- End diff --

java list is not friendly to scala implementations :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread jmchung

Github user jmchung commented on the issue:

https://github.com/apache/spark/pull/19199
  
Thanks @HyukjinKwon and @viirya :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r138355462
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.Strategy
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.{FilterExec, ProjectExec, SparkPlan}
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+import org.apache.spark.sql.sources.Filter
+import 
org.apache.spark.sql.sources.v2.reader.downward.{CatalystFilterPushDownSupport, 
ColumnPruningSupport, FilterPushDownSupport}
+
+object DataSourceV2Strategy extends Strategy {
+  // TODO: write path
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+case PhysicalOperation(projects, filters, DataSourceV2Relation(output, 
reader)) =>
+  val attrMap = AttributeMap(output.zip(output))
+
+  val projectSet = AttributeSet(projects.flatMap(_.references))
+  val filterSet = AttributeSet(filters.flatMap(_.references))
+
+  // Match original case of attributes.
+  // TODO: nested fields pruning
+  val requiredColumns = (projectSet ++ filterSet).toSeq.map(attrMap)
+  reader match {
+case r: ColumnPruningSupport =>
+  r.pruneColumns(requiredColumns.toStructType)
+case _ =>
+  }
+
+  val stayUpFilters: Seq[Expression] = reader match {
+case r: CatalystFilterPushDownSupport =>
+  r.pushCatalystFilters(filters.toArray)
+
+case r: FilterPushDownSupport =>
--- End diff --

good idea!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are ...

2017-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19199


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19199
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19195
  
**[Test build #81680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81680/testReport)**
 for PR 19195 at commit 
[`bec41c8`](https://github.com/apache/spark/commit/bec41c8702b11654202b179769e291ac6bfa9894).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19206: Client and ApplicationMaster resolvePath is inappropriat...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19206
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19195: [DOCS] Fix unreachable links in the document

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19195
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19199
  
LGTM too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19181
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18592
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81679/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18592
  
**[Test build #81679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81679/testReport)**
 for PR 18592 at commit 
[`06e306f`](https://github.com/apache/spark/commit/06e306fdb4199a8c7850a6a370ce67aeac0cdf8e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TPCDSQueryBenchmarkArguments(val args: Array[String]) `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19206: Client and ApplicationMaster resolvePath is inapp...

2017-09-12 Thread Chaos-Ju

GitHub user Chaos-Ju opened a pull request:

https://github.com/apache/spark/pull/19206

Client and ApplicationMaster resolvePath is inappropriate when use viewfs 

## What changes were proposed in this pull request?
When HDFS use viewfs and spark construct Executor's and ApplicationMaster's 
 localResource  Map ( the list of localized files ) ,can't covert viewfs:// 
path to the real hdfs:// path . Therefore , when NodeManager download the local 
Resource, will throw java.io.IOException: ViewFs: Cannot initialize: Empty 
Mount table in config for viewfs://clusterName/ 

Exception stackï¼

java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config 
for viewfs://ns-view/ 
at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:337) 
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:167) 
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:167) 
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) 
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) 
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) 
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) 
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) 
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) 
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251) 
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) 
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) 
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1700)
 
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) 
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:748) 
Failing this attempt. Failing the application

## How was this patch tested?
manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Chaos-Ju/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19206.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19206


commit f1fff009d32b8f7d1d2b24734e4d677c6264ec90
Author: Chaos-Ju 
Date:   2017-09-12T12:45:36Z

fix spark support viewfs




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18592
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19190: [SPARK-21976][DOC] Fix wrong documentation for Mean Abso...

2017-09-12 Thread FavioVazquez

Github user FavioVazquez commented on the issue:

https://github.com/apache/spark/pull/19190
  
Thanks to Carlos Munguia, Jared Romero and Christhian Flores :). 
@montactuaria @jared275 @chris122flores


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19181
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81672/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19181
  
**[Test build #81672 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81672/testReport)**
 for PR 19181 at commit 
[`ae7fbc4`](https://github.com/apache/spark/commit/ae7fbc48b349f5608aaef9f66e9e692354b72d18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18592: [SPARK-21368][SQL] TPCDSQueryBenchmark can't refer query...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18592
  
**[Test build #81679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81679/testReport)**
 for PR 18592 at commit 
[`06e306f`](https://github.com/apache/spark/commit/06e306fdb4199a8c7850a6a370ce67aeac0cdf8e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19203: [BUILD] Close stale PRs

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19203
  
@srowen, it looks `19091` is missed. The rest of mine is a subset of the 
current list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19199
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19132
  
Thanks @HyukjinKwon , I will ping Josh about this thing ð .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19201
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19132
  
@jerryshao, for triggering tests on Jenkins, I think this should be added 
by its admin manually as well if I understood correctly. In my case, I asked 
this to Josh Rosen before via email privately. I am quite sure you are facing 
the same issue I (and Holden, Felix and Takuya) met before if I understood 
correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19201
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81671/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19201: [SPARK-21979][SQL]Improve QueryPlanConstraints framework

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19201
  
**[Test build #81671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81671/testReport)**
 for PR 19201 at commit 
[`7b414fa`](https://github.com/apache/spark/commit/7b414fafcf53e9e9e79a403a47e409238c0b9761).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19132
  
**[Test build #81678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81678/testReport)**
 for PR 19132 at commit 
[`25fe22c`](https://github.com/apache/spark/commit/25fe22cddde276f846fd4808de1b575a87b1c059).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19132
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19199
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81673/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 >

201 - 300 of 465 matches

Mail list logo