[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6479#discussion_r31697101
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -313,295 +318,309 @@ abstract class CodeGenerator[InType <: AnyRef, 
OutType <: AnyRef] extends Loggin
 }
 val $nullTerm = false
 val $primitiveTerm = $funcName
-""".children
+"""
   */
 
   case GreaterThan(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 > $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 > $eval2" }
   case GreaterThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 >= $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 >= $eval2" }
   case LessThan(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 < $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 < $eval2" }
   case LessThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 <= $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 <= $eval2" }
 
   case And(e1, e2) =>
-val eval1 = expressionEvaluator(e1)
-val eval2 = expressionEvaluator(e2)
-
-q"""
-  ..${eval1.code}
-  var $nullTerm = false
-  var $primitiveTerm: ${termForType(BooleanType)} = false
-
-  if (!${eval1.nullTerm} && ${eval1.primitiveTerm} == false) {
+val eval1 = expressionEvaluator(e1, ctx)
+val eval2 = expressionEvaluator(e2, ctx)
+// TODO(davies): This is different than And.eval()
+s"""
+  ${eval1.code}
+  boolean $nullTerm = false;
+  boolean $primitiveTerm  = false;
+
+  if (!${eval1.nullTerm} && !${eval1.primitiveTerm}) {
   } else {
-..${eval2.code}
-if (!${eval2.nullTerm} && ${eval2.primitiveTerm} == false) {
+${eval2.code}
+if (!${eval2.nullTerm} && !${eval2.primitiveTerm}) {
 } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) {
-  $primitiveTerm = true
+  $primitiveTerm = true;
 } else {
-  $nullTerm = true
+  $nullTerm = true;
 }
   }
- """.children
+ """
 
   case Or(e1, e2) =>
-val eval1 = expressionEvaluator(e1)
-val eval2 = expressionEvaluator(e2)
+val eval1 = expressionEvaluator(e1, ctx)
+val eval2 = expressionEvaluator(e2, ctx)
 
-q"""
-  ..${eval1.code}
-  var $nullTerm = false
-  var $primitiveTerm: ${termForType(BooleanType)} = false
+s"""
+  ${eval1.code}
+  boolean $nullTerm = false;
+  boolean $primitiveTerm = false;
 
   if (!${eval1.nullTerm} && ${eval1.primitiveTerm}) {
-$primitiveTerm = true
+$primitiveTerm = true;
   } else {
-..${eval2.code}
+${eval2.code}
 if (!${eval2.nullTerm} && ${eval2.primitiveTerm}) {
-  $primitiveTerm = true
+  $primitiveTerm = true;
 } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) {
-  $primitiveTerm = false
+  $primitiveTerm = false;
 } else {
-  $nullTerm = true
+  $nullTerm = true;
 }
   }
- """.children
+ """
 
   case Not(child) =>
 // Uh, bad function name...
-child.castOrNull(c => q"!$c", BooleanType)
-
-  case Add(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 + $eval2" }
-  case Subtract(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 - $eval2" }
-  case Multiply(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 * $eval2" }
+child.castOrNull(c => s"!$c", BooleanType)
+
+  case Add(e1 @ DecimalType(), e2 @ DecimalType()) =>
--- End diff --

@JoshRosen this is actually not correct if you use `e1: DecimalType`, 
because we are not matching against `DecimalType` here, but rather against 
expressions whose output is decimaltype.


---
If your project is set up for it, yo

[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6499#issuecomment-108755212
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6499#issuecomment-108755193
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7937][SQL] Support comparison on Struct...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6519#issuecomment-108754525
  
  [Test build #34157 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34157/consoleFull)
 for   PR 6519 at commit 
[`2071693`](https://github.com/apache/spark/commit/20716936db429f8fc793b65ff70fd841c8c6d428).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108754378
  
Nah, not a correctness issue. Just some general cleanup; happy to tackle it 
myself in a followup.  By the way, Catalyst seems to compile fine when I remove 
both the Scala reflection and compiler JARs.  We can remove those as part of 
the followup, though; not a blocker for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...

2015-06-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6616#discussion_r31696920
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -395,22 +395,35 @@ class DataFrame private[sql](
* @since 1.4.0
*/
   def join(right: DataFrame, usingColumn: String): DataFrame = {
+join(right, Seq(usingColumn))
+  }
+
+  def join(right: DataFrame, usingColumns: Seq[String]): DataFrame = {
 // Analyze the self join. The assumption is that the analyzer will 
disambiguate left vs right
 // by creating a new instance for one of the branch.
 val joined = sqlContext.executePlan(
   Join(logicalPlan, right.logicalPlan, joinType = Inner, 
None)).analyzed.asInstanceOf[Join]
 
-// Project only one of the join column.
-val joinedCol = joined.right.resolve(usingColumn)
+// Project only one of the join columns.
+val joinedCols = usingColumns.map(col => joined.right.resolve(col))
+val condition = usingColumns.map { col =>
+  catalyst.expressions.EqualTo(joined.left.resolve(col), 
joined.right.resolve(col))
+}.foldLeft[Option[catalyst.expressions.BinaryExpression]](None) { 
(opt, eqTo) =>
--- End diff --

this can be simplifed into a reduceOption right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7937][SQL] Support comparison on Struct...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6519#issuecomment-108753616
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7937][SQL] Support comparison on Struct...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6519#issuecomment-108753655
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...

2015-06-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6616#discussion_r31696855
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -395,22 +395,35 @@ class DataFrame private[sql](
* @since 1.4.0
*/
   def join(right: DataFrame, usingColumn: String): DataFrame = {
+join(right, Seq(usingColumn))
+  }
+
+  def join(right: DataFrame, usingColumns: Seq[String]): DataFrame = {
--- End diff --

add javadoc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...

2015-06-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6616#discussion_r31696822
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -448,6 +461,10 @@ class DataFrame private[sql](
* @since 1.3.0
*/
   def join(right: DataFrame, joinExprs: Column, joinType: String): 
DataFrame = {
+join(right, Seq(joinExprs), joinType)
+  }
+
+  def join(right: DataFrame, joinExprs: Seq[Column], joinType: String): 
DataFrame = {
--- End diff --

i think we should remove this one for scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2015-06-03 Thread nchammas
Github user nchammas closed the pull request at:

https://github.com/apache/spark/pull/3564


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108752930
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108752895
  
  [Test build #34153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34153/consoleFull)
 for   PR 6479 at commit 
[`262d848`](https://github.com/apache/spark/commit/262d84839a0876c07ffe57031fb505e664abfe66).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class UnsafeRow extends BaseMutableRow `
  * `public abstract class BaseMutableRow extends BaseRow implements 
MutableRow `
  * `public abstract class BaseRow implements Row `
  * `  protected class CodeGenContext `
  * `abstract class BaseMutableProjection extends MutableProjection `
  * `  class SpecificProjection extends $`
  * `class BaseOrdering extends Ordering[Row] `
  * `  class SpecificOrdering extends $`
  * `abstract class Predicate `
  * `  class SpecificPredicate extends $`
  * `abstract class BaseProject extends Projection `
  * `class SpecificProjection extends $`
  * `final class SpecificRow extends $`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7436: Fixed instantiation of custom reco...

2015-06-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5976


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2387][Core]Remove Stage's barrier

2015-06-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3430


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6625#issuecomment-108751844
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6625#issuecomment-108751839
  
  [Test build #34154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34154/consoleFull)
 for   PR 6625 at commit 
[`ae212c8`](https://github.com/apache/spark/commit/ae212c89ce9feba731c3dd60b3d4332addee2a0e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  // class ParentClass(parentField: Int)`
  * `  // class ChildClass(childField: Int) extends ParentClass(1)`
  * `  // If the class type corresponding to current slot has 
writeObject() defined,`
  * `  // then its not obvious which fields of the class will be 
serialized as the writeObject()`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4576


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2015-06-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2495


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108751601
  
Does it impact correctness? If it doesn't I think it's fine to do it in 
followup prs.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108751305
  
Actually... if you agree that we only need the synchronization for 
`ClassBodyEvaluator`, how about removing `globalLock` and just synchronizing on 
the `ClassBodyEvaluator` instance instead?  I think that this is a little 
clearer / more explicit.  I think that we only need to keep the `globalLock` in 
place if it's implicitly guarding Scala reflection calls.

We don't need to do this cleanup here necessarily, but we might as well if 
it's easy to do.  The nitpicker smaller cleanups, such as type aliases for 
`String`, etc. should definitely be deferred, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-108751032
  
@tsliwowicz can you please close this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3172] [SPARK-3577] improve shuffle spil...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2504#issuecomment-108750834
  
This has gone stale so I'd like to close this issue pending further 
discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3172] [SPARK-3577] improve shuffle spil...

2015-06-03 Thread sryza
Github user sryza closed the pull request at:

https://github.com/apache/spark/pull/2504


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-108750767
  
I'd like to close this issue for now pending further development.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4576#issuecomment-108750698
  
I am cleaning up old PR's and would propose to close this issue pending 
further discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108750436
  
@davies, regarding the global lock, I think that we need it strictly 
because of `ClassBodyEvaluator`'s lack of thread-safety, not to prevent 
duplicate loading for expressions.

According to Guava's [CacheBuilder 
documentation](http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html#build(com.google.common.cache.CacheLoader))
 for `CacheBuilder.build(loader)`:

> Builds a cache, which either returns an already-loaded value for a given 
key or atomically computes or retrieves it using the supplied CacheLoader. If 
another thread is currently loading the value for this key, simply waits for 
that thread to finish and returns its loaded value. Note that multiple threads 
can concurrently load values for distinct keys.

Spark's HistoryServer makes use of this and I've used this pattern in some 
other code as well.  Not a correctness / performance issue for this PR, but 
just wanted to point this out since it's a neat thread-safety trick that Guava 
gives us.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6499#issuecomment-108750534
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6499#issuecomment-108750520
  
  [Test build #34156 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34156/consoleFull)
 for   PR 6499 at commit 
[`e12c590`](https://github.com/apache/spark/commit/e12c590801d5eed7666b33d7ac2b8543e2d340e2).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StreamingKMeansModel(KMeansModel):`
  * `class StreamingKMeans(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4222#issuecomment-108749998
  
@jacek-lewandowski can you close this issue? It didnt' close properly 
because of the way github auto-closes patches into release branches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4051#issuecomment-108749911
  
For some reason this didn't close. @sryza can you close this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2495#issuecomment-108749807
  
I'd like to close this issue pending further updates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2387][Core]Remove Stage's barrier

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3430#issuecomment-108749762
  
I'd propose to close this issue. It's a fairly large change and needs more 
discussion before being seriously considered to be part of spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-108749526
  
Hey @brennonyork - sorry it took so long to review this. I can iterate on 
this much more closely over the next few days. I took a close look at this. 
This is a faithful porting of the bash implementation, but I think we should 
leverage Python to make things a bit nicer than they could be in bash. As part 
of re-writing this in python I think we should so some much needed 
simplification of the construction of all the various permutations of test 
configurations. I sketched an outline as to how it can work, let me know if you 
have any questions. Without those changes I think this will remain really hard 
for someone else to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108749295
  
As long as we're removing quasiquotes, we might see if we can delete their 
dependencies from the `pom` as well:

I think we might be able to remove the following (should confirm, though):

```
 
 
 
   scala-2.10
   
 !scala-2.11
   
   
 
   org.scalamacros
   quasiquotes_${scala.binary.version}
   ${scala.macros.version}
 
   
 
   
```

and

```
 
   org.scala-lang
   scala-compiler
 
```

We might have to keep

```
 
   org.scala-lang
   scala-reflect
 
```

if we actually depend on Scala reflection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31696016
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path or

[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...

2015-06-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/6616#discussion_r31695804
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -509,30 +509,42 @@ def join(self, other, joinExprs=None, joinType=None):
 The following performs a full outer join between ``df1`` and 
``df2``.
 
 :param other: Right side of the join
-:param joinExprs: a string for join column name, or a join 
expression (Column).
-If joinExprs is a string indicating the name of the join 
column,
-the column must exist on both sides, and this performs an 
inner equi-join.
+:param joinExprs: a string for join column name, a list of column 
names,
--- End diff --

ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-108743088
  
Can you update the Python lint checks to run on this file as well?  I 
noticed that it fails `pep8` checks:

```python
pep8 dev/run-tests.py
dev/run-tests.py:34:62: W291 trailing whitespace
dev/run-tests.py:41:74: W291 trailing whitespace
dev/run-tests.py:43:1: W293 blank line contains whitespace
dev/run-tests.py:45:53: W291 trailing whitespace
dev/run-tests.py:93:20: E203 whitespace before ':'
dev/run-tests.py:94:20: E203 whitespace before ':'
dev/run-tests.py:95:20: E203 whitespace before ':'
dev/run-tests.py:96:20: E203 whitespace before ':'
dev/run-tests.py:103:62: W291 trailing whitespace
dev/run-tests.py:104:22: W503 line break before binary operator
dev/run-tests.py:108:22: W503 line break before binary operator
dev/run-tests.py:151:65: W291 trailing whitespace
dev/run-tests.py:153:48: E261 at least two spaces before inline comment
dev/run-tests.py:154:57: E261 at least two spaces before inline comment
dev/run-tests.py:155:45: E261 at least two spaces before inline comment
dev/run-tests.py:157:44: W291 trailing whitespace
dev/run-tests.py:197:45: W291 trailing whitespace
dev/run-tests.py:198:50: W291 trailing whitespace
dev/run-tests.py:199:59: W291 trailing whitespace
dev/run-tests.py:268:24: W291 trailing whitespace
dev/run-tests.py:283:60: W291 trailing whitespace
dev/run-tests.py:288:33: W291 trailing whitespace
dev/run-tests.py:289:43: W291 trailing whitespace
dev/run-tests.py:296:57: W291 trailing whitespace
dev/run-tests.py:317:40: W291 trailing whitespace
dev/run-tests.py:320:79: W291 trailing whitespace
dev/run-tests.py:332:44: W291 trailing whitespace
dev/run-tests.py:342:44: W291 trailing whitespace
dev/run-tests.py:353:1: W293 blank line contains whitespace
dev/run-tests.py:392:79: W291 trailing whitespace
dev/run-tests.py:404:1: W293 blank line contains whitespace
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695415
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695382
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
--- End diff --

Also, `inspect` needs to be imported` if you intend to call this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695346
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
--- End diff --

This looks like a typo; `ajbp` is an undefined variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695334
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695312
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695270
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695181
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...

2015-06-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6616#discussion_r31695152
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -509,30 +509,42 @@ def join(self, other, joinExprs=None, joinType=None):
 The following performs a full outer join between ``df1`` and 
``df2``.
 
 :param other: Right side of the join
-:param joinExprs: a string for join column name, or a join 
expression (Column).
-If joinExprs is a string indicating the name of the join 
column,
-the column must exist on both sides, and this performs an 
inner equi-join.
+:param joinExprs: a string for join column name, a list of column 
names,
--- End diff --

let's change the argument name in python as explained in the jira ticket


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695142
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
--- End diff --

Well, it probably does get called a bunch of times, so compiling it may 
help, but doing the compilation, say, twice instead of once isn't a huge cost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695111
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
--- End diff --

This is only called in one place and it's not a performance-critical regex, 
so I think it would be cleaner to just move this into the function where it's 
used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8010][SQL]Promote numeric types to stri...

2015-06-03 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/6551#discussion_r31695131
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -69,6 +69,11 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 sql("DROP TEMPORARY FUNCTION udtf_count2")
   }
 
+  test("SPARK-8010: implicit promote to string type") {
+sql("select case when true then '1' else 1 end from src ")
+sql("select coalesce(null, 1, '1') from src ")
--- End diff --

And probably it's better to move the test into 
https://github.com/OopsOutOfMemory/spark/blob/pnts/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695085
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
--- End diff --

This doesn't seem to be used anywhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8010][SQL]Promote numeric types to stri...

2015-06-03 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/6551#discussion_r31695034
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -69,6 +69,11 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 sql("DROP TEMPORARY FUNCTION udtf_count2")
   }
 
+  test("SPARK-8010: implicit promote to string type") {
+sql("select case when true then '1' else 1 end from src ")
+sql("select coalesce(null, 1, '1') from src ")
--- End diff --

Can you compare the result with Hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31695009
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694976
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694893
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694857
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694783
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694780
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path o

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694637
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694598
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
--- End diff --

For consistently it might be nice to call this `SPARK_HOME` (our standard 
name for the Spark dir).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694485
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694123
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-06-03 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/5694#discussion_r31694092
  
--- Diff: dev/run-tests.py ---
@@ -0,0 +1,417 @@
+#!/usr/bin/env python2
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import os
+import re
+import sys
+import shutil
+import subprocess
+from collections import namedtuple
+
+SPARK_PROJ_ROOT = 
os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+USER_HOME_DIR = os.environ.get("HOME")
+
+SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS"
+AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL")
+AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS")
+
+SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + 
+   "^.*[warn].*Merging" + "|" +
+   "^.*[info].*Including")
+
+
+def get_error_codes(err_code_file):
+"""Function to retrieve all block numbers from the `run-tests-codes.sh`
+file to maintain backwards compatibility with the `run-tests-jenkins` 
+script"""
+
+with open(err_code_file, 'r') as f:
+err_codes = [e.split()[1].strip().split('=') 
+ for e in f if e.startswith("readonly")]
+return dict(err_codes)
+
+
+def exit_from_command_with_retcode(cmd, retcode):
+print "[error] running", cmd, "; received return code", retcode
+sys.exit(int(os.environ.get("CURRENT_BLOCK", 255)))
+
+
+def rm_r(path):
+"""Given an arbitrary path properly remove it with the correct python
+construct if it exists
+- from: http://stackoverflow.com/a/9559881""";
+
+if os.path.isdir(path):
+shutil.rmtree(path)
+elif os.path.exists(path):
+os.remove(path)
+
+
+def lineno():
+"""Returns the current line number in our program
+- from: http://stackoverflow.com/a/3056059""";
+
+return inspect.currentframe().f_back.f_lineno
+
+
+def run_cmd(cmd):
+"""Given a command as a list of arguments will attempt to execute the
+command and, on failure, print an error message"""
+
+if not isinstance(cmd, list):
+cmd = cmd.split()
+try:
+subprocess.check_call(cmd)
+except subprocess.CalledProcessError as e:
+exit_from_command_with_retcode(e.cmd, e.returncode)
+
+
+def set_sbt_maven_profile_args():
+"""Properly sets the SBT environment variable arguments with additional
+checks to determine if this is running on an Amplab Jenkins machine"""
+
+# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be 
appended on
+sbt_maven_profile_args_base = ["-Pkinesis-asl"]
+
+sbt_maven_profile_arg_dict = {
+"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"],
+"hadoop2.0" : ["-Phadoop-1", 
"-Dhadoop.version=2.0.0-mr1-cdh4.1.1"],
+"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"],
+"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"],
+}
+
+# set the SBT maven build profile argument environment variable and 
ensure
+# we build against the right version of Hadoop
+if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"):
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) 
+ + sbt_maven_profile_args_base)
+else:
+os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \
+" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", [])
+ + sbt_maven_profile_args_base)
+
+
+def is_exe(path):
+"""Check if a given path is an executable file
+- from: http://stackoverflow.com/a/377028""";
+
+return os.path.isfile(path) and os.access(path, os.X_OK)
+
+
+def which(program):
+"""Find and return the given program by its absolute path or

[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...

2015-06-03 Thread jerryshao
Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/6430#issuecomment-108731637
  
I just tried in my local pseudo-cluster for functionality, haven't tried in 
real cluster for performance, I will test it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...

2015-06-03 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/6430#issuecomment-108730438
  
lgtm. @jerryshao have you had a chance to test this on a real cluster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7558] Demarcate tests in unit-tests.log...

2015-06-03 Thread andrewor14
Github user andrewor14 closed the pull request at:

https://github.com/apache/spark/pull/6598


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6430#issuecomment-108728952
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6430#issuecomment-108728901
  
  [Test build #34151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34151/consoleFull)
 for   PR 6430 at commit 
[`02cac8e`](https://github.com/apache/spark/commit/02cac8e8ff5c0d91a2cc905f9412942d74c751b6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Testing only] [Do not merge] [1.4]

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6626#issuecomment-108725031
  
  [Test build #34149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34149/consoleFull)
 for   PR 6626 at commit 
[`4fe0b14`](https://github.com/apache/spark/commit/4fe0b14d506740efffc6b2a11b552ec0d26ae6f2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Testing only] [Do not merge] [1.4]

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6626#issuecomment-108725087
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6587#issuecomment-108721462
  
  [Test build #34155 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34155/consoleFull)
 for   PR 6587 at commit 
[`f1ddbf1`](https://github.com/apache/spark/commit/f1ddbf1e4bff3710596ddd7ba45ef2e42695628a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6625#issuecomment-108721394
  
  [Test build #34154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34154/consoleFull)
 for   PR 6625 at commit 
[`ae212c8`](https://github.com/apache/spark/commit/ae212c89ce9feba731c3dd60b3d4332addee2a0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6499#issuecomment-108721147
  
  [Test build #34156 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34156/consoleFull)
 for   PR 6499 at commit 
[`e12c590`](https://github.com/apache/spark/commit/e12c590801d5eed7666b33d7ac2b8543e2d340e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6625#issuecomment-108720948
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6625#issuecomment-108720966
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6499#issuecomment-108720982
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6499#issuecomment-108720956
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6587#issuecomment-108720952
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6587#issuecomment-108720983
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...

2015-06-03 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6625#discussion_r31693024
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala ---
@@ -125,6 +125,13 @@ private[spark] object SerializationDebugger extends 
Logging {
   return List.empty
 }
 
+/**
+ * Visit an externalizable object.
+ * Since writeExternal() can choose add arbitrary objects at the time 
of serialization,
+ * the only way to capture all the objects it will serialize is by 
using a
+ * dummy ObjectOutput object that captures all the inner objects, and 
then visit all the
--- End diff --

Okay, I can shorten it for this, as well as for the new visit function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6583][SQL] Support aggregated function ...

2015-06-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/5290#discussion_r31692885
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -364,11 +364,24 @@ class Analyzer(
 
 val (resolvedOrdering, missing) = resolveAndFindMissing(ordering, 
a, groupingRelation)
 
-if (missing.nonEmpty) {
+val addForAlias = new ArrayBuffer[NamedExpression]()
+val aliasedOrdering = resolvedOrdering.zipWithIndex.map {
+  case (o, i) => {
+o transform {
+  case aggOrSub @ (_: AggregateExpression | _: Substring) =>
--- End diff --

It's another bug and not only affect `Substring`... I tried `SELECT b + 1, 
count(*) FROM orderByData GROUP BY b + 1 ORDER BY b + 1` and also failed. The 
key problem is we build the temp grouping relation by `grouping.collect { case 
ne: NamedExpression => ne.toAttribute }` which filter out all un-named 
expressions like `Add`, `Substring`, etc. I will work on it.
cc @marmbrus @rxin 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108717176
  
  [Test build #34153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34153/consoleFull)
 for   PR 6479 at commit 
[`262d848`](https://github.com/apache/spark/commit/262d84839a0876c07ffe57031fb505e664abfe66).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6587#issuecomment-108717174
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6587#issuecomment-108717155
  
  [Test build #34150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34150/consoleFull)
 for   PR 6587 at commit 
[`35f1892`](https://github.com/apache/spark/commit/35f1892bd1e60eb7b45f1737b08664d71e4c1f38).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ElementwiseProduct(val scalingVec: Vector) extends 
VectorTransformer `
  * `trait TypeCheckResult `
  * `  case class TypeCheckFailure(message: String) extends TypeCheckResult 
`
  * `abstract class UnaryArithmetic extends UnaryExpression `
  * `case class UnaryMinus(child: Expression) extends UnaryArithmetic `
  * `case class Sqrt(child: Expression) extends UnaryArithmetic `
  * `case class Abs(child: Expression) extends UnaryArithmetic `
  * `case class BitwiseNot(child: Expression) extends UnaryArithmetic `
  * `case class MaxOf(left: Expression, right: Expression) extends 
BinaryArithmetic `
  * `case class MinOf(left: Expression, right: Expression) extends 
BinaryArithmetic `
  * `case class Atan2(left: Expression, right: Expression)`
  * `case class Hypot(left: Expression, right: Expression)`
  * `case class EqualTo(left: Expression, right: Expression) extends 
BinaryComparison `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108716127
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108716085
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6479#discussion_r31692338
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -313,295 +318,309 @@ abstract class CodeGenerator[InType <: AnyRef, 
OutType <: AnyRef] extends Loggin
 }
 val $nullTerm = false
 val $primitiveTerm = $funcName
-""".children
+"""
   */
 
   case GreaterThan(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 > $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 > $eval2" }
   case GreaterThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 >= $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 >= $eval2" }
   case LessThan(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 < $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 < $eval2" }
   case LessThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 <= $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 <= $eval2" }
 
   case And(e1, e2) =>
-val eval1 = expressionEvaluator(e1)
-val eval2 = expressionEvaluator(e2)
-
-q"""
-  ..${eval1.code}
-  var $nullTerm = false
-  var $primitiveTerm: ${termForType(BooleanType)} = false
-
-  if (!${eval1.nullTerm} && ${eval1.primitiveTerm} == false) {
+val eval1 = expressionEvaluator(e1, ctx)
+val eval2 = expressionEvaluator(e2, ctx)
+// TODO(davies): This is different than And.eval()
+s"""
+  ${eval1.code}
+  boolean $nullTerm = false;
+  boolean $primitiveTerm  = false;
+
+  if (!${eval1.nullTerm} && !${eval1.primitiveTerm}) {
   } else {
-..${eval2.code}
-if (!${eval2.nullTerm} && ${eval2.primitiveTerm} == false) {
+${eval2.code}
+if (!${eval2.nullTerm} && !${eval2.primitiveTerm}) {
 } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) {
-  $primitiveTerm = true
+  $primitiveTerm = true;
 } else {
-  $nullTerm = true
+  $nullTerm = true;
 }
   }
- """.children
+ """
 
   case Or(e1, e2) =>
-val eval1 = expressionEvaluator(e1)
-val eval2 = expressionEvaluator(e2)
+val eval1 = expressionEvaluator(e1, ctx)
+val eval2 = expressionEvaluator(e2, ctx)
 
-q"""
-  ..${eval1.code}
-  var $nullTerm = false
-  var $primitiveTerm: ${termForType(BooleanType)} = false
+s"""
+  ${eval1.code}
+  boolean $nullTerm = false;
+  boolean $primitiveTerm = false;
 
   if (!${eval1.nullTerm} && ${eval1.primitiveTerm}) {
-$primitiveTerm = true
+$primitiveTerm = true;
   } else {
-..${eval2.code}
+${eval2.code}
 if (!${eval2.nullTerm} && ${eval2.primitiveTerm}) {
-  $primitiveTerm = true
+  $primitiveTerm = true;
 } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) {
-  $primitiveTerm = false
+  $primitiveTerm = false;
 } else {
-  $nullTerm = true
+  $nullTerm = true;
 }
   }
- """.children
+ """
 
   case Not(child) =>
 // Uh, bad function name...
-child.castOrNull(c => q"!$c", BooleanType)
-
-  case Add(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 + $eval2" }
-  case Subtract(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 - $eval2" }
-  case Multiply(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 * $eval2" }
+child.castOrNull(c => s"!$c", BooleanType)
+
+  case Add(e1 @ DecimalType(), e2 @ DecimalType()) =>
--- End diff --

DecimalType.unapply() will catch all the DecimalType objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled a

[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6479#discussion_r31692314
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala ---
@@ -37,7 +37,6 @@ import com.google.common.hash.Hashing
  * It uses quadratic probing with a power-of-2 hash table size, which is 
guaranteed
  * to explore all spaces for each key (see 
http://en.wikipedia.org/wiki/Quadratic_probing).
  */
-private[spark]
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6479#discussion_r31692322
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -163,133 +190,111 @@ abstract class CodeGenerator[InType <: AnyRef, 
OutType <: AnyRef] extends Loggin
*
* @param f a function from two primitive term names to a tree that 
evaluates them.
*/
-  def evaluate(f: (TermName, TermName) => Tree): Seq[Tree] =
+  def evaluate(f: (String, String) => String): String =
--- End diff --

Could be done later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108714776
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108714769
  
  [Test build #34152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34152/consoleFull)
 for   PR 6479 at commit 
[`eec3a33`](https://github.com/apache/spark/commit/eec3a33434342bfb342fc541f97d879fbd80d974).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class UnsafeRow extends BaseMutableRow `
  * `public abstract class BaseMutableRow extends BaseRow implements 
MutableRow `
  * `public abstract class BaseRow implements Row `
  * `  protected class CodeGenContext `
  * `abstract class BaseMutableProjection extends MutableProjection `
  * `  class SpecificProjection extends $`
  * `class BaseOrdering extends Ordering[Row] `
  * `  class SpecificOrdering extends $`
  * `abstract class Predicate `
  * `  class SpecificPredicate extends $`
  * `abstract class BaseProject extends Projection `
  * `class SpecificProjection extends $`
  * `final class SpecificRow extends $`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6479#discussion_r31692302
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateProjection.scala
 ---
@@ -38,201 +42,191 @@ object GenerateProjection extends 
CodeGenerator[Seq[Expression], Projection] {
 
   // Make Mutablility optional...
   protected def create(expressions: Seq[Expression]): Projection = {
-val tupleLength = ru.Literal(Constant(expressions.length))
-val lengthDef = q"final val length = $tupleLength"
-
 /* TODO: Configurable...
 val nullFunctions =
-  q"""
+  s"""
 private final val nullSet = new 
org.apache.spark.util.collection.BitSet(length)
 final def setNullAt(i: Int) = nullSet.set(i)
 final def isNullAt(i: Int) = nullSet.get(i)
   """
  */
 
-val nullFunctions =
-  q"""
-private[this] var nullBits = new 
Array[Boolean](${expressions.size})
-override def setNullAt(i: Int) = { nullBits(i) = true }
-override def isNullAt(i: Int) = nullBits(i)
-  """.children
-
-val tupleElements = expressions.zipWithIndex.flatMap {
+val ctx = newCodeGenContext()
+val columns = expressions.zipWithIndex.map {
   case (e, i) =>
-val elementName = newTermName(s"c$i")
-val evaluatedExpression = expressionEvaluator(e)
-val iLit = ru.Literal(Constant(i))
+s"private ${primitiveForType(e.dataType)} c$i = 
${defaultPrimitive(e.dataType)};\n"
+}.mkString("\n  ")
 
-q"""
-var ${newTermName(s"c$i")}: ${termForType(e.dataType)} = _
+val initColumns = expressions.zipWithIndex.map {
+  case (e, i) =>
+val eval = expressionEvaluator(e, ctx)
+s"""
 {
-  ..${evaluatedExpression.code}
-  if(${evaluatedExpression.nullTerm})
-setNullAt($iLit)
-  else {
-nullBits($iLit) = false
-$elementName = ${evaluatedExpression.primitiveTerm}
+  // column$i
+  ${eval.code}
+  nullBits[$i] = ${eval.nullTerm};
+  if(!${eval.nullTerm}) {
+c$i = ${eval.primitiveTerm};
   }
 }
-""".children : Seq[Tree]
-}
+"""
+}.mkString("\n")
 
-val accessorFailure = q"""scala.sys.error("Invalid ordinal:" + i)"""
-val applyFunction = {
-  val cases = (0 until expressions.size).map { i =>
-val ordinal = ru.Literal(Constant(i))
-val elementName = newTermName(s"c$i")
-val iLit = ru.Literal(Constant(i))
+val getCases = (0 until expressions.size).map { i =>
+  s"case $i: return c$i;"
+}.mkString("\n")
 
-q"if(i == $ordinal) { if(isNullAt($i)) return null else return 
$elementName }"
-  }
-  q"override def apply(i: Int): Any = { ..$cases; $accessorFailure }"
-}
-
-val updateFunction = {
-  val cases = expressions.zipWithIndex.map {case (e, i) =>
-val ordinal = ru.Literal(Constant(i))
-val elementName = newTermName(s"c$i")
-val iLit = ru.Literal(Constant(i))
-
-q"""
-  if(i == $ordinal) {
-if(value == null) {
-  setNullAt(i)
-} else {
-  nullBits(i) = false
-  $elementName = value.asInstanceOf[${termForType(e.dataType)}]
-}
-return
-  }"""
-  }
-  q"override def update(i: Int, value: Any): Unit = { ..$cases; 
$accessorFailure }"
-}
+val updateCases = expressions.zipWithIndex.map { case (e, i) =>
+  s"case $i: { c$i = (${termForType(e.dataType)})value; return;}"
+}.mkString("\n")
 
 val specificAccessorFunctions = nativeTypes.map { dataType =>
-  val ifStatements = expressions.zipWithIndex.flatMap {
-// getString() is not used by expressions
-case (e, i) if e.dataType == dataType && dataType != StringType =>
-  val elementName = newTermName(s"c$i")
-  // TODO: The string of ifs gets pretty inefficient as the row 
grows in size.
-  // TODO: Optional null checks?
-  q"if(i == $i) return $elementName" :: Nil
-case _ => Nil
-  }
-  dataType match {
-// Row() need this interface to compile
-case StringType =>
-  q"""
-  override def getString(i: Int): String = {
-$accessorFailure
-  }"""
-case other =>
-  q"""
-  override def ${accessorFor

[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108714133
  
  [Test build #34152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34152/consoleFull)
 for   PR 6479 at commit 
[`eec3a33`](https://github.com/apache/spark/commit/eec3a33434342bfb342fc541f97d879fbd80d974).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7558] Demarcate tests in unit-tests.log...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6598#issuecomment-108714273
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7558] Demarcate tests in unit-tests.log...

2015-06-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6598#issuecomment-108714237
  
  [Test build #34147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34147/consoleFull)
 for   PR 6598 at commit 
[`4c3c566`](https://github.com/apache/spark/commit/4c3c566b155e01149ff1e8c9fd55c1ae78602954).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108714012
  
@JoshRosen I think it's fine to keep the global lock, because we want to 
only compile once for a given expression. Also the `ClassBodyEvaluator` is not 
thread safe, or we need to create new one for each compile unit.

IntegerHashSet is specified version for codegen, it will have better 
performance than OpenHashSet[Row].

There is a follow up PR (based on this one) to push the codegen into 
Expression, lots of code will be moved, so some minor issues could be addressed 
in that one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108713625
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6479#issuecomment-108713614
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...

2015-06-03 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6479#discussion_r31691732
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -313,295 +318,309 @@ abstract class CodeGenerator[InType <: AnyRef, 
OutType <: AnyRef] extends Loggin
 }
 val $nullTerm = false
 val $primitiveTerm = $funcName
-""".children
+"""
   */
 
   case GreaterThan(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 > $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 > $eval2" }
   case GreaterThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 >= $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 >= $eval2" }
   case LessThan(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 < $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 < $eval2" }
   case LessThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) =>
-(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
q"$eval1 <= $eval2" }
+(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => 
s"$eval1 <= $eval2" }
 
   case And(e1, e2) =>
-val eval1 = expressionEvaluator(e1)
-val eval2 = expressionEvaluator(e2)
-
-q"""
-  ..${eval1.code}
-  var $nullTerm = false
-  var $primitiveTerm: ${termForType(BooleanType)} = false
-
-  if (!${eval1.nullTerm} && ${eval1.primitiveTerm} == false) {
+val eval1 = expressionEvaluator(e1, ctx)
+val eval2 = expressionEvaluator(e2, ctx)
+// TODO(davies): This is different than And.eval()
+s"""
+  ${eval1.code}
+  boolean $nullTerm = false;
+  boolean $primitiveTerm  = false;
+
+  if (!${eval1.nullTerm} && !${eval1.primitiveTerm}) {
   } else {
-..${eval2.code}
-if (!${eval2.nullTerm} && ${eval2.primitiveTerm} == false) {
+${eval2.code}
+if (!${eval2.nullTerm} && !${eval2.primitiveTerm}) {
 } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) {
-  $primitiveTerm = true
+  $primitiveTerm = true;
 } else {
-  $nullTerm = true
+  $nullTerm = true;
 }
   }
- """.children
+ """
 
   case Or(e1, e2) =>
-val eval1 = expressionEvaluator(e1)
-val eval2 = expressionEvaluator(e2)
+val eval1 = expressionEvaluator(e1, ctx)
+val eval2 = expressionEvaluator(e2, ctx)
 
-q"""
-  ..${eval1.code}
-  var $nullTerm = false
-  var $primitiveTerm: ${termForType(BooleanType)} = false
+s"""
+  ${eval1.code}
+  boolean $nullTerm = false;
+  boolean $primitiveTerm = false;
 
   if (!${eval1.nullTerm} && ${eval1.primitiveTerm}) {
-$primitiveTerm = true
+$primitiveTerm = true;
   } else {
-..${eval2.code}
+${eval2.code}
 if (!${eval2.nullTerm} && ${eval2.primitiveTerm}) {
-  $primitiveTerm = true
+  $primitiveTerm = true;
 } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) {
-  $primitiveTerm = false
+  $primitiveTerm = false;
 } else {
-  $nullTerm = true
+  $nullTerm = true;
 }
   }
- """.children
+ """
 
   case Not(child) =>
 // Uh, bad function name...
-child.castOrNull(c => q"!$c", BooleanType)
-
-  case Add(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 + $eval2" }
-  case Subtract(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 - $eval2" }
-  case Multiply(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => 
q"$eval1 * $eval2" }
+child.castOrNull(c => s"!$c", BooleanType)
+
+  case Add(e1 @ DecimalType(), e2 @ DecimalType()) =>
+(e1, e2) evaluate { case (eval1, eval2) => 
s"$eval1.$$plus($eval2)" }
+  case Subtract(e1 @ DecimalType(), e2 @ DecimalType()) =>
+(e1, e2) evaluate { case (eval1, eval2) => 
s"$eval1.$$minus($eval2)" }
+  case Mult

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-06-03 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/5921#issuecomment-108709503
  
Another ping on this, even if it misses 1.4

Seeing waitUntilLeaderOffset all over the place in test code I'm working on 
right now made me sad :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7889] [UI] make sure click the "App ID"...

2015-06-03 Thread XuTingjun
Github user XuTingjun commented on the pull request:

https://github.com/apache/spark/pull/6545#issuecomment-108708777
  
I think the reason is not just because of the appCache. After my debug the 
code, I found there are two mainly reasons:
1. First time send *"/history/appid"* request, the *handler "/history/*"* 
will deal with it. During this, the handler adds the handlers of sparkui into 
historyserver, contains the *handler "/history/appid"*. So when the second time 
send *"/history/appid"* request, the *handler "/history/appid"* will deal with 
it instead of *handler "/history/*"*. so the second time ui is the same with 
the first one;

2. In the *handler "/history/*"*, the code use `appCache.get()` to cache 
app ui. I think we should use `appCache.refresh()` instead to make it can 
refresh.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >