[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/6479#discussion_r31697101 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -313,295 +318,309 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin } val $nullTerm = false val $primitiveTerm = $funcName -""".children +""" */ case GreaterThan(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 > $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 > $eval2" } case GreaterThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 >= $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 >= $eval2" } case LessThan(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 < $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 < $eval2" } case LessThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 <= $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 <= $eval2" } case And(e1, e2) => -val eval1 = expressionEvaluator(e1) -val eval2 = expressionEvaluator(e2) - -q""" - ..${eval1.code} - var $nullTerm = false - var $primitiveTerm: ${termForType(BooleanType)} = false - - if (!${eval1.nullTerm} && ${eval1.primitiveTerm} == false) { +val eval1 = expressionEvaluator(e1, ctx) +val eval2 = expressionEvaluator(e2, ctx) +// TODO(davies): This is different than And.eval() +s""" + ${eval1.code} + boolean $nullTerm = false; + boolean $primitiveTerm = false; + + if (!${eval1.nullTerm} && !${eval1.primitiveTerm}) { } else { -..${eval2.code} -if (!${eval2.nullTerm} && ${eval2.primitiveTerm} == false) { +${eval2.code} +if (!${eval2.nullTerm} && !${eval2.primitiveTerm}) { } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) { - $primitiveTerm = true + $primitiveTerm = true; } else { - $nullTerm = true + $nullTerm = true; } } - """.children + """ case Or(e1, e2) => -val eval1 = expressionEvaluator(e1) -val eval2 = expressionEvaluator(e2) +val eval1 = expressionEvaluator(e1, ctx) +val eval2 = expressionEvaluator(e2, ctx) -q""" - ..${eval1.code} - var $nullTerm = false - var $primitiveTerm: ${termForType(BooleanType)} = false +s""" + ${eval1.code} + boolean $nullTerm = false; + boolean $primitiveTerm = false; if (!${eval1.nullTerm} && ${eval1.primitiveTerm}) { -$primitiveTerm = true +$primitiveTerm = true; } else { -..${eval2.code} +${eval2.code} if (!${eval2.nullTerm} && ${eval2.primitiveTerm}) { - $primitiveTerm = true + $primitiveTerm = true; } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) { - $primitiveTerm = false + $primitiveTerm = false; } else { - $nullTerm = true + $nullTerm = true; } } - """.children + """ case Not(child) => // Uh, bad function name... -child.castOrNull(c => q"!$c", BooleanType) - - case Add(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 + $eval2" } - case Subtract(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 - $eval2" } - case Multiply(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 * $eval2" } +child.castOrNull(c => s"!$c", BooleanType) + + case Add(e1 @ DecimalType(), e2 @ DecimalType()) => --- End diff -- @JoshRosen this is actually not correct if you use `e1: DecimalType`, because we are not matching against `DecimalType` here, but rather against expressions whose output is decimaltype. --- If your project is set up for it, yo
[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-108755212 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-108755193 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7937][SQL] Support comparison on Struct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6519#issuecomment-108754525 [Test build #34157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34157/consoleFull) for PR 6519 at commit [`2071693`](https://github.com/apache/spark/commit/20716936db429f8fc793b65ff70fd841c8c6d428). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108754378 Nah, not a correctness issue. Just some general cleanup; happy to tackle it myself in a followup. By the way, Catalyst seems to compile fine when I remove both the Scala reflection and compiler JARs. We can remove those as part of the followup, though; not a blocker for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/6616#discussion_r31696920 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -395,22 +395,35 @@ class DataFrame private[sql]( * @since 1.4.0 */ def join(right: DataFrame, usingColumn: String): DataFrame = { +join(right, Seq(usingColumn)) + } + + def join(right: DataFrame, usingColumns: Seq[String]): DataFrame = { // Analyze the self join. The assumption is that the analyzer will disambiguate left vs right // by creating a new instance for one of the branch. val joined = sqlContext.executePlan( Join(logicalPlan, right.logicalPlan, joinType = Inner, None)).analyzed.asInstanceOf[Join] -// Project only one of the join column. -val joinedCol = joined.right.resolve(usingColumn) +// Project only one of the join columns. +val joinedCols = usingColumns.map(col => joined.right.resolve(col)) +val condition = usingColumns.map { col => + catalyst.expressions.EqualTo(joined.left.resolve(col), joined.right.resolve(col)) +}.foldLeft[Option[catalyst.expressions.BinaryExpression]](None) { (opt, eqTo) => --- End diff -- this can be simplifed into a reduceOption right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7937][SQL] Support comparison on Struct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6519#issuecomment-108753616 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7937][SQL] Support comparison on Struct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6519#issuecomment-108753655 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/6616#discussion_r31696855 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -395,22 +395,35 @@ class DataFrame private[sql]( * @since 1.4.0 */ def join(right: DataFrame, usingColumn: String): DataFrame = { +join(right, Seq(usingColumn)) + } + + def join(right: DataFrame, usingColumns: Seq[String]): DataFrame = { --- End diff -- add javadoc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/6616#discussion_r31696822 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -448,6 +461,10 @@ class DataFrame private[sql]( * @since 1.3.0 */ def join(right: DataFrame, joinExprs: Column, joinType: String): DataFrame = { +join(right, Seq(joinExprs), joinType) + } + + def join(right: DataFrame, joinExprs: Seq[Column], joinType: String): DataFrame = { --- End diff -- i think we should remove this one for scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...
Github user nchammas closed the pull request at: https://github.com/apache/spark/pull/3564 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108752930 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108752895 [Test build #34153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34153/consoleFull) for PR 6479 at commit [`262d848`](https://github.com/apache/spark/commit/262d84839a0876c07ffe57031fb505e664abfe66). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class UnsafeRow extends BaseMutableRow ` * `public abstract class BaseMutableRow extends BaseRow implements MutableRow ` * `public abstract class BaseRow implements Row ` * ` protected class CodeGenContext ` * `abstract class BaseMutableProjection extends MutableProjection ` * ` class SpecificProjection extends $` * `class BaseOrdering extends Ordering[Row] ` * ` class SpecificOrdering extends $` * `abstract class Predicate ` * ` class SpecificPredicate extends $` * `abstract class BaseProject extends Projection ` * `class SpecificProjection extends $` * `final class SpecificRow extends $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7436: Fixed instantiation of custom reco...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5976 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2387][Core]Remove Stage's barrier
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3430 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6625#issuecomment-108751844 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6625#issuecomment-108751839 [Test build #34154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34154/consoleFull) for PR 6625 at commit [`ae212c8`](https://github.com/apache/spark/commit/ae212c89ce9feba731c3dd60b3d4332addee2a0e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4576 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2495 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108751601 Does it impact correctness? If it doesn't I think it's fine to do it in followup prs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108751305 Actually... if you agree that we only need the synchronization for `ClassBodyEvaluator`, how about removing `globalLock` and just synchronizing on the `ClassBodyEvaluator` instance instead? I think that this is a little clearer / more explicit. I think that we only need to keep the `globalLock` in place if it's implicitly guarding Scala reflection calls. We don't need to do this cleanup here necessarily, but we might as well if it's easy to do. The nitpicker smaller cleanups, such as type aliases for `String`, etc. should definitely be deferred, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2854#issuecomment-108751032 @tsliwowicz can you please close this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3172] [SPARK-3577] improve shuffle spil...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2504#issuecomment-108750834 This has gone stale so I'd like to close this issue pending further discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3172] [SPARK-3577] improve shuffle spil...
Github user sryza closed the pull request at: https://github.com/apache/spark/pull/2504 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-108750767 I'd like to close this issue for now pending further development. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4576#issuecomment-108750698 I am cleaning up old PR's and would propose to close this issue pending further discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108750436 @davies, regarding the global lock, I think that we need it strictly because of `ClassBodyEvaluator`'s lack of thread-safety, not to prevent duplicate loading for expressions. According to Guava's [CacheBuilder documentation](http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html#build(com.google.common.cache.CacheLoader)) for `CacheBuilder.build(loader)`: > Builds a cache, which either returns an already-loaded value for a given key or atomically computes or retrieves it using the supplied CacheLoader. If another thread is currently loading the value for this key, simply waits for that thread to finish and returns its loaded value. Note that multiple threads can concurrently load values for distinct keys. Spark's HistoryServer makes use of this and I've used this pattern in some other code as well. Not a correctness / performance issue for this PR, but just wanted to point this out since it's a neat thread-safety trick that Guava gives us. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-108750534 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-108750520 [Test build #34156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34156/consoleFull) for PR 6499 at commit [`e12c590`](https://github.com/apache/spark/commit/e12c590801d5eed7666b33d7ac2b8543e2d340e2). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class StreamingKMeansModel(KMeansModel):` * `class StreamingKMeans(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4222#issuecomment-108749998 @jacek-lewandowski can you close this issue? It didnt' close properly because of the way github auto-closes patches into release branches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-108749911 For some reason this didn't close. @sryza can you close this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2495#issuecomment-108749807 I'd like to close this issue pending further updates. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2387][Core]Remove Stage's barrier
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3430#issuecomment-108749762 I'd propose to close this issue. It's a fairly large change and needs more discussion before being seriously considered to be part of spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-108749526 Hey @brennonyork - sorry it took so long to review this. I can iterate on this much more closely over the next few days. I took a close look at this. This is a faithful porting of the bash implementation, but I think we should leverage Python to make things a bit nicer than they could be in bash. As part of re-writing this in python I think we should so some much needed simplification of the construction of all the various permutations of test configurations. I sketched an outline as to how it can work, let me know if you have any questions. Without those changes I think this will remain really hard for someone else to understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108749295 As long as we're removing quasiquotes, we might see if we can delete their dependencies from the `pom` as well: I think we might be able to remove the following (should confirm, though): ``` scala-2.10 !scala-2.11 org.scalamacros quasiquotes_${scala.binary.version} ${scala.macros.version} ``` and ``` org.scala-lang scala-compiler ``` We might have to keep ``` org.scala-lang scala-reflect ``` if we actually depend on Scala reflection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31696016 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path or
[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/6616#discussion_r31695804 --- Diff: python/pyspark/sql/dataframe.py --- @@ -509,30 +509,42 @@ def join(self, other, joinExprs=None, joinType=None): The following performs a full outer join between ``df1`` and ``df2``. :param other: Right side of the join -:param joinExprs: a string for join column name, or a join expression (Column). -If joinExprs is a string indicating the name of the join column, -the column must exist on both sides, and this performs an inner equi-join. +:param joinExprs: a string for join column name, a list of column names, --- End diff -- ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-108743088 Can you update the Python lint checks to run on this file as well? I noticed that it fails `pep8` checks: ```python pep8 dev/run-tests.py dev/run-tests.py:34:62: W291 trailing whitespace dev/run-tests.py:41:74: W291 trailing whitespace dev/run-tests.py:43:1: W293 blank line contains whitespace dev/run-tests.py:45:53: W291 trailing whitespace dev/run-tests.py:93:20: E203 whitespace before ':' dev/run-tests.py:94:20: E203 whitespace before ':' dev/run-tests.py:95:20: E203 whitespace before ':' dev/run-tests.py:96:20: E203 whitespace before ':' dev/run-tests.py:103:62: W291 trailing whitespace dev/run-tests.py:104:22: W503 line break before binary operator dev/run-tests.py:108:22: W503 line break before binary operator dev/run-tests.py:151:65: W291 trailing whitespace dev/run-tests.py:153:48: E261 at least two spaces before inline comment dev/run-tests.py:154:57: E261 at least two spaces before inline comment dev/run-tests.py:155:45: E261 at least two spaces before inline comment dev/run-tests.py:157:44: W291 trailing whitespace dev/run-tests.py:197:45: W291 trailing whitespace dev/run-tests.py:198:50: W291 trailing whitespace dev/run-tests.py:199:59: W291 trailing whitespace dev/run-tests.py:268:24: W291 trailing whitespace dev/run-tests.py:283:60: W291 trailing whitespace dev/run-tests.py:288:33: W291 trailing whitespace dev/run-tests.py:289:43: W291 trailing whitespace dev/run-tests.py:296:57: W291 trailing whitespace dev/run-tests.py:317:40: W291 trailing whitespace dev/run-tests.py:320:79: W291 trailing whitespace dev/run-tests.py:332:44: W291 trailing whitespace dev/run-tests.py:342:44: W291 trailing whitespace dev/run-tests.py:353:1: W293 blank line contains whitespace dev/run-tests.py:392:79: W291 trailing whitespace dev/run-tests.py:404:1: W293 blank line contains whitespace ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695415 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695382 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno --- End diff -- Also, `inspect` needs to be imported` if you intend to call this function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695346 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) --- End diff -- This looks like a typo; `ajbp` is an undefined variable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695334 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695312 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695270 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695181 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7990][SQL] Add methods to facilitate eq...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/6616#discussion_r31695152 --- Diff: python/pyspark/sql/dataframe.py --- @@ -509,30 +509,42 @@ def join(self, other, joinExprs=None, joinType=None): The following performs a full outer join between ``df1`` and ``df2``. :param other: Right side of the join -:param joinExprs: a string for join column name, or a join expression (Column). -If joinExprs is a string indicating the name of the join column, -the column must exist on both sides, and this performs an inner equi-join. +:param joinExprs: a string for join column name, a list of column names, --- End diff -- let's change the argument name in python as explained in the jira ticket --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695142 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + --- End diff -- Well, it probably does get called a bunch of times, so compiling it may help, but doing the compilation, say, twice instead of once isn't a huge cost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695111 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + --- End diff -- This is only called in one place and it's not a performance-critical regex, so I think it would be cleaner to just move this into the function where it's used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8010][SQL]Promote numeric types to stri...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/6551#discussion_r31695131 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -69,6 +69,11 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter { sql("DROP TEMPORARY FUNCTION udtf_count2") } + test("SPARK-8010: implicit promote to string type") { +sql("select case when true then '1' else 1 end from src ") +sql("select coalesce(null, 1, '1') from src ") --- End diff -- And probably it's better to move the test into https://github.com/OopsOutOfMemory/spark/blob/pnts/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695085 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): --- End diff -- This doesn't seem to be used anywhere? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8010][SQL]Promote numeric types to stri...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/6551#discussion_r31695034 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -69,6 +69,11 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter { sql("DROP TEMPORARY FUNCTION udtf_count2") } + test("SPARK-8010: implicit promote to string type") { +sql("select case when true then '1' else 1 end from src ") +sql("select coalesce(null, 1, '1') from src ") --- End diff -- Can you compare the result with Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31695009 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694976 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694893 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694857 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694783 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path or
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694780 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path o
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694637 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path or
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694598 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") --- End diff -- For consistently it might be nice to call this `SPARK_HOME` (our standard name for the Spark dir). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694485 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path or
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694123 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path or
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r31694092 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python2 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +import re +import sys +import shutil +import subprocess +from collections import namedtuple + +SPARK_PROJ_ROOT = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") +USER_HOME_DIR = os.environ.get("HOME") + +SBT_MAVEN_PROFILE_ARGS_ENV = "SBT_MAVEN_PROFILES_ARGS" +AMPLAB_JENKINS_BUILD_TOOL = os.environ.get("AMPLAB_JENKINS_BUILD_TOOL") +AMPLAB_JENKINS = os.environ.get("AMPLAB_JENKINS") + +SBT_OUTPUT_FILTER = re.compile("^.*[info].*Resolving" + "|" + + "^.*[warn].*Merging" + "|" + + "^.*[info].*Including") + + +def get_error_codes(err_code_file): +"""Function to retrieve all block numbers from the `run-tests-codes.sh` +file to maintain backwards compatibility with the `run-tests-jenkins` +script""" + +with open(err_code_file, 'r') as f: +err_codes = [e.split()[1].strip().split('=') + for e in f if e.startswith("readonly")] +return dict(err_codes) + + +def exit_from_command_with_retcode(cmd, retcode): +print "[error] running", cmd, "; received return code", retcode +sys.exit(int(os.environ.get("CURRENT_BLOCK", 255))) + + +def rm_r(path): +"""Given an arbitrary path properly remove it with the correct python +construct if it exists +- from: http://stackoverflow.com/a/9559881"""; + +if os.path.isdir(path): +shutil.rmtree(path) +elif os.path.exists(path): +os.remove(path) + + +def lineno(): +"""Returns the current line number in our program +- from: http://stackoverflow.com/a/3056059"""; + +return inspect.currentframe().f_back.f_lineno + + +def run_cmd(cmd): +"""Given a command as a list of arguments will attempt to execute the +command and, on failure, print an error message""" + +if not isinstance(cmd, list): +cmd = cmd.split() +try: +subprocess.check_call(cmd) +except subprocess.CalledProcessError as e: +exit_from_command_with_retcode(e.cmd, e.returncode) + + +def set_sbt_maven_profile_args(): +"""Properly sets the SBT environment variable arguments with additional +checks to determine if this is running on an Amplab Jenkins machine""" + +# base environment values for SBT_MAVEN_PROFILE_ARGS_ENV which will be appended on +sbt_maven_profile_args_base = ["-Pkinesis-asl"] + +sbt_maven_profile_arg_dict = { +"hadoop1.0" : ["-Phadoop-1", "-Dhadoop.version=1.0.4"], +"hadoop2.0" : ["-Phadoop-1", "-Dhadoop.version=2.0.0-mr1-cdh4.1.1"], +"hadoop2.2" : ["-Pyarn", "-Phadoop-2.2"], +"hadoop2.3" : ["-Pyarn", "-Phadoop-2.3", "-Dhadoop.version=2.3.0"], +} + +# set the SBT maven build profile argument environment variable and ensure +# we build against the right version of Hadoop +if os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE"): +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get(ajbp, []) + + sbt_maven_profile_args_base) +else: +os.environ[SBT_MAVEN_PROFILE_ARGS_ENV] = \ +" ".join(sbt_maven_profile_arg_dict.get("hadoop2.3", []) + + sbt_maven_profile_args_base) + + +def is_exe(path): +"""Check if a given path is an executable file +- from: http://stackoverflow.com/a/377028"""; + +return os.path.isfile(path) and os.access(path, os.X_OK) + + +def which(program): +"""Find and return the given program by its absolute path or
[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/6430#issuecomment-108731637 I just tried in my local pseudo-cluster for functionality, haven't tried in real cluster for performance, I will test it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6430#issuecomment-108730438 lgtm. @jerryshao have you had a chance to test this on a real cluster? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7558] Demarcate tests in unit-tests.log...
Github user andrewor14 closed the pull request at: https://github.com/apache/spark/pull/6598 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6430#issuecomment-108728952 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6430#issuecomment-108728901 [Test build #34151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34151/consoleFull) for PR 6430 at commit [`02cac8e`](https://github.com/apache/spark/commit/02cac8e8ff5c0d91a2cc905f9412942d74c751b6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Testing only] [Do not merge] [1.4]
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6626#issuecomment-108725031 [Test build #34149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34149/consoleFull) for PR 6626 at commit [`4fe0b14`](https://github.com/apache/spark/commit/4fe0b14d506740efffc6b2a11b552ec0d26ae6f2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Testing only] [Do not merge] [1.4]
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6626#issuecomment-108725087 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6587#issuecomment-108721462 [Test build #34155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34155/consoleFull) for PR 6587 at commit [`f1ddbf1`](https://github.com/apache/spark/commit/f1ddbf1e4bff3710596ddd7ba45ef2e42695628a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6625#issuecomment-108721394 [Test build #34154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34154/consoleFull) for PR 6625 at commit [`ae212c8`](https://github.com/apache/spark/commit/ae212c89ce9feba731c3dd60b3d4332addee2a0e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-108721147 [Test build #34156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34156/consoleFull) for PR 6499 at commit [`e12c590`](https://github.com/apache/spark/commit/e12c590801d5eed7666b33d7ac2b8543e2d340e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6625#issuecomment-108720948 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6625#issuecomment-108720966 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-108720982 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4118] [MLlib] [PySpark] Python bindings...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6499#issuecomment-108720956 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6587#issuecomment-108720952 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6587#issuecomment-108720983 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7180][SPARK-8090][SPARK-8091] Fix a num...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6625#discussion_r31693024 --- Diff: core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala --- @@ -125,6 +125,13 @@ private[spark] object SerializationDebugger extends Logging { return List.empty } +/** + * Visit an externalizable object. + * Since writeExternal() can choose add arbitrary objects at the time of serialization, + * the only way to capture all the objects it will serialize is by using a + * dummy ObjectOutput object that captures all the inner objects, and then visit all the --- End diff -- Okay, I can shorten it for this, as well as for the new visit function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6583][SQL] Support aggregated function ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/5290#discussion_r31692885 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -364,11 +364,24 @@ class Analyzer( val (resolvedOrdering, missing) = resolveAndFindMissing(ordering, a, groupingRelation) -if (missing.nonEmpty) { +val addForAlias = new ArrayBuffer[NamedExpression]() +val aliasedOrdering = resolvedOrdering.zipWithIndex.map { + case (o, i) => { +o transform { + case aggOrSub @ (_: AggregateExpression | _: Substring) => --- End diff -- It's another bug and not only affect `Substring`... I tried `SELECT b + 1, count(*) FROM orderByData GROUP BY b + 1 ORDER BY b + 1` and also failed. The key problem is we build the temp grouping relation by `grouping.collect { case ne: NamedExpression => ne.toAttribute }` which filter out all un-named expressions like `Add`, `Substring`, etc. I will work on it. cc @marmbrus @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108717176 [Test build #34153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34153/consoleFull) for PR 6479 at commit [`262d848`](https://github.com/apache/spark/commit/262d84839a0876c07ffe57031fb505e664abfe66). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6587#issuecomment-108717174 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7269] [SQL] [WIP] Refactor the class At...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6587#issuecomment-108717155 [Test build #34150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34150/consoleFull) for PR 6587 at commit [`35f1892`](https://github.com/apache/spark/commit/35f1892bd1e60eb7b45f1737b08664d71e4c1f38). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ElementwiseProduct(val scalingVec: Vector) extends VectorTransformer ` * `trait TypeCheckResult ` * ` case class TypeCheckFailure(message: String) extends TypeCheckResult ` * `abstract class UnaryArithmetic extends UnaryExpression ` * `case class UnaryMinus(child: Expression) extends UnaryArithmetic ` * `case class Sqrt(child: Expression) extends UnaryArithmetic ` * `case class Abs(child: Expression) extends UnaryArithmetic ` * `case class BitwiseNot(child: Expression) extends UnaryArithmetic ` * `case class MaxOf(left: Expression, right: Expression) extends BinaryArithmetic ` * `case class MinOf(left: Expression, right: Expression) extends BinaryArithmetic ` * `case class Atan2(left: Expression, right: Expression)` * `case class Hypot(left: Expression, right: Expression)` * `case class EqualTo(left: Expression, right: Expression) extends BinaryComparison ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108716127 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108716085 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6479#discussion_r31692338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -313,295 +318,309 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin } val $nullTerm = false val $primitiveTerm = $funcName -""".children +""" */ case GreaterThan(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 > $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 > $eval2" } case GreaterThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 >= $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 >= $eval2" } case LessThan(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 < $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 < $eval2" } case LessThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 <= $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 <= $eval2" } case And(e1, e2) => -val eval1 = expressionEvaluator(e1) -val eval2 = expressionEvaluator(e2) - -q""" - ..${eval1.code} - var $nullTerm = false - var $primitiveTerm: ${termForType(BooleanType)} = false - - if (!${eval1.nullTerm} && ${eval1.primitiveTerm} == false) { +val eval1 = expressionEvaluator(e1, ctx) +val eval2 = expressionEvaluator(e2, ctx) +// TODO(davies): This is different than And.eval() +s""" + ${eval1.code} + boolean $nullTerm = false; + boolean $primitiveTerm = false; + + if (!${eval1.nullTerm} && !${eval1.primitiveTerm}) { } else { -..${eval2.code} -if (!${eval2.nullTerm} && ${eval2.primitiveTerm} == false) { +${eval2.code} +if (!${eval2.nullTerm} && !${eval2.primitiveTerm}) { } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) { - $primitiveTerm = true + $primitiveTerm = true; } else { - $nullTerm = true + $nullTerm = true; } } - """.children + """ case Or(e1, e2) => -val eval1 = expressionEvaluator(e1) -val eval2 = expressionEvaluator(e2) +val eval1 = expressionEvaluator(e1, ctx) +val eval2 = expressionEvaluator(e2, ctx) -q""" - ..${eval1.code} - var $nullTerm = false - var $primitiveTerm: ${termForType(BooleanType)} = false +s""" + ${eval1.code} + boolean $nullTerm = false; + boolean $primitiveTerm = false; if (!${eval1.nullTerm} && ${eval1.primitiveTerm}) { -$primitiveTerm = true +$primitiveTerm = true; } else { -..${eval2.code} +${eval2.code} if (!${eval2.nullTerm} && ${eval2.primitiveTerm}) { - $primitiveTerm = true + $primitiveTerm = true; } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) { - $primitiveTerm = false + $primitiveTerm = false; } else { - $nullTerm = true + $nullTerm = true; } } - """.children + """ case Not(child) => // Uh, bad function name... -child.castOrNull(c => q"!$c", BooleanType) - - case Add(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 + $eval2" } - case Subtract(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 - $eval2" } - case Multiply(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 * $eval2" } +child.castOrNull(c => s"!$c", BooleanType) + + case Add(e1 @ DecimalType(), e2 @ DecimalType()) => --- End diff -- DecimalType.unapply() will catch all the DecimalType objects. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled a
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6479#discussion_r31692314 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -37,7 +37,6 @@ import com.google.common.hash.Hashing * It uses quadratic probing with a power-of-2 hash table size, which is guaranteed * to explore all spaces for each key (see http://en.wikipedia.org/wiki/Quadratic_probing). */ -private[spark] --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6479#discussion_r31692322 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -163,133 +190,111 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin * * @param f a function from two primitive term names to a tree that evaluates them. */ - def evaluate(f: (TermName, TermName) => Tree): Seq[Tree] = + def evaluate(f: (String, String) => String): String = --- End diff -- Could be done later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108714776 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108714769 [Test build #34152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34152/consoleFull) for PR 6479 at commit [`eec3a33`](https://github.com/apache/spark/commit/eec3a33434342bfb342fc541f97d879fbd80d974). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class UnsafeRow extends BaseMutableRow ` * `public abstract class BaseMutableRow extends BaseRow implements MutableRow ` * `public abstract class BaseRow implements Row ` * ` protected class CodeGenContext ` * `abstract class BaseMutableProjection extends MutableProjection ` * ` class SpecificProjection extends $` * `class BaseOrdering extends Ordering[Row] ` * ` class SpecificOrdering extends $` * `abstract class Predicate ` * ` class SpecificPredicate extends $` * `abstract class BaseProject extends Projection ` * `class SpecificProjection extends $` * `final class SpecificRow extends $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6479#discussion_r31692302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateProjection.scala --- @@ -38,201 +42,191 @@ object GenerateProjection extends CodeGenerator[Seq[Expression], Projection] { // Make Mutablility optional... protected def create(expressions: Seq[Expression]): Projection = { -val tupleLength = ru.Literal(Constant(expressions.length)) -val lengthDef = q"final val length = $tupleLength" - /* TODO: Configurable... val nullFunctions = - q""" + s""" private final val nullSet = new org.apache.spark.util.collection.BitSet(length) final def setNullAt(i: Int) = nullSet.set(i) final def isNullAt(i: Int) = nullSet.get(i) """ */ -val nullFunctions = - q""" -private[this] var nullBits = new Array[Boolean](${expressions.size}) -override def setNullAt(i: Int) = { nullBits(i) = true } -override def isNullAt(i: Int) = nullBits(i) - """.children - -val tupleElements = expressions.zipWithIndex.flatMap { +val ctx = newCodeGenContext() +val columns = expressions.zipWithIndex.map { case (e, i) => -val elementName = newTermName(s"c$i") -val evaluatedExpression = expressionEvaluator(e) -val iLit = ru.Literal(Constant(i)) +s"private ${primitiveForType(e.dataType)} c$i = ${defaultPrimitive(e.dataType)};\n" +}.mkString("\n ") -q""" -var ${newTermName(s"c$i")}: ${termForType(e.dataType)} = _ +val initColumns = expressions.zipWithIndex.map { + case (e, i) => +val eval = expressionEvaluator(e, ctx) +s""" { - ..${evaluatedExpression.code} - if(${evaluatedExpression.nullTerm}) -setNullAt($iLit) - else { -nullBits($iLit) = false -$elementName = ${evaluatedExpression.primitiveTerm} + // column$i + ${eval.code} + nullBits[$i] = ${eval.nullTerm}; + if(!${eval.nullTerm}) { +c$i = ${eval.primitiveTerm}; } } -""".children : Seq[Tree] -} +""" +}.mkString("\n") -val accessorFailure = q"""scala.sys.error("Invalid ordinal:" + i)""" -val applyFunction = { - val cases = (0 until expressions.size).map { i => -val ordinal = ru.Literal(Constant(i)) -val elementName = newTermName(s"c$i") -val iLit = ru.Literal(Constant(i)) +val getCases = (0 until expressions.size).map { i => + s"case $i: return c$i;" +}.mkString("\n") -q"if(i == $ordinal) { if(isNullAt($i)) return null else return $elementName }" - } - q"override def apply(i: Int): Any = { ..$cases; $accessorFailure }" -} - -val updateFunction = { - val cases = expressions.zipWithIndex.map {case (e, i) => -val ordinal = ru.Literal(Constant(i)) -val elementName = newTermName(s"c$i") -val iLit = ru.Literal(Constant(i)) - -q""" - if(i == $ordinal) { -if(value == null) { - setNullAt(i) -} else { - nullBits(i) = false - $elementName = value.asInstanceOf[${termForType(e.dataType)}] -} -return - }""" - } - q"override def update(i: Int, value: Any): Unit = { ..$cases; $accessorFailure }" -} +val updateCases = expressions.zipWithIndex.map { case (e, i) => + s"case $i: { c$i = (${termForType(e.dataType)})value; return;}" +}.mkString("\n") val specificAccessorFunctions = nativeTypes.map { dataType => - val ifStatements = expressions.zipWithIndex.flatMap { -// getString() is not used by expressions -case (e, i) if e.dataType == dataType && dataType != StringType => - val elementName = newTermName(s"c$i") - // TODO: The string of ifs gets pretty inefficient as the row grows in size. - // TODO: Optional null checks? - q"if(i == $i) return $elementName" :: Nil -case _ => Nil - } - dataType match { -// Row() need this interface to compile -case StringType => - q""" - override def getString(i: Int): String = { -$accessorFailure - }""" -case other => - q""" - override def ${accessorFor
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108714133 [Test build #34152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34152/consoleFull) for PR 6479 at commit [`eec3a33`](https://github.com/apache/spark/commit/eec3a33434342bfb342fc541f97d879fbd80d974). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7558] Demarcate tests in unit-tests.log...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6598#issuecomment-108714273 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7558] Demarcate tests in unit-tests.log...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6598#issuecomment-108714237 [Test build #34147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34147/consoleFull) for PR 6598 at commit [`4c3c566`](https://github.com/apache/spark/commit/4c3c566b155e01149ff1e8c9fd55c1ae78602954). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108714012 @JoshRosen I think it's fine to keep the global lock, because we want to only compile once for a given expression. Also the `ClassBodyEvaluator` is not thread safe, or we need to create new one for each compile unit. IntegerHashSet is specified version for codegen, it will have better performance than OpenHashSet[Row]. There is a follow up PR (based on this one) to push the codegen into Expression, lots of code will be moved, so some minor issues could be addressed in that one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108713625 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6479#issuecomment-108713614 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7956] [SQL] Use Janino to compile SQL e...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6479#discussion_r31691732 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -313,295 +318,309 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin } val $nullTerm = false val $primitiveTerm = $funcName -""".children +""" */ case GreaterThan(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 > $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 > $eval2" } case GreaterThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 >= $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 >= $eval2" } case LessThan(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 < $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 < $eval2" } case LessThanOrEqual(e1 @ NumericType(), e2 @ NumericType()) => -(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => q"$eval1 <= $eval2" } +(e1, e2).evaluateAs (BooleanType) { case (eval1, eval2) => s"$eval1 <= $eval2" } case And(e1, e2) => -val eval1 = expressionEvaluator(e1) -val eval2 = expressionEvaluator(e2) - -q""" - ..${eval1.code} - var $nullTerm = false - var $primitiveTerm: ${termForType(BooleanType)} = false - - if (!${eval1.nullTerm} && ${eval1.primitiveTerm} == false) { +val eval1 = expressionEvaluator(e1, ctx) +val eval2 = expressionEvaluator(e2, ctx) +// TODO(davies): This is different than And.eval() +s""" + ${eval1.code} + boolean $nullTerm = false; + boolean $primitiveTerm = false; + + if (!${eval1.nullTerm} && !${eval1.primitiveTerm}) { } else { -..${eval2.code} -if (!${eval2.nullTerm} && ${eval2.primitiveTerm} == false) { +${eval2.code} +if (!${eval2.nullTerm} && !${eval2.primitiveTerm}) { } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) { - $primitiveTerm = true + $primitiveTerm = true; } else { - $nullTerm = true + $nullTerm = true; } } - """.children + """ case Or(e1, e2) => -val eval1 = expressionEvaluator(e1) -val eval2 = expressionEvaluator(e2) +val eval1 = expressionEvaluator(e1, ctx) +val eval2 = expressionEvaluator(e2, ctx) -q""" - ..${eval1.code} - var $nullTerm = false - var $primitiveTerm: ${termForType(BooleanType)} = false +s""" + ${eval1.code} + boolean $nullTerm = false; + boolean $primitiveTerm = false; if (!${eval1.nullTerm} && ${eval1.primitiveTerm}) { -$primitiveTerm = true +$primitiveTerm = true; } else { -..${eval2.code} +${eval2.code} if (!${eval2.nullTerm} && ${eval2.primitiveTerm}) { - $primitiveTerm = true + $primitiveTerm = true; } else if (!${eval1.nullTerm} && !${eval2.nullTerm}) { - $primitiveTerm = false + $primitiveTerm = false; } else { - $nullTerm = true + $nullTerm = true; } } - """.children + """ case Not(child) => // Uh, bad function name... -child.castOrNull(c => q"!$c", BooleanType) - - case Add(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 + $eval2" } - case Subtract(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 - $eval2" } - case Multiply(e1, e2) => (e1, e2) evaluate { case (eval1, eval2) => q"$eval1 * $eval2" } +child.castOrNull(c => s"!$c", BooleanType) + + case Add(e1 @ DecimalType(), e2 @ DecimalType()) => +(e1, e2) evaluate { case (eval1, eval2) => s"$eval1.$$plus($eval2)" } + case Subtract(e1 @ DecimalType(), e2 @ DecimalType()) => +(e1, e2) evaluate { case (eval1, eval2) => s"$eval1.$$minus($eval2)" } + case Mult
[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-108709503 Another ping on this, even if it misses 1.4 Seeing waitUntilLeaderOffset all over the place in test code I'm working on right now made me sad :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7889] [UI] make sure click the "App ID"...
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/6545#issuecomment-108708777 I think the reason is not just because of the appCache. After my debug the code, I found there are two mainly reasons: 1. First time send *"/history/appid"* request, the *handler "/history/*"* will deal with it. During this, the handler adds the handlers of sparkui into historyserver, contains the *handler "/history/appid"*. So when the second time send *"/history/appid"* request, the *handler "/history/appid"* will deal with it instead of *handler "/history/*"*. so the second time ui is the same with the first one; 2. In the *handler "/history/*"*, the code use `appCache.get()` to cache app ui. I think we should use `appCache.refresh()` instead to make it can refresh. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org