[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16264 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70069/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16264 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16264 **[Test build #70069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70069/consoleFull)** for PR 16264 at commit [`fb1933d`](https://github.com/apache/spark/commit/fb1933d22bfd8294b38c7f0e712c0f753ce56ba6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16264 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70070/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16264 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16264 **[Test build #70070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70070/consoleFull)** for PR 16264 at commit [`be9c846`](https://github.com/apache/spark/commit/be9c8466209a0d656ce9b7bb08ee94014391ea0d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16119: [SPARK-18687][Pyspark][SQL]Backward compatibility - crea...
Github user vijoshi commented on the issue: https://github.com/apache/spark/pull/16119 @holdenk reminder also pinging @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16161: [SPARK-18717][SQL] Make code generation for Scala...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16161 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16161: [SPARK-18717][SQL] Make code generation for Scala Map wo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16161 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16264 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70068/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16264 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16264 **[Test build #70068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70068/consoleFull)** for PR 16264 at commit [`292745b`](https://github.com/apache/spark/commit/292745b5ea8fc7815db0bcc1be995cae0a33b0ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomF...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16264#discussion_r92112001 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -539,7 +539,7 @@ In the following example, we use the `longley` dataset to train a random forest ```{r} df <- createDataFrame(longley) -rfModel <- spark.randomForest(df, Employed ~ ., type = "regression", numTrees = 5) +rfModel <- spark.randomForest(df, Employed ~ ., type = "regression", maxDepth = 2, numTrees = 2) --- End diff -- you could also do this to limit output ```{r, include=FALSE} ops <- options() options(max.print=40) ``` there is another example in the vignettes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16264 and if we are doing "(Added in Spark 2.1.0)" in https://github.com/apache/spark/pull/16222/files perhaps we should have the same for these 2? actually, I'd vote for removing them - vignettes is for that specific version of package you install. If you install SparkR 2.1.0 what is described there is in 2.1.0, by default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70071/consoleFull)** for PR 16263 at commit [`5ba5d11`](https://github.com/apache/spark/commit/5ba5d1163802a5250510491f97aa9592c785596f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16259: [Minor][SparkR]:fix kstest example error and add unit te...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16259 LGTM. Let's trigger appveyor again like Hyukjin says --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16259: [Minor][SparkR]:fix kstest example error and add ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16259#discussion_r92111051 --- Diff: R/pkg/R/mllib.R --- @@ -1595,14 +1595,14 @@ setMethod("write.ml", signature(object = "ALSModel", path = "character"), #' \dontrun{ #' data <- data.frame(test = c(0.1, 0.15, 0.2, 0.3, 0.25)) #' df <- createDataFrame(data) -#' test <- spark.ktest(df, "test", "norm", c(0, 1)) +#' test <- spark.kstest(df, "test", "norm", c(0, 1)) --- End diff -- yap, I saw this recently too but forgot to follow up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16264 LGTM. maybe good to have one example for classification but optional --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16247 Possibly, SPARK-16027 was just a hack, the root issue remains I think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15819 The current fix does not resolve the issue when users hitting abnormal termination of JVM. In addition, if the JVM does not stop, these temporary files could consume a lot of spaces. Thus, I think https://github.com/apache/spark/pull/16134 needs to be added too. This is just my opinion. Also need to get the feedbacks from the other Committers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16259: [Minor][SparkR]:fix kstest example error and add unit te...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16259 (@wangmiao1981 you could close and re-open this to re-trigger the build as we already talk before if you or any cares). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16264 **[Test build #70070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70070/consoleFull)** for PR 16264 at commit [`be9c846`](https://github.com/apache/spark/commit/be9c8466209a0d656ce9b7bb08ee94014391ea0d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [SPARK-18794] [R] add spark.randomForest/s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16264 **[Test build #70069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70069/consoleFull)** for PR 16264 at commit [`fb1933d`](https://github.com/apache/spark/commit/fb1933d22bfd8294b38c7f0e712c0f753ce56ba6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16262: [SPARK-17932][SQL][FOLLOWUP] Change statement `SHOW TABL...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16262 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16264: [SPARK-18793] [R] add spark.randomForest to vignettes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16264 **[Test build #70068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70068/consoleFull)** for PR 16264 at commit [`292745b`](https://github.com/apache/spark/commit/292745b5ea8fc7815db0bcc1be995cae0a33b0ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r92108007 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +58,93 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") +val array = ctx.freshName("array") -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val isPrimitiveArray = ctx.isPrimitiveType(et) +val primitiveTypeName = if (isPrimitiveArray) ctx.primitiveTypeName(et) else "" +val (preprocess, arrayData, arrayWriter) = + GenArrayData.getCodeArrayData(ctx, et, children.size, isPrimitiveArray, array) + +ev.copy(code = + preprocess + ctx.splitExpressions( ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; +evals.zipWithIndex.map { case (eval, i) => + eval.code + +(if (isPrimitiveArray) { + (if (!children(i).nullable) { +s"\n$arrayWriter.write($i, ${eval.value});" + } else { +s""" +if (${eval.isNull}) { --- End diff -- I think branch prediction can work very well for this case, we don't need to manually optimize the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16264: SPARK-18792] [R] add spark.randomForest to vignet...
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/16264 SPARK-18792] [R] add spark.randomForest to vignettes ## What changes were proposed in this pull request? Mention `spark.randomForest` in vignettes. Keep the content minimal since users can type `?spark.randomForest` to see the full doc. cc: @jkbradley You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark SPARK-18793 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16264.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16264 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16247 This test failure seems related with this PR. It seems because the previous Hive-enabled spark session is not closed properly between `test_saprkSQL.R` and `test_sparkR.R` here, in particular, I suspect the lock in derby via Hive client. It seems apparently related with SPARK-16027? There was only single instance of Hive-enabled spark session there but the test here introduces another one. I manually tested for each after removing each other and it seems working fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16247 re: test failure. it might be related to this change? the call stack is hidden, it should be trying to call into https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala and failed. Is it possible that tmpdir() is not writeable on Jenkins? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16222: [SPARK-18797][SparkR]:Update spark.logit in spark...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16222 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15314 **[Test build #70067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70067/consoleFull)** for PR 15314 at commit [`20e77fb`](https://github.com/apache/spark/commit/20e77fbc29bcba0238edc1c1368f984f44f7776c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16222: [SPARK-18797][SparkR]:Update spark.logit in sparkr-vigne...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/16222 LGTM. Merged into master and branch-2.1. I will change the `regParam` value in a follow-up PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah One more commit that adds a test case with `weight = 4.7` which will round up to 5 to test the case @sethah described. All tests passed. I'm pretty sure R's rounding is the same as what I'm doing here. Please merge if there is no other issue. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16142 **[Test build #70066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70066/consoleFull)** for PR 16142 at commit [`96e9063`](https://github.com/apache/spark/commit/96e906326327c96f0c6d07bba14c46ac46e86ae0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16220: [SPARK-18796][SS]StreamingQueryManager should not...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16220 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15915 **[Test build #70065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70065/consoleFull)** for PR 15915 at commit [`c48b8f9`](https://github.com/apache/spark/commit/c48b8f976e812d947f0cc54dfc159ac55dfb6211). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16142 retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15717: [SPARK-17910][SQL] Allow users to update the comment of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15717 **[Test build #70064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70064/consoleFull)** for PR 15717 at commit [`a71683c`](https://github.com/apache/spark/commit/a71683c773a1117f58c7e2b5cb5a4b1b101326f4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92104532 --- Diff: core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala --- @@ -119,4 +119,21 @@ class ChunkedByteBufferOutputStreamSuite extends SparkFunSuite { assert(arrays(1).toSeq === ref.slice(10, 20)) assert(arrays(2).toSeq === ref.slice(20, 30)) } + + test("negative chunk size") { +val ref = new Array[Byte](8 * 1024 * 1024 + 10) +Random.nextBytes(ref) +val o = new ChunkedByteBufferOutputStream(-10, ByteBuffer.allocate) +o.write(ref) +o.close() +val arrays = o.toChunkedByteBuffer.getChunks().map(_.array()) +assert(arrays.length === 3) +assert(arrays(0).length === 4 * 1024 * 1024) +assert(arrays(1).length === 4 * 1024 * 1024) +assert(arrays(2).length === 10 ) + +assert(arrays(0).toSeq === ref.slice(0, 4 * 1024 * 1024)) +assert(arrays(1).toSeq === ref.slice(4 * 1024 * 1024, 8 * 1024 * 1024)) +assert(arrays(2).toSeq === ref.slice(8 * 1024 * 1024, 8 * 1024 * 1024 + 10)) + } } --- End diff -- Discovery starting. Discovery completed in 42 seconds, 124 milliseconds. Run starting. Expected test count is: 9 ChunkedByteBufferOutputStreamSuite: - empty output - write a single byte - write a single near boundary - write a single at boundary - single chunk output - single chunk output at boundary size - multiple chunk output - multiple chunk output at boundary size - negative chunk size Run completed in 42 seconds, 700 milliseconds. Total number of tests run: 9 Suites: completed 2, aborted 0 Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 All tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92081435 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala --- @@ -157,8 +160,16 @@ abstract class SchedulerIntegrationSuite[T <: MockBackend: ClassTag] extends Spa } // When a job fails, we terminate before waiting for all the task end events to come in, // so there might still be a running task set. So we only check these conditions - // when the job succeeds - assert(taskScheduler.runningTaskSets.isEmpty) + // when the job succeeds. + // When the final task of a taskset completes, we post + // the event to the DAGScheduler event loop before we finish processing in the taskscheduler + // thread. Its possible the DAGScheduler thread processes the event, finishes the job, + // and notifies the job waiter before our original thread in the task scheduler finishes + // handling the event and marks the taskset as complete. So its ok if we need to wait a + // *little* bit longer for the original taskscheduler thread to finish up to deal w/ the race. + eventually(timeout(1 second), interval(100 millis)) { +assert(taskScheduler.runningTaskSets.isEmpty) --- End diff -- Is this related to this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92079858 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -17,10 +17,322 @@ package org.apache.spark.scheduler -import org.apache.spark.{SparkConf, SparkFunSuite} +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklist: BlacklistTracker = _ + private var scheduler: TaskSchedulerImpl = _ + private var conf: SparkConf = _ + + override def beforeEach(): Unit = { +conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +scheduler = mockTaskSchedWithConf(conf) + +clock.setTime(0) +blacklist = new BlacklistTracker(conf, clock) + } + + override def afterEach(): Unit = { +if (blacklist != null) { + blacklist = null +} +if (scheduler != null) { + scheduler.stop() + scheduler = null +} +super.afterEach() + } + + val allExecutorAndHostIds = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set, so this is a simple way to test + * something similar, since we know the universe of values that might appear in these sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allExecutorAndHostIds.foreach { id => + val actual = f(id) + val exp = expected.contains(id) + assert(actual === exp, raw"""for string "$id" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + def createTaskSetBlacklist(stageId: Int = 0): TaskSetBlacklist = { +new TaskSetBlacklist(conf, stageId, clock) + } + + test("executors can be blacklisted with only a few failures per stage") { +// For 4 different stages, executor 1 fails a task, then executor 2 succeeds the task, +// and then the task set is done. Not enough failures to blacklist the executor *within* +// any particular taskset, but we still blacklist the executor overall eventually. +// Also, we intentionally have a mix of task successes and failures -- there are even some +// successes after the executor is blacklisted. The idea here is those tasks get scheduled +// before the executor is blacklisted. We might get successes after blacklisting (because the +// executor might be flaky but not totally broken). But successes should not unblacklist the +// executor. +val failuresUntilBlacklisted = conf.get(config.MAX_FAILURES_PER_EXEC) +var failuresSoFar = 0 +(0 until failuresUntilBlacklisted * 10).foreach { stageId => + val taskSetBlacklist = createTaskSetBlacklist(stageId) + if (stageId % 2 == 0) { +// fail every other task --- End diff -- fail a task in every other stage? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92076411 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -17,10 +17,275 @@ package org.apache.spark.scheduler +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} + import org.apache.spark.SparkConf import org.apache.spark.internal.Logging import org.apache.spark.internal.config -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * executors and nodes across an entire application (with a periodic expiry). TaskSetManagers add + * additional blacklisting of executors and nodes for individual tasks and stages which works in + * concert with the blacklisting here. + * + * The tracker needs to deal with a variety of workloads, eg.: + * + * * bad user code -- this may lead to many task failures, but that should not count against + * individual executors + * * many small stages -- this may prevent a bad executor for having many failures within one + * stage, but still many failures over the entire application + * * "flaky" executors -- they don't fail every task, but are still faulty enough to merit + * blacklisting + * + * See the design doc on SPARK-8425 for a more in-depth discussion. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + BlacklistTracker.validateBlacklistConfs(conf) + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val BLACKLIST_TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf) + + /** + * A map from executorId to information on task failures. Tracks the time of each task failure, + * so that we can avoid blacklisting executors due to failures that are very far apart. We do not + * actively remove from this as soon as tasks hit their timeouts, to avoid the time it would take + * to do so. But it will not grow too large, because as soon as an executor gets too many + * failures, we blacklist the executor and remove its entry here. + */ + private val executorIdToFailureList = new HashMap[String, ExecutorFailureList]() + val executorIdToBlacklistStatus = new HashMap[String, BlacklistedExecutor]() + val nodeIdToBlacklistExpiryTime = new HashMap[String, Long]() + /** + * An immutable copy of the set of nodes that are currently blacklisted. Kept in an + * AtomicReference to make [[nodeBlacklist()]] thread-safe. + */ + private val _nodeBlacklist = new AtomicReference[Set[String]](Set()) + /** + * Time when the next blacklist will expire. Used as a + * shortcut to avoid iterating over all entries in the blacklist when none will have expired. + */ + var nextExpiryTime: Long = Long.MaxValue + /** + * Mapping from nodes to all of the executors that have been blacklisted on that node. We do *not* + * remove from this when executors are removed from spark, so we can track when we get multiple + * successive blacklisted executors on one node. Nonetheless, it will not grow too large because + * there cannot be many blacklisted executors on one node, before we stop requesting more + * executors on that node, and we clean up the list of blacklisted executors once an executor has + * been blacklisted for BLACKLIST_TIMEOUT_MILLIS. + */ + val nodeToBlacklistedExecs = new HashMap[String, HashSet[String]]() + + /** + * Un-blacklists executors and nodes that have been blacklisted for at least + * BLACKLIST_TIMEOUT_MILLIS + */ + def applyBlacklistTimeout(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + // Apply the timeout to blacklisted nodes and executors + val execsToUnblacklist = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToUnblacklist.nonEmpty) { +// Un-blacklist any executors that have been blacklisted longer
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92079426 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -17,10 +17,322 @@ package org.apache.spark.scheduler -import org.apache.spark.{SparkConf, SparkFunSuite} +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklist: BlacklistTracker = _ + private var scheduler: TaskSchedulerImpl = _ + private var conf: SparkConf = _ + + override def beforeEach(): Unit = { +conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +scheduler = mockTaskSchedWithConf(conf) + +clock.setTime(0) +blacklist = new BlacklistTracker(conf, clock) + } + + override def afterEach(): Unit = { +if (blacklist != null) { + blacklist = null +} +if (scheduler != null) { + scheduler.stop() + scheduler = null +} +super.afterEach() + } + + val allExecutorAndHostIds = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet --- End diff -- can you add a comment here saying all host / executor IDs used in this class's tests should be in this list so the below function works? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r90543377 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -17,10 +17,275 @@ package org.apache.spark.scheduler +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} + import org.apache.spark.SparkConf import org.apache.spark.internal.Logging import org.apache.spark.internal.config -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * executors and nodes across an entire application (with a periodic expiry). TaskSetManagers add + * additional blacklisting of executors and nodes for individual tasks and stages which works in + * concert with the blacklisting here. + * + * The tracker needs to deal with a variety of workloads, eg.: + * + * * bad user code -- this may lead to many task failures, but that should not count against + * individual executors + * * many small stages -- this may prevent a bad executor for having many failures within one + * stage, but still many failures over the entire application + * * "flaky" executors -- they don't fail every task, but are still faulty enough to merit + * blacklisting + * + * See the design doc on SPARK-8425 for a more in-depth discussion. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + BlacklistTracker.validateBlacklistConfs(conf) + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val BLACKLIST_TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf) + + /** + * A map from executorId to information on task failures. Tracks the time of each task failure, + * so that we can avoid blacklisting executors due to failures that are very far apart. We do not + * actively remove from this as soon as tasks hit their timeouts, to avoid the time it would take + * to do so. But it will not grow too large, because as soon as an executor gets too many + * failures, we blacklist the executor and remove its entry here. + */ + private val executorIdToFailureList = new HashMap[String, ExecutorFailureList]() + val executorIdToBlacklistStatus = new HashMap[String, BlacklistedExecutor]() + val nodeIdToBlacklistExpiryTime = new HashMap[String, Long]() + /** + * An immutable copy of the set of nodes that are currently blacklisted. Kept in an + * AtomicReference to make [[nodeBlacklist()]] thread-safe. + */ + private val _nodeBlacklist = new AtomicReference[Set[String]](Set()) + /** + * Time when the next blacklist will expire. Used as a + * shortcut to avoid iterating over all entries in the blacklist when none will have expired. + */ + var nextExpiryTime: Long = Long.MaxValue + /** + * Mapping from nodes to all of the executors that have been blacklisted on that node. We do *not* + * remove from this when executors are removed from spark, so we can track when we get multiple + * successive blacklisted executors on one node. Nonetheless, it will not grow too large because + * there cannot be many blacklisted executors on one node, before we stop requesting more + * executors on that node, and we clean up the list of blacklisted executors once an executor has + * been blacklisted for BLACKLIST_TIMEOUT_MILLIS. + */ + val nodeToBlacklistedExecs = new HashMap[String, HashSet[String]]() + + /** + * Un-blacklists executors and nodes that have been blacklisted for at least + * BLACKLIST_TIMEOUT_MILLIS + */ + def applyBlacklistTimeout(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + // Apply the timeout to blacklisted nodes and executors + val execsToUnblacklist = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToUnblacklist.nonEmpty) { +// Un-blacklist any executors that have been blacklisted longer
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92078434 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -584,6 +614,14 @@ private[spark] class TaskSchedulerImpl( executorIdToTaskCount.getOrElse(execId, -1) > 0 } + /** + * Get a snapshot of the currently blacklisted nodes for the entire application. This is + * thread-safe -- it can be called without a lock on the TaskScheduler. + */ + def nodeBlacklist(): scala.collection.immutable.Set[String] = { --- End diff -- Got it that makes sense / I agree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92079769 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -17,10 +17,322 @@ package org.apache.spark.scheduler -import org.apache.spark.{SparkConf, SparkFunSuite} +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklist: BlacklistTracker = _ + private var scheduler: TaskSchedulerImpl = _ + private var conf: SparkConf = _ + + override def beforeEach(): Unit = { +conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +scheduler = mockTaskSchedWithConf(conf) + +clock.setTime(0) +blacklist = new BlacklistTracker(conf, clock) + } + + override def afterEach(): Unit = { +if (blacklist != null) { + blacklist = null +} +if (scheduler != null) { + scheduler.stop() + scheduler = null +} +super.afterEach() + } + + val allExecutorAndHostIds = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set, so this is a simple way to test + * something similar, since we know the universe of values that might appear in these sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allExecutorAndHostIds.foreach { id => + val actual = f(id) + val exp = expected.contains(id) + assert(actual === exp, raw"""for string "$id" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + def createTaskSetBlacklist(stageId: Int = 0): TaskSetBlacklist = { +new TaskSetBlacklist(conf, stageId, clock) + } + + test("executors can be blacklisted with only a few failures per stage") { +// For 4 different stages, executor 1 fails a task, then executor 2 succeeds the task, --- End diff -- I don't think this is for 4 different stages? (looks like ~20?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r90543538 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -17,10 +17,254 @@ package org.apache.spark.scheduler +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} + import org.apache.spark.SparkConf import org.apache.spark.internal.Logging import org.apache.spark.internal.config -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * executors and nodes across an entire application (with a periodic expiry). TaskSetManagers add + * additional blacklisting of executors and nodes for individual tasks and stages which works in + * concert with the blacklisting here. + * + * The tracker needs to deal with a variety of workloads, eg.: + * + * * bad user code -- this may lead to many task failures, but that should not count against + * individual executors + * * many small stages -- this may prevent a bad executor for having many failures within one + * stage, but still many failures over the entire application + * * "flaky" executors -- they don't fail every task, but are still faulty enough to merit + * blacklisting + * + * See the design doc on SPARK-8425 for a more in-depth discussion. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + BlacklistTracker.validateBlacklistConfs(conf) + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val BLACKLIST_TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf) + + /** + * A map from executorId to information on task failures. Tracks the time of each task failure, + * so that we can avoid blacklisting executors due to failures that are very far apart. We do not + * actively remove from this as soon as tasks hit their timeouts, to avoid the time it would take + * to do so. But it will not grow too large, because as soon as an executor gets too many + * failures, we blacklist the executor and remove its entry here. + */ + private val executorIdToFailureList = new HashMap[String, ExecutorFailureList]() + val executorIdToBlacklistStatus = new HashMap[String, BlacklistedExecutor]() + val nodeIdToBlacklistExpiryTime = new HashMap[String, Long]() + /** + * An immutable copy of the set of nodes that are currently blacklisted. Kept in an + * AtomicReference to make [[nodeBlacklist()]] thread-safe. + */ + private val _nodeBlacklist = new AtomicReference[Set[String]](Set()) + /** + * Time when the next blacklist will expire. Used as a + * shortcut to avoid iterating over all entries in the blacklist when none will have expired. + */ + var nextExpiryTime: Long = Long.MaxValue + /** + * Mapping from nodes to all of the executors that have been blacklisted on that node. We do *not* + * remove from this when executors are removed from spark, so we can track when we get multiple + * successive blacklisted executors on one node. Nonetheless, it will not grow too large because + * there cannot be many blacklisted executors on one node, before we stop requesting more + * executors on that node, and we periodically clean up the list of blacklisted executors. + */ + val nodeToBlacklistedExecs = new HashMap[String, HashSet[String]]() + + /** + * Un-blacklists executors and nodes that have been blacklisted for at least + * BLACKLIST_TIMEOUT_MILLIS + */ + def applyBlacklistTimeout(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + // Apply the timeout to blacklisted nodes and executors + val execsToUnblacklist = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToUnblacklist.nonEmpty) { +// Un-blacklist any executors that have been blacklisted longer than the blacklist timeout. +logInfo(s"Removing ex
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92103550 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -217,18 +219,35 @@ private[yarn] class YarnAllocator( * @param localityAwareTasks number of locality aware tasks to be used as container placement hint * @param hostToLocalTaskCount a map of preferred hostname to possible task counts to be used as * container placement hint. + * @param nodeBlacklist a set of blacklisted nodes, which is passed in to avoid allocating new +* containers on them. It will be used to update the application master's +* blacklist. * @return Whether the new requested total is different than the old value. */ def requestTotalExecutorsWithPreferredLocalities( requestedTotal: Int, localityAwareTasks: Int, - hostToLocalTaskCount: Map[String, Int]): Boolean = synchronized { + hostToLocalTaskCount: Map[String, Int], + nodeBlacklist: Set[String]): Boolean = synchronized { this.numLocalityAwareTasks = localityAwareTasks this.hostToLocalTaskCounts = hostToLocalTaskCount if (requestedTotal != targetNumExecutors) { logInfo(s"Driver requested a total number of $requestedTotal executor(s).") targetNumExecutors = requestedTotal + + // Update blacklist infomation to YARN ResouceManager for this application, + // in order to avoid allocating new Containers on the problematic nodes. + val blacklistAdditions = nodeBlacklist -- currentNodeBlacklist + val blacklistRemovals = currentNodeBlacklist -- nodeBlacklist + if (blacklistAdditions.nonEmpty) { +logInfo(s"adding nodes to blacklist: $blacklistAdditions") --- End diff -- can you add "YARN application master" here and below? (so "adding nodes to YARN application master's blacklist: $...") (to differentiate clearly from the app-level blacklisting used elsewhere) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92081806 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -408,6 +411,96 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B } assert(tsm.isZombie) } + +// the tasksSets complete, so the tracker should be notified of the successful ones +verify(blacklist, times(1)).updateBlacklistForSuccessfulTaskSet( + stageId = 0, + stageAttemptId = 0, + failuresByExec = stageToMockTaskSetBlacklist(0).execToFailures) +verify(blacklist, times(1)).updateBlacklistForSuccessfulTaskSet( + stageId = 1, + stageAttemptId = 0, + failuresByExec = stageToMockTaskSetBlacklist(1).execToFailures) +// but we shouldn't update for the failed taskset +verify(blacklist, never).updateBlacklistForSuccessfulTaskSet( + stageId = meq(2), + stageAttemptId = anyInt(), + failuresByExec = anyObject()) + } + + test("scheduled tasks obey node and executor blacklists") { --- End diff -- why not have this test be a normal case (i.e., when the tsm doesn't get aborted, but the blacklist is obeyed)? That seems like the more common case, and abortion is tested in the test below --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92103779 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -121,13 +121,20 @@ private[spark] abstract class YarnSchedulerBackend( } } + private[cluster] def prepareRequestExecutors(requestedTotal: Int): RequestExecutors = { +val nodeBlacklist: Set[String] = scheduler.nodeBlacklist() +val filteredHostToLocalTaskCount = + hostToLocalTaskCount.filter { case (k, v) => !nodeBlacklist.contains(k) } --- End diff -- Ok cool can you add a comment here with the first thing that you said ("For locality preferences, ignore preferences for nodes that are blacklisted") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92103358 --- Diff: docs/configuration.md --- @@ -1339,6 +1347,28 @@ Apart from these, the following properties are also available, and may be useful + spark.blacklist.application.maxFailedTasksPerExecutor + 2 + +(Experimental) How many different tasks must fail on one executor, in successful task sets, +before the executor is blacklisted for the entire application. Blacklisted executors will +be automatically added back to the pool of available resources after the timeout specified by +spark.blacklist.timeout. Note that with dynamic allocation, though, the executors +may get marked as idle and be reclaimed by the cluster manager. + + + + spark.blacklist.application.maxFailedExecutorsPerNode + 2 + +(Experimental) How many different executors must be blacklisted for the entire application, +before the node is blacklisted for the entire application. Blacklisted nodes will +be automatically added back to the pool of available resources after the timeout specified by +spark.blacklist.timeout. Note that with dynamic allocation, though, the executors +may get marked as idle and be reclaimed by the cluster manager. --- End diff -- the executors on the node may --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92077755 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorFailuresInTaskSet.scala --- @@ -25,26 +25,30 @@ import scala.collection.mutable.HashMap private[scheduler] class ExecutorFailuresInTaskSet(val node: String) { /** * Mapping from index of the tasks in the taskset, to the number of times it has failed on this - * executor. + * executor and the failure time. --- End diff -- can you say "and the last failure time" or "and the most recent failure time" (o/w confusing since there's a count of failures, and which is the time for?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92081374 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala --- @@ -157,8 +160,16 @@ abstract class SchedulerIntegrationSuite[T <: MockBackend: ClassTag] extends Spa } // When a job fails, we terminate before waiting for all the task end events to come in, // so there might still be a running task set. So we only check these conditions - // when the job succeeds - assert(taskScheduler.runningTaskSets.isEmpty) + // when the job succeeds. + // When the final task of a taskset completes, we post + // the event to the DAGScheduler event loop before we finish processing in the taskscheduler + // thread. Its possible the DAGScheduler thread processes the event, finishes the job, --- End diff -- It's --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r90543965 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -17,10 +17,275 @@ package org.apache.spark.scheduler +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} + import org.apache.spark.SparkConf import org.apache.spark.internal.Logging import org.apache.spark.internal.config -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * executors and nodes across an entire application (with a periodic expiry). TaskSetManagers add + * additional blacklisting of executors and nodes for individual tasks and stages which works in + * concert with the blacklisting here. + * + * The tracker needs to deal with a variety of workloads, eg.: + * + * * bad user code -- this may lead to many task failures, but that should not count against + * individual executors + * * many small stages -- this may prevent a bad executor for having many failures within one + * stage, but still many failures over the entire application + * * "flaky" executors -- they don't fail every task, but are still faulty enough to merit + * blacklisting + * + * See the design doc on SPARK-8425 for a more in-depth discussion. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + BlacklistTracker.validateBlacklistConfs(conf) + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val BLACKLIST_TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf) + + /** + * A map from executorId to information on task failures. Tracks the time of each task failure, + * so that we can avoid blacklisting executors due to failures that are very far apart. We do not + * actively remove from this as soon as tasks hit their timeouts, to avoid the time it would take + * to do so. But it will not grow too large, because as soon as an executor gets too many + * failures, we blacklist the executor and remove its entry here. + */ + private val executorIdToFailureList = new HashMap[String, ExecutorFailureList]() + val executorIdToBlacklistStatus = new HashMap[String, BlacklistedExecutor]() + val nodeIdToBlacklistExpiryTime = new HashMap[String, Long]() + /** + * An immutable copy of the set of nodes that are currently blacklisted. Kept in an + * AtomicReference to make [[nodeBlacklist()]] thread-safe. + */ + private val _nodeBlacklist = new AtomicReference[Set[String]](Set()) + /** + * Time when the next blacklist will expire. Used as a + * shortcut to avoid iterating over all entries in the blacklist when none will have expired. + */ + var nextExpiryTime: Long = Long.MaxValue + /** + * Mapping from nodes to all of the executors that have been blacklisted on that node. We do *not* + * remove from this when executors are removed from spark, so we can track when we get multiple + * successive blacklisted executors on one node. Nonetheless, it will not grow too large because + * there cannot be many blacklisted executors on one node, before we stop requesting more + * executors on that node, and we clean up the list of blacklisted executors once an executor has + * been blacklisted for BLACKLIST_TIMEOUT_MILLIS. + */ + val nodeToBlacklistedExecs = new HashMap[String, HashSet[String]]() + + /** + * Un-blacklists executors and nodes that have been blacklisted for at least + * BLACKLIST_TIMEOUT_MILLIS + */ + def applyBlacklistTimeout(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + // Apply the timeout to blacklisted nodes and executors + val execsToUnblacklist = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToUnblacklist.nonEmpty) { +// Un-blacklist any executors that have been blacklisted longer
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92075850 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -17,10 +17,275 @@ package org.apache.spark.scheduler +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} + import org.apache.spark.SparkConf import org.apache.spark.internal.Logging import org.apache.spark.internal.config -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * executors and nodes across an entire application (with a periodic expiry). TaskSetManagers add + * additional blacklisting of executors and nodes for individual tasks and stages which works in + * concert with the blacklisting here. + * + * The tracker needs to deal with a variety of workloads, eg.: + * + * * bad user code -- this may lead to many task failures, but that should not count against + * individual executors + * * many small stages -- this may prevent a bad executor for having many failures within one + * stage, but still many failures over the entire application + * * "flaky" executors -- they don't fail every task, but are still faulty enough to merit + * blacklisting + * + * See the design doc on SPARK-8425 for a more in-depth discussion. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + BlacklistTracker.validateBlacklistConfs(conf) + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val BLACKLIST_TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf) + + /** + * A map from executorId to information on task failures. Tracks the time of each task failure, + * so that we can avoid blacklisting executors due to failures that are very far apart. We do not + * actively remove from this as soon as tasks hit their timeouts, to avoid the time it would take + * to do so. But it will not grow too large, because as soon as an executor gets too many + * failures, we blacklist the executor and remove its entry here. + */ + private val executorIdToFailureList = new HashMap[String, ExecutorFailureList]() + val executorIdToBlacklistStatus = new HashMap[String, BlacklistedExecutor]() + val nodeIdToBlacklistExpiryTime = new HashMap[String, Long]() + /** + * An immutable copy of the set of nodes that are currently blacklisted. Kept in an + * AtomicReference to make [[nodeBlacklist()]] thread-safe. + */ + private val _nodeBlacklist = new AtomicReference[Set[String]](Set()) + /** + * Time when the next blacklist will expire. Used as a + * shortcut to avoid iterating over all entries in the blacklist when none will have expired. + */ + var nextExpiryTime: Long = Long.MaxValue + /** + * Mapping from nodes to all of the executors that have been blacklisted on that node. We do *not* + * remove from this when executors are removed from spark, so we can track when we get multiple + * successive blacklisted executors on one node. Nonetheless, it will not grow too large because + * there cannot be many blacklisted executors on one node, before we stop requesting more + * executors on that node, and we clean up the list of blacklisted executors once an executor has + * been blacklisted for BLACKLIST_TIMEOUT_MILLIS. + */ + val nodeToBlacklistedExecs = new HashMap[String, HashSet[String]]() + + /** + * Un-blacklists executors and nodes that have been blacklisted for at least + * BLACKLIST_TIMEOUT_MILLIS + */ + def applyBlacklistTimeout(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + // Apply the timeout to blacklisted nodes and executors + val execsToUnblacklist = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToUnblacklist.nonEmpty) { +// Un-blacklist any executors that have been blacklisted longer
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r90147862 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -17,10 +17,272 @@ package org.apache.spark.scheduler +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} + import org.apache.spark.SparkConf import org.apache.spark.internal.Logging import org.apache.spark.internal.config -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * executors and nodes across an entire application (with a periodic expiry). TaskSetManagers add + * additional blacklisting of executors and nodes for individual tasks and stages which works in + * concert with the blacklisting here. + * + * The tracker needs to deal with a variety of workloads, eg.: + * + * * bad user code -- this may lead to many task failures, but that should not count against + * individual executors + * * many small stages -- this may prevent a bad executor for having many failures within one + * stage, but still many failures over the entire application + * * "flaky" executors -- they don't fail every task, but are still faulty enough to merit + * blacklisting + * + * See the design doc on SPARK-8425 for a more in-depth discussion. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + BlacklistTracker.validateBlacklistConfs(conf) + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val BLACKLIST_TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf) + + /** + * A map from executorId to information on task failures. Tracks the time of each task failure, + * so that we can avoid blacklisting executors due to failures that are very far apart. We do not + * actively remove from this as soon as tasks hit their timeouts, to avoid the time it would take + * to do so. But it will not grow too large, because as soon as an executor gets too many + * failures, we blacklist the executor and remove its entry here. + */ + private val executorIdToFailureList = new HashMap[String, ExecutorFailureList]() --- End diff -- super nit: remove extra space between "new" and "HashMap" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92103809 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -121,13 +121,20 @@ private[spark] abstract class YarnSchedulerBackend( } } + private[cluster] def prepareRequestExecutors(requestedTotal: Int): RequestExecutors = { --- End diff -- Ok makes sense --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92078684 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -678,4 +716,13 @@ private[spark] object TaskSchedulerImpl { retval.toList } + + private def maybeCreateBlacklistTracker(conf: SparkConf): Option[BlacklistTracker] = { --- End diff -- does it make sense for this method to be in the BlacklistTracker object? (then it could just be BlacklistTracker.maybeCreate()?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92103410 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -691,11 +691,11 @@ private[spark] class ApplicationMaster( } override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { - case RequestExecutors(requestedTotal, localityAwareTasks, hostToLocalTaskCount) => + case r: RequestExecutors => --- End diff -- Ok cool seems fine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92076298 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -17,10 +17,275 @@ package org.apache.spark.scheduler +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} + import org.apache.spark.SparkConf import org.apache.spark.internal.Logging import org.apache.spark.internal.config -import org.apache.spark.util.Utils +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * executors and nodes across an entire application (with a periodic expiry). TaskSetManagers add + * additional blacklisting of executors and nodes for individual tasks and stages which works in + * concert with the blacklisting here. + * + * The tracker needs to deal with a variety of workloads, eg.: + * + * * bad user code -- this may lead to many task failures, but that should not count against + * individual executors + * * many small stages -- this may prevent a bad executor for having many failures within one + * stage, but still many failures over the entire application + * * "flaky" executors -- they don't fail every task, but are still faulty enough to merit + * blacklisting + * + * See the design doc on SPARK-8425 for a more in-depth discussion. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + BlacklistTracker.validateBlacklistConfs(conf) + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val BLACKLIST_TIMEOUT_MILLIS = BlacklistTracker.getBlacklistTimeout(conf) + + /** + * A map from executorId to information on task failures. Tracks the time of each task failure, + * so that we can avoid blacklisting executors due to failures that are very far apart. We do not + * actively remove from this as soon as tasks hit their timeouts, to avoid the time it would take + * to do so. But it will not grow too large, because as soon as an executor gets too many + * failures, we blacklist the executor and remove its entry here. + */ + private val executorIdToFailureList = new HashMap[String, ExecutorFailureList]() + val executorIdToBlacklistStatus = new HashMap[String, BlacklistedExecutor]() + val nodeIdToBlacklistExpiryTime = new HashMap[String, Long]() + /** + * An immutable copy of the set of nodes that are currently blacklisted. Kept in an + * AtomicReference to make [[nodeBlacklist()]] thread-safe. + */ + private val _nodeBlacklist = new AtomicReference[Set[String]](Set()) + /** + * Time when the next blacklist will expire. Used as a + * shortcut to avoid iterating over all entries in the blacklist when none will have expired. + */ + var nextExpiryTime: Long = Long.MaxValue + /** + * Mapping from nodes to all of the executors that have been blacklisted on that node. We do *not* + * remove from this when executors are removed from spark, so we can track when we get multiple + * successive blacklisted executors on one node. Nonetheless, it will not grow too large because + * there cannot be many blacklisted executors on one node, before we stop requesting more + * executors on that node, and we clean up the list of blacklisted executors once an executor has + * been blacklisted for BLACKLIST_TIMEOUT_MILLIS. + */ + val nodeToBlacklistedExecs = new HashMap[String, HashSet[String]]() + + /** + * Un-blacklists executors and nodes that have been blacklisted for at least + * BLACKLIST_TIMEOUT_MILLIS + */ + def applyBlacklistTimeout(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + // Apply the timeout to blacklisted nodes and executors + val execsToUnblacklist = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToUnblacklist.nonEmpty) { +// Un-blacklist any executors that have been blacklisted longer
[GitHub] spark pull request #14079: [SPARK-8425][CORE] Application Level Blacklisting
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r92079984 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -17,10 +17,322 @@ package org.apache.spark.scheduler -import org.apache.spark.{SparkConf, SparkFunSuite} +import org.mockito.Mockito.when +import org.scalatest.BeforeAndAfterEach +import org.scalatest.mock.MockitoSugar + +import org.apache.spark._ import org.apache.spark.internal.config +import org.apache.spark.util.ManualClock + +class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with MockitoSugar +with LocalSparkContext { + + private val clock = new ManualClock(0) + + private var blacklist: BlacklistTracker = _ + private var scheduler: TaskSchedulerImpl = _ + private var conf: SparkConf = _ + + override def beforeEach(): Unit = { +conf = new SparkConf().setAppName("test").setMaster("local") + .set(config.BLACKLIST_ENABLED.key, "true") +scheduler = mockTaskSchedWithConf(conf) + +clock.setTime(0) +blacklist = new BlacklistTracker(conf, clock) + } + + override def afterEach(): Unit = { +if (blacklist != null) { + blacklist = null +} +if (scheduler != null) { + scheduler.stop() + scheduler = null +} +super.afterEach() + } + + val allExecutorAndHostIds = (('A' to 'Z').map("host" + _) ++ (1 to 100).map{_.toString}).toSet + + /** + * Its easier to write our tests as if we could directly look at the sets of nodes & executors in + * the blacklist. However the api doesn't expose a set, so this is a simple way to test + * something similar, since we know the universe of values that might appear in these sets. + */ + def assertEquivalentToSet(f: String => Boolean, expected: Set[String]): Unit = { +allExecutorAndHostIds.foreach { id => + val actual = f(id) + val exp = expected.contains(id) + assert(actual === exp, raw"""for string "$id" """) +} + } + + def mockTaskSchedWithConf(conf: SparkConf): TaskSchedulerImpl = { +sc = new SparkContext(conf) +val scheduler = mock[TaskSchedulerImpl] +when(scheduler.sc).thenReturn(sc) + when(scheduler.mapOutputTracker).thenReturn(SparkEnv.get.mapOutputTracker) +scheduler + } + + def createTaskSetBlacklist(stageId: Int = 0): TaskSetBlacklist = { +new TaskSetBlacklist(conf, stageId, clock) + } + + test("executors can be blacklisted with only a few failures per stage") { +// For 4 different stages, executor 1 fails a task, then executor 2 succeeds the task, +// and then the task set is done. Not enough failures to blacklist the executor *within* +// any particular taskset, but we still blacklist the executor overall eventually. +// Also, we intentionally have a mix of task successes and failures -- there are even some +// successes after the executor is blacklisted. The idea here is those tasks get scheduled +// before the executor is blacklisted. We might get successes after blacklisting (because the +// executor might be flaky but not totally broken). But successes should not unblacklist the +// executor. +val failuresUntilBlacklisted = conf.get(config.MAX_FAILURES_PER_EXEC) +var failuresSoFar = 0 +(0 until failuresUntilBlacklisted * 10).foreach { stageId => + val taskSetBlacklist = createTaskSetBlacklist(stageId) + if (stageId % 2 == 0) { +// fail every other task +taskSetBlacklist.updateBlacklistForFailedTask("hostA", exec = "1", index = 0) +failuresSoFar += 1 + } + blacklist.updateBlacklistForSuccessfulTaskSet(stageId, 0, taskSetBlacklist.execToFailures) + assert(failuresSoFar == stageId / 2 + 1) + if (failuresSoFar < failuresUntilBlacklisted) { +assertEquivalentToSet(blacklist.isExecutorBlacklisted(_), Set()) + } else { +assertEquivalentToSet(blacklist.isExecutorBlacklisted(_), Set("1")) + } +} + } + + // If an executor has many task failures, but the task set ends up failing, it shouldn't be + // counted against the executor. + test("executors aren't blacklisted if task sets fail") { --- End diff -- "executors aren't blacklisted as a result of tasks in failed task sets"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastr
[GitHub] spark issue #16258: [SPARK-18834][SS] Expose event time stats through Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16258 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70058/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16258: [SPARK-18834][SS] Expose event time stats through Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16258 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16247: [SPARK-18817][SparkR] set default spark-warehouse path t...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16247 a new property for default warehouse LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16258: [SPARK-18834][SS] Expose event time stats through Stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16258 **[Test build #70058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70058/consoleFull)** for PR 16258 at commit [`b59ab80`](https://github.com/apache/spark/commit/b59ab8083de3f2441133fed35658dea39cd4a759). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15915 **[Test build #70063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70063/consoleFull)** for PR 15915 at commit [`041992f`](https://github.com/apache/spark/commit/041992fa467d1475eb5b9ec7303b22f21ccf97ea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92103662 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStream.scala --- @@ -30,9 +31,14 @@ import org.apache.spark.storage.StorageUtils * @param chunkSize size of each chunk, in bytes. */ private[spark] class ChunkedByteBufferOutputStream( -chunkSize: Int, +var chunkSize: Int, allocator: Int => ByteBuffer) - extends OutputStream { + extends OutputStream with Logging{ + + if (chunkSize < 0) { +logWarning(s"chunkSize should not be an negative value, replaced as 4MB default.") +chunkSize = 4 * 1024 * 1024 + } --- End diff -- As `chunkSize` is passed from many code path, and there is underlying integer overflow when convert from `Long` to `Int`. As we do not have a better solution, introducing a protection check may be a tradeoff way in the first step. @JoshRosen @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70062/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16262: [SPARK-17932][SQL][FOLLOWUP] Change statement `SHOW TABL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16262 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70062/consoleFull)** for PR 16263 at commit [`6905a70`](https://github.com/apache/spark/commit/6905a700376b2deff77ff539400951cf5e12885d). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16142 **[Test build #70059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70059/consoleFull)** for PR 16142 at commit [`c54c7f3`](https://github.com/apache/spark/commit/c54c7f3a1a021ab8d6d47990921d15f5e202b9ed). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16142 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70059/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16262: [SPARK-17932][SQL][FOLLOWUP] Change statement `SHOW TABL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16262 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70056/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16263 **[Test build #70062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70062/consoleFull)** for PR 16263 at commit [`6905a70`](https://github.com/apache/spark/commit/6905a700376b2deff77ff539400951cf5e12885d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16142 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16262: [SPARK-17932][SQL][FOLLOWUP] Change statement `SHOW TABL...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16262 **[Test build #70056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70056/consoleFull)** for PR 16262 at commit [`0c875c0`](https://github.com/apache/spark/commit/0c875c00eeb927a47b7d8eb47c9c0c8f27111d47). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16263: [SPARK-18281][SQL][PySpark] Consumes the returned...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16263 [SPARK-18281][SQL][PySpark] Consumes the returned local iterator immediately to prevent timeout on the socket serving the data ## What changes were proposed in this pull request? There is a timeout failure when using `rdd.toLocalIterator()` or `df.toLocalIterator()` for a PySpark RDD and DataFrame: df = spark.createDataFrame([[1],[2],[3]]) it = df.toLocalIterator() row = next(it) The cause of this issue is, we open a socket to serve the data from JVM side. We set a timeout for the socket to accept connection. If we don't consume the returned local iterator from `toLocalIterator` in Python immediately, the socket will be timeout and failed. ## How was this patch tested? Added tests into PySpark. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 fix-pyspark-localiterator Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16263 commit 6905a700376b2deff77ff539400951cf5e12885d Author: Liang-Chi Hsieh Date: 2016-12-13T03:47:10Z Consumes the returned local iterator immediately to prevent timeout on the socket serving the data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15717: [SPARK-17910][SQL] Allow users to update the comm...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15717#discussion_r92102030 --- Diff: sql/core/src/test/resources/sql-tests/inputs/change-column.sql --- @@ -0,0 +1,53 @@ +-- Create the origin table +CREATE TABLE test_change(a Int, b String, c Int); --- End diff -- nit: in SQL statement, we should upper case the type string, i.e. `STRING`, `INT`, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15717: [SPARK-17910][SQL] Allow users to update the comm...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/15717#discussion_r92101718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -274,6 +274,80 @@ case class AlterTableUnsetPropertiesCommand( } + +/** + * A command to change the columns for a table, only support changing the comments of non-partition + * columns for now. + * + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] + * [FIRST | AFTER column_name]; + * }}} + */ +case class AlterTableChangeColumnsCommand( +tableName: TableIdentifier, +columns: Map[String, StructField]) extends RunnableCommand { + + // TODO: support change column name/dataType/metadata/position. + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val table = catalog.getTableMetadata(tableName) +val resolver = sparkSession.sessionState.conf.resolver +DDLUtils.verifyAlterTableType(catalog, table, isView = false) + +// Create a map that converts the origin column to the new column with changed comment, throw +// a Exception if the column reference is invalid or the column name/dataType is changed. +val columnsMap = columns.map { case (oldName: String, newField: StructField) => + // Find the origin column from schema by column name. + val originColumn = findColumn(table.schema, oldName, resolver) + // Throw a Exception if the column name/dataType is changed. + if (!columnEqual(originColumn, newField, resolver)) { +throw new AnalysisException( + "ALTER TABLE CHANGE COLUMN is not supported for changing column " + +s"'${getDesc(originColumn)}' to '${getDesc(newField)}'") + } + // Create a new column from the origin column with new comment. + val newColumn = addComment(originColumn, newField.getComment) + // Create the map from origin column to changed column + originColumn -> newColumn --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15717: [SPARK-17910][SQL] Allow users to update the comm...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15717#discussion_r92101553 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -274,6 +274,80 @@ case class AlterTableUnsetPropertiesCommand( } + +/** + * A command to change the columns for a table, only support changing the comments of non-partition + * columns for now. + * + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] + * [FIRST | AFTER column_name]; + * }}} + */ +case class AlterTableChangeColumnsCommand( +tableName: TableIdentifier, +columns: Map[String, StructField]) extends RunnableCommand { + + // TODO: support change column name/dataType/metadata/position. + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val table = catalog.getTableMetadata(tableName) +val resolver = sparkSession.sessionState.conf.resolver +DDLUtils.verifyAlterTableType(catalog, table, isView = false) + +// Create a map that converts the origin column to the new column with changed comment, throw +// a Exception if the column reference is invalid or the column name/dataType is changed. +val columnsMap = columns.map { case (oldName: String, newField: StructField) => + // Find the origin column from schema by column name. + val originColumn = findColumn(table.schema, oldName, resolver) + // Throw a Exception if the column name/dataType is changed. + if (!columnEqual(originColumn, newField, resolver)) { +throw new AnalysisException( + "ALTER TABLE CHANGE COLUMN is not supported for changing column " + +s"'${getDesc(originColumn)}' to '${getDesc(newField)}'") + } + // Create a new column from the origin column with new comment. + val newColumn = addComment(originColumn, newField.getComment) + // Create the map from origin column to changed column + originColumn -> newColumn --- End diff -- how about use `originColumn.name` as map key? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15717: [SPARK-17910][SQL] Allow users to update the comm...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15717#discussion_r92101519 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -274,6 +274,80 @@ case class AlterTableUnsetPropertiesCommand( } + +/** + * A command to change the columns for a table, only support changing the comments of non-partition + * columns for now. + * + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] + * [FIRST | AFTER column_name]; + * }}} + */ +case class AlterTableChangeColumnsCommand( +tableName: TableIdentifier, +columns: Map[String, StructField]) extends RunnableCommand { + + // TODO: support change column name/dataType/metadata/position. + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val table = catalog.getTableMetadata(tableName) +val resolver = sparkSession.sessionState.conf.resolver +DDLUtils.verifyAlterTableType(catalog, table, isView = false) + +// Create a map that converts the origin column to the new column with changed comment, throw +// a Exception if the column reference is invalid or the column name/dataType is changed. +val columnsMap = columns.map { case (oldName: String, newField: StructField) => + // Find the origin column from schema by column name. + val originColumn = findColumn(table.schema, oldName, resolver) + // Throw a Exception if the column name/dataType is changed. + if (!columnEqual(originColumn, newField, resolver)) { +throw new AnalysisException( + "ALTER TABLE CHANGE COLUMN is not supported for changing column " + +s"'${getDesc(originColumn)}' to '${getDesc(newField)}'") + } + // Create a new column from the origin column with new comment. + val newColumn = addComment(originColumn, newField.getComment) + // Create the map from origin column to changed column + originColumn -> newColumn +} + +val newSchema = table.schema.fields.map(field => columnsMap.getOrElse(field, field)) --- End diff -- ah i see --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15996 **[Test build #70061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70061/consoleFull)** for PR 15996 at commit [`172f6eb`](https://github.com/apache/spark/commit/172f6eb5eeb36819aaf731c547540c5af90c49cc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15995 **[Test build #70060 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70060/consoleFull)** for PR 15995 at commit [`ed548e6`](https://github.com/apache/spark/commit/ed548e6437d64e67545f6bcb60384eb8badb4cec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16142 **[Test build #70059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70059/consoleFull)** for PR 16142 at commit [`c54c7f3`](https://github.com/apache/spark/commit/c54c7f3a1a021ab8d6d47990921d15f5e202b9ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16142 @vanzin I have removed related changes in `EventLoggingListener`, and provide a new clean mode, i.e. `space` based mode. Please take a review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15717: [SPARK-17910][SQL] Allow users to update the comm...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/15717#discussion_r92100070 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -274,6 +274,80 @@ case class AlterTableUnsetPropertiesCommand( } + +/** + * A command to change the columns for a table, only support changing the comments of non-partition + * columns for now. + * + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] + * [FIRST | AFTER column_name]; + * }}} + */ +case class AlterTableChangeColumnsCommand( +tableName: TableIdentifier, +columns: Map[String, StructField]) extends RunnableCommand { + + // TODO: support change column name/dataType/metadata/position. + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val table = catalog.getTableMetadata(tableName) +val resolver = sparkSession.sessionState.conf.resolver +DDLUtils.verifyAlterTableType(catalog, table, isView = false) + +// Create a map that converts the origin column to the new column with changed comment, throw +// a Exception if the column reference is invalid or the column name/dataType is changed. +val columnsMap = columns.map { case (oldName: String, newField: StructField) => + // Find the origin column from schema by column name. + val originColumn = findColumn(table.schema, oldName, resolver) + // Throw a Exception if the column name/dataType is changed. + if (!columnEqual(originColumn, newField, resolver)) { +throw new AnalysisException( + "ALTER TABLE CHANGE COLUMN is not supported for changing column " + +s"'${getDesc(originColumn)}' to '${getDesc(newField)}'") + } + // Create a new column from the origin column with new comment. + val newColumn = addComment(originColumn, newField.getComment) + // Create the map from origin column to changed column + originColumn -> newColumn +} + +val newSchema = table.schema.fields.map(field => columnsMap.getOrElse(field, field)) --- End diff -- If `newColumn` is empty, we should throw a Exception like that HIVE does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15995#discussion_r92099484 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -128,61 +128,69 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { projectList } + private def hasBeenPreprocessed( + tableOutput: Seq[Attribute], + partSchema: StructType, + partSpec: Map[String, Option[String]], + query: LogicalPlan): Boolean = { +val partColNames = partSchema.map(_.name).toSet +query.resolved && partSpec.keys.forall(partColNames.contains) && { + val staticPartCols = partSpec.filter(_._2.isDefined).keySet + val expectedColumns = tableOutput.filterNot(a => staticPartCols.contains(a.name)) + expectedColumns.toStructType.sameType(query.schema) --- End diff -- this is to follow the previous condition: https://github.com/apache/spark/pull/15995/files#diff-d99813bd5bbc18277e4090475e4944cfL166 This can be caused if users issue an invalid command, e.g. `INSERT INTO src SELECT 1,2` while table `src` has 3 columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15995#discussion_r92099139 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -128,61 +128,69 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { projectList } + private def hasBeenPreprocessed( + tableOutput: Seq[Attribute], + partSchema: StructType, + partSpec: Map[String, Option[String]], + query: LogicalPlan): Boolean = { +val partColNames = partSchema.map(_.name).toSet +query.resolved && partSpec.keys.forall(partColNames.contains) && { --- End diff -- yup --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16252: [SPARK-18827][Core] Fix cannot read broadcast on disk
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16252 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70055/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16252: [SPARK-18827][Core] Fix cannot read broadcast on disk
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16252 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16252: [SPARK-18827][Core] Fix cannot read broadcast on disk
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16252 **[Test build #70055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70055/consoleFull)** for PR 16252 at commit [`d964c54`](https://github.com/apache/spark/commit/d964c540974e680fc2809cd40a25ef72992a17db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16194: [SPARK-18767][ML] Unify Models' toString methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16194 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16194: [SPARK-18767][ML] Unify Models' toString methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16194 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70057/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16194: [SPARK-18767][ML] Unify Models' toString methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16194 **[Test build #70057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70057/consoleFull)** for PR 16194 at commit [`3b7ddfe`](https://github.com/apache/spark/commit/3b7ddfe02a00e7722b6ec41aef5ea1e69f738561). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70054/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #70054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70054/consoleFull)** for PR 13599 at commit [`131ae1f`](https://github.com/apache/spark/commit/131ae1f66f753ae4de0d477aee029fe2103d270f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16258: [SPARK-18834][SS] Expose event time stats through Stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16258 **[Test build #70058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70058/consoleFull)** for PR 16258 at commit [`b59ab80`](https://github.com/apache/spark/commit/b59ab8083de3f2441133fed35658dea39cd4a759). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16194: [SPARK-18767][ML] Unify Models' toString methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16194 **[Test build #70057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70057/consoleFull)** for PR 16194 at commit [`3b7ddfe`](https://github.com/apache/spark/commit/3b7ddfe02a00e7722b6ec41aef5ea1e69f738561). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org