[GitHub] spark pull request #14374: [SPARK-16735][SQL] `map` should create a decimal ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14374#discussion_r72567433 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -93,20 +93,45 @@ case class CreateMap(children: Seq[Expression]) extends Expression { if (children.size % 2 != 0) { TypeCheckResult.TypeCheckFailure(s"$prettyName expects a positive even number of arguments.") } else if (keys.map(_.dataType).distinct.length > 1) { - TypeCheckResult.TypeCheckFailure("The given keys of function map should all be the same " + -"type, but they are " + keys.map(_.dataType.simpleString).mkString("[", ", ", "]")) + if (keys.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { +TypeCheckResult.TypeCheckSuccess + } else { +TypeCheckResult.TypeCheckFailure("The given keys of function map should all be the same " + + "type, but they are " + keys.map(_.dataType.simpleString).mkString("[", ", ", "]")) + } } else if (values.map(_.dataType).distinct.length > 1) { - TypeCheckResult.TypeCheckFailure("The given values of function map should all be the same " + -"type, but they are " + values.map(_.dataType.simpleString).mkString("[", ", ", "]")) + if (values.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { +TypeCheckResult.TypeCheckSuccess + } else { +TypeCheckResult.TypeCheckFailure("The given values of function map should all be the " + + "same type, but they are " + values.map(_.dataType.simpleString).mkString("[", ", ", "]")) + } } else { TypeCheckResult.TypeCheckSuccess } } + private def checkDecimalType(colType: Seq[Expression]): DataType = { +val elementType = colType.headOption.map(_.dataType).getOrElse(NullType) +elementType match { + case _ if elementType.isInstanceOf[DecimalType] => +var tighter: DataType = elementType +colType.foreach { child => + if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) { --- End diff -- What I was referring to was that isTighterThan was not associative, and i don't think you can just take the tightest one this way. As an example: a precision 10, scale 5 b precision 7, scale 1 in this case a is not tighter than b, but b would be chosen as the target data type, leading to lose of precision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14353 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14353 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62954/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14353 **[Test build #62954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62954/consoleFull)** for PR 14353 at commit [`a095389`](https://github.com/apache/spark/commit/a0953891e0d1b80ee8aa2e7d24409f987ba76a83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72565468 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -452,14 +466,37 @@ object FunctionRegistry { } } -val clazz = tag.runtimeClass +(name, (expressionInfo[T](name), builder)) + } + + /** + * Creates a function registry lookup entry for cast aliases (SPARK-16730). + * For example, if name is "int", and dataType is IntegerType, this means int(x) would become + * an alias for cast(x as IntegerType). + * See usage above. + */ + private def castAlias( + name: String, + dataType: DataType): (String, (ExpressionInfo, FunctionBuilder)) = { +val builder = (args: Seq[Expression]) => { + if (args.size != 1) { +throw new AnalysisException(s"Function $name accepts only one argument") + } + Cast(args.head, dataType) +} +(name, (expressionInfo[Cast](name), builder)) --- End diff -- Yes - this is a limitation. That's not what Hive does because Hive actually does not have a single cast expression. It has a cast expression for each target type. I think it's a pretty small detail and fixing it would require a lot of work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13680 We need inputs from @davies , but unfortunately he is on vacation now... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14364#discussion_r72563566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -452,14 +466,37 @@ object FunctionRegistry { } } -val clazz = tag.runtimeClass +(name, (expressionInfo[T](name), builder)) + } + + /** + * Creates a function registry lookup entry for cast aliases (SPARK-16730). + * For example, if name is "int", and dataType is IntegerType, this means int(x) would become + * an alias for cast(x as IntegerType). + * See usage above. + */ + private def castAlias( + name: String, + dataType: DataType): (String, (ExpressionInfo, FunctionBuilder)) = { +val builder = (args: Seq[Expression]) => { + if (args.size != 1) { +throw new AnalysisException(s"Function $name accepts only one argument") + } + Cast(args.head, dataType) +} +(name, (expressionInfo[Cast](name), builder)) --- End diff -- so whatever cast function we describe, we will always show `Cast`'s description right? Is it same with hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14132 for case 2, what does hive(or other databases that support sql hint) do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...
Github user zasdfgbnm commented on a diff in the pull request: https://github.com/apache/spark/pull/14231#discussion_r72562015 --- Diff: bin/spark-class --- @@ -65,24 +65,25 @@ fi # characters that would be otherwise interpreted by the shell. Read that in a while loop, populating # an array that will be used to exec the final command. # -# The exit code of the launcher is appended to the output, so the parent shell removes it from the -# command array and checks the value to see if the launcher succeeded. -build_command() { - "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" - printf "%d\0" $? -} +# To keep both the output and the exit code of the launcher, the output is first converted to a hex +# dump which prevents the bash from getting rid of the NULL character, and the exit code retrieved +# from the bash array ${PIPESTATUS[@]}. +# +# Note that the seperator NULL character can not be replace with space or '\n' so that the command +# won't fail if some path of the user contain special characher such as '\n' or space +# +# Also note that when the launcher fails, it might not output something ending with '\0' [SPARK-16586] +_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]}) --- End diff -- This won't work. `build_command` is executed in a subshell. The `exit $?` will only terminate the subshell. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14386: [SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14386 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14386: [SPARK-16761][DOC][ML] Fix doc link in docs/ml-gu...
GitHub user sundapeng opened a pull request: https://github.com/apache/spark/pull/14386 [SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md ## What changes were proposed in this pull request? Fix the link at http://spark.apache.org/docs/latest/ml-guide.html. ## How was this patch tested? None You can merge this pull request into a Git repository by running: $ git pull https://github.com/sundapeng/spark doclink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14386.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14386 commit 4d1b3c58a4a35339fc088652b242cc6acd3838cb Author: Sun DapengDate: 2016-07-28T03:18:47Z SPARK-16761: Fix doc link in docs/ml-guide.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14378: [SPARK-16750] [ML] Fix GaussianMixture training f...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14378#discussion_r72560646 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala --- @@ -111,7 +111,7 @@ class MinMaxScaler @Since("1.5.0") (@Since("1.5.0") override val uid: String) @Since("2.0.0") override def fit(dataset: Dataset[_]): MinMaxScalerModel = { -transformSchema(dataset.schema, logging = true) +transformSchema(dataset.schema) --- End diff -- It's a good question. Since ```MinMaxScaler``` override ```transformSchema``` with no argument ```logging```, we should use that one rather than the function in the base class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14385 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14385 **[Test build #62953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62953/consoleFull)** for PR 14385 at commit [`c44611d`](https://github.com/apache/spark/commit/c44611dad2a5b7ec27b6584c05ca03ff868ebe08). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14385 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62953/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14353 **[Test build #62954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62954/consoleFull)** for PR 14353 at commit [`a095389`](https://github.com/apache/spark/commit/a0953891e0d1b80ee8aa2e7d24409f987ba76a83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14353 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14385 **[Test build #62953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62953/consoleFull)** for PR 14385 at commit [`c44611d`](https://github.com/apache/spark/commit/c44611dad2a5b7ec27b6584c05ca03ff868ebe08). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafk...
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/14385 [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration ## What changes were proposed in this pull request? Doc for the Kafka 0.10 integration ## How was this patch tested? Scala code examples were taken from my example repo, so hopefully they compile. You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-16312 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14385.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14385 commit c44611dad2a5b7ec27b6584c05ca03ff868ebe08 Author: cody koeningerDate: 2016-07-28T03:00:06Z [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/14182#discussion_r72558746 --- Diff: R/pkg/inst/tests/testthat/test_mllib.R --- @@ -454,4 +454,9 @@ test_that("spark.survreg", { } }) +test_that("spark.isotonicRegression", { + +}) + sparkR.session.stop() + --- End diff -- Remove this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14176 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14176 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62952/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14176 **[Test build #62952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62952/consoleFull)** for PR 14176 at commit [`122cf18`](https://github.com/apache/spark/commit/122cf181841046a026c4ffd7fd0c597a03ff30dd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62951/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62951 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62951/consoleFull)** for PR 14132 at commit [`d733393`](https://github.com/apache/spark/commit/d733393d480a9f06b4ceb23292dcd753e061685c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14176 **[Test build #62952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62952/consoleFull)** for PR 14176 at commit [`122cf18`](https://github.com/apache/spark/commit/122cf181841046a026c4ffd7fd0c597a03ff30dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14124 @cloud-fan If nullability should be not ignored, then I can fix this PR to make them consistent to not ignoring it (and of course I will try to identify the related problems). In this case, I will work on what @gatorsmile pointed out in https://github.com/apache/spark/pull/14124#issuecomment-231594416 about JSON (and will check the other data sources as well). I will follow your decision. To cut the all comments above short, (for other reviewers), - The purpose of this PR is whether it should force all schema to nullable schema or not. - This is already happening with normal reading and writing for data sources based on `FileFormat`. - This is for both inferred/read schema and user-given schema. - For `json(rdd: RDD[String])` API, this is not hapenning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14355 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14355 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62948/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14355 **[Test build #62948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62948/consoleFull)** for PR 14355 at commit [`71e8ec9`](https://github.com/apache/spark/commit/71e8ec90413a58c25bd7b5c612a9821a797652b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14266 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62947/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14266 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14383 Hi, @rxin and @hvanhovell . It's ready for review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14266 **[Test build #62947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62947/consoleFull)** for PR 14266 at commit [`4944b29`](https://github.com/apache/spark/commit/4944b29558ba8f3563f1c0c3d1a485b29dcdc39b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62951/consoleFull)** for PR 14132 at commit [`d733393`](https://github.com/apache/spark/commit/d733393d480a9f06b4ceb23292dcd753e061685c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62950/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #62950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62950/consoleFull)** for PR 14384 at commit [`962e12f`](https://github.com/apache/spark/commit/962e12fccc7969e9a009af9f49bf7f6b9268b08a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14374: [SPARK-16735][SQL] `map` should create a decimal ...
Github user biglobster commented on a diff in the pull request: https://github.com/apache/spark/pull/14374#discussion_r72544676 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -93,20 +93,45 @@ case class CreateMap(children: Seq[Expression]) extends Expression { if (children.size % 2 != 0) { TypeCheckResult.TypeCheckFailure(s"$prettyName expects a positive even number of arguments.") } else if (keys.map(_.dataType).distinct.length > 1) { - TypeCheckResult.TypeCheckFailure("The given keys of function map should all be the same " + -"type, but they are " + keys.map(_.dataType.simpleString).mkString("[", ", ", "]")) + if (keys.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { +TypeCheckResult.TypeCheckSuccess + } else { +TypeCheckResult.TypeCheckFailure("The given keys of function map should all be the same " + + "type, but they are " + keys.map(_.dataType.simpleString).mkString("[", ", ", "]")) + } } else if (values.map(_.dataType).distinct.length > 1) { - TypeCheckResult.TypeCheckFailure("The given values of function map should all be the same " + -"type, but they are " + values.map(_.dataType.simpleString).mkString("[", ", ", "]")) + if (values.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { +TypeCheckResult.TypeCheckSuccess + } else { +TypeCheckResult.TypeCheckFailure("The given values of function map should all be the " + + "same type, but they are " + values.map(_.dataType.simpleString).mkString("[", ", ", "]")) + } } else { TypeCheckResult.TypeCheckSuccess } } + private def checkDecimalType(colType: Seq[Expression]): DataType = { +val elementType = colType.headOption.map(_.dataType).getOrElse(NullType) +elementType match { + case _ if elementType.isInstanceOf[DecimalType] => +var tighter: DataType = elementType +colType.foreach { child => + if (elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) { --- End diff -- @rxin I have checked this function, and it will not lost any precision or range ,it 's safe . can you give me some advise?thank you :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14384 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62949/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #62949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62949/consoleFull)** for PR 14384 at commit [`3939ccc`](https://github.com/apache/spark/commit/3939ccce37cd899edf007d0ac3b9589748db0bc3). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/14079 OK done with a pass over this -- overall this looks good. Sorry for the huge number of naming suggestions. This behavior is subtle/complex and I'm in favor of spending some effort to make it as understandable as possible to future folks who modify this code. A few more comments: The design doc mentions some invariants that must be true of the parameters (e.g., that some values must be lower than others). Can you enforce those invariants in the creation of the BlacklistManager? It would be helpful to add a comment somewhere describing the division of functionality. Maybe a good place would be in the BlacklistTracker docstring? Something that explains that the TaskSetManager is responsible for handling blacklisting at the task set and task level, while the TaskSchedulerImpl is responsible for handling application-level blacklists? I added a few high level comments / concerns to the design doc, also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14380: [SPARK-16485][DOC][ML] Remove useless latex in scaladoc ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14380 Do a search for some latex tags and you'll find more, like AFTSurvivalRegression.scala, LinearRegression.scala, etc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14231#discussion_r72543191 --- Diff: bin/spark-class --- @@ -65,24 +65,25 @@ fi # characters that would be otherwise interpreted by the shell. Read that in a while loop, populating # an array that will be used to exec the final command. # -# The exit code of the launcher is appended to the output, so the parent shell removes it from the -# command array and checks the value to see if the launcher succeeded. -build_command() { - "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" - printf "%d\0" $? -} +# To keep both the output and the exit code of the launcher, the output is first converted to a hex +# dump which prevents the bash from getting rid of the NULL character, and the exit code retrieved +# from the bash array ${PIPESTATUS[@]}. +# +# Note that the seperator NULL character can not be replace with space or '\n' so that the command +# won't fail if some path of the user contain special characher such as '\n' or space +# +# Also note that when the launcher fails, it might not output something ending with '\0' [SPARK-16586] +_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]}) --- End diff -- Eh, I'm talking about something _much_ simpler. ``` build_command() { "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@" if [ $? != 0 ]; then echo "Couldn't run launcher" exit $? fi printf "%d\0" $? } ``` Does that make sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14379 Oh I see, so this is actually packaged, not strictly part of tests. Well we should do it anyway. Not sure if the risk actually surfaces in Spark but better to update and be safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62946/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62946/consoleFull)** for PR 14132 at commit [`381045d`](https://github.com/apache/spark/commit/381045da10cb74d0a2b45d743cac16c441a526e5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14379 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14379 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62943/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14379 **[Test build #62943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62943/consoleFull)** for PR 14379 at commit [`f3815cf`](https://github.com/apache/spark/commit/f3815cfd1b6b6d7ba29dc8f2e12b1334ad415847). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14373#discussion_r72541802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap def optimize(): Unit = { val range = maxKey - minKey --- End diff -- They're already `Long`s so `range` is too. The difference can overflow though. This should correctly handle the case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72541736 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -592,7 +599,9 @@ private[spark] class TaskSetManager( * failures (this is because the method picks on unscheduled task, and then iterates through each * executor until it finds one that the task hasn't failed on already). */ - private[scheduler] def abortIfCompletelyBlacklisted(executors: Iterable[String]): Unit = { + private[scheduler] def abortIfCompletelyBlacklisted( + executorsByHost: HashMap[String, HashSet[String]], + blacklist: BlacklistTracker): Unit = { --- End diff -- looks like you don't need to pass this in anymore, since it's now part of the class? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72541451 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{HashMap, HashSet} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * specific (executor, task) pairs within a stage, blacklisting entire executors and nodes for a + * stage, and blacklisting executors and nodes across an entire application (with a periodic + * expiry). + * + * The tracker needs to deal with a variety of workloads, eg.: bad user code, which may lead to many + * task failures, but that should not count against individual executors; many small stages, which + * may prevent a bad executor for having many failures within one stage, but still many failures + * over the entire application; "flaky" executors, that don't fail every task, but are still + * faulty; etc. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val EXECUTOR_RECOVERY_MILLIS = BlacklistTracker.getBlacklistExpiryTime(conf) + + // a count of failed tasks for each executor. Only counts failures after tasksets complete + // successfully + private val executorIdToFailureCount: HashMap[String, Int] = new HashMap() + private val executorIdToBlacklistStatus: HashMap[String, BlacklistedExecutor] = new HashMap() + private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new HashMap() + private val _nodeBlacklist: AtomicReference[Set[String]] = new AtomicReference(Set()) + private var nextExpiryTime: Long = Long.MaxValue + // for blacklisted executors, the node it is on. We do *not* remove from this when executors are + // removed from spark, so we can track when we get multiple successive blacklisted executors on + // one node. --- End diff -- Also can you add a sentence here explaining why you don't expect this to grow to be infinitely large for a long-running app? (assume this is because there will be a max # of executors on each node in this map before the node as a whole will be blacklisted?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72541285 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{HashMap, HashSet} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * specific (executor, task) pairs within a stage, blacklisting entire executors and nodes for a + * stage, and blacklisting executors and nodes across an entire application (with a periodic + * expiry). + * + * The tracker needs to deal with a variety of workloads, eg.: bad user code, which may lead to many + * task failures, but that should not count against individual executors; many small stages, which + * may prevent a bad executor for having many failures within one stage, but still many failures + * over the entire application; "flaky" executors, that don't fail every task, but are still + * faulty; etc. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val EXECUTOR_RECOVERY_MILLIS = BlacklistTracker.getBlacklistExpiryTime(conf) + + // a count of failed tasks for each executor. Only counts failures after tasksets complete + // successfully + private val executorIdToFailureCount: HashMap[String, Int] = new HashMap() + private val executorIdToBlacklistStatus: HashMap[String, BlacklistedExecutor] = new HashMap() + private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new HashMap() --- End diff -- maybe just nodeToBlacklistExpiryTime? I think the key here is a hostname, so ID is slightly misleading (since other IDs we use are 0/1/2/3/4 etc.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72541175 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{HashMap, HashSet} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * specific (executor, task) pairs within a stage, blacklisting entire executors and nodes for a + * stage, and blacklisting executors and nodes across an entire application (with a periodic + * expiry). + * + * The tracker needs to deal with a variety of workloads, eg.: bad user code, which may lead to many + * task failures, but that should not count against individual executors; many small stages, which + * may prevent a bad executor for having many failures within one stage, but still many failures + * over the entire application; "flaky" executors, that don't fail every task, but are still + * faulty; etc. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val EXECUTOR_RECOVERY_MILLIS = BlacklistTracker.getBlacklistExpiryTime(conf) + + // a count of failed tasks for each executor. Only counts failures after tasksets complete + // successfully + private val executorIdToFailureCount: HashMap[String, Int] = new HashMap() + private val executorIdToBlacklistStatus: HashMap[String, BlacklistedExecutor] = new HashMap() + private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new HashMap() + private val _nodeBlacklist: AtomicReference[Set[String]] = new AtomicReference(Set()) --- End diff -- add comment here saying that you keep a copy of the node blacklist in an atomicref so that it can be read in a sep. thread? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14383 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62942/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72541001 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{HashMap, HashSet} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * specific (executor, task) pairs within a stage, blacklisting entire executors and nodes for a --- End diff -- Also it looks like task-specific blacklisting actually isn't handled here? (it's in TSM?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14383 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14383 **[Test build #62942 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62942/consoleFull)** for PR 14383 at commit [`d7d70e6`](https://github.com/apache/spark/commit/d7d70e67dbeaa456d8e0c308f0ddc13a9963b0cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72540910 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{HashMap, HashSet} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * specific (executor, task) pairs within a stage, blacklisting entire executors and nodes for a + * stage, and blacklisting executors and nodes across an entire application (with a periodic + * expiry). + * + * The tracker needs to deal with a variety of workloads, eg.: bad user code, which may lead to many + * task failures, but that should not count against individual executors; many small stages, which + * may prevent a bad executor for having many failures within one stage, but still many failures + * over the entire application; "flaky" executors, that don't fail every task, but are still + * faulty; etc. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val EXECUTOR_RECOVERY_MILLIS = BlacklistTracker.getBlacklistExpiryTime(conf) + + // a count of failed tasks for each executor. Only counts failures after tasksets complete + // successfully + private val executorIdToFailureCount: HashMap[String, Int] = new HashMap() + private val executorIdToBlacklistStatus: HashMap[String, BlacklistedExecutor] = new HashMap() + private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new HashMap() + private val _nodeBlacklist: AtomicReference[Set[String]] = new AtomicReference(Set()) + private var nextExpiryTime: Long = Long.MaxValue --- End diff -- docstring here? "Time when the next blacklist will expire. Used to avoid checking for expired blacklist entries when none will have expired." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14383 **[Test build #62941 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62941/consoleFull)** for PR 14383 at commit [`9a49580`](https://github.com/apache/spark/commit/9a49580549a3980364ed225b77a6e0342ef4d90c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72539715 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -800,6 +832,78 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() } + private[scheduler] def updateBlacklistForFailedTask( + host: String, + exec: String, + index: Int): Unit = { +val failureStatus = execToFailures.getOrElseUpdate(exec, new FailureStatus(host)) +failureStatus.totalFailures += 1 +failureStatus.tasksWithFailures += index + +// check if this task has also failed on other executors on the same host, and if so, blacklist +// this task from the host +val execsWithFailuresOnNode = nodesToExecsWithFailures.getOrElseUpdate(host, new HashSet()) +execsWithFailuresOnNode += exec +val failuresOnHost = (for { + exec <- execsWithFailuresOnNode.toIterator + failures <- execToFailures.get(exec) +} yield { + if (failures.tasksWithFailures.contains(index)) 1 else 0 +}).sum +if (failuresOnHost >= MAX_TASK_ATTEMPTS_PER_NODE) { + nodeBlacklistedTasks.getOrElseUpdate(host, new HashSet()) += index +} + +if (failureStatus.totalFailures >= MAX_FAILURES_PER_EXEC_STAGE) { --- End diff -- This logic seems to be repeated in at least one other place (where you first blacklist something on an executor, and then escalate the issue to the host). I wonder if there's a helper class that could be added that would handle this in both cases (i.e., for a specific task, and for a whole stage, and I think also for the app)? Maybe there's not enough shared logic? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72540361 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -217,18 +219,34 @@ private[yarn] class YarnAllocator( * @param localityAwareTasks number of locality aware tasks to be used as container placement hint * @param hostToLocalTaskCount a map of preferred hostname to possible task counts to be used as * container placement hint. + * @param nodeBlacklist a set of blacklisted node to avoid allocating new container on them. It + * will be used to update AM blacklist. * @return Whether the new requested total is different than the old value. */ def requestTotalExecutorsWithPreferredLocalities( requestedTotal: Int, localityAwareTasks: Int, - hostToLocalTaskCount: Map[String, Int]): Boolean = synchronized { + hostToLocalTaskCount: Map[String, Int], + nodeBlacklist: Set[String]): Boolean = synchronized { this.numLocalityAwareTasks = localityAwareTasks this.hostToLocalTaskCounts = hostToLocalTaskCount if (requestedTotal != targetNumExecutors) { logInfo(s"Driver requested a total number of $requestedTotal executor(s).") targetNumExecutors = requestedTotal + + // Update blacklist infomation to YARN ResouceManager for this application, + // in order to avoid allocating new Container on the problematic nodes. --- End diff -- Containers --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72540337 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -217,18 +219,34 @@ private[yarn] class YarnAllocator( * @param localityAwareTasks number of locality aware tasks to be used as container placement hint * @param hostToLocalTaskCount a map of preferred hostname to possible task counts to be used as * container placement hint. + * @param nodeBlacklist a set of blacklisted node to avoid allocating new container on them. It --- End diff -- a set of blacklisted nodes, which is passed in to avoid allocating new containers on them. It will be used to update the application manager (? right terminology?) blacklist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72540080 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -281,15 +323,216 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!failedTaskSet) } + test("scheduled tasks obey task and stage blacklists") { +val blacklist = mock[BlacklistTracker] +taskScheduler = setupSchedulerWithMockTsm(blacklist) +(0 to 2).foreach { stageId => + val taskSet = FakeTask.createTaskSet(numTasks = 2, stageId = stageId, stageAttemptId = 0) + taskScheduler.submitTasks(taskSet) +} + +val offers = Seq( + new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1), + new WorkerOffer("executor2", "host1", 1), + new WorkerOffer("executor3", "host2", 10) +) + +// setup our mock blacklist: +// stage 0 is blacklisted on node "host1" +// stage 1 is blacklisted on executor "executor3" +// stage 0, part 0 is blacklisted on executor 0 +// (later stubs take precedence over earlier ones) +when(blacklist.isNodeBlacklisted(anyString())).thenReturn(false) +when(blacklist.isExecutorBlacklisted(anyString())).thenReturn(false) +// setup some defaults, then override them with particulars +stageToMockTsm.values.foreach { tsm => + when(tsm.isNodeBlacklistedForTaskSet(anyString())).thenReturn(false) + when(tsm.isExecutorBlacklistedForTaskSet(anyString())).thenReturn(false) + when(tsm.isExecutorBlacklistedForTask(anyString(), anyInt())).thenReturn(false) --- End diff -- Do you need the isExecutorBlacklistedForTask here (and below)? wondering since it looks like it's never called outside the TSM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #62950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62950/consoleFull)** for PR 14384 at commit [`962e12f`](https://github.com/apache/spark/commit/962e12fccc7969e9a009af9f49bf7f6b9268b08a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14383 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62941/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14383 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72539615 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -800,6 +832,78 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() } + private[scheduler] def updateBlacklistForFailedTask( + host: String, + exec: String, + index: Int): Unit = { +val failureStatus = execToFailures.getOrElseUpdate(exec, new FailureStatus(host)) +failureStatus.totalFailures += 1 +failureStatus.tasksWithFailures += index + +// check if this task has also failed on other executors on the same host, and if so, blacklist +// this task from the host +val execsWithFailuresOnNode = nodesToExecsWithFailures.getOrElseUpdate(host, new HashSet()) +execsWithFailuresOnNode += exec +val failuresOnHost = (for { + exec <- execsWithFailuresOnNode.toIterator + failures <- execToFailures.get(exec) +} yield { + if (failures.tasksWithFailures.contains(index)) 1 else 0 +}).sum --- End diff -- I personally find Scala's `for` incredibly hard to read, so +1 for `map`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14384 **[Test build #62949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62949/consoleFull)** for PR 14384 at commit [`3939ccc`](https://github.com/apache/spark/commit/3939ccce37cd899edf007d0ac3b9589748db0bc3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72539450 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{HashMap, HashSet} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * specific (executor, task) pairs within a stage, blacklisting entire executors and nodes for a + * stage, and blacklisting executors and nodes across an entire application (with a periodic + * expiry). + * + * The tracker needs to deal with a variety of workloads, eg.: bad user code, which may lead to many + * task failures, but that should not count against individual executors; many small stages, which + * may prevent a bad executor for having many failures within one stage, but still many failures + * over the entire application; "flaky" executors, that don't fail every task, but are still + * faulty; etc. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val EXECUTOR_RECOVERY_MILLIS = BlacklistTracker.getBlacklistExpiryTime(conf) + + // a count of failed tasks for each executor. Only counts failures after tasksets complete + // successfully + private val executorIdToFailureCount: HashMap[String, Int] = new HashMap() + private val executorIdToBlacklistStatus: HashMap[String, BlacklistedExecutor] = new HashMap() + private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new HashMap() + private val _nodeBlacklist: AtomicReference[Set[String]] = new AtomicReference(Set()) + private var nextExpiryTime: Long = Long.MaxValue + // for blacklisted executors, the node it is on. We do *not* remove from this when executors are + // removed from spark, so we can track when we get multiple successive blacklisted executors on + // one node. + val nodeToFailedExecs: HashMap[String, HashSet[String]] = new HashMap() + + def expireExecutorsInBlacklist(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + val execsToClear = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToClear.nonEmpty) { +logInfo(s"Removing executors $execsToClear from blacklist during periodic recovery") +execsToClear.foreach { exec => + val status = executorIdToBlacklistStatus.remove(exec).get + val failedExecsOnNode = nodeToFailedExecs(status.node) + failedExecsOnNode.remove(exec) + if (failedExecsOnNode.isEmpty) { +nodeToFailedExecs.remove(status.node) + } +} + } + if (executorIdToBlacklistStatus.nonEmpty) { +nextExpiryTime = executorIdToBlacklistStatus.map{_._2.expiryTime}.min + } else { +nextExpiryTime = Long.MaxValue + } + val nodesToClear = nodeIdToBlacklistExpiryTime.filter(_._2 <
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72539148 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -800,6 +832,78 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() } + private[scheduler] def updateBlacklistForFailedTask( + host: String, + exec: String, + index: Int): Unit = { +val failureStatus = execToFailures.getOrElseUpdate(exec, new FailureStatus(host)) +failureStatus.totalFailures += 1 +failureStatus.tasksWithFailures += index + +// check if this task has also failed on other executors on the same host, and if so, blacklist +// this task from the host +val execsWithFailuresOnNode = nodesToExecsWithFailures.getOrElseUpdate(host, new HashSet()) +execsWithFailuresOnNode += exec +val failuresOnHost = (for { + exec <- execsWithFailuresOnNode.toIterator + failures <- execToFailures.get(exec) +} yield { + if (failures.tasksWithFailures.contains(index)) 1 else 0 +}).sum --- End diff -- I always prefer map to yield (but maybe I'm in the minority, in which case leave it?): execsWithFailuresOnNode.map { val failures = execToFailures.get(_) if (failures.tasksWithFailures.contains(index)) 1 else 0 }.sum --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72538818 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -611,15 +620,31 @@ private[spark] class TaskSetManager( // If no executors have registered yet, don't abort the stage, just wait. We probably // got here because a task set was added before the executors registered. -if (executors.nonEmpty) { +if (executorsByHost.nonEmpty) { // take any task that needs to be scheduled, and see if we can find some executor it *could* // run on - pendingTask.foreach { taskId => -if (executors.forall(executorIsBlacklisted(_, taskId))) { - val execs = executors.toIndexedSeq.sorted.mkString("(", ",", ")") - val partition = tasks(taskId).partitionId - abort(s"Aborting ${taskSet} because task $taskId (partition $partition)" + -s" has already failed on executors $execs, and no other executors are available.") + pendingTask.foreach { indexInTaskSet => +// try to find some executor this task can run on. Its possible that some *other* +// task isn't schedulable anywhere, but we will discover that in some later call, +// when that unschedulable task is the last task remaining. +val blacklistedEverywhere = executorsByHost.forall { case (host, execs) => + val nodeBlacklisted = blacklist.isNodeBlacklisted(host) || +isNodeBlacklistedForTaskSet(host) || +isNodeBlacklistedForTask(host, indexInTaskSet) + if (nodeBlacklisted) { +true + } else { +execs.forall { exec => + blacklist.isExecutorBlacklisted(exec) || +isExecutorBlacklistedForTaskSet(exec) || +isExecutorBlacklistedForTask(exec, indexInTaskSet) +} + } +} +if (blacklistedEverywhere) { + val partition = tasks(indexInTaskSet).partitionId + abort(s"Aborting ${taskSet} because task $indexInTaskSet (partition $partition) cannot " + +s"run anywhere due to node and executor blacklist.") --- End diff -- Maybe refer to the config options here? Anticipating confused users... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14384 [Spark-16443][SparkR] Alternating Least Squares (ALS) wrapper ## What changes were proposed in this pull request? Add Alternating Least Squares wrapper in SparkR. Unit tests have been updated. ## How was this patch tested? SparkR unit tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) ![screen shot 2016-07-27 at 3 50 46 pm](https://cloud.githubusercontent.com/assets/15318264/17195348/f7a7d452-5411-11e6-845f-6d292283bc28.png) ![screen shot 2016-07-27 at 3 50 31 pm](https://cloud.githubusercontent.com/assets/15318264/17195347/f7a6352a-5411-11e6-8e21-61a48070192a.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16443 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14384.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14384 commit ecf918546d3d25b60ece78cad42ec41c7c188f3d Author: Junyang QianDate: 2016-07-08T21:56:31Z ALS main added to mllib.R commit 766b55b464263879d90a0cce6f0d3215f3dba1e3 Author: Junyang Qian Date: 2016-07-11T20:04:09Z minimal wrapper commit e4cd463486577d544ec096fe59a603b33374a11c Author: Junyang Qian Date: 2016-07-21T22:03:25Z clean comments commit d0dfe7d21e99b8440de255a6a55f34971d08163f Author: Junyang Qian Date: 2016-07-25T18:31:40Z first set commit 7d0c139e427f8f6fef4ed8f6380c966ad998da39 Author: Junyang Qian Date: 2016-07-25T18:56:22Z add spark.als to namespace commit 243defa8c5a95923cdfab4d306624150d01b6121 Author: Junyang Qian Date: 2016-07-27T04:51:27Z ALS wrapper with summary, predict and write.ml commit a3cca04c54596ea466cd98edc4ee5e4611f207df Author: Junyang Qian Date: 2016-07-27T17:05:42Z fix typo in reading model commit d785b2a93e4e68661b661e08641bbb01dee8875f Author: Junyang Qian Date: 2016-07-27T21:10:52Z allow more arguments to als commit 3939ccce37cd899edf007d0ac3b9589748db0bc3 Author: Junyang Qian Date: 2016-07-27T22:15:00Z fix type issue in mllib.R --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72538697 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -770,9 +794,19 @@ private[spark] class TaskSetManager( logError("Unknown TaskEndReason: " + e) None } -// always add to failed executors -failedExecutors.getOrElseUpdate(index, new HashMap[String, Long]()). - put(info.executorId, clock.getTimeMillis()) + +// we might rack up a bunch of fetch-failures in rapid succession, due to a bad node. But +// that bad node will get handled separately by spark's stage-failure handling mechanism. It +// shouldn't penalize *this* executor at all, so don't count it as a task-failure as far as +// the blacklist is concerned. +val countTowardsTaskFailures = reason match { + case fail: TaskFailedReason => fail.countTowardsTaskFailures --- End diff -- the code on the old line 710 makes it seem like we always expect this to be a TaskFailedReason (should the param type just be changed?)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72538489 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -677,8 +702,9 @@ private[spark] class TaskSetManager( } if (!successful(index)) { tasksSuccessful += 1 - logInfo("Finished task %s in stage %s (TID %d) in %d ms on %s (%d/%d)".format( -info.id, taskSet.id, info.taskId, info.duration, info.host, tasksSuccessful, numTasks)) + logInfo("Finished task %s in stage %s (TID %d) in %d ms on %s / exec %s (%d/%d)".format( --- End diff -- maybe "(executor %s)" (favoring parens / verbosity since this is a message that does commonly pop up to users in the console --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72538532 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -706,16 +731,15 @@ private[spark] class TaskSetManager( val index = info.index copiesRunning(index) -= 1 var accumUpdates: Seq[AccumulatorV2[_, _]] = Seq.empty -val failureReason = s"Lost task ${info.id} in stage ${taskSet.id} (TID $tid, ${info.host}): " + - reason.asInstanceOf[TaskFailedReason].toErrorString +val failureReason = s"Lost task ${info.id} in stage ${taskSet.id} (TID $tid, ${info.host}," + + s" exec ${info.executorId}): ${reason.asInstanceOf[TaskFailedReason].toErrorString}" --- End diff -- executor again (same thought as above) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72538331 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -611,15 +620,31 @@ private[spark] class TaskSetManager( // If no executors have registered yet, don't abort the stage, just wait. We probably // got here because a task set was added before the executors registered. -if (executors.nonEmpty) { +if (executorsByHost.nonEmpty) { // take any task that needs to be scheduled, and see if we can find some executor it *could* // run on - pendingTask.foreach { taskId => -if (executors.forall(executorIsBlacklisted(_, taskId))) { - val execs = executors.toIndexedSeq.sorted.mkString("(", ",", ")") - val partition = tasks(taskId).partitionId - abort(s"Aborting ${taskSet} because task $taskId (partition $partition)" + -s" has already failed on executors $execs, and no other executors are available.") + pendingTask.foreach { indexInTaskSet => +// try to find some executor this task can run on. Its possible that some *other* +// task isn't schedulable anywhere, but we will discover that in some later call, +// when that unschedulable task is the last task remaining. +val blacklistedEverywhere = executorsByHost.forall { case (host, execs) => + val nodeBlacklisted = blacklist.isNodeBlacklisted(host) || +isNodeBlacklistedForTaskSet(host) || +isNodeBlacklistedForTask(host, indexInTaskSet) + if (nodeBlacklisted) { +true + } else { +execs.forall { exec => --- End diff -- and "Check if the task can run on any of the executors" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14355 **[Test build #62948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62948/consoleFull)** for PR 14355 at commit [`71e8ec9`](https://github.com/apache/spark/commit/71e8ec90413a58c25bd7b5c612a9821a797652b3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72538301 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -611,15 +620,31 @@ private[spark] class TaskSetManager( // If no executors have registered yet, don't abort the stage, just wait. We probably // got here because a task set was added before the executors registered. -if (executors.nonEmpty) { +if (executorsByHost.nonEmpty) { // take any task that needs to be scheduled, and see if we can find some executor it *could* // run on - pendingTask.foreach { taskId => -if (executors.forall(executorIsBlacklisted(_, taskId))) { - val execs = executors.toIndexedSeq.sorted.mkString("(", ",", ")") - val partition = tasks(taskId).partitionId - abort(s"Aborting ${taskSet} because task $taskId (partition $partition)" + -s" has already failed on executors $execs, and no other executors are available.") + pendingTask.foreach { indexInTaskSet => +// try to find some executor this task can run on. Its possible that some *other* +// task isn't schedulable anywhere, but we will discover that in some later call, +// when that unschedulable task is the last task remaining. +val blacklistedEverywhere = executorsByHost.forall { case (host, execs) => + val nodeBlacklisted = blacklist.isNodeBlacklisted(host) || --- End diff -- Add a comment above here with something like "Check if the task can run on the node" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62939/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14275 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72538143 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -611,15 +620,31 @@ private[spark] class TaskSetManager( // If no executors have registered yet, don't abort the stage, just wait. We probably // got here because a task set was added before the executors registered. -if (executors.nonEmpty) { +if (executorsByHost.nonEmpty) { // take any task that needs to be scheduled, and see if we can find some executor it *could* --- End diff -- Can you move this comment up to the beginning of the method (above where pendingTask is defined) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14275 **[Test build #62939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62939/consoleFull)** for PR 14275 at commit [`ec6707e`](https://github.com/apache/spark/commit/ec6707e35784113896063f5b5aec042e2e54ecb1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14355 Hi, @hvanhovell . The logic is refactored into a function now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72537531 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -266,19 +278,11 @@ private[spark] class TaskSetManager( taskAttempts(taskIndex).exists(_.host == host) } - /** - * Is this re-execution of a failed task on an executor it already failed in before - * EXECUTOR_TASK_BLACKLIST_TIMEOUT has elapsed ? - */ - private[scheduler] def executorIsBlacklisted(execId: String, taskId: Int): Boolean = { -if (failedExecutors.contains(taskId)) { - val failed = failedExecutors.get(taskId).get - - return failed.contains(execId) && -clock.getTimeMillis() - failed.get(execId).get < EXECUTOR_TASK_BLACKLIST_TIMEOUT -} - -false + private def blacklistedOnExec(execId: String, host: String, index: Int): Boolean = { +blacklistTracker.map { bl => --- End diff -- you can use _ since b1 is never used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14266 **[Test build #62947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62947/consoleFull)** for PR 14266 at commit [`4944b29`](https://github.com/apache/spark/commit/4944b29558ba8f3563f1c0c3d1a485b29dcdc39b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72537361 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -266,19 +278,11 @@ private[spark] class TaskSetManager( taskAttempts(taskIndex).exists(_.host == host) } - /** - * Is this re-execution of a failed task on an executor it already failed in before - * EXECUTOR_TASK_BLACKLIST_TIMEOUT has elapsed ? - */ - private[scheduler] def executorIsBlacklisted(execId: String, taskId: Int): Boolean = { -if (failedExecutors.contains(taskId)) { - val failed = failedExecutors.get(taskId).get - - return failed.contains(execId) && -clock.getTimeMillis() - failed.get(execId).get < EXECUTOR_TASK_BLACKLIST_TIMEOUT -} - -false + private def blacklistedOnExec(execId: String, host: String, index: Int): Boolean = { --- End diff -- can you call this taskIsBlacklistedOnExec? (to make it obvious this is task-specific) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72537255 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -83,8 +85,15 @@ private[spark] class TaskSetManager( val copiesRunning = new Array[Int](numTasks) val successful = new Array[Boolean](numTasks) private val numFailures = new Array[Int](numTasks) - // key is taskId (aka TaskInfo.index), value is a Map of executor id to when it failed - private val failedExecutors = new HashMap[Int, HashMap[String, Long]]() + val execToFailures: HashMap[String, FailureStatus] = new HashMap() + /** + * Map from node to all executors on it with failures. Needed because we want to know about + * executors on a node even after they have died. + */ + private val nodesToExecsWithFailures: HashMap[String, HashSet[String]] = new HashMap() + private val nodeBlacklistedTasks: HashMap[String, HashSet[Int]] = new HashMap() --- End diff -- nodeToBlacklistedTasks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72537224 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -83,8 +85,15 @@ private[spark] class TaskSetManager( val copiesRunning = new Array[Int](numTasks) val successful = new Array[Boolean](numTasks) private val numFailures = new Array[Int](numTasks) - // key is taskId (aka TaskInfo.index), value is a Map of executor id to when it failed - private val failedExecutors = new HashMap[Int, HashMap[String, Long]]() + val execToFailures: HashMap[String, FailureStatus] = new HashMap() + /** + * Map from node to all executors on it with failures. Needed because we want to know about + * executors on a node even after they have died. + */ + private val nodesToExecsWithFailures: HashMap[String, HashSet[String]] = new HashMap() --- End diff -- nodeToExecsWithFailures? (make plural-ness consistent with nodeToBlacklistedTasks) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Excep...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14355#discussion_r72536913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -280,6 +267,33 @@ trait CheckAnalysis extends PredicateHelper { case p if p.expressions.exists(PredicateSubquery.hasPredicateSubquery) => failAnalysis(s"Predicate sub-queries can only be used in a Filter: $p") + case _: Union | _: SetOperation if operator.children.length > 1 => +def dataTypes(plan: LogicalPlan): Seq[DataType] = plan.output.map(_.dataType) +val ref = dataTypes(operator.children.head) +operator.children.tail.zipWithIndex.foreach { case (child, ti) => + // Check the number of columns + if (child.output.length != ref.length) { +failAnalysis( + s""" +|${operator.nodeName} can only be performed on tables with the same number +|of columns, but the first table has ${ref.length} columns and +|the ${if (ti == 0) "second" else (ti + 2) + "th"} table has --- End diff -- Oh, sure. I'll fix soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72536830 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -552,12 +614,17 @@ private[spark] class TaskSchedulerImpl( executorIdToHost -= executorId rootPool.executorLost(executorId, host, reason) } +blacklistTracker.foreach(_.removeExecutor(executorId)) } def executorAdded(execId: String, host: String) { dagScheduler.executorAdded(execId, host) } + def getHostForExecutor(execId: String): String = synchronized { --- End diff -- I think this is no longer used / can be removed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r72536726 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.util.concurrent.atomic.AtomicReference + +import scala.collection.mutable.{HashMap, HashSet} + +import org.apache.spark.SparkConf +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config +import org.apache.spark.util.{Clock, SystemClock, Utils} + +/** + * BlacklistTracker is designed to track problematic executors and nodes. It supports blacklisting + * specific (executor, task) pairs within a stage, blacklisting entire executors and nodes for a + * stage, and blacklisting executors and nodes across an entire application (with a periodic + * expiry). + * + * The tracker needs to deal with a variety of workloads, eg.: bad user code, which may lead to many + * task failures, but that should not count against individual executors; many small stages, which + * may prevent a bad executor for having many failures within one stage, but still many failures + * over the entire application; "flaky" executors, that don't fail every task, but are still + * faulty; etc. + * + * THREADING: As with most helpers of TaskSchedulerImpl, this is not thread-safe. Though it is + * called by multiple threads, callers must already have a lock on the TaskSchedulerImpl. The + * one exception is [[nodeBlacklist()]], which can be called without holding a lock. + */ +private[scheduler] class BlacklistTracker ( +conf: SparkConf, +clock: Clock = new SystemClock()) extends Logging { + + private val MAX_FAILURES_PER_EXEC = conf.get(config.MAX_FAILURES_PER_EXEC) + private val MAX_FAILED_EXEC_PER_NODE = conf.get(config.MAX_FAILED_EXEC_PER_NODE) + val EXECUTOR_RECOVERY_MILLIS = BlacklistTracker.getBlacklistExpiryTime(conf) + + // a count of failed tasks for each executor. Only counts failures after tasksets complete + // successfully + private val executorIdToFailureCount: HashMap[String, Int] = new HashMap() + private val executorIdToBlacklistStatus: HashMap[String, BlacklistedExecutor] = new HashMap() + private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new HashMap() + private val _nodeBlacklist: AtomicReference[Set[String]] = new AtomicReference(Set()) + private var nextExpiryTime: Long = Long.MaxValue + // for blacklisted executors, the node it is on. We do *not* remove from this when executors are + // removed from spark, so we can track when we get multiple successive blacklisted executors on + // one node. + val nodeToFailedExecs: HashMap[String, HashSet[String]] = new HashMap() + + def expireExecutorsInBlacklist(): Unit = { +val now = clock.getTimeMillis() +// quickly check if we've got anything to expire from blacklist -- if not, avoid doing any work +if (now > nextExpiryTime) { + val execsToClear = executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys + if (execsToClear.nonEmpty) { +logInfo(s"Removing executors $execsToClear from blacklist during periodic recovery") +execsToClear.foreach { exec => + val status = executorIdToBlacklistStatus.remove(exec).get + val failedExecsOnNode = nodeToFailedExecs(status.node) + failedExecsOnNode.remove(exec) + if (failedExecsOnNode.isEmpty) { +nodeToFailedExecs.remove(status.node) + } +} + } + if (executorIdToBlacklistStatus.nonEmpty) { +nextExpiryTime = executorIdToBlacklistStatus.map{_._2.expiryTime}.min + } else { +nextExpiryTime = Long.MaxValue + } + val nodesToClear = nodeIdToBlacklistExpiryTime.filter(_._2 <
[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 Hi, @cloud-fan , @yhuai , @hvanhovell , and @rxin . The remaining comment is about matching algorithm. While digging Hive's dead code, I suddenly feel it's useless because Hive does not use it any more. - Case 1: I think we agree to use DFS to match the first `t1` in the following query. ``` select /*+ MAPJOIN(t1) */ * from t1 JOIN t1 JOIN t1 JOIN t1 == Optimized Logical Plan == Join Inner :- Join Inner : :- Join Inner : : :- BroadcastHint : : : +- Range (0, 10, splits=8) : : +- Range (0, 10, splits=8) : +- Range (0, 10, splits=8) +- Range (0, 10, splits=8) ``` - Case 2: What should we do for the following query? (The following is just the current behaviors) ``` SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x JOIN t1 == Optimized Logical Plan == Join Inner :- BroadcastHint : +- Range (0, 10, splits=8) +- Range (0, 10, splits=8) ``` Until now, I thought the current behavior is an intuitive result. But, if there is another direction for `case 2`, I will try to find a way satisfying both cases. How can I proceed this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14176 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14176 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62944/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14176 **[Test build #62944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62944/consoleFull)** for PR 14176 at commit [`7194394`](https://github.com/apache/spark/commit/71943941ebe548e0a2c66d633893b7e2196b94a6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org