[GitHub] spark pull request #14374: [SPARK-16735][SQL] `map` should create a decimal ...

2016-07-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14374#discussion_r72567433
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -93,20 +93,45 @@ case class CreateMap(children: Seq[Expression]) extends 
Expression {
 if (children.size % 2 != 0) {
   TypeCheckResult.TypeCheckFailure(s"$prettyName expects a positive 
even number of arguments.")
 } else if (keys.map(_.dataType).distinct.length > 1) {
-  TypeCheckResult.TypeCheckFailure("The given keys of function map 
should all be the same " +
-"type, but they are " + 
keys.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  if (keys.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure("The given keys of function map 
should all be the same " +
+  "type, but they are " + 
keys.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  }
 } else if (values.map(_.dataType).distinct.length > 1) {
-  TypeCheckResult.TypeCheckFailure("The given values of function map 
should all be the same " +
-"type, but they are " + 
values.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  if (values.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure("The given values of function map 
should all be the " +
+  "same type, but they are " + 
values.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  }
 } else {
   TypeCheckResult.TypeCheckSuccess
 }
   }
 
+  private def checkDecimalType(colType: Seq[Expression]): DataType = {
+val elementType = 
colType.headOption.map(_.dataType).getOrElse(NullType)
+elementType match {
+  case _ if elementType.isInstanceOf[DecimalType] =>
+var tighter: DataType = elementType
+colType.foreach { child =>
+  if 
(elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) {
--- End diff --

What I was referring to was that isTighterThan was not associative, and i 
don't think you can just take the tightest one this way.

As an example:

a precision 10, scale 5
b precision 7, scale 1

in this case a is not tighter than b, but b would be chosen as the target 
data type, leading to lose of precision.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14353
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14353
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62954/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14353
  
**[Test build #62954 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62954/consoleFull)**
 for PR 14353 at commit 
[`a095389`](https://github.com/apache/spark/commit/a0953891e0d1b80ee8aa2e7d24409f987ba76a83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...

2016-07-27 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/14364#discussion_r72565468
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -452,14 +466,37 @@ object FunctionRegistry {
   }
 }
 
-val clazz = tag.runtimeClass
+(name, (expressionInfo[T](name), builder))
+  }
+
+  /**
+   * Creates a function registry lookup entry for cast aliases 
(SPARK-16730).
+   * For example, if name is "int", and dataType is IntegerType, this 
means int(x) would become
+   * an alias for cast(x as IntegerType).
+   * See usage above.
+   */
+  private def castAlias(
+  name: String,
+  dataType: DataType): (String, (ExpressionInfo, FunctionBuilder)) = {
+val builder = (args: Seq[Expression]) => {
+  if (args.size != 1) {
+throw new AnalysisException(s"Function $name accepts only one 
argument")
+  }
+  Cast(args.head, dataType)
+}
+(name, (expressionInfo[Cast](name), builder))
--- End diff --

Yes - this is a limitation. That's not what Hive does because Hive actually 
does not have a single cast expression. It has a cast expression for each 
target type. I think it's a pretty small detail and fixing it would require a 
lot of work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13680
  
We need inputs from @davies , but unfortunately he is on vacation now...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14364: [SPARK-16730][SQL] Implement function aliases for...

2016-07-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14364#discussion_r72563566
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -452,14 +466,37 @@ object FunctionRegistry {
   }
 }
 
-val clazz = tag.runtimeClass
+(name, (expressionInfo[T](name), builder))
+  }
+
+  /**
+   * Creates a function registry lookup entry for cast aliases 
(SPARK-16730).
+   * For example, if name is "int", and dataType is IntegerType, this 
means int(x) would become
+   * an alias for cast(x as IntegerType).
+   * See usage above.
+   */
+  private def castAlias(
+  name: String,
+  dataType: DataType): (String, (ExpressionInfo, FunctionBuilder)) = {
+val builder = (args: Seq[Expression]) => {
+  if (args.size != 1) {
+throw new AnalysisException(s"Function $name accepts only one 
argument")
+  }
+  Cast(args.head, dataType)
+}
+(name, (expressionInfo[Cast](name), builder))
--- End diff --

so whatever cast function we describe, we will always show `Cast`'s 
description right? Is it same with hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14132
  
for case 2, what does hive(or other databases that support sql hint) do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-27 Thread zasdfgbnm
Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72562015
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

This won't work. `build_command` is executed in a subshell. The `exit $?`  
will only terminate the subshell.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14386: [SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14386
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14386: [SPARK-16761][DOC][ML] Fix doc link in docs/ml-gu...

2016-07-27 Thread sundapeng
GitHub user sundapeng opened a pull request:

https://github.com/apache/spark/pull/14386

[SPARK-16761][DOC][ML] Fix doc link in docs/ml-guide.md

## What changes were proposed in this pull request?

Fix the link at http://spark.apache.org/docs/latest/ml-guide.html.

## How was this patch tested?

None


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sundapeng/spark doclink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14386


commit 4d1b3c58a4a35339fc088652b242cc6acd3838cb
Author: Sun Dapeng 
Date:   2016-07-28T03:18:47Z

SPARK-16761: Fix doc link in docs/ml-guide.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14378: [SPARK-16750] [ML] Fix GaussianMixture training f...

2016-07-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/14378#discussion_r72560646
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -111,7 +111,7 @@ class MinMaxScaler @Since("1.5.0") (@Since("1.5.0") 
override val uid: String)
 
   @Since("2.0.0")
   override def fit(dataset: Dataset[_]): MinMaxScalerModel = {
-transformSchema(dataset.schema, logging = true)
+transformSchema(dataset.schema)
--- End diff --

It's a good question. Since ```MinMaxScaler``` override 
```transformSchema``` with no argument ```logging```, we should use that one 
rather than the function in the base class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14385
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14385
  
**[Test build #62953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62953/consoleFull)**
 for PR 14385 at commit 
[`c44611d`](https://github.com/apache/spark/commit/c44611dad2a5b7ec27b6584c05ca03ff868ebe08).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14385
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62953/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14353
  
**[Test build #62954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62954/consoleFull)**
 for PR 14353 at commit 
[`a095389`](https://github.com/apache/spark/commit/a0953891e0d1b80ee8aa2e7d24409f987ba76a83).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14353: [SPARK-16714][SQL] `array` should create a decimal array...

2016-07-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14353
  
Rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 ...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14385
  
**[Test build #62953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62953/consoleFull)**
 for PR 14385 at commit 
[`c44611d`](https://github.com/apache/spark/commit/c44611dad2a5b7ec27b6584c05ca03ff868ebe08).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14385: [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafk...

2016-07-27 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/14385

[SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration

## What changes were proposed in this pull request?
Doc for the Kafka 0.10 integration


## How was this patch tested?
Scala code examples were taken from my example repo, so hopefully they 
compile.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 SPARK-16312

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14385


commit c44611dad2a5b7ec27b6584c05ca03ff868ebe08
Author: cody koeninger 
Date:   2016-07-28T03:00:06Z

[SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integration




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14182: [SPARK-16444][WIP][SparkR]: Isotonic Regression w...

2016-07-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/14182#discussion_r72558746
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -454,4 +454,9 @@ test_that("spark.survreg", {
   }
 })
 
+test_that("spark.isotonicRegression", {
+
+})
+
 sparkR.session.stop()
+
--- End diff --

Remove this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14176
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14176
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62952/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14176
  
**[Test build #62952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62952/consoleFull)**
 for PR 14176 at commit 
[`122cf18`](https://github.com/apache/spark/commit/122cf181841046a026c4ffd7fd0c597a03ff30dd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62951/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62951/consoleFull)**
 for PR 14132 at commit 
[`d733393`](https://github.com/apache/spark/commit/d733393d480a9f06b4ceb23292dcd753e061685c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14176
  
**[Test build #62952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62952/consoleFull)**
 for PR 14176 at commit 
[`122cf18`](https://github.com/apache/spark/commit/122cf181841046a026c4ffd7fd0c597a03ff30dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14124
  
@cloud-fan If nullability should be not ignored, then I can fix this PR to 
make them consistent to not ignoring it (and of course I will try to identify 
the related problems). In this case, I will work on what @gatorsmile pointed 
out in https://github.com/apache/spark/pull/14124#issuecomment-231594416 about 
JSON (and will check the other data sources as well).

I will follow your decision.


To cut the all comments above short, (for other reviewers), 
 - The purpose of this PR is whether it should force all schema to nullable 
schema or not.
 - This is already happening with normal reading and writing for data 
sources based on `FileFormat`.
 - This is for both inferred/read schema and user-given schema. 
 - For `json(rdd: RDD[String])` API, this is not hapenning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14355
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14355
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62948/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14355
  
**[Test build #62948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62948/consoleFull)**
 for PR 14355 at commit 
[`71e8ec9`](https://github.com/apache/spark/commit/71e8ec90413a58c25bd7b5c612a9821a797652b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14266
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62947/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14266
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...

2016-07-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14383
  
Hi, @rxin and @hvanhovell .
It's ready for review again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14266
  
**[Test build #62947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62947/consoleFull)**
 for PR 14266 at commit 
[`4944b29`](https://github.com/apache/spark/commit/4944b29558ba8f3563f1c0c3d1a485b29dcdc39b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62951/consoleFull)**
 for PR 14132 at commit 
[`d733393`](https://github.com/apache/spark/commit/d733393d480a9f06b4ceb23292dcd753e061685c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14384
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62950/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14384
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14384
  
**[Test build #62950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62950/consoleFull)**
 for PR 14384 at commit 
[`962e12f`](https://github.com/apache/spark/commit/962e12fccc7969e9a009af9f49bf7f6b9268b08a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14374: [SPARK-16735][SQL] `map` should create a decimal ...

2016-07-27 Thread biglobster
Github user biglobster commented on a diff in the pull request:

https://github.com/apache/spark/pull/14374#discussion_r72544676
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -93,20 +93,45 @@ case class CreateMap(children: Seq[Expression]) extends 
Expression {
 if (children.size % 2 != 0) {
   TypeCheckResult.TypeCheckFailure(s"$prettyName expects a positive 
even number of arguments.")
 } else if (keys.map(_.dataType).distinct.length > 1) {
-  TypeCheckResult.TypeCheckFailure("The given keys of function map 
should all be the same " +
-"type, but they are " + 
keys.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  if (keys.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure("The given keys of function map 
should all be the same " +
+  "type, but they are " + 
keys.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  }
 } else if (values.map(_.dataType).distinct.length > 1) {
-  TypeCheckResult.TypeCheckFailure("The given values of function map 
should all be the same " +
-"type, but they are " + 
values.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  if (values.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure("The given values of function map 
should all be the " +
+  "same type, but they are " + 
values.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+  }
 } else {
   TypeCheckResult.TypeCheckSuccess
 }
   }
 
+  private def checkDecimalType(colType: Seq[Expression]): DataType = {
+val elementType = 
colType.headOption.map(_.dataType).getOrElse(NullType)
+elementType match {
+  case _ if elementType.isInstanceOf[DecimalType] =>
+var tighter: DataType = elementType
+colType.foreach { child =>
+  if 
(elementType.asInstanceOf[DecimalType].isTighterThan(child.dataType)) {
--- End diff --

@rxin  I have  checked this function, and it will not lost any precision or 
range ,it 's safe . can you give me some advise?thank you :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14384
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14384
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62949/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14384
  
**[Test build #62949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62949/consoleFull)**
 for PR 14384 at commit 
[`3939ccc`](https://github.com/apache/spark/commit/3939ccce37cd899edf007d0ac3b9589748db0bc3).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/14079
  
OK done with a pass over this -- overall this looks good.  Sorry for the 
huge number of naming suggestions.  This behavior is subtle/complex and I'm in 
favor of spending some effort to make it as understandable as possible to 
future folks who modify this code.

A few more comments:
 
The design doc mentions some invariants that must be true of the parameters 
(e.g., that some values must be lower than others). Can you enforce those 
invariants in the creation of the BlacklistManager?

It would be helpful to add a comment somewhere describing the division of 
functionality.  Maybe a good place would be in the BlacklistTracker docstring? 
Something that explains that the TaskSetManager is responsible for handling 
blacklisting at the task set and task level, while the TaskSchedulerImpl is 
responsible for handling application-level blacklists?

I added a few high level comments / concerns to the design doc, also.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14380: [SPARK-16485][DOC][ML] Remove useless latex in scaladoc ...

2016-07-27 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14380
  
Do a search for some latex tags and you'll find more, like 
AFTSurvivalRegression.scala, LinearRegression.scala, etc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14231: [SPARK-16586] Change the way the exit code of lau...

2016-07-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14231#discussion_r72543191
  
--- Diff: bin/spark-class ---
@@ -65,24 +65,25 @@ fi
 # characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
 # an array that will be used to exec the final command.
 #
-# The exit code of the launcher is appended to the output, so the parent 
shell removes it from the
-# command array and checks the value to see if the launcher succeeded.
-build_command() {
-  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"
-  printf "%d\0" $?
-}
+# To keep both the output and the exit code of the launcher, the output is 
first converted to a hex
+# dump which prevents the bash from getting rid of the NULL character, and 
the exit code retrieved
+# from the bash array ${PIPESTATUS[@]}.
+#
+# Note that the seperator NULL character can not be replace with space or 
'\n' so that the command
+# won't fail if some path of the user contain special characher such as 
'\n' or space
+#
+# Also note that when the launcher fails, it might not output something 
ending with '\0' [SPARK-16586]
+_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" 
org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
--- End diff --

Eh, I'm talking about something _much_ simpler.

```
build_command() {
  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main 
"$@"
  if [ $? != 0 ]; then 
echo "Couldn't run launcher"
exit $?
  fi
  printf "%d\0" $?
}
```
Does that make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1

2016-07-27 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14379
  
Oh I see, so this is actually packaged, not strictly part of tests. Well we 
should do it anyway. Not sure if the risk actually surfaces in Spark but better 
to update and be safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62946/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62946/consoleFull)**
 for PR 14132 at commit 
[`381045d`](https://github.com/apache/spark/commit/381045da10cb74d0a2b45d743cac16c441a526e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14379
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14379
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62943/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14379: [SPARK-16751] Upgrade derby to 10.12.1.1

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14379
  
**[Test build #62943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62943/consoleFull)**
 for PR 14379 at commit 
[`f3815cf`](https://github.com/apache/spark/commit/f3815cfd1b6b6d7ba29dc8f2e12b1334ad415847).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

2016-07-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14373#discussion_r72541802
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
   def optimize(): Unit = {
 val range = maxKey - minKey
--- End diff --

They're already `Long`s so `range` is too. The difference can overflow 
though. This should correctly handle the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72541736
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -592,7 +599,9 @@ private[spark] class TaskSetManager(
* failures (this is because the method picks on unscheduled task, and 
then iterates through each
* executor until it finds one that the task hasn't failed on already).
*/
-  private[scheduler] def abortIfCompletelyBlacklisted(executors: 
Iterable[String]): Unit = {
+  private[scheduler] def abortIfCompletelyBlacklisted(
+  executorsByHost: HashMap[String, HashSet[String]],
+  blacklist: BlacklistTracker): Unit = {
--- End diff --

looks like you don't need to pass this in anymore, since it's now part of 
the class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72541451
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+/**
+ * BlacklistTracker is designed to track problematic executors and nodes.  
It supports blacklisting
+ * specific (executor, task) pairs within a stage, blacklisting entire 
executors and nodes for a
+ * stage, and blacklisting executors and nodes across an entire 
application (with a periodic
+ * expiry).
+ *
+ * The tracker needs to deal with a variety of workloads, eg.: bad user 
code, which may lead to many
+ * task failures, but that should not count against individual executors; 
many small stages, which
+ * may prevent a bad executor for having many failures within one stage, 
but still many failures
+ * over the entire application; "flaky" executors, that don't fail every 
task, but are still
+ * faulty; etc.
+ *
+ * THREADING: As with most helpers of TaskSchedulerImpl, this is not 
thread-safe.  Though it is
+ * called by multiple threads, callers must already have a lock on the 
TaskSchedulerImpl.  The
+ * one exception is [[nodeBlacklist()]], which can be called without 
holding a lock.
+ */
+private[scheduler] class BlacklistTracker (
+conf: SparkConf,
+clock: Clock = new SystemClock()) extends Logging {
+
+  private val MAX_FAILURES_PER_EXEC = 
conf.get(config.MAX_FAILURES_PER_EXEC)
+  private val MAX_FAILED_EXEC_PER_NODE = 
conf.get(config.MAX_FAILED_EXEC_PER_NODE)
+  val EXECUTOR_RECOVERY_MILLIS = 
BlacklistTracker.getBlacklistExpiryTime(conf)
+
+  // a count of failed tasks for each executor.  Only counts failures 
after tasksets complete
+  // successfully
+  private val executorIdToFailureCount: HashMap[String, Int] = new 
HashMap()
+  private val executorIdToBlacklistStatus: HashMap[String, 
BlacklistedExecutor] = new HashMap()
+  private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new 
HashMap()
+  private val _nodeBlacklist: AtomicReference[Set[String]] = new 
AtomicReference(Set())
+  private var nextExpiryTime: Long = Long.MaxValue
+  // for blacklisted executors, the node it is on.  We do *not* remove 
from this when executors are
+  // removed from spark, so we can track when we get multiple successive 
blacklisted executors on
+  // one node.
--- End diff --

Also can you add a sentence here explaining why you don't expect this to 
grow to be infinitely large for a long-running app? (assume this is because 
there will be a max # of executors on each node in this map before the node as 
a whole will be blacklisted?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72541285
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+/**
+ * BlacklistTracker is designed to track problematic executors and nodes.  
It supports blacklisting
+ * specific (executor, task) pairs within a stage, blacklisting entire 
executors and nodes for a
+ * stage, and blacklisting executors and nodes across an entire 
application (with a periodic
+ * expiry).
+ *
+ * The tracker needs to deal with a variety of workloads, eg.: bad user 
code, which may lead to many
+ * task failures, but that should not count against individual executors; 
many small stages, which
+ * may prevent a bad executor for having many failures within one stage, 
but still many failures
+ * over the entire application; "flaky" executors, that don't fail every 
task, but are still
+ * faulty; etc.
+ *
+ * THREADING: As with most helpers of TaskSchedulerImpl, this is not 
thread-safe.  Though it is
+ * called by multiple threads, callers must already have a lock on the 
TaskSchedulerImpl.  The
+ * one exception is [[nodeBlacklist()]], which can be called without 
holding a lock.
+ */
+private[scheduler] class BlacklistTracker (
+conf: SparkConf,
+clock: Clock = new SystemClock()) extends Logging {
+
+  private val MAX_FAILURES_PER_EXEC = 
conf.get(config.MAX_FAILURES_PER_EXEC)
+  private val MAX_FAILED_EXEC_PER_NODE = 
conf.get(config.MAX_FAILED_EXEC_PER_NODE)
+  val EXECUTOR_RECOVERY_MILLIS = 
BlacklistTracker.getBlacklistExpiryTime(conf)
+
+  // a count of failed tasks for each executor.  Only counts failures 
after tasksets complete
+  // successfully
+  private val executorIdToFailureCount: HashMap[String, Int] = new 
HashMap()
+  private val executorIdToBlacklistStatus: HashMap[String, 
BlacklistedExecutor] = new HashMap()
+  private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new 
HashMap()
--- End diff --

maybe just nodeToBlacklistExpiryTime? I think the key here is a hostname, 
so ID is slightly misleading (since other IDs we use are 0/1/2/3/4 etc.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72541175
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+/**
+ * BlacklistTracker is designed to track problematic executors and nodes.  
It supports blacklisting
+ * specific (executor, task) pairs within a stage, blacklisting entire 
executors and nodes for a
+ * stage, and blacklisting executors and nodes across an entire 
application (with a periodic
+ * expiry).
+ *
+ * The tracker needs to deal with a variety of workloads, eg.: bad user 
code, which may lead to many
+ * task failures, but that should not count against individual executors; 
many small stages, which
+ * may prevent a bad executor for having many failures within one stage, 
but still many failures
+ * over the entire application; "flaky" executors, that don't fail every 
task, but are still
+ * faulty; etc.
+ *
+ * THREADING: As with most helpers of TaskSchedulerImpl, this is not 
thread-safe.  Though it is
+ * called by multiple threads, callers must already have a lock on the 
TaskSchedulerImpl.  The
+ * one exception is [[nodeBlacklist()]], which can be called without 
holding a lock.
+ */
+private[scheduler] class BlacklistTracker (
+conf: SparkConf,
+clock: Clock = new SystemClock()) extends Logging {
+
+  private val MAX_FAILURES_PER_EXEC = 
conf.get(config.MAX_FAILURES_PER_EXEC)
+  private val MAX_FAILED_EXEC_PER_NODE = 
conf.get(config.MAX_FAILED_EXEC_PER_NODE)
+  val EXECUTOR_RECOVERY_MILLIS = 
BlacklistTracker.getBlacklistExpiryTime(conf)
+
+  // a count of failed tasks for each executor.  Only counts failures 
after tasksets complete
+  // successfully
+  private val executorIdToFailureCount: HashMap[String, Int] = new 
HashMap()
+  private val executorIdToBlacklistStatus: HashMap[String, 
BlacklistedExecutor] = new HashMap()
+  private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new 
HashMap()
+  private val _nodeBlacklist: AtomicReference[Set[String]] = new 
AtomicReference(Set())
--- End diff --

add comment here saying that you keep a copy of the node blacklist in an 
atomicref so that it can be read in a sep. thread?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14383
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62942/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72541001
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+/**
+ * BlacklistTracker is designed to track problematic executors and nodes.  
It supports blacklisting
+ * specific (executor, task) pairs within a stage, blacklisting entire 
executors and nodes for a
--- End diff --

Also it looks like task-specific blacklisting actually isn't handled here? 
(it's in TSM?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14383
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14383
  
**[Test build #62942 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62942/consoleFull)**
 for PR 14383 at commit 
[`d7d70e6`](https://github.com/apache/spark/commit/d7d70e67dbeaa456d8e0c308f0ddc13a9963b0cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72540910
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+/**
+ * BlacklistTracker is designed to track problematic executors and nodes.  
It supports blacklisting
+ * specific (executor, task) pairs within a stage, blacklisting entire 
executors and nodes for a
+ * stage, and blacklisting executors and nodes across an entire 
application (with a periodic
+ * expiry).
+ *
+ * The tracker needs to deal with a variety of workloads, eg.: bad user 
code, which may lead to many
+ * task failures, but that should not count against individual executors; 
many small stages, which
+ * may prevent a bad executor for having many failures within one stage, 
but still many failures
+ * over the entire application; "flaky" executors, that don't fail every 
task, but are still
+ * faulty; etc.
+ *
+ * THREADING: As with most helpers of TaskSchedulerImpl, this is not 
thread-safe.  Though it is
+ * called by multiple threads, callers must already have a lock on the 
TaskSchedulerImpl.  The
+ * one exception is [[nodeBlacklist()]], which can be called without 
holding a lock.
+ */
+private[scheduler] class BlacklistTracker (
+conf: SparkConf,
+clock: Clock = new SystemClock()) extends Logging {
+
+  private val MAX_FAILURES_PER_EXEC = 
conf.get(config.MAX_FAILURES_PER_EXEC)
+  private val MAX_FAILED_EXEC_PER_NODE = 
conf.get(config.MAX_FAILED_EXEC_PER_NODE)
+  val EXECUTOR_RECOVERY_MILLIS = 
BlacklistTracker.getBlacklistExpiryTime(conf)
+
+  // a count of failed tasks for each executor.  Only counts failures 
after tasksets complete
+  // successfully
+  private val executorIdToFailureCount: HashMap[String, Int] = new 
HashMap()
+  private val executorIdToBlacklistStatus: HashMap[String, 
BlacklistedExecutor] = new HashMap()
+  private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new 
HashMap()
+  private val _nodeBlacklist: AtomicReference[Set[String]] = new 
AtomicReference(Set())
+  private var nextExpiryTime: Long = Long.MaxValue
--- End diff --

docstring here? "Time when the next blacklist will expire. Used to avoid 
checking for expired blacklist entries when none will have expired."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14383
  
**[Test build #62941 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62941/consoleFull)**
 for PR 14383 at commit 
[`9a49580`](https://github.com/apache/spark/commit/9a49580549a3980364ed225b77a6e0342ef4d90c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72539715
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -800,6 +832,78 @@ private[spark] class TaskSetManager(
 maybeFinishTaskSet()
   }
 
+  private[scheduler] def updateBlacklistForFailedTask(
+  host: String,
+  exec: String,
+  index: Int): Unit = {
+val failureStatus = execToFailures.getOrElseUpdate(exec, new 
FailureStatus(host))
+failureStatus.totalFailures += 1
+failureStatus.tasksWithFailures += index
+
+// check if this task has also failed on other executors on the same 
host, and if so, blacklist
+// this task from the host
+val execsWithFailuresOnNode = 
nodesToExecsWithFailures.getOrElseUpdate(host, new HashSet())
+execsWithFailuresOnNode += exec
+val failuresOnHost = (for {
+  exec <- execsWithFailuresOnNode.toIterator
+  failures <- execToFailures.get(exec)
+} yield {
+  if (failures.tasksWithFailures.contains(index)) 1 else 0
+}).sum
+if (failuresOnHost >= MAX_TASK_ATTEMPTS_PER_NODE) {
+  nodeBlacklistedTasks.getOrElseUpdate(host, new HashSet()) += index
+}
+
+if (failureStatus.totalFailures >= MAX_FAILURES_PER_EXEC_STAGE) {
--- End diff --

This logic seems to be repeated in at least one other place (where you 
first blacklist something on an executor, and then escalate the issue to the 
host). I wonder if there's a helper class that could be added that would handle 
this in both cases (i.e., for a specific task, and for a whole stage, and I 
think also for the app)? Maybe there's not enough shared logic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72540361
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -217,18 +219,34 @@ private[yarn] class YarnAllocator(
* @param localityAwareTasks number of locality aware tasks to be used 
as container placement hint
* @param hostToLocalTaskCount a map of preferred hostname to possible 
task counts to be used as
* container placement hint.
+   * @param nodeBlacklist a set of blacklisted node to avoid allocating 
new container on them. It
+   *  will be used to update AM blacklist.
* @return Whether the new requested total is different than the old 
value.
*/
   def requestTotalExecutorsWithPreferredLocalities(
   requestedTotal: Int,
   localityAwareTasks: Int,
-  hostToLocalTaskCount: Map[String, Int]): Boolean = synchronized {
+  hostToLocalTaskCount: Map[String, Int],
+  nodeBlacklist: Set[String]): Boolean = synchronized {
 this.numLocalityAwareTasks = localityAwareTasks
 this.hostToLocalTaskCounts = hostToLocalTaskCount
 
 if (requestedTotal != targetNumExecutors) {
   logInfo(s"Driver requested a total number of $requestedTotal 
executor(s).")
   targetNumExecutors = requestedTotal
+
+  // Update blacklist infomation to YARN ResouceManager for this 
application,
+  // in order to avoid allocating new Container on the problematic 
nodes.
--- End diff --

Containers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72540337
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -217,18 +219,34 @@ private[yarn] class YarnAllocator(
* @param localityAwareTasks number of locality aware tasks to be used 
as container placement hint
* @param hostToLocalTaskCount a map of preferred hostname to possible 
task counts to be used as
* container placement hint.
+   * @param nodeBlacklist a set of blacklisted node to avoid allocating 
new container on them. It
--- End diff --

a set of blacklisted nodes, which is passed in to avoid allocating new 
containers on them.  It will be used to update the application manager (? right 
terminology?) blacklist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72540080
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ---
@@ -281,15 +323,216 @@ class TaskSchedulerImplSuite extends SparkFunSuite 
with LocalSparkContext with B
 assert(!failedTaskSet)
   }
 
+  test("scheduled tasks obey task and stage blacklists") {
+val blacklist = mock[BlacklistTracker]
+taskScheduler = setupSchedulerWithMockTsm(blacklist)
+(0 to 2).foreach { stageId =>
+  val taskSet = FakeTask.createTaskSet(numTasks = 2, stageId = 
stageId, stageAttemptId = 0)
+  taskScheduler.submitTasks(taskSet)
+}
+
+val offers = Seq(
+  new WorkerOffer("executor0", "host0", 1),
+  new WorkerOffer("executor1", "host1", 1),
+  new WorkerOffer("executor2", "host1", 1),
+  new WorkerOffer("executor3", "host2", 10)
+)
+
+// setup our mock blacklist:
+// stage 0 is blacklisted on node "host1"
+// stage 1 is blacklisted on executor "executor3"
+// stage 0, part 0 is blacklisted on executor 0
+// (later stubs take precedence over earlier ones)
+when(blacklist.isNodeBlacklisted(anyString())).thenReturn(false)
+when(blacklist.isExecutorBlacklisted(anyString())).thenReturn(false)
+// setup some defaults, then override them with particulars
+stageToMockTsm.values.foreach { tsm =>
+  when(tsm.isNodeBlacklistedForTaskSet(anyString())).thenReturn(false)
+  
when(tsm.isExecutorBlacklistedForTaskSet(anyString())).thenReturn(false)
+  when(tsm.isExecutorBlacklistedForTask(anyString(), 
anyInt())).thenReturn(false)
--- End diff --

Do you need the isExecutorBlacklistedForTask here (and below)? wondering 
since it looks like it's never called outside the TSM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14384
  
**[Test build #62950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62950/consoleFull)**
 for PR 14384 at commit 
[`962e12f`](https://github.com/apache/spark/commit/962e12fccc7969e9a009af9f49bf7f6b9268b08a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14383
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62941/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14383: [SPARK-15232][SQL] Add subquery SQL building tests to Lo...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14383
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72539615
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -800,6 +832,78 @@ private[spark] class TaskSetManager(
 maybeFinishTaskSet()
   }
 
+  private[scheduler] def updateBlacklistForFailedTask(
+  host: String,
+  exec: String,
+  index: Int): Unit = {
+val failureStatus = execToFailures.getOrElseUpdate(exec, new 
FailureStatus(host))
+failureStatus.totalFailures += 1
+failureStatus.tasksWithFailures += index
+
+// check if this task has also failed on other executors on the same 
host, and if so, blacklist
+// this task from the host
+val execsWithFailuresOnNode = 
nodesToExecsWithFailures.getOrElseUpdate(host, new HashSet())
+execsWithFailuresOnNode += exec
+val failuresOnHost = (for {
+  exec <- execsWithFailuresOnNode.toIterator
+  failures <- execToFailures.get(exec)
+} yield {
+  if (failures.tasksWithFailures.contains(index)) 1 else 0
+}).sum
--- End diff --

I personally find Scala's `for` incredibly hard to read, so +1 for `map`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14384: [Spark-16443][SparkR] Alternating Least Squares (ALS) wr...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14384
  
**[Test build #62949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62949/consoleFull)**
 for PR 14384 at commit 
[`3939ccc`](https://github.com/apache/spark/commit/3939ccce37cd899edf007d0ac3b9589748db0bc3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72539450
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+/**
+ * BlacklistTracker is designed to track problematic executors and nodes.  
It supports blacklisting
+ * specific (executor, task) pairs within a stage, blacklisting entire 
executors and nodes for a
+ * stage, and blacklisting executors and nodes across an entire 
application (with a periodic
+ * expiry).
+ *
+ * The tracker needs to deal with a variety of workloads, eg.: bad user 
code, which may lead to many
+ * task failures, but that should not count against individual executors; 
many small stages, which
+ * may prevent a bad executor for having many failures within one stage, 
but still many failures
+ * over the entire application; "flaky" executors, that don't fail every 
task, but are still
+ * faulty; etc.
+ *
+ * THREADING: As with most helpers of TaskSchedulerImpl, this is not 
thread-safe.  Though it is
+ * called by multiple threads, callers must already have a lock on the 
TaskSchedulerImpl.  The
+ * one exception is [[nodeBlacklist()]], which can be called without 
holding a lock.
+ */
+private[scheduler] class BlacklistTracker (
+conf: SparkConf,
+clock: Clock = new SystemClock()) extends Logging {
+
+  private val MAX_FAILURES_PER_EXEC = 
conf.get(config.MAX_FAILURES_PER_EXEC)
+  private val MAX_FAILED_EXEC_PER_NODE = 
conf.get(config.MAX_FAILED_EXEC_PER_NODE)
+  val EXECUTOR_RECOVERY_MILLIS = 
BlacklistTracker.getBlacklistExpiryTime(conf)
+
+  // a count of failed tasks for each executor.  Only counts failures 
after tasksets complete
+  // successfully
+  private val executorIdToFailureCount: HashMap[String, Int] = new 
HashMap()
+  private val executorIdToBlacklistStatus: HashMap[String, 
BlacklistedExecutor] = new HashMap()
+  private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new 
HashMap()
+  private val _nodeBlacklist: AtomicReference[Set[String]] = new 
AtomicReference(Set())
+  private var nextExpiryTime: Long = Long.MaxValue
+  // for blacklisted executors, the node it is on.  We do *not* remove 
from this when executors are
+  // removed from spark, so we can track when we get multiple successive 
blacklisted executors on
+  // one node.
+  val nodeToFailedExecs: HashMap[String, HashSet[String]] = new HashMap()
+
+  def expireExecutorsInBlacklist(): Unit = {
+val now = clock.getTimeMillis()
+// quickly check if we've got anything to expire from blacklist -- if 
not, avoid doing any work
+if (now > nextExpiryTime) {
+  val execsToClear = 
executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys
+  if (execsToClear.nonEmpty) {
+logInfo(s"Removing executors $execsToClear from blacklist during 
periodic recovery")
+execsToClear.foreach { exec =>
+  val status = executorIdToBlacklistStatus.remove(exec).get
+  val failedExecsOnNode = nodeToFailedExecs(status.node)
+  failedExecsOnNode.remove(exec)
+  if (failedExecsOnNode.isEmpty) {
+nodeToFailedExecs.remove(status.node)
+  }
+}
+  }
+  if (executorIdToBlacklistStatus.nonEmpty) {
+nextExpiryTime = 
executorIdToBlacklistStatus.map{_._2.expiryTime}.min
+  } else {
+nextExpiryTime = Long.MaxValue
+  }
+  val nodesToClear = nodeIdToBlacklistExpiryTime.filter(_._2 < 

[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72539148
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -800,6 +832,78 @@ private[spark] class TaskSetManager(
 maybeFinishTaskSet()
   }
 
+  private[scheduler] def updateBlacklistForFailedTask(
+  host: String,
+  exec: String,
+  index: Int): Unit = {
+val failureStatus = execToFailures.getOrElseUpdate(exec, new 
FailureStatus(host))
+failureStatus.totalFailures += 1
+failureStatus.tasksWithFailures += index
+
+// check if this task has also failed on other executors on the same 
host, and if so, blacklist
+// this task from the host
+val execsWithFailuresOnNode = 
nodesToExecsWithFailures.getOrElseUpdate(host, new HashSet())
+execsWithFailuresOnNode += exec
+val failuresOnHost = (for {
+  exec <- execsWithFailuresOnNode.toIterator
+  failures <- execToFailures.get(exec)
+} yield {
+  if (failures.tasksWithFailures.contains(index)) 1 else 0
+}).sum
--- End diff --

I always prefer map to yield (but maybe I'm in the minority, in which case 
leave it?):

execsWithFailuresOnNode.map {
  val failures = execToFailures.get(_)
  if (failures.tasksWithFailures.contains(index))
 1
  else
0
}.sum


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72538818
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -611,15 +620,31 @@ private[spark] class TaskSetManager(
 
 // If no executors have registered yet, don't abort the stage, just 
wait.  We probably
 // got here because a task set was added before the executors 
registered.
-if (executors.nonEmpty) {
+if (executorsByHost.nonEmpty) {
   // take any task that needs to be scheduled, and see if we can find 
some executor it *could*
   // run on
-  pendingTask.foreach { taskId =>
-if (executors.forall(executorIsBlacklisted(_, taskId))) {
-  val execs = executors.toIndexedSeq.sorted.mkString("(", ",", ")")
-  val partition = tasks(taskId).partitionId
-  abort(s"Aborting ${taskSet} because task $taskId (partition 
$partition)" +
-s" has already failed on executors $execs, and no other 
executors are available.")
+  pendingTask.foreach { indexInTaskSet =>
+// try to find some executor this task can run on.  Its possible 
that some *other*
+// task isn't schedulable anywhere, but we will discover that in 
some later call,
+// when that unschedulable task is the last task remaining.
+val blacklistedEverywhere = executorsByHost.forall { case (host, 
execs) =>
+  val nodeBlacklisted = blacklist.isNodeBlacklisted(host) ||
+isNodeBlacklistedForTaskSet(host) ||
+isNodeBlacklistedForTask(host, indexInTaskSet)
+  if (nodeBlacklisted) {
+true
+  } else {
+execs.forall { exec =>
+  blacklist.isExecutorBlacklisted(exec) ||
+isExecutorBlacklistedForTaskSet(exec) ||
+isExecutorBlacklistedForTask(exec, indexInTaskSet)
+}
+  }
+}
+if (blacklistedEverywhere) {
+  val partition = tasks(indexInTaskSet).partitionId
+  abort(s"Aborting ${taskSet} because task $indexInTaskSet 
(partition $partition) cannot " +
+s"run anywhere due to node and executor blacklist.")
--- End diff --

Maybe refer to the config options here? Anticipating confused users...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14384: [Spark-16443][SparkR] Alternating Least Squares (...

2016-07-27 Thread junyangq
GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14384

[Spark-16443][SparkR] Alternating Least Squares (ALS) wrapper

## What changes were proposed in this pull request?

Add Alternating Least Squares wrapper in SparkR. Unit tests have been 
updated.

## How was this patch tested?

SparkR unit tests.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

![screen shot 2016-07-27 at 3 50 46 
pm](https://cloud.githubusercontent.com/assets/15318264/17195348/f7a7d452-5411-11e6-845f-6d292283bc28.png)
![screen shot 2016-07-27 at 3 50 31 
pm](https://cloud.githubusercontent.com/assets/15318264/17195347/f7a6352a-5411-11e6-8e21-61a48070192a.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARK-16443

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14384


commit ecf918546d3d25b60ece78cad42ec41c7c188f3d
Author: Junyang Qian 
Date:   2016-07-08T21:56:31Z

ALS main added to mllib.R

commit 766b55b464263879d90a0cce6f0d3215f3dba1e3
Author: Junyang Qian 
Date:   2016-07-11T20:04:09Z

minimal wrapper

commit e4cd463486577d544ec096fe59a603b33374a11c
Author: Junyang Qian 
Date:   2016-07-21T22:03:25Z

clean comments

commit d0dfe7d21e99b8440de255a6a55f34971d08163f
Author: Junyang Qian 
Date:   2016-07-25T18:31:40Z

first set

commit 7d0c139e427f8f6fef4ed8f6380c966ad998da39
Author: Junyang Qian 
Date:   2016-07-25T18:56:22Z

add spark.als to namespace

commit 243defa8c5a95923cdfab4d306624150d01b6121
Author: Junyang Qian 
Date:   2016-07-27T04:51:27Z

ALS wrapper with summary, predict and write.ml

commit a3cca04c54596ea466cd98edc4ee5e4611f207df
Author: Junyang Qian 
Date:   2016-07-27T17:05:42Z

fix typo in reading model

commit d785b2a93e4e68661b661e08641bbb01dee8875f
Author: Junyang Qian 
Date:   2016-07-27T21:10:52Z

allow more arguments to als

commit 3939ccce37cd899edf007d0ac3b9589748db0bc3
Author: Junyang Qian 
Date:   2016-07-27T22:15:00Z

fix type issue in mllib.R




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72538697
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -770,9 +794,19 @@ private[spark] class TaskSetManager(
 logError("Unknown TaskEndReason: " + e)
 None
 }
-// always add to failed executors
-failedExecutors.getOrElseUpdate(index, new HashMap[String, Long]()).
-  put(info.executorId, clock.getTimeMillis())
+
+// we might rack up a bunch of fetch-failures in rapid succession, due 
to a bad node.  But
+// that bad node will get handled separately by spark's stage-failure 
handling mechanism.  It
+// shouldn't penalize *this* executor at all, so don't count it as a 
task-failure as far as
+// the blacklist is concerned.
+val countTowardsTaskFailures = reason match {
+  case fail: TaskFailedReason => fail.countTowardsTaskFailures
--- End diff --

the code on the old line 710 makes it seem like we always expect this to be 
a TaskFailedReason (should the param type just be changed?)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72538489
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -677,8 +702,9 @@ private[spark] class TaskSetManager(
 }
 if (!successful(index)) {
   tasksSuccessful += 1
-  logInfo("Finished task %s in stage %s (TID %d) in %d ms on %s 
(%d/%d)".format(
-info.id, taskSet.id, info.taskId, info.duration, info.host, 
tasksSuccessful, numTasks))
+  logInfo("Finished task %s in stage %s (TID %d) in %d ms on %s / exec 
%s (%d/%d)".format(
--- End diff --

maybe "(executor %s)" (favoring parens / verbosity since this is a message 
that does commonly pop up to users in the console


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72538532
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -706,16 +731,15 @@ private[spark] class TaskSetManager(
 val index = info.index
 copiesRunning(index) -= 1
 var accumUpdates: Seq[AccumulatorV2[_, _]] = Seq.empty
-val failureReason = s"Lost task ${info.id} in stage ${taskSet.id} (TID 
$tid, ${info.host}): " +
-  reason.asInstanceOf[TaskFailedReason].toErrorString
+val failureReason = s"Lost task ${info.id} in stage ${taskSet.id} (TID 
$tid, ${info.host}," +
+  s" exec ${info.executorId}): 
${reason.asInstanceOf[TaskFailedReason].toErrorString}"
--- End diff --

executor again (same thought as above)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72538331
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -611,15 +620,31 @@ private[spark] class TaskSetManager(
 
 // If no executors have registered yet, don't abort the stage, just 
wait.  We probably
 // got here because a task set was added before the executors 
registered.
-if (executors.nonEmpty) {
+if (executorsByHost.nonEmpty) {
   // take any task that needs to be scheduled, and see if we can find 
some executor it *could*
   // run on
-  pendingTask.foreach { taskId =>
-if (executors.forall(executorIsBlacklisted(_, taskId))) {
-  val execs = executors.toIndexedSeq.sorted.mkString("(", ",", ")")
-  val partition = tasks(taskId).partitionId
-  abort(s"Aborting ${taskSet} because task $taskId (partition 
$partition)" +
-s" has already failed on executors $execs, and no other 
executors are available.")
+  pendingTask.foreach { indexInTaskSet =>
+// try to find some executor this task can run on.  Its possible 
that some *other*
+// task isn't schedulable anywhere, but we will discover that in 
some later call,
+// when that unschedulable task is the last task remaining.
+val blacklistedEverywhere = executorsByHost.forall { case (host, 
execs) =>
+  val nodeBlacklisted = blacklist.isNodeBlacklisted(host) ||
+isNodeBlacklistedForTaskSet(host) ||
+isNodeBlacklistedForTask(host, indexInTaskSet)
+  if (nodeBlacklisted) {
+true
+  } else {
+execs.forall { exec =>
--- End diff --

and "Check if the task can run on any of the executors"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14355
  
**[Test build #62948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62948/consoleFull)**
 for PR 14355 at commit 
[`71e8ec9`](https://github.com/apache/spark/commit/71e8ec90413a58c25bd7b5c612a9821a797652b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72538301
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -611,15 +620,31 @@ private[spark] class TaskSetManager(
 
 // If no executors have registered yet, don't abort the stage, just 
wait.  We probably
 // got here because a task set was added before the executors 
registered.
-if (executors.nonEmpty) {
+if (executorsByHost.nonEmpty) {
   // take any task that needs to be scheduled, and see if we can find 
some executor it *could*
   // run on
-  pendingTask.foreach { taskId =>
-if (executors.forall(executorIsBlacklisted(_, taskId))) {
-  val execs = executors.toIndexedSeq.sorted.mkString("(", ",", ")")
-  val partition = tasks(taskId).partitionId
-  abort(s"Aborting ${taskSet} because task $taskId (partition 
$partition)" +
-s" has already failed on executors $execs, and no other 
executors are available.")
+  pendingTask.foreach { indexInTaskSet =>
+// try to find some executor this task can run on.  Its possible 
that some *other*
+// task isn't schedulable anywhere, but we will discover that in 
some later call,
+// when that unschedulable task is the last task remaining.
+val blacklistedEverywhere = executorsByHost.forall { case (host, 
execs) =>
+  val nodeBlacklisted = blacklist.isNodeBlacklisted(host) ||
--- End diff --

Add a comment above here with something like "Check if the task can run on 
the node"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62939/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14275
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72538143
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -611,15 +620,31 @@ private[spark] class TaskSetManager(
 
 // If no executors have registered yet, don't abort the stage, just 
wait.  We probably
 // got here because a task set was added before the executors 
registered.
-if (executors.nonEmpty) {
+if (executorsByHost.nonEmpty) {
   // take any task that needs to be scheduled, and see if we can find 
some executor it *could*
--- End diff --

Can you move this comment up to the beginning of the method (above where 
pendingTask is defined)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14275
  
**[Test build #62939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62939/consoleFull)**
 for PR 14275 at commit 
[`ec6707e`](https://github.com/apache/spark/commit/ec6707e35784113896063f5b5aec042e2e54ecb1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Except` erro...

2016-07-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14355
  
Hi, @hvanhovell . 
The logic is refactored into a function now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72537531
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -266,19 +278,11 @@ private[spark] class TaskSetManager(
 taskAttempts(taskIndex).exists(_.host == host)
   }
 
-  /**
-   * Is this re-execution of a failed task on an executor it already 
failed in before
-   * EXECUTOR_TASK_BLACKLIST_TIMEOUT has elapsed ?
-   */
-  private[scheduler] def executorIsBlacklisted(execId: String, taskId: 
Int): Boolean = {
-if (failedExecutors.contains(taskId)) {
-  val failed = failedExecutors.get(taskId).get
-
-  return failed.contains(execId) &&
-clock.getTimeMillis() - failed.get(execId).get < 
EXECUTOR_TASK_BLACKLIST_TIMEOUT
-}
-
-false
+  private def blacklistedOnExec(execId: String, host: String, index: Int): 
Boolean = {
+blacklistTracker.map { bl =>
--- End diff --

you can use _ since b1 is never used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14266
  
**[Test build #62947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62947/consoleFull)**
 for PR 14266 at commit 
[`4944b29`](https://github.com/apache/spark/commit/4944b29558ba8f3563f1c0c3d1a485b29dcdc39b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72537361
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -266,19 +278,11 @@ private[spark] class TaskSetManager(
 taskAttempts(taskIndex).exists(_.host == host)
   }
 
-  /**
-   * Is this re-execution of a failed task on an executor it already 
failed in before
-   * EXECUTOR_TASK_BLACKLIST_TIMEOUT has elapsed ?
-   */
-  private[scheduler] def executorIsBlacklisted(execId: String, taskId: 
Int): Boolean = {
-if (failedExecutors.contains(taskId)) {
-  val failed = failedExecutors.get(taskId).get
-
-  return failed.contains(execId) &&
-clock.getTimeMillis() - failed.get(execId).get < 
EXECUTOR_TASK_BLACKLIST_TIMEOUT
-}
-
-false
+  private def blacklistedOnExec(execId: String, host: String, index: Int): 
Boolean = {
--- End diff --

can you call this taskIsBlacklistedOnExec? (to make it obvious this is 
task-specific)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72537255
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -83,8 +85,15 @@ private[spark] class TaskSetManager(
   val copiesRunning = new Array[Int](numTasks)
   val successful = new Array[Boolean](numTasks)
   private val numFailures = new Array[Int](numTasks)
-  // key is taskId (aka TaskInfo.index), value is a Map of executor id to 
when it failed
-  private val failedExecutors = new HashMap[Int, HashMap[String, Long]]()
+  val execToFailures: HashMap[String, FailureStatus] = new HashMap()
+  /**
+   * Map from node to all executors on it with failures.  Needed because 
we want to know about
+   * executors on a node even after they have died.
+   */
+  private val nodesToExecsWithFailures: HashMap[String, HashSet[String]] = 
new HashMap()
+  private val nodeBlacklistedTasks: HashMap[String, HashSet[Int]] = new 
HashMap()
--- End diff --

nodeToBlacklistedTasks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72537224
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -83,8 +85,15 @@ private[spark] class TaskSetManager(
   val copiesRunning = new Array[Int](numTasks)
   val successful = new Array[Boolean](numTasks)
   private val numFailures = new Array[Int](numTasks)
-  // key is taskId (aka TaskInfo.index), value is a Map of executor id to 
when it failed
-  private val failedExecutors = new HashMap[Int, HashMap[String, Long]]()
+  val execToFailures: HashMap[String, FailureStatus] = new HashMap()
+  /**
+   * Map from node to all executors on it with failures.  Needed because 
we want to know about
+   * executors on a node even after they have died.
+   */
+  private val nodesToExecsWithFailures: HashMap[String, HashSet[String]] = 
new HashMap()
--- End diff --

nodeToExecsWithFailures? (make plural-ness consistent with 
nodeToBlacklistedTasks)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14355: [SPARK-16726][SQL] Improve `Union/Intersect/Excep...

2016-07-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14355#discussion_r72536913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -280,6 +267,33 @@ trait CheckAnalysis extends PredicateHelper {
   case p if 
p.expressions.exists(PredicateSubquery.hasPredicateSubquery) =>
 failAnalysis(s"Predicate sub-queries can only be used in a 
Filter: $p")
 
+  case _: Union | _: SetOperation if operator.children.length > 1 
=>
+def dataTypes(plan: LogicalPlan): Seq[DataType] = 
plan.output.map(_.dataType)
+val ref = dataTypes(operator.children.head)
+operator.children.tail.zipWithIndex.foreach { case (child, ti) 
=>
+  // Check the number of columns
+  if (child.output.length != ref.length) {
+failAnalysis(
+  s"""
+|${operator.nodeName} can only be performed on tables 
with the same number
+|of columns, but the first table has ${ref.length} 
columns and
+|the ${if (ti == 0) "second" else (ti + 2) + "th"} 
table has
--- End diff --

Oh, sure. I'll fix soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72536830
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -552,12 +614,17 @@ private[spark] class TaskSchedulerImpl(
   executorIdToHost -= executorId
   rootPool.executorLost(executorId, host, reason)
 }
+blacklistTracker.foreach(_.removeExecutor(executorId))
   }
 
   def executorAdded(execId: String, host: String) {
 dagScheduler.executorAdded(execId, host)
   }
 
+  def getHostForExecutor(execId: String): String = synchronized {
--- End diff --

I think this is no longer used / can be removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-27 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/14079#discussion_r72536726
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.collection.mutable.{HashMap, HashSet}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+/**
+ * BlacklistTracker is designed to track problematic executors and nodes.  
It supports blacklisting
+ * specific (executor, task) pairs within a stage, blacklisting entire 
executors and nodes for a
+ * stage, and blacklisting executors and nodes across an entire 
application (with a periodic
+ * expiry).
+ *
+ * The tracker needs to deal with a variety of workloads, eg.: bad user 
code, which may lead to many
+ * task failures, but that should not count against individual executors; 
many small stages, which
+ * may prevent a bad executor for having many failures within one stage, 
but still many failures
+ * over the entire application; "flaky" executors, that don't fail every 
task, but are still
+ * faulty; etc.
+ *
+ * THREADING: As with most helpers of TaskSchedulerImpl, this is not 
thread-safe.  Though it is
+ * called by multiple threads, callers must already have a lock on the 
TaskSchedulerImpl.  The
+ * one exception is [[nodeBlacklist()]], which can be called without 
holding a lock.
+ */
+private[scheduler] class BlacklistTracker (
+conf: SparkConf,
+clock: Clock = new SystemClock()) extends Logging {
+
+  private val MAX_FAILURES_PER_EXEC = 
conf.get(config.MAX_FAILURES_PER_EXEC)
+  private val MAX_FAILED_EXEC_PER_NODE = 
conf.get(config.MAX_FAILED_EXEC_PER_NODE)
+  val EXECUTOR_RECOVERY_MILLIS = 
BlacklistTracker.getBlacklistExpiryTime(conf)
+
+  // a count of failed tasks for each executor.  Only counts failures 
after tasksets complete
+  // successfully
+  private val executorIdToFailureCount: HashMap[String, Int] = new 
HashMap()
+  private val executorIdToBlacklistStatus: HashMap[String, 
BlacklistedExecutor] = new HashMap()
+  private val nodeIdToBlacklistExpiryTime: HashMap[String, Long] = new 
HashMap()
+  private val _nodeBlacklist: AtomicReference[Set[String]] = new 
AtomicReference(Set())
+  private var nextExpiryTime: Long = Long.MaxValue
+  // for blacklisted executors, the node it is on.  We do *not* remove 
from this when executors are
+  // removed from spark, so we can track when we get multiple successive 
blacklisted executors on
+  // one node.
+  val nodeToFailedExecs: HashMap[String, HashSet[String]] = new HashMap()
+
+  def expireExecutorsInBlacklist(): Unit = {
+val now = clock.getTimeMillis()
+// quickly check if we've got anything to expire from blacklist -- if 
not, avoid doing any work
+if (now > nextExpiryTime) {
+  val execsToClear = 
executorIdToBlacklistStatus.filter(_._2.expiryTime < now).keys
+  if (execsToClear.nonEmpty) {
+logInfo(s"Removing executors $execsToClear from blacklist during 
periodic recovery")
+execsToClear.foreach { exec =>
+  val status = executorIdToBlacklistStatus.remove(exec).get
+  val failedExecsOnNode = nodeToFailedExecs(status.node)
+  failedExecsOnNode.remove(exec)
+  if (failedExecsOnNode.isEmpty) {
+nodeToFailedExecs.remove(status.node)
+  }
+}
+  }
+  if (executorIdToBlacklistStatus.nonEmpty) {
+nextExpiryTime = 
executorIdToBlacklistStatus.map{_._2.expiryTime}.min
+  } else {
+nextExpiryTime = Long.MaxValue
+  }
+  val nodesToClear = nodeIdToBlacklistExpiryTime.filter(_._2 < 

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Hi, @cloud-fan , @yhuai , @hvanhovell , and @rxin .

The remaining comment is about matching algorithm. While digging Hive's 
dead code, I suddenly feel it's useless because Hive does not use it any more.

- Case 1: I think we agree to use DFS to match the first `t1` in the 
following query.
 ```
select /*+ MAPJOIN(t1) */ * from t1 JOIN t1 JOIN t1 JOIN t1
== Optimized Logical Plan ==
Join Inner
:- Join Inner
:  :- Join Inner
:  :  :- BroadcastHint
:  :  :  +- Range (0, 10, splits=8)
:  :  +- Range (0, 10, splits=8)
:  +- Range (0, 10, splits=8)
+- Range (0, 10, splits=8)
```

- Case 2: What should we do for the following query? (The following is just 
the current behaviors)
 ```
SELECT /*+ MAPJOIN(t1) */ * FROM (SELECT * FROM t1) x JOIN t1
== Optimized Logical Plan ==
Join Inner
:- BroadcastHint
:  +- Range (0, 10, splits=8)
+- Range (0, 10, splits=8)
```

Until now, I thought the current behavior is an intuitive result. But, if 
there is another direction for `case 2`, I will try to find a way satisfying 
both cases. How can I proceed this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14176
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14176
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62944/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...

2016-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14176
  
**[Test build #62944 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62944/consoleFull)**
 for PR 14176 at commit 
[`7194394`](https://github.com/apache/spark/commit/71943941ebe548e0a2c66d633893b7e2196b94a6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >