date:20180416

[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-16 Thread maryannxue

Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/21083
  
cc @mgaido91 @maryannxue @KaiXinXiaoLei @gatorsmile @jiangxb1987 
@gengliangwang: I do not think this is the right way to do things, @cloud-fan. 
Looks like you have been aware of my and others' work like 
https://github.com/apache/spark/pull/20816, you could have, or I'd say, should 
have, given your input/suggestions on related PRs. People who have worked on 
this should deserve more credit than being mentioned as "inspired" here. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181966326
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

@viirya my comment was for the current code without your change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21083
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2373/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21083
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21083
  
**[Test build #89432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89432/testReport)**
 for PR 21083 at commit 
[`371c1df`](https://github.com/apache/spark/commit/371c1df8aed72ed42f74c8c4cdbc6c4c1791fcdf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21083
  
**[Test build #89431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89431/testReport)**
 for PR 21083 at commit 
[`b967955`](https://github.com/apache/spark/commit/b967955ec2c7d33f28845dd55a1a9b70c5c2ba03).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-16 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21083
  
cc @mgaido91 @maryannxue @KaiXinXiaoLei @gatorsmile @jiangxb1987 
@gengliangwang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89425/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21083: [SPARK-21479][SPARK-23564][SQL] infer additional ...

2018-04-16 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/21083

[SPARK-21479][SPARK-23564][SQL] infer additional filters from constraints 
for join's children

## What changes were proposed in this pull request?

The existing query constraints framework has 2 steps:
1. propagate constraints bottom up.
2. use constraints to infer additional filters for better data pruning.

For step 2, it mostly helps with Join, because we can connect the 
constraints from children to the join condition and infer powerful filters to 
prune the data of the join sides. e.g., the left side has constraints `a = 1`, 
the join condition is `left.a = right.a`, then we can infer `right.a = 1` to 
the right side and prune the right side a lot.

However, the current logic of inferring filters from constraints for Join 
is pretty weak. It infers the filters from Join's constraints. Some joins like 
left semi/anti exclude output from right side and the right side constraints 
will be lost here.

This PR propose to check the left and right constraints individually, 
expand the constraints with join condition and add filters to children of join 
directly, instead of adding to the join condition.

This reverts https://github.com/apache/spark/pull/20670 , covers 
https://github.com/apache/spark/pull/20717 and 
https://github.com/apache/spark/pull/20816

This is inspired by the original PRs and the tests are all from these PRs. 
Thanks to the authors @mgaido91 @maryannxue @KaiXinXiaoLei !

## How was this patch tested?

new tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark join

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21083


commit 2977a5e037eb862a530a777e349f328ffbda39bb
Author: Wenchen Fan 
Date:   2018-04-16T16:15:04Z

Revert "[SPARK-23405] Generate additional constraints for Join's children"

This reverts commit cdcccd7b41c43d79edff2fec7a84cd00e9524f75.

commit b967955ec2c7d33f28845dd55a1a9b70c5c2ba03
Author: Wenchen Fan 
Date:   2018-04-16T19:39:50Z

fix join filter inference from constraints




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20984
  
**[Test build #89425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89425/testReport)**
 for PR 20984 at commit 
[`77640c4`](https://github.com/apache/spark/commit/77640c449af4f3ba0d7e5a231c24f34f090314db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-16 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21036#discussion_r181963024
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -323,7 +323,7 @@ package object config {
   .internal()
   .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions 
for empty input splits.")
   .booleanConf
-  .createWithDefault(false)
+  .createWithDefault(true)
--- End diff --

This seems silently changed the behavior. I would suggest not to do it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21036: [SPARK-23958][CORE] HadoopRdd filters empty files to avo...

2018-04-16 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21036
  
nope it's a radical change that affects many of integrations. I wouldn't 
enable it by default for now.  here is non-critical path. It's fine to loop 
twice if it's more readable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21029
  
**[Test build #89430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89430/testReport)**
 for PR 21029 at commit 
[`36031f2`](https://github.com/apache/spark/commit/36031f2bc803535ba49e058951aef3faea30380d).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89430/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2372/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21029
  
**[Test build #89430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89430/testReport)**
 for PR 21029 at commit 
[`36031f2`](https://github.com/apache/spark/commit/36031f2bc803535ba49e058951aef3faea30380d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21036: [SPARK-23958][CORE] HadoopRdd filters empty files to avo...

2018-04-16 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/21036
  
1.No need to loop twice to filter to determine if the length is greater 
than 0 
2.This feature is to improve performance, the default switch needs to open


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181957835
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

Ah, sorry for my misunderstanding. To change line 581 would work well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2371/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20984
  
**[Test build #89429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89429/testReport)**
 for PR 20984 at commit 
[`8b5de0f`](https://github.com/apache/spark/commit/8b5de0f7d18af1d194305d7cd8c43ca007a5b980).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2370/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20984
  
**[Test build #89428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89428/testReport)**
 for PR 20984 at commit 
[`6e8a697`](https://github.com/apache/spark/commit/6e8a697098e146ab0582df049831ec5f1237aa40).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-16 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20816
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181953816
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

Hmm, doesn't it be

`a_1` -> `a_1_0`
`a` -> `a_0`
`a` -> `a_1`

?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89427/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21078
  
**[Test build #89427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89427/testReport)**
 for PR 21078 at commit 
[`2d7a8ca`](https://github.com/apache/spark/commit/2d7a8cadb81eb1adf896ef566c1d27f42f8b26ba).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20633
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89421/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20633
  
**[Test build #89421 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89421/testReport)**
 for PR 20633 at commit 
[`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181952028
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

I think it still has a problem. for sequence `a_1`, `a`, `a`, we have 
duplicated name `a_1`.

We can solve this problem by always adding the postfix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181950606
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

Isn't changed to `s"${fullName}_$id"`? So you will get `a_0_1`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-04-16 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19381
  
@dbtsai Good idea! Is there a related JIRA or could you open one for it ?
cc @jkbradley 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2369/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21078
  
**[Test build #89427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89427/testReport)**
 for PR 21078 at commit 
[`2d7a8ca`](https://github.com/apache/spark/commit/2d7a8cadb81eb1adf896ef566c1d27f42f8b26ba).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181949672
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

If I am correct, does this still have a conflict between `a_01` wth 
`a_0$id` where `$id = 1`? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
I am fine to accept different opinions for this specific PR. Reverting this 
PR is not my goal here. This is a public community. It sounds like the commit 
message clearly delivers what this PR does to you: `This PR proposes to add 
collect to a query executor as an action.`, although I still have different 
opinions. We need to collect and accept different opinions always. 

I am also glad you agree on the backport policy I proposed above. 
Hopefully, everyone is on the same page for avoiding unnecessary overhead. 

> The minor bug fixes/improvements that have external behavior changes

I personally thought this PR fits this category. No matter whether the 
behavior changes are correct or not, we should still not backport it if the 
issue is neither critical nor a regression. That is what I emphasized in the 
above argument multiple times. The API inconsistency is not rare between our 
APIs. We did not backport these PRs. Now, I am fine to backport it because it 
is an experimental API. Thus, we can say we do not guarantee the backport 
compatibility. If it were a public API, I would insist my original opinion. 

I am also glad many community members have a lot of experience with mission 
critical software development. This can help improve documentation, code 
quality and test coverage. Development of application/mobile software is 
completely different from development of system software. We are in the right 
direction. We need to enforce it with stricter discipline. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89420/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20816
  
**[Test build #89420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89420/testReport)**
 for PR 20816 at commit 
[`194e6e7`](https://github.com/apache/spark/commit/194e6e76f3f4605c0c3026ab693124be44608715).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20787
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20787
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89419/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21060
  
I do not see a problem with the commit message here. Is that really the 
issue? it accurately describes _what_ changes. The _why_ has always been 
documented in discussion, and it is here already. Sometimes the _why_ is 
documented in comments too; I don't see a particular need for that here, but, 
if that's the issue, why isn't that what we're talking about?

You continue to portray this as a behavior change, and I think you mean "a 
change in what is considered correct behavior". However all the other comments 
suggest otherwise; the argument from consistency seems much stronger.

Your proposed criteria for backports sort of align with accepted practice, 
which is to follow semver.org semantics. I think semver is reasonably clear, in 
general and in this case. I see broad agreement for this backport, and people 
simply disagree with your interpretation. It is not a failure to understand 
criteria.

Believe me, people here have plenty experience with software, versioning, 
and the impact of changes. I'd put more faith in the judgment of your peers. 
Your anecdotes are of a type that's familiar to many people, but, I also fail 
to see how they're relevant here.

You are adopting a 'conservative' position and I think in this case it's 
out of line with normal practice. I think you should accept that people 
disagree and move on.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20787
  
**[Test build #89419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89419/testReport)**
 for PR 20787 at commit 
[`3ca9948`](https://github.com/apache/spark/commit/3ca99489a1a88d16f371091e86e4a58962e00759).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20940: [SPARK-23429][CORE] Add executor memory metrics t...

2018-04-16 Thread edwinalu

Github user edwinalu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20940#discussion_r181943630
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -234,8 +244,22 @@ private[spark] class EventLoggingListener(
 }
   }
 
-  // No-op because logging every update would be overkill
-  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = { }
+  /**
+   * Log if there is a new peak value for one of the memory metrics for 
the given executor.
+   * Metrics are cleared out when a new stage is started in 
onStageSubmitted, so this will
+   * log new peak memory metric values per executor per stage.
+   */
+  override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
--- End diff --

Thanks for your thoughts on this. Size of message, and also logging, but it 
is only an extra few longs per heartbeat, and and similarly for logging. Task 
end would help with minimizing communication for longer running tasks. The 
heartbeats are only every 10 seconds, so perhaps not so bad. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-16 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r181943238
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -115,6 +116,62 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns the union of all the given maps.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(map, ...) - Returns the union of all the given maps",
+examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));
+   [[1 -> "a"], [2 -> "c"], [3 -> "d"]
+  """)
+case class MapConcat(children: Seq[Expression]) extends Expression
+  with CodegenFallback {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+// this check currently does not allow valueContainsNull to vary,
+// and unfortunately none of the MapType toString methods include
+// valueContainsNull for the error message
+if (children.exists(!_.dataType.isInstanceOf[MapType])) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input of function $prettyName should all be of type 
map, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else if (children.map(_.dataType).distinct.length > 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input maps of function $prettyName should all be the 
same type, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+  override def dataType: MapType = {
+children.headOption.map(_.dataType.asInstanceOf[MapType])
+  .getOrElse(MapType(keyType = StringType, valueType = StringType))
+  }
+
+  override def nullable: Boolean = false
--- End diff --

@henryr empty map:


scala> df.select(map_concat('map1, 'map2).as('newMap)).show
df.select(map_concat('map1, 'map2).as('newMap)).show
+--+
|newMap|
+--+
|[]|
|[]|
+--+

Presto docs (from which the proposed spec comes) are quiet on the matter. 
Even after looking at the Presto code, I am still hard-pressed to say.

I did divine from the Presto code that there should be at least two inputs 
(and I don't currently verify that).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21060
  
I might not explain it well. Sorry for the misunderstanding. Thank you 
@rxin for helping me clarify my points. It sounds like many of you think this 
backport is fine. I am not against this specific PR. We do not need to revert 
the PR but just improve the documentation. That should be fine, although I 
still personally prefer to adding the configuration.  

As what I said in the original PR 
https://github.com/apache/spark/pull/21007 that was merged to master, let me 
point out two points here too.

- PR descriptions will be part of the commit log. We need to be very 
careful before merging the PR. In the past, I also missed a few when I did the 
merge. To be honest, I am not sure how the native English speakers think. The 
first paragraph scared me when I reading the PR commit log. @srowen WDYT?

```
This PR proposes to add collect to a query executor as an action.
```

- Document the behavior changes that are visible to the external 
users/developers. In Spark 2.3, we started to enforce it in every merged PR. I 
believe many of you got multiple similar comments in the previous PRs. This PR 
should also upgrade the migration guides. @HyukjinKwon Do you agree?

Before we finalize the backport policy, below is my inputs about the 
whitelist which we can backport:
- The critical/important bug fixes and security fixes.
- The regression fixes.
- The PRs that do not touch the production code, like test-only patches, 
documentation fixes, and the log message fixes.

Avoid backporting the PRs if it contains
- The new features
- The minor bug fixes/improvements that have external behavior changes
- The code refactoring
- The code changes with the high/mid risk

In the OSS community, I believe no committer will be fired just because we 
merged/introduced a bug, right? If the users application failed due to an 
upgrade, normally we blame our users or the bug are just accidentally 
introduced. However, this is not acceptable in my first team. Let me share what 
I experienced. Just various customer accidents in my related product teams. 

- One director got demoted (almost fired) due to a bad release. She is a 
very nice lady. We really like her. That release had many cool features but the 
quality is not controlled well. Many customers are not willing to upgrade. 
- There is a famous system upgrade failure a few years ago. The whole 
system became very slow after the upgrade. It took 10s hours to recover the 
system. After a few days, the GM went to the customer site and got blamed in 
the whole day. Multiple architects and VPs were forced to write apology 
letters. Customers planned to sue us.  In the customer side, the CTO got fired 
later and the upgrade accident was also on the national TV news because it 
affects many people. 
- A few directors were on call with me 10+ nights to resolve one Japanese 
customer data corruption issue. The client teams ran multiple systems at the 
same time to reproduce the issue. After a few weeks, it was finally resolved 
after reading the memory dump. The root cause is the code merge from one branch 
to another branch many years ago. 

If all the above people believes Spark is the best product in Big Data, we 
need to be more conservative. Our decisions could affect many people. This is 
not the first time I argued with the other committers/contributors about the PR 
quality. In one previous PR, I left almost 100 comments just because the 
documents are not accurate.

If my above comments offend anyone, I apologize. Everyone has different 
understanding about the software development because we have different work 
experience. The whole community already did a wonderful job compared with the 
other open source projects. I still believe we can do a better job, right? Let 
us formalize the backport policy and enforce them in each release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21079: [SPARK-23992][CORE] ShuffleDependency does not ne...

2018-04-16 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21079#discussion_r181942230
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -113,3 +118,24 @@ private[spark] class ShuffleMapTask(
 
   override def toString: String = "ShuffleMapTask(%d, %d)".format(stageId, 
partitionId)
 }
+
+object ShuffleMapTask extends Logging {
+  private val cache = CacheBuilder.newBuilder()
--- End diff --

we don't need to clear, this cache can  recovery itself


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21019
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2368/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21019
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-16 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/21019
  
Thanks comments from Imran and Xingbo. 
I made some change and please take another look when you have time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21019
  
**[Test build #89426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89426/testReport)**
 for PR 21019 at commit 
[`42a9b2e`](https://github.com/apache/spark/commit/42a9b2e2a4f23401affbf53f207a7c98d4da0bce).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89422/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20984
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2367/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20695
  
**[Test build #89422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89422/testReport)**
 for PR 20695 at commit 
[`9a4a0ca`](https://github.com/apache/spark/commit/9a4a0ca43185d46800a9e29c9c3b0a139a1e29e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20984
  
**[Test build #89425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89425/testReport)**
 for PR 20984 at commit 
[`77640c4`](https://github.com/apache/spark/commit/77640c449af4f3ba0d7e5a231c24f34f090314db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21078
  
**[Test build #89423 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89423/testReport)**
 for PR 21078 at commit 
[`7d286c4`](https://github.com/apache/spark/commit/7d286c4322bca3c95f678018dc85e476d77e48da).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19868
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89423/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2366/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2365/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21078
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21079: [SPARK-23992][CORE] ShuffleDependency does not ne...

2018-04-16 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21079#discussion_r181938225
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -113,3 +118,24 @@ private[spark] class ShuffleMapTask(
 
   override def toString: String = "ShuffleMapTask(%d, %d)".format(stageId, 
partitionId)
 }
+
+object ShuffleMapTask extends Logging {
+  private val cache = CacheBuilder.newBuilder()
--- End diff --

Do we need to clear this `cache` at the end of a app ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19868
  
**[Test build #89424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89424/testReport)**
 for PR 19868 at commit 
[`bf4277b`](https://github.com/apache/spark/commit/bf4277bf862000bb000d0cbf11092d8565f42afb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-16 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19868#discussion_r181939704
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---
@@ -195,6 +205,10 @@ class NewHadoopRDD[K, V](
   e)
 finished = true
 null
+  case e: FileNotFoundException if ignoreMissingFiles =>
--- End diff --

Yes, I should change this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19868: [SPARK-22676] Avoid iterating all partition paths...

2018-04-16 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19868#discussion_r181939661
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -276,6 +292,12 @@ class HadoopRDD[K, V](
 try {
   finished = !reader.next(key, value)
 } catch {
+  case e: FileNotFoundException if ignoreMissingFiles =>
+logWarning(s"Skipped missing file: ${split.inputSplit}", e)
+finished = true
+null
--- End diff --

I removed this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21078
  
**[Test build #89423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89423/testReport)**
 for PR 21078 at commit 
[`7d286c4`](https://github.com/apache/spark/commit/7d286c4322bca3c95f678018dc85e476d77e48da).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21034: [SPARK-23926][SQL] Extending reverse function to support...

2018-04-16 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21034
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2364/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20695
  
**[Test build #89422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89422/testReport)**
 for PR 20695 at commit 
[`9a4a0ca`](https://github.com/apache/spark/commit/9a4a0ca43185d46800a9e29c9c3b0a139a1e29e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-16 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20888
  
Overall this LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181931856
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

I'd suggest something like `s"${fullName}_0"` at L581. It also solves the 
failed tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21080: [SPARK-23986][SQL] freshName can generate non-uni...

2018-04-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21080#discussion_r181931283
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -575,7 +575,7 @@ class CodegenContext {
 if (freshNameIds.contains(fullName)) {
   val id = freshNameIds(fullName)
   freshNameIds(fullName) = id + 1
-  s"$fullName$id"
+  s"${fullName}_$id"
 } else {
   freshNameIds += fullName -> 1
   fullName
--- End diff --

If the given name is something like `name1_1`, I think you can still 
produce non-unique variable name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21070
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21070
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89415/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21070
  
**[Test build #89415 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89415/testReport)**
 for PR 21070 at commit 
[`27a66d8`](https://github.com/apache/spark/commit/27a66d8114552e199389c944517d42719861b9de).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21060
  
Adding a flag just in 2.3 is, at least, an unusual thing to do. By this 
logic lots of backports should be flag protected but we don't. Why is this 
special?

 I still don't see much argument against this backport. I count about 3-4 
committers in favor and 1 against. Let's leave it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21060: [SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in ...

2018-04-16 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21060
  
I am okay if there's a specific reason. I think this is the point - if 
there's a specific reason, that should be mentioned and explained ahead. 
Actually, I (and @srowen did as well IIUC) asked this many times, see above.

I would have investigated or would have just said that I am okay with 
reverting. I don't usually get in the way if there's a specific reason. It 
would be great if we can have more open talks next time. 

> for the 2.3.x backport, add a config that so it is possible to turn this 
off in production, if somebody actually has their job failed because of this? 
It's a small delta from what this PR already does, and that should alleviate 
the concerns @gatorsmile has.

I am personally fine with reverting or adding a configuration if that's 
what you guys feel strongly; however, I should say it sounds unusual to have a 
config to control this behaviour in branch-2.3 alone and it sounds less worth. 
The case you mention sounds really unlikely and I wonder if that makes sense 
tho. It's also experimental as you all said.

Also, I should note that I have been confused about the backporting policy 
and the bunch of configurations to control each behaviour. If that's just 
concerns to be addressed, that's fine but sounds what people must follow so 
far. If this is true, I feel sure this should be documented. I feel sure we 
shouldn't have such overhead next time. I am pretty sure this isn't the first 
time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2363/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20633
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20633
  
**[Test build #89421 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89421/testReport)**
 for PR 20633 at commit 
[`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20633
  
@dbtsai Thanks! I've solved the conflicts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2362/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20816
  
**[Test build #89420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89420/testReport)**
 for PR 20816 at commit 
[`194e6e7`](https://github.com/apache/spark/commit/194e6e76f3f4605c0c3026ab693124be44608715).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-04-16 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/20633
  
Can you address the conflicts? Gonna start to review it soon. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-04-16 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/19381
  
Too late to the party! Great PR. We should also think about refactor the 
predictors to `mllib-local` eventually so people can use them in prod without 
depending on Spark.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20787: [MINOR][DOCS] Documenting months_between direction

2018-04-16 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20787
  
**[Test build #89419 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89419/testReport)**
 for PR 20787 at commit 
[`3ca9948`](https://github.com/apache/spark/commit/3ca99489a1a88d16f371091e86e4a58962e00759).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat

2018-04-16 Thread henryr

Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r181915827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -115,6 +116,62 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns the union of all the given maps.
+ */
+@ExpressionDescription(
+usage = "_FUNC_(map, ...) - Returns the union of all the given maps",
+examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));
+   [[1 -> "a"], [2 -> "c"], [3 -> "d"]
+  """)
+case class MapConcat(children: Seq[Expression]) extends Expression
+  with CodegenFallback {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+// this check currently does not allow valueContainsNull to vary,
+// and unfortunately none of the MapType toString methods include
+// valueContainsNull for the error message
+if (children.exists(!_.dataType.isInstanceOf[MapType])) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input of function $prettyName should all be of type 
map, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else if (children.map(_.dataType).distinct.length > 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The given input maps of function $prettyName should all be the 
same type, " +
+  "but they are " + 
children.map(_.dataType.simpleString).mkString("[", ", ", "]"))
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+  override def dataType: MapType = {
+children.headOption.map(_.dataType.asInstanceOf[MapType])
+  .getOrElse(MapType(keyType = StringType, valueType = StringType))
+  }
+
+  override def nullable: Boolean = false
--- End diff --

What's the result of `map_concat(NULL, NULL)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21068: [SPARK-16630][YARN] Blacklist a node if executors...

2018-04-16 Thread henryr

Github user henryr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21068#discussion_r181908529
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocatorBlacklistTracker.scala
 ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.yarn
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.HashMap
+
+import org.apache.hadoop.yarn.client.api.AMRMClient
+import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.yarn.config._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.util.{Clock, SystemClock, Utils}
+
+private[spark] class YarnAllocatorBlacklistTracker(
+sparkConf: SparkConf,
+amClient: AMRMClient[ContainerRequest],
+failureWithinTimeIntervalTracker: FailureWithinTimeIntervalTracker)
+  extends Logging {
+
+  private val DEFAULT_TIMEOUT = "1h"
+
+  private val BLACKLIST_TIMEOUT_MILLIS =
+
sparkConf.get(BLACKLIST_TIMEOUT_CONF).getOrElse(Utils.timeStringAsMs(DEFAULT_TIMEOUT))
+
+  private val IS_YARN_ALLOCATION_BLACKLIST_ENABLED =
+sparkConf.get(YARN_ALLOCATION_BLACKLIST_ENABLED).getOrElse(false)
+
+  private val BLACKLIST_MAX_FAILED_EXEC_PER_NODE = 
sparkConf.get(MAX_FAILED_EXEC_PER_NODE)
+
+  private val BLACKLIST_SIZE_LIMIT = 
sparkConf.get(YARN_BLACKLIST_SIZE_LIMIT)
+
+  private val BLACKLIST_SIZE_DEFAULT_WEIGHT = 
sparkConf.get(YARN_BLACKLIST_SIZE_DEFAULT_WEIGHT)
+
+  private var clock: Clock = new SystemClock
+
+  private val allocationBlacklistedNodesWithExpiry = new HashMap[String, 
Long]()
+
+  private var currentBlacklistedYarnNodes = Set.empty[String]
+
+  private var schedulerBlacklistedNodesWithExpiry = Map.empty[String, Long]
--- End diff --

Do you need to keep a separate data structure for the scheduler and 
allocator blacklisted nodes? Instead, could you add the scheduler ones into a 
shared map when `setSchedulerBlacklistedNodes` is called?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 430 matches

Mail list logo