date:20190716

[GitHub] [spark] dongjoon-hyun closed pull request #24993: [SPARK-18299][SQL] Allow more aggregations on KeyValueGroupedDataset

2019-07-16 Thread GitBox

dongjoon-hyun closed pull request #24993: [SPARK-18299][SQL] Allow more 
aggregations on KeyValueGroupedDataset
URL: https://github.com/apache/spark/pull/24993
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin closed pull request #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

vanzin closed pull request #24817: [SPARK-27963][core] Allow dynamic allocation 
without a shuffle service.
URL: https://github.com/apache/spark/pull/24817
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24993: [SPARK-18299][SQL] Allow more aggregations on KeyValueGroupedDataset

2019-07-16 Thread GitBox

dongjoon-hyun commented on issue #24993: [SPARK-18299][SQL] Allow more 
aggregations on KeyValueGroupedDataset
URL: https://github.com/apache/spark/pull/24993#issuecomment-512041697
 
 
   @nooberfsh . What is your Apache JIRA ID? I want to add you to the Apache 
Spark Contributor group and assign SPARK-18299 to you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on issue #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

vanzin commented on issue #24817: [SPARK-27963][core] Allow dynamic allocation 
without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-512041567
 
 
   Merging to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24993: [SPARK-18299][SQL] Allow more aggregations on KeyValueGroupedDataset

2019-07-16 Thread GitBox

dongjoon-hyun commented on issue #24993: [SPARK-18299][SQL] Allow more 
aggregations on KeyValueGroupedDataset
URL: https://github.com/apache/spark/pull/24993#issuecomment-512041073
 
 
   Thank you, @nooberfsh and @JoshRosen .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #25174: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread GitBox

dongjoon-hyun commented on issue #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174#issuecomment-512039299
 
 
   Thank you so much, @hvanhovell !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25173: [SPARK-28416][SQL] Use java.time API in timestampAddInterval

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25173: [SPARK-28416][SQL] Use 
java.time API in timestampAddInterval
URL: https://github.com/apache/spark/pull/25173#issuecomment-512037229
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107758/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25173: [SPARK-28416][SQL] Use java.time API in timestampAddInterval

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25173: [SPARK-28416][SQL] Use java.time API 
in timestampAddInterval
URL: https://github.com/apache/spark/pull/25173#issuecomment-512037229
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107758/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25173: [SPARK-28416][SQL] Use java.time API in timestampAddInterval

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25173: [SPARK-28416][SQL] Use java.time API 
in timestampAddInterval
URL: https://github.com/apache/spark/pull/25173#issuecomment-512037222
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25173: [SPARK-28416][SQL] Use java.time API in timestampAddInterval

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25173: [SPARK-28416][SQL] Use 
java.time API in timestampAddInterval
URL: https://github.com/apache/spark/pull/25173#issuecomment-512037222
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25173: [SPARK-28416][SQL] Use java.time API in timestampAddInterval

2019-07-16 Thread GitBox

SparkQA removed a comment on issue #25173: [SPARK-28416][SQL] Use java.time API 
in timestampAddInterval
URL: https://github.com/apache/spark/pull/25173#issuecomment-511964088
 
 
   **[Test build #107758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107758/testReport)**
 for PR 25173 at commit 
[`9eaffa7`](https://github.com/apache/spark/commit/9eaffa70bad926fb595e8b89f19b1649ae245477).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25173: [SPARK-28416][SQL] Use java.time API in timestampAddInterval

2019-07-16 Thread GitBox

SparkQA commented on issue #25173: [SPARK-28416][SQL] Use java.time API in 
timestampAddInterval
URL: https://github.com/apache/spark/pull/25173#issuecomment-512036844
 
 
   **[Test build #107758 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107758/testReport)**
 for PR 25173 at commit 
[`9eaffa7`](https://github.com/apache/spark/commit/9eaffa70bad926fb595e8b89f19b1649ae245477).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mccheah commented on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver

2019-07-16 Thread GitBox

mccheah commented on issue #24796: [SPARK-27900][CORE] Add uncaught exception 
handler to the driver
URL: https://github.com/apache/spark/pull/24796#issuecomment-512029989
 
 
   Yeah adding the uncaught exception hook shouldn't hurt here. As long as this 
doesn't eliminate the pod from the system (so that one can access the logs if 
necessary), it should be fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ifilonenko commented on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver

2019-07-16 Thread GitBox

ifilonenko commented on issue #24796: [SPARK-27900][CORE] Add uncaught 
exception handler to the driver
URL: https://github.com/apache/spark/pull/24796#issuecomment-512028704
 
 
   I can see no reason why Kubernetes shouldn't include a kill on OOM. I agree 
that this should be extended for Kubernetes. These OOM issues do cause issues 
for our operators so this would be useful to report back a failed job. @skonto 
I'd prefer to see `-XX:+ExitOnOutOfMemoryError` be added to the kube 
entrypoint. at the very least to have parity with yarn. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement 
REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512028227
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12883/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement 
REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512028221
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512028221
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512028227
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12883/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE 
and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512026634
 
 
   **[Test build #107764 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107764/testReport)**
 for PR 24798 at commit 
[`581dba2`](https://github.com/apache/spark/commit/581dba2822a8c1d23f9da653e8c26c84b18cfa3e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement 
REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512025787
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107756/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement 
REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512025783
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512025783
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512025787
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107756/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

SparkQA removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-511950480
 
 
   **[Test build #107756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107756/testReport)**
 for PR 24798 at commit 
[`0b5c029`](https://github.com/apache/spark/commit/0b5c0290c6aca26030f3f4b3c93069d8c2366969).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE 
and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512024041
 
 
   **[Test build #107756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107756/testReport)**
 for PR 24798 at commit 
[`0b5c029`](https://github.com/apache/spark/commit/0b5c0290c6aca26030f3f4b3c93069d8c2366969).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on a change in pull request #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

rdblue commented on a change in pull request #24798: [SPARK-27724][SQL] 
Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#discussion_r304148360
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
 ##
 @@ -78,7 +80,8 @@ case class CreateTableAsSelectExec(
 }
 
 Utils.tryWithSafeFinallyAndFailureCallbacks({
-  catalog.createTable(ident, query.schema, partitioning.toArray, 
properties.asJava) match {
+  catalog.createTable(
+ident, query.schema, partitioning.toArray, properties.asJava) match {
 
 Review comment:
   I agree. The scope of this suggestion is larger than this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25176: [SPARK-28417][Spark Core] Wrap File Glob Resolution in DoAs to use ProxyUser Credentials

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25176: [SPARK-28417][Spark Core] Wrap 
File Glob Resolution in DoAs to use ProxyUser Credentials
URL: https://github.com/apache/spark/pull/25176#issuecomment-512005604
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25176: [SPARK-28417][Spark Core] Wrap File Glob Resolution in DoAs to use ProxyUser Credentials

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25176: [SPARK-28417][Spark Core] Wrap File 
Glob Resolution in DoAs to use ProxyUser Credentials
URL: https://github.com/apache/spark/pull/25176#issuecomment-512006177
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25176: [SPARK-28417][Spark Core] Wrap File Glob Resolution in DoAs to use ProxyUser Credentials

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25176: [SPARK-28417][Spark Core] Wrap 
File Glob Resolution in DoAs to use ProxyUser Credentials
URL: https://github.com/apache/spark/pull/25176#issuecomment-512005450
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : 
Optimize Spark Scheduler to dequeue speculative tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304138587
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
 ##
 @@ -1723,4 +1722,48 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(manager.resourceOffer("exec2", "host2", ANY).isEmpty)
 assert(manager.resourceOffer("exec3", "host3", ANY).isEmpty)
   }
+
+  test("SPARK-26755 Ensure that a speculative task obeys the original locality 
preferences") {
+sc = new SparkContext("local", "test")
+sched = new FakeTaskScheduler(sc, ("exec1", "host1"),
+  ("exec2", "host2"), ("exec3", "host3"), ("exec4", "host4"))
+// Create 3 tasks with locality preferences
+val taskSet = FakeTask.createTaskSet(3,
+  Seq(TaskLocation("host1"), TaskLocation("host3")),
+  Seq(TaskLocation("host2")),
+  Seq(TaskLocation("host3")))
+// Set the speculation multiplier to be 0 so speculative tasks are 
launched immediately
+sc.conf.set(config.SPECULATION_MULTIPLIER, 0.0)
+sc.conf.set(config.SPECULATION_ENABLED, true)
+sc.conf.set(config.SPECULATION_QUANTILE, 0.5)
+val clock = new ManualClock()
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock 
= clock)
+val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = 
taskSet.tasks.map { task =>
+  task.metrics.internalAccums
+}
+// Offer resources for 3 tasks to start
+Seq("exec1" -> "host1", "exec2" -> "host2", "exec3" -> "host3").foreach { 
case (exec, host) =>
+  val taskOption = manager.resourceOffer(exec, host, NO_PREF)
+  assert(taskOption.isDefined)
+  assert(taskOption.get.executorId === exec)
+}
+assert(sched.startedTasks.toSet === Set(0, 1, 2))
+clock.advance(1)
+// Finish one task and mark the others as speculatable
+manager.handleSuccessfulTask(2, createTaskResult(2, accumUpdatesByTask(2)))
+assert(sched.endedTasks(2) === Success)
+clock.advance(1)
+assert(manager.checkSpeculatableTasks(0))
+assert(sched.speculativeTasks.toSet === Set(0, 1))
+// Ensure that the speculatable tasks obey the original locality 
preferences
+assert(manager.resourceOffer("exec4", "host4", NODE_LOCAL).isEmpty)
+assert(manager.resourceOffer("exec2", "host2", NODE_LOCAL).isEmpty)
+assert(manager.resourceOffer("exec3", "host3", NODE_LOCAL).isDefined)
+assert(manager.resourceOffer("exec4", "host4", ANY).isDefined)
 
 Review comment:
   any particular reason to pull this out into a separate test case?  Seems 
like it could be combined.  Its fine if there is a good reason, but I don't 
like a proliferation of test cases that are all doing more or less the same 
thing.  It seems the only thing which you aren't doing here, but you are doing 
above, is checking the taskId etc. of the speculative tasks.
   
   also another thing missing from both tests -- there is no check that we do 
not schedule a speculative task on the same host as the original task, even 
despite locality preferences.
   
   (I realize some of these tests were missing before, but this logic is 
getting a little trickier now, and maybe those tests always should have been 
there)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : 
Optimize Spark Scheduler to dequeue speculative tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304136584
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -1054,6 +1043,19 @@ private[spark] object TaskSetManager {
   val TASK_SIZE_TO_WARN_KIB = 1000
 }
 
+// Set of pending tasks for various levels of locality: executor, host, rack,
+// noPrefs and anyPrefs. These collections are actually
+// treated as stacks, in which new tasks are added to the end of the
+// ArrayBuffer and removed from the end. This makes it faster to detect
+// tasks that repeatedly fail because whenever a task failed, it is put
+// back at the head of the stack. These collections may contain duplicates
+// for two reasons:
+// (1): Tasks are only removed lazily; when a task is launched, it remains
+// in all the pending lists except the one that it was launched from.
+// (2): Tasks may be re-added to these lists multiple times as a result
+// of failures.
+// Duplicates are handled in dequeueTaskFromList, which ensures that a
+// task hasn't already started running before launching it.
 
 Review comment:
   turn this into a scaladoc comment so its shows up in IDEs for 
PendingTasksByLocality
   ```
   /**
* Set of ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] 
InsertInto with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-512005480
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nvander1 commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-07-16 Thread GitBox

nvander1 commented on issue #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232#issuecomment-512005589
 
 
   I had only launched the ./build/sbt and run compile in interactive mode 
without any flags to the build. Thanks for helping me find the build config 
@HyukjinKwon . I'll test locally with that and report back later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto 
with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-512005484
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107763/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25176: [SPARK-28417][Spark Core] Wrap File Glob Resolution in DoAs to use ProxyUser Credentials

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25176: [SPARK-28417][Spark Core] Wrap File 
Glob Resolution in DoAs to use ProxyUser Credentials
URL: https://github.com/apache/spark/pull/25176#issuecomment-512005604
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto 
with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-512005480
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] 
InsertInto with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-512005484
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107763/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

SparkQA removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] 
InsertInto with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-511996567
 
 
   **[Test build #107763 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107763/testReport)**
 for PR 25175 at commit 
[`566fc84`](https://github.com/apache/spark/commit/566fc84ab87470dfac1103854bc550b33d593bdd).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25176: [SPARK-28417][Spark Core] Wrap File Glob Resolution in DoAs to use ProxyUser Credentials

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25176: [SPARK-28417][Spark Core] Wrap File 
Glob Resolution in DoAs to use ProxyUser Credentials
URL: https://github.com/apache/spark/pull/25176#issuecomment-512005450
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

SparkQA commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with 
overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-512005183
 
 
   **[Test build #107763 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107763/testReport)**
 for PR 25175 at commit 
[`566fc84`](https://github.com/apache/spark/commit/566fc84ab87470dfac1103854bc550b33d593bdd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] modi95 opened a new pull request #25176: [SPARK-28417][Spark Core] Wrap File Glob Resolution in DoAs to use ProxyUser Credentials

2019-07-16 Thread GitBox

modi95 opened a new pull request #25176: [SPARK-28417][Spark Core] Wrap File 
Glob Resolution in DoAs to use ProxyUser Credentials
URL: https://github.com/apache/spark/pull/25176
 
 
   ## What changes were proposed in this pull request?
   
   This PR addresses SPARK-28417 by wrapping the Glob file path resolution in a 
DoAs so that ProxyUser credentials are used for the glob resolution.
   
   ## How was this patch tested?
   
   This patch was tested using the unit tests and via manual testing. This was 
a bug encountered by Spark users at Uber. The bug was patched and is currently 
being used in prod at Uber. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #24817: [SPARK-27963][core] Allow 
dynamic allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-512004788
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #24817: [SPARK-27963][core] Allow 
dynamic allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-512004794
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107757/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

SparkQA removed a comment on issue #24817: [SPARK-27963][core] Allow dynamic 
allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-511961467
 
 
   **[Test build #107757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107757/testReport)**
 for PR 24817 at commit 
[`6154bf4`](https://github.com/apache/spark/commit/6154bf486e68dbb5a4c16dc9e71030cc20d8ca58).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #24817: [SPARK-27963][core] Allow dynamic 
allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-512004794
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107757/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #24817: [SPARK-27963][core] Allow dynamic 
allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-512004788
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24817: [SPARK-27963][core] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread GitBox

SparkQA commented on issue #24817: [SPARK-27963][core] Allow dynamic allocation 
without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#issuecomment-512004162
 
 
   **[Test build #107757 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107757/testReport)**
 for PR 24817 at commit 
[`6154bf4`](https://github.com/apache/spark/commit/6154bf486e68dbb5a4c16dc9e71030cc20d8ca58).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on issue #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on issue #23677: [SPARK-26755][SCHEDULER] : Optimize Spark 
Scheduler to dequeue speculative tasks…
URL: https://github.com/apache/spark/pull/23677#issuecomment-512001628
 
 
   @squito @Ngone51 Have updated the PR with the changes. Thank you once again 
for your valuable reviews.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304135359
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -143,25 +144,18 @@ private[spark] class TaskSetManager(
   // of failures.
   // Duplicates are handled in dequeueTaskFromList, which ensures that a
   // task hasn't already started running before launching it.
-  private val pendingTasksForExecutor = new HashMap[String, ArrayBuffer[Int]]
 
-  // Set of pending tasks for each host. Similar to pendingTasksForExecutor,
-  // but at host level.
-  private val pendingTasksForHost = new HashMap[String, ArrayBuffer[Int]]
-
-  // Set of pending tasks for each rack -- similar to the above.
-  private val pendingTasksForRack = new HashMap[String, ArrayBuffer[Int]]
-
-  // Set containing pending tasks with no locality preferences.
-  private[scheduler] var pendingTasksWithNoPrefs = new ArrayBuffer[Int]
-
-  // Set containing all pending tasks (also used as a stack, as above).
-  private val allPendingTasks = new ArrayBuffer[Int]
+  private[scheduler] val pendingTasks = new PendingTasksByLocality()
 
   // Tasks that can be speculated. Since these will be a small fraction of 
total
-  // tasks, we'll just hold them in a HashSet.
+  // tasks, we'll just hold them in a HashSet. The HashSet here ensures that 
we do not add
+  // duplicate speculative tasks.
   private[scheduler] val speculatableTasks = new HashSet[Int]
 
 Review comment:
   So I ran a Join query on a dataset of size 10TB without this change and out 
of 10 tasks for the ShuffleMapStage, the maximum number of speculatable 
tasks that was noted was close to 7900-8000 at a point. That is when we start 
seeing the bottleneck on the scheduler lock.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wypoon commented on issue #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

wypoon commented on issue #23767: [SPARK-26329][CORE] Faster polling of 
executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#issuecomment-512000423
 
 
   @attilapiros regarding type aliases: I think the injunction against them 
applies to writing Java compatible APIs, which does not apply in this case. 
ExecutorMetricsPoller is an internal implementation detail.
   I agree that TCMP is cryptic, but I was hoping that the preceding comment 
makes clear that it stands for "Task Count and Metric Peaks".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304098114
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
 ##
 @@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.lang.Long.{MAX_VALUE => LONG_MAX_VALUE}
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.{AtomicLong, AtomicLongArray}
+
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.internal.Logging
+import org.apache.spark.memory.MemoryManager
+import org.apache.spark.metrics.ExecutorMetricType
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * :: DeveloperApi ::
 
 Review comment:
   why is this a developerapi?  I think it should just be internal


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304113971
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
 ##
 @@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.lang.Long.{MAX_VALUE => LONG_MAX_VALUE}
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.{AtomicLong, AtomicLongArray}
+
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.internal.Logging
+import org.apache.spark.memory.MemoryManager
+import org.apache.spark.metrics.ExecutorMetricType
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * :: DeveloperApi ::
+ * A class that polls executor metrics, and tracks their peaks per task and 
per stage.
+ * Each executor keeps an instance of this class.
+ * The poll method polls the executor metrics, and is either run in its own 
thread or
+ * called by the executor's heartbeater thread, depending on configuration.
+ * The class keeps two ConcurrentHashMaps that are accessed (via its methods) 
by the
+ * executor's task runner threads concurrently with the polling thread. One 
thread may
+ * update one of these maps while another reads it, so the reading thread may 
not get
+ * the latest metrics, but this is ok.
+ *
+ * @param memoryManager the memory manager used by the executor.
+ * @param pollingInterval the polling interval in milliseconds.
+ */
+@DeveloperApi
+private[spark] class ExecutorMetricsPoller(
+memoryManager: MemoryManager,
+pollingInterval: Long)
+  extends Logging {
+
+  type StageKey = (Int, Int)
+  // tuple for Task Count and Metric Peaks
+  type TCMP = (AtomicLong, AtomicLongArray)
 
 Review comment:
   I'm not sure I see the value in the extra util methods, but I agree about 
having a small case class -- if nothing else, it would make the code a lot 
cleared if instead of `v._2` it was `taskCountAndMetrics.metrics`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304123225
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala
 ##
 @@ -342,6 +320,87 @@ class ExecutorSuite extends SparkFunSuite
 }
   }
 
+  test("Send task executor metrics in DirectTaskResult") {
+// Run a successful, trivial result task
+// We need to ensure, however, that executor metrics are polled after the 
task is started
+// so this requires some coordination using ExecutorSuiteHelper.
+val conf = new SparkConf().setMaster("local").setAppName("executor suite 
test")
+sc = new SparkContext(conf)
+val serializer = SparkEnv.get.closureSerializer.newInstance()
+ExecutorSuiteHelper.latches = new ExecutorSuiteHelper
+val resultFunc =
+  (context: TaskContext, itr: Iterator[Int]) => {
+ExecutorSuiteHelper.latches.latch1.await(300, TimeUnit.MILLISECONDS)
+ExecutorSuiteHelper.latches.latch2.countDown()
+ExecutorSuiteHelper.latches.latch3.await(500, TimeUnit.MILLISECONDS)
 
 Review comment:
   would help to have a short comment about the purpose of these latches.  Eg. 
"latch2 tells the metricsPoller to poll and send results, and latch3 waits till 
the poller has sent its results"
   
   (I don't see the point of latch1, so you'd probably switch to just using 
latch1 & latch2)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304127412
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
 ##
 @@ -600,8 +611,16 @@ private[spark] object JsonProtocolSuite extends 
Assertions {
 assert(stageAttemptId1 === stageAttemptId2)
 assertSeqEquals[AccumulableInfo](updates1, updates2, (a, b) => 
a.equals(b))
   })
-assertOptionEquals(e1.executorUpdates, e2.executorUpdates,
-(e1: ExecutorMetrics, e2: ExecutorMetrics) => assertEquals(e1, e2))
+assertSeqEquals[((Int, Int), ExecutorMetrics)](
+  e1.executorUpdates.toSeq.sortWith((x, y) => lexOrder(x._1, y._1)),
+  e2.executorUpdates.toSeq.sortWith((x, y) => lexOrder(x._1, y._1)),
+  (a, b) => {
+val (k1, v1) = a
+val (k2, v2) = b
+assert(k1 === k2)
+assertEquals(v1, v2)
+  }
 
 Review comment:
   can use pattern matching earlier to clean up a bit more:
   
   ```scala
 { case ((k1, v1), (k2, v2)) =>
   assert(k1 === k2)
   assertEquals(v1, v2)
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304098528
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
 ##
 @@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.lang.Long.{MAX_VALUE => LONG_MAX_VALUE}
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.{AtomicLong, AtomicLongArray}
+
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.internal.Logging
+import org.apache.spark.memory.MemoryManager
+import org.apache.spark.metrics.ExecutorMetricType
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * :: DeveloperApi ::
+ * A class that polls executor metrics, and tracks their peaks per task and 
per stage.
+ * Each executor keeps an instance of this class.
+ * The poll method polls the executor metrics, and is either run in its own 
thread or
+ * called by the executor's heartbeater thread, depending on configuration.
+ * The class keeps two ConcurrentHashMaps that are accessed (via its methods) 
by the
+ * executor's task runner threads concurrently with the polling thread. One 
thread may
+ * update one of these maps while another reads it, so the reading thread may 
not get
+ * the latest metrics, but this is ok.
+ *
+ * @param memoryManager the memory manager used by the executor.
+ * @param pollingInterval the polling interval in milliseconds.
+ */
+@DeveloperApi
+private[spark] class ExecutorMetricsPoller(
+memoryManager: MemoryManager,
+pollingInterval: Long)
+  extends Logging {
 
 Review comment:
   nit: move this up to the line above
   
   ```scala
   pollingInterval: Long) extends Logging {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304125444
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
 ##
 @@ -600,8 +611,16 @@ private[spark] object JsonProtocolSuite extends 
Assertions {
 assert(stageAttemptId1 === stageAttemptId2)
 assertSeqEquals[AccumulableInfo](updates1, updates2, (a, b) => 
a.equals(b))
   })
-assertOptionEquals(e1.executorUpdates, e2.executorUpdates,
-(e1: ExecutorMetrics, e2: ExecutorMetrics) => assertEquals(e1, e2))
+assertSeqEquals[((Int, Int), ExecutorMetrics)](
+  e1.executorUpdates.toSeq.sortWith((x, y) => lexOrder(x._1, y._1)),
+  e2.executorUpdates.toSeq.sortWith((x, y) => lexOrder(x._1, y._1)),
 
 Review comment:
   scala provides a default ordering on tuples, which is the same as your 
`lexOrder`, so I think you can get rid of that and this becomes  `.sortBy(_._1)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304122257
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala
 ##
 @@ -342,6 +320,87 @@ class ExecutorSuite extends SparkFunSuite
 }
   }
 
+  test("Send task executor metrics in DirectTaskResult") {
+// Run a successful, trivial result task
+// We need to ensure, however, that executor metrics are polled after the 
task is started
+// so this requires some coordination using ExecutorSuiteHelper.
+val conf = new SparkConf().setMaster("local").setAppName("executor suite 
test")
+sc = new SparkContext(conf)
+val serializer = SparkEnv.get.closureSerializer.newInstance()
+ExecutorSuiteHelper.latches = new ExecutorSuiteHelper
+val resultFunc =
+  (context: TaskContext, itr: Iterator[Int]) => {
+ExecutorSuiteHelper.latches.latch1.await(300, TimeUnit.MILLISECONDS)
+ExecutorSuiteHelper.latches.latch2.countDown()
+ExecutorSuiteHelper.latches.latch3.await(500, TimeUnit.MILLISECONDS)
+itr.size
+  }
+val rdd = new RDD[Int](sc, Nil) {
+  override def compute(split: Partition, context: TaskContext): 
Iterator[Int] = {
+val l = List(1)
+l.iterator
 
 Review comment:
   can just be `Iterator(1)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304109127
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
 ##
 @@ -268,12 +278,16 @@ private[spark] class EventLoggingListener(
 
   override def onExecutorMetricsUpdate(event: 
SparkListenerExecutorMetricsUpdate): Unit = {
 if (shouldLogStageExecutorMetrics) {
-  // For the active stages, record any new peak values for the memory 
metrics for the executor
-  event.executorUpdates.foreach { executorUpdates =>
-liveStageExecutorMetrics.values.foreach { peakExecutorMetrics =>
-  val peakMetrics = peakExecutorMetrics.getOrElseUpdate(
-event.execId, new ExecutorMetrics())
-  peakMetrics.compareAndUpdatePeakValues(executorUpdates)
+  event.executorUpdates.foreach { case (stageKey1, peaks) =>
+liveStageExecutorMetrics.foreach { case (stageKey2, 
metricsPerExecutor) =>
+  // If the update came from the driver, stageKey1 will be the dummy 
key (-1, -1),
+  // so record those peaks for all active stages.
+  // Otherwise, record the peaks for the matching stage.
+  if (stageKey1 == DRIVER_STAGE_KEY || stageKey1 == stageKey2) {
+val metrics = metricsPerExecutor.getOrElseUpdate(
+  event.execId, new ExecutorMetrics())
+metrics.compareAndUpdatePeakValues(peaks)
 
 Review comment:
   minor, can you rename `peaks` -> `newPeaks` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304100902
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
 ##
 @@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.lang.Long.{MAX_VALUE => LONG_MAX_VALUE}
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.{AtomicLong, AtomicLongArray}
+
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.internal.Logging
+import org.apache.spark.memory.MemoryManager
+import org.apache.spark.metrics.ExecutorMetricType
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * :: DeveloperApi ::
+ * A class that polls executor metrics, and tracks their peaks per task and 
per stage.
+ * Each executor keeps an instance of this class.
+ * The poll method polls the executor metrics, and is either run in its own 
thread or
+ * called by the executor's heartbeater thread, depending on configuration.
+ * The class keeps two ConcurrentHashMaps that are accessed (via its methods) 
by the
+ * executor's task runner threads concurrently with the polling thread. One 
thread may
+ * update one of these maps while another reads it, so the reading thread may 
not get
+ * the latest metrics, but this is ok.
+ *
+ * @param memoryManager the memory manager used by the executor.
+ * @param pollingInterval the polling interval in milliseconds.
+ */
+@DeveloperApi
+private[spark] class ExecutorMetricsPoller(
+memoryManager: MemoryManager,
+pollingInterval: Long)
+  extends Logging {
+
+  type StageKey = (Int, Int)
+  // tuple for Task Count and Metric Peaks
+  type TCMP = (AtomicLong, AtomicLongArray)
+
+  // Map of (stageId, stageAttemptId) to (count of running tasks, executor 
metric peaks)
+  private val stageTCMP = new ConcurrentHashMap[StageKey, TCMP]
+
+  // Map of taskId to executor metric peaks
+  private val taskMetricPeaks = new ConcurrentHashMap[Long, AtomicLongArray]
+
+  private val poller =
+if (pollingInterval > 0) {
+  
Some(ThreadUtils.newDaemonSingleThreadScheduledExecutor("executor-metrics-poller"))
+} else {
+  None
+}
+
+  /**
+   * Function to poll executor metrics.
+   * On start, if pollingInterval is positive, this is scheduled to run at 
that interval.
+   * Otherwise, this is called by the reportHeartBeat function defined in 
Executor and passed
+   * to its Heartbeater.
+   */
+  def poll(): Unit = {
+// Note: Task runner threads may update stageTCMP or read from 
taskMetricPeaks concurrently
+// with this function via calls to methods of this class.
+
+// get the latest values for the metrics
+val latestMetrics = ExecutorMetrics.getCurrentMetrics(memoryManager)
+
+def updatePeaks(metrics: AtomicLongArray): Unit = {
+  (0 until metrics.length).foreach { i =>
+metrics.getAndAccumulate(i, latestMetrics(i), math.max)
+  }
+}
+
+// for each active stage, update the peaks
+stageTCMP.forEachValue(LONG_MAX_VALUE, v => updatePeaks(v._2))
+
+// for each running task, update the peaks
+taskMetricPeaks.forEachValue(LONG_MAX_VALUE, updatePeaks)
+  }
+
+  /** Starts the polling thread. */
+  def start(): Unit = {
+poller.foreach { exec =>
+  val pollingTask: Runnable = () => Utils.logUncaughtExceptions(poll())
+  exec.scheduleAtFixedRate(pollingTask, 0L, pollingInterval, 
TimeUnit.MILLISECONDS)
+}
+  }
+
+  /**
+   * Called by TaskRunner#run.
+   *
+   * @param taskId the id of the task being run.
+   * @param stageId the id of the stage the task belongs to.
+   * @param stageAttemptId the attempt number of the stage the task belongs to.
+   */
+  def onTaskStart(taskId: Long, stageId: Int, stageAttemptId: Int): Unit = {
+// Put an entry in taskMetricPeaks for the task.
+taskMetricPeaks.put(taskId, new 
AtomicLongArray(ExecutorMetricType.numMetrics))
+
+// Put a new entry in stageTCMP for the stage if there isn't one already.
+// Increment the task

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304118293
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/executor/ExecutorMetricsPoller.scala
 ##
 @@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.executor
+
+import java.lang.Long.{MAX_VALUE => LONG_MAX_VALUE}
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.{AtomicLong, AtomicLongArray}
+
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.internal.Logging
+import org.apache.spark.memory.MemoryManager
+import org.apache.spark.metrics.ExecutorMetricType
+import org.apache.spark.util.{ThreadUtils, Utils}
+
+/**
+ * :: DeveloperApi ::
+ * A class that polls executor metrics, and tracks their peaks per task and 
per stage.
+ * Each executor keeps an instance of this class.
+ * The poll method polls the executor metrics, and is either run in its own 
thread or
+ * called by the executor's heartbeater thread, depending on configuration.
+ * The class keeps two ConcurrentHashMaps that are accessed (via its methods) 
by the
+ * executor's task runner threads concurrently with the polling thread. One 
thread may
+ * update one of these maps while another reads it, so the reading thread may 
not get
+ * the latest metrics, but this is ok.
+ *
+ * @param memoryManager the memory manager used by the executor.
 
 Review comment:
   We should also explain why there are metrics tracked per stage and also per 
task.  I think mostly just your comment here: 
https://issues.apache.org/jira/browse/SPARK-26329?focusedCommentId=16745313=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16745313


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster polling of executor memory metrics.

2019-07-16 Thread GitBox

squito commented on a change in pull request #23767: [SPARK-26329][CORE] Faster 
polling of executor memory metrics.
URL: https://github.com/apache/spark/pull/23767#discussion_r304123384
 
 

 ##
 File path: core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala
 ##
 @@ -342,6 +320,87 @@ class ExecutorSuite extends SparkFunSuite
 }
   }
 
+  test("Send task executor metrics in DirectTaskResult") {
+// Run a successful, trivial result task
+// We need to ensure, however, that executor metrics are polled after the 
task is started
+// so this requires some coordination using ExecutorSuiteHelper.
+val conf = new SparkConf().setMaster("local").setAppName("executor suite 
test")
+sc = new SparkContext(conf)
+val serializer = SparkEnv.get.closureSerializer.newInstance()
+ExecutorSuiteHelper.latches = new ExecutorSuiteHelper
+val resultFunc =
+  (context: TaskContext, itr: Iterator[Int]) => {
+ExecutorSuiteHelper.latches.latch1.await(300, TimeUnit.MILLISECONDS)
+ExecutorSuiteHelper.latches.latch2.countDown()
+ExecutorSuiteHelper.latches.latch3.await(500, TimeUnit.MILLISECONDS)
+itr.size
+  }
+val rdd = new RDD[Int](sc, Nil) {
+  override def compute(split: Partition, context: TaskContext): 
Iterator[Int] = {
+val l = List(1)
+l.iterator
+  }
+  override protected def getPartitions: Array[Partition] = {
+Array(new SimplePartition)
+  }
+}
+val taskDescription = createResultTaskDescription(serializer, resultFunc, 
rdd, 0)
+
+val mockBackend = mock[ExecutorBackend]
+when(mockBackend.statusUpdate(any(), meq(TaskState.RUNNING), any()))
+  .thenAnswer(new Answer[Unit] {
+override def answer(invocationOnMock: InvocationOnMock): Unit = {
+  ExecutorSuiteHelper.latches.latch1.countDown()
+}
+  })
 
 Review comment:
   as I mentioned above, i don't see any point to having the call to 
`statusUpdate` set a latch.  the statusUpdate occurs in the same thread as the 
taskRunner, so you know its completed by the time `resultFunc` is called.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304133121
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
 ##
 @@ -1655,4 +1657,70 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 // get removed inside TaskSchedulerImpl later.
 assert(availableResources(GPU) sameElements Array("0", "1", "2", "3"))
   }
+
+  test("SPARK-26755 Ensure that a speculative task is submitted only once for 
execution") {
+sc = new SparkContext("local", "test")
+sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
+val taskSet = FakeTask.createTaskSet(4)
+// Set the speculation multiplier to be 0 so speculative tasks are 
launched immediately
+sc.conf.set(config.SPECULATION_MULTIPLIER, 0.0)
+sc.conf.set(config.SPECULATION_ENABLED, true)
+sc.conf.set(config.SPECULATION_QUANTILE, 0.5)
+val clock = new ManualClock()
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock 
= clock)
+val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = 
taskSet.tasks.map { task =>
+  task.metrics.internalAccums
+}
+// Offer resources for 4 tasks to start
+for ((k, v) <- List(
+  "exec1" -> "host1",
+  "exec1" -> "host1",
+  "exec2" -> "host2",
+  "exec2" -> "host2")) {
+  val taskOption = manager.resourceOffer(k, v, NO_PREF)
+  assert(taskOption.isDefined)
+  val task = taskOption.get
+  assert(task.executorId === k)
+}
+assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
+clock.advance(1)
+// Complete the first 2 tasks and leave the other 2 tasks in running
+for (id <- Set(0, 1)) {
+  manager.handleSuccessfulTask(id, createTaskResult(id, 
accumUpdatesByTask(id)))
+  assert(sched.endedTasks(id) === Success)
+}
+// checkSpeculatableTasks checks that the task runtime is greater than the 
threshold for
+// speculating. Since we use a threshold of 0 for speculation, tasks need 
to be running for
+// > 0ms, so advance the clock by 1ms here.
+clock.advance(1)
+assert(manager.checkSpeculatableTasks(0))
+assert(sched.speculativeTasks.toSet === Set(2, 3))
+assert(manager.copiesRunning(2) === 1)
+assert(manager.copiesRunning(3) === 1)
+
+// Offer resource to start the speculative attempt for the running task
 
 Review comment:
   Have written a separate unit test to account for the locality preference 
feature. Thank you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304133210
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -1064,7 +979,8 @@ private[spark] class TaskSetManager(
 val info = taskInfos(tid)
 val index = info.index
 if (!successful(index) && copiesRunning(index) == 1 && 
info.timeRunning(time) > threshold &&
-  !speculatableTasks.contains(index)) {
+!speculatableTasks.contains(index)) {
+  addPendingTask(index, speculative = true)
   logInfo(
 "Marking task %d in stage %s (on %s) as speculatable because it 
ran more than %.0f ms"
   .format(index, taskSet.id, info.host, threshold))
 
 Review comment:
   Makes sense, have updated the comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304132725
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -234,63 +228,41 @@ private[spark] class TaskSetManager(
   /** Add a task to all the pending-task lists that it should be on. */
   private[spark] def addPendingTask(
   index: Int,
-  resolveRacks: Boolean = true): Unit = {
+  resolveRacks: Boolean = true,
+  speculative: Boolean = false): Unit = {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304132824
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
 ##
 @@ -1655,4 +1657,70 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 // get removed inside TaskSchedulerImpl later.
 assert(availableResources(GPU) sameElements Array("0", "1", "2", "3"))
   }
+
+  test("SPARK-26755 Ensure that a speculative task is submitted only once for 
execution") {
+sc = new SparkContext("local", "test")
+sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
+val taskSet = FakeTask.createTaskSet(4)
+// Set the speculation multiplier to be 0 so speculative tasks are 
launched immediately
+sc.conf.set(config.SPECULATION_MULTIPLIER, 0.0)
+sc.conf.set(config.SPECULATION_ENABLED, true)
+sc.conf.set(config.SPECULATION_QUANTILE, 0.5)
+val clock = new ManualClock()
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock 
= clock)
+val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = 
taskSet.tasks.map { task =>
+  task.metrics.internalAccums
+}
+// Offer resources for 4 tasks to start
+for ((k, v) <- List(
+  "exec1" -> "host1",
+  "exec1" -> "host1",
+  "exec2" -> "host2",
+  "exec2" -> "host2")) {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304132776
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -302,16 +274,22 @@ private[spark] class TaskSetManager(
   private def dequeueTaskFromList(
   execId: String,
   host: String,
-  list: ArrayBuffer[Int]): Option[Int] = {
+  list: ArrayBuffer[Int],
+  speculative: Boolean = false): Option[Int] = {
 var indexOffset = list.size
 while (indexOffset > 0) {
   indexOffset -= 1
   val index = list(indexOffset)
-  if (!isTaskBlacklistedOnExecOrNode(index, execId, host)) {
+  if (!isTaskBlacklistedOnExecOrNode(index, execId, host) &&
+!(speculative && hasAttemptOnHost(index, host))) {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304132690
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -143,25 +144,18 @@ private[spark] class TaskSetManager(
   // of failures.
   // Duplicates are handled in dequeueTaskFromList, which ensures that a
   // task hasn't already started running before launching it.
-  private val pendingTasksForExecutor = new HashMap[String, ArrayBuffer[Int]]
 
-  // Set of pending tasks for each host. Similar to pendingTasksForExecutor,
-  // but at host level.
-  private val pendingTasksForHost = new HashMap[String, ArrayBuffer[Int]]
-
-  // Set of pending tasks for each rack -- similar to the above.
-  private val pendingTasksForRack = new HashMap[String, ArrayBuffer[Int]]
-
-  // Set containing pending tasks with no locality preferences.
-  private[scheduler] var pendingTasksWithNoPrefs = new ArrayBuffer[Int]
-
-  // Set containing all pending tasks (also used as a stack, as above).
-  private val allPendingTasks = new ArrayBuffer[Int]
+  private[scheduler] val pendingTasks = new PendingTasksByLocality()
 
   // Tasks that can be speculated. Since these will be a small fraction of 
total
-  // tasks, we'll just hold them in a HashSet.
+  // tasks, we'll just hold them in a HashSet. The HashSet here ensures that 
we do not add
+  // duplicate speculative tasks.
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304132653
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -143,25 +144,18 @@ private[spark] class TaskSetManager(
   // of failures.
   // Duplicates are handled in dequeueTaskFromList, which ensures that a
   // task hasn't already started running before launching it.
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pgandhi999 commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

pgandhi999 commented on a change in pull request #23677: 
[SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative 
tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r304132612
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##
 @@ -330,128 +308,65 @@ private[spark] class TaskSetManager(
 }
   }
 
-  /**
-   * Return a speculative task for a given executor if any are available. The 
task should not have
-   * an attempt running on this host, in case the host is slow. In addition, 
the task should meet
-   * the given locality constraint.
-   */
-  // Labeled as protected to allow tests to override providing speculative 
tasks if necessary
-  protected def dequeueSpeculativeTask(execId: String, host: String, locality: 
TaskLocality.Value)
-: Option[(Int, TaskLocality.Value)] =
-  {
-speculatableTasks.retain(index => !successful(index)) // Remove finished 
tasks from set
-
-def canRunOnHost(index: Int): Boolean = {
-  !hasAttemptOnHost(index, host) &&
-!isTaskBlacklistedOnExecOrNode(index, execId, host)
-}
-
-if (!speculatableTasks.isEmpty) {
-  // Check for process-local tasks; note that tasks can be process-local
-  // on multiple nodes when we replicate cached blocks, as in Spark 
Streaming
-  for (index <- speculatableTasks if canRunOnHost(index)) {
-val prefs = tasks(index).preferredLocations
-val executors = prefs.flatMap(_ match {
-  case e: ExecutorCacheTaskLocation => Some(e.executorId)
-  case _ => None
-})
-if (executors.contains(execId)) {
-  speculatableTasks -= index
-  return Some((index, TaskLocality.PROCESS_LOCAL))
-}
-  }
-
-  // Check for node-local tasks
-  if (TaskLocality.isAllowed(locality, TaskLocality.NODE_LOCAL)) {
-for (index <- speculatableTasks if canRunOnHost(index)) {
-  val locations = tasks(index).preferredLocations.map(_.host)
-  if (locations.contains(host)) {
-speculatableTasks -= index
-return Some((index, TaskLocality.NODE_LOCAL))
-  }
-}
-  }
-
-  // Check for no-preference tasks
-  if (TaskLocality.isAllowed(locality, TaskLocality.NO_PREF)) {
-for (index <- speculatableTasks if canRunOnHost(index)) {
-  val locations = tasks(index).preferredLocations
-  if (locations.size == 0) {
-speculatableTasks -= index
-return Some((index, TaskLocality.PROCESS_LOCAL))
-  }
-}
-  }
-
-  // Check for rack-local tasks
-  if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) {
-for (rack <- sched.getRackForHost(host)) {
-  for (index <- speculatableTasks if canRunOnHost(index)) {
-val racks = 
tasks(index).preferredLocations.map(_.host).flatMap(sched.getRackForHost)
-if (racks.contains(rack)) {
-  speculatableTasks -= index
-  return Some((index, TaskLocality.RACK_LOCAL))
-}
-  }
-}
-  }
-
-  // Check for non-local tasks
-  if (TaskLocality.isAllowed(locality, TaskLocality.ANY)) {
-for (index <- speculatableTasks if canRunOnHost(index)) {
-  speculatableTasks -= index
-  return Some((index, TaskLocality.ANY))
-}
-  }
-}
-
-None
-  }
-
   /**
* Dequeue a pending task for a given node and return its index and locality 
level.
* Only search for tasks matching the given locality constraint.
*
* @return An option containing (task index within the task set, locality, 
is speculative?)
*/
   private def dequeueTask(execId: String, host: String, maxLocality: 
TaskLocality.Value)
-: Option[(Int, TaskLocality.Value, Boolean)] =
-  {
-for (index <- dequeueTaskFromList(execId, host, 
getPendingTasksForExecutor(execId))) {
-  return Some((index, TaskLocality.PROCESS_LOCAL, false))
+  : Option[(Int, TaskLocality.Value, Boolean)] = {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] 
InsertInto with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-511995847
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12882/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25175: [SPARK-28411][PYTHON][SQL] 
InsertInto with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-511995842
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

SparkQA commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with 
overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-511996567
 
 
   **[Test build #107763 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107763/testReport)**
 for PR 25175 at commit 
[`566fc84`](https://github.com/apache/spark/commit/566fc84ab87470dfac1103854bc550b33d593bdd).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] brkyvz commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

brkyvz commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE 
and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-511996105
 
 
   Approach and interface LGTM! +9000 on "keep[ing] commits smaller and more 
focused." in the future. Would really help speed up the development cycle.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto 
with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-511995847
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12882/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto 
with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-511995842
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] brkyvz commented on a change in pull request #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

brkyvz commented on a change in pull request #24798: [SPARK-27724][SQL] 
Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#discussion_r304127699
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateTableExec.scala
 ##
 @@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.datasources.v2
 import scala.collection.JavaConverters._
 
 import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.catalog.v2.{Identifier, TableCatalog}
+import org.apache.spark.sql.catalog.v2.{Identifier, StagingTableCatalog, 
TableCatalog}
 
 Review comment:
   spurious change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] brkyvz commented on a change in pull request #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

brkyvz commented on a change in pull request #24798: [SPARK-27724][SQL] 
Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#discussion_r304127549
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/TestInMemoryTableCatalog.scala
 ##
 @@ -196,7 +196,103 @@ private class InMemoryTable(
   }
 }
 
-private class BufferedRows extends WriterCommitMessage with InputPartition 
with Serializable {
+object TestInMemoryTableCatalog {
+  val SIMULATE_FAILED_WRITE_OPTION = "spark.sql.test.simulateFailedWrite"
+  val SIMULATE_FAILED_CREATE_PROPERTY = "spark.sql.test.simulateFailedCreate"
+
+  def maybeSimulateFailedTableCreation(tableProperties: util.Map[String, 
String]): Unit = {
+if 
(tableProperties.containsKey(TestInMemoryTableCatalog.SIMULATE_FAILED_CREATE_PROPERTY)
+  && 
tableProperties.get(TestInMemoryTableCatalog.SIMULATE_FAILED_CREATE_PROPERTY)
+  .equalsIgnoreCase("true")) {
+  throw new IllegalStateException("Manual create table failure.")
+}
+  }
+
+  def maybeSimulateFailedTableWrite(tableOptions: CaseInsensitiveStringMap): 
Unit = {
+if 
(tableOptions.containsKey(TestInMemoryTableCatalog.SIMULATE_FAILED_WRITE_OPTION)
+  && 
tableOptions.get(TestInMemoryTableCatalog.SIMULATE_FAILED_WRITE_OPTION)
+  .equalsIgnoreCase("true")) {
 
 Review comment:
   nit: can these long clauses be evaluated above before an if statement?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mccheah commented on a change in pull request #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

mccheah commented on a change in pull request #24798: [SPARK-27724][SQL] 
Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#discussion_r304127138
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
 ##
 @@ -78,7 +80,8 @@ case class CreateTableAsSelectExec(
 }
 
 Utils.tryWithSafeFinallyAndFailureCallbacks({
-  catalog.createTable(ident, query.schema, partitioning.toArray, 
properties.asJava) match {
+  catalog.createTable(
+ident, query.schema, partitioning.toArray, properties.asJava) match {
 
 Review comment:
   I'm not keen to adjust any assumptions from the previous implementation in 
this PR - that can be a follow-up question perhaps for the dev list or in 
another JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mccheah commented on a change in pull request #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

mccheah commented on a change in pull request #24798: [SPARK-27724][SQL] 
Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#discussion_r304127138
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
 ##
 @@ -78,7 +80,8 @@ case class CreateTableAsSelectExec(
 }
 
 Utils.tryWithSafeFinallyAndFailureCallbacks({
-  catalog.createTable(ident, query.schema, partitioning.toArray, 
properties.asJava) match {
+  catalog.createTable(
+ident, query.schema, partitioning.toArray, properties.asJava) match {
 
 Review comment:
   I'm not quite willing to adjust any assumptions from the previous 
implementation in this PR - that can be a follow-up question perhaps for the 
dev list or in another JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao opened a new pull request #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-16 Thread GitBox

huaxingao opened a new pull request #25175: [SPARK-28411][PYTHON][SQL] 
InsertInto with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175
 
 
   ## What changes were proposed in this pull request?
   In the following python code
   ```
   df.write.mode("overwrite").insertInto("table") 
   ```
   ```insertInto``` ignores ```mode("overwrite")```  and appends by default.
   
   
   ## How was this patch tested?
   
   Add Unit test.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25174: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174#issuecomment-511993132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12881/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25174: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174#issuecomment-511993126
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] brkyvz commented on a change in pull request #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-16 Thread GitBox

brkyvz commented on a change in pull request #24798: [SPARK-27724][SQL] 
Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#discussion_r304126475
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
 ##
 @@ -78,7 +80,8 @@ case class CreateTableAsSelectExec(
 }
 
 Utils.tryWithSafeFinallyAndFailureCallbacks({
-  catalog.createTable(ident, query.schema, partitioning.toArray, 
properties.asJava) match {
+  catalog.createTable(
+ident, query.schema, partitioning.toArray, properties.asJava) match {
 
 Review comment:
   should the schema be called with `asNullable`? I feel the nullability of the 
input data cannot be trusted (or just because a column is not-nullable for a 
query doesn't mean it can't be null in the future)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

2019-07-16 Thread GitBox

SparkQA commented on issue #23677: [SPARK-26755][SCHEDULER] : Optimize Spark 
Scheduler to dequeue speculative tasks…
URL: https://github.com/apache/spark/pull/23677#issuecomment-511993929
 
 
   **[Test build #107762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107762/testReport)**
 for PR 23677 at commit 
[`466849e`](https://github.com/apache/spark/commit/466849e3ebf8e39f994423947f21bcf390c26894).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25174: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174#issuecomment-511993132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12881/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25174: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174#issuecomment-511993126
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #22560: [SPARK-25547][SQL] Pluggable JDBC connection factory

2019-07-16 Thread GitBox

SparkQA commented on issue #22560: [SPARK-25547][SQL] Pluggable JDBC connection 
factory
URL: https://github.com/apache/spark/pull/22560#issuecomment-511991052
 
 
   **[Test build #107761 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107761/testReport)**
 for PR 22560 at commit 
[`edd3245`](https://github.com/apache/spark/commit/edd3245ed5604dee2aad56c3ee0b8f99d76b48a0).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25174: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread GitBox

SparkQA commented on issue #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174#issuecomment-511991024
 
 
   **[Test build #107760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107760/testReport)**
 for PR 25174 at commit 
[`c3ae8c8`](https://github.com/apache/spark/commit/c3ae8c82e345d31320a1b754985d685259375861).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell opened a new pull request #25174: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread GitBox

hvanhovell opened a new pull request #25174: [SPARK-27485][BRANCH-2.4] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25174
 
 
   Backport of 421d9d56efd447d31787e77316ce0eafb5fe45a5
   
   ## What changes were proposed in this pull request?
   When reordering joins EnsureRequirements only checks if all the join keys 
are present in the partitioning expression seq. This is problematic when the 
joins keys and and partitioning expressions both contain duplicates but not the 
same number of duplicates for each expression, e.g. `Seq(a, a, b)` vs `Seq(a, 
b, b)`. This fails with an index lookup failure in the `reorder` function.
   
   This PR fixes this removing the equality checking logic from the 
`reorderJoinKeys` function, and by doing the multiset equality in the `reorder` 
function while building the reordered key sequences.
   
   ## How was this patch tested?
   Added a unit test to the `PlannerSuite` and added an integration test to 
`JoinSuite`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] joshrosen-stripe commented on a change in pull request #25164: [SPARK-28375] Prevent the PullupCorrelatedPredicates optimizer rule from removing predicates if run multiple times

2019-07-16 Thread GitBox

joshrosen-stripe commented on a change in pull request #25164: [SPARK-28375] 
Prevent the PullupCorrelatedPredicates optimizer rule from removing predicates 
if run multiple times
URL: https://github.com/apache/spark/pull/25164#discussion_r304121011
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ##
 @@ -275,13 +275,16 @@ object PullupCorrelatedPredicates extends 
Rule[LogicalPlan] with PredicateHelper
 plan transformExpressions {
   case ScalarSubquery(sub, children, exprId) if children.nonEmpty =>
 val (newPlan, newCond) = pullOutCorrelatedPredicates(sub, outerPlans)
-ScalarSubquery(newPlan, newCond, exprId)
+val conds = newCond ++ children.filter(_.isInstanceOf[Predicate])
 
 Review comment:
   @gatorsmile 
   
   > We want to ensure all these optimizer rules can be run multiple times. 
This is critical for adaptive query execution.
   
   Quick clarification question: does Spark 2.4.x perform double-optimization 
when adaptive execution is used? Or is that only a characteristic of the new / 
expanded adaptive execution features in Spark 3.x? I ask in order to help 
determine whether this patch should be a correctness backport for 2.4.x.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25134: [SPARK-28366][CORE] Logging in driver when loading single large unsplittable file via sc.textFile

2019-07-16 Thread GitBox

srowen commented on a change in pull request #25134: [SPARK-28366][CORE] 
Logging in driver when loading single large unsplittable file via sc.textFile
URL: https://github.com/apache/spark/pull/25134#discussion_r304120087
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
 ##
 @@ -1180,6 +1180,13 @@ package object config {
   .intConf
   .createWithDefault(1)
 
+  private[spark] val IO_FILE_UNSPLITTABLE_WARNING_THRESHOLD =
 
 Review comment:
   I just don't think this is worth another config


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the 
query plan between analyzer, optimizer and planner
URL: https://github.com/apache/spark/pull/25111#issuecomment-511987351
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the 
query plan between analyzer, optimizer and planner
URL: https://github.com/apache/spark/pull/25111#issuecomment-511987361
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107753/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query 
plan between analyzer, optimizer and planner
URL: https://github.com/apache/spark/pull/25111#issuecomment-511987361
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107753/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner

2019-07-16 Thread GitBox

AmplabJenkins commented on issue #25111: [SPARK-28346][SQL] clone the query 
plan between analyzer, optimizer and planner
URL: https://github.com/apache/spark/pull/25111#issuecomment-511987351
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner

2019-07-16 Thread GitBox

SparkQA removed a comment on issue #25111: [SPARK-28346][SQL] clone the query 
plan between analyzer, optimizer and planner
URL: https://github.com/apache/spark/pull/25111#issuecomment-511914738
 
 
   **[Test build #107753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107753/testReport)**
 for PR 25111 at commit 
[`6f9b59f`](https://github.com/apache/spark/commit/6f9b59f5d360a041f6b825ddeb010dc03abee64c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner

2019-07-16 Thread GitBox

SparkQA commented on issue #25111: [SPARK-28346][SQL] clone the query plan 
between analyzer, optimizer and planner
URL: https://github.com/apache/spark/pull/25111#issuecomment-511986682
 
 
   **[Test build #107753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107753/testReport)**
 for PR 25111 at commit 
[`6f9b59f`](https://github.com/apache/spark/commit/6f9b59f5d360a041f6b825ddeb010dc03abee64c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22560: [SPARK-25547][SQL] Pluggable JDBC connection factory

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #22560: [SPARK-25547][SQL] Pluggable 
JDBC connection factory
URL: https://github.com/apache/spark/pull/22560#issuecomment-511982199
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107759/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22560: [SPARK-25547][SQL] Pluggable JDBC connection factory

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #22560: [SPARK-25547][SQL] Pluggable 
JDBC connection factory
URL: https://github.com/apache/spark/pull/22560#issuecomment-511982186
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22560: [SPARK-25547][SQL] Pluggable JDBC connection factory

2019-07-16 Thread GitBox

AmplabJenkins removed a comment on issue #22560: [SPARK-25547][SQL] Pluggable 
JDBC connection factory
URL: https://github.com/apache/spark/pull/22560#issuecomment-510680246
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 >

201 - 300 of 882 matches

Mail list logo