date:20190529

[GitHub] [spark] AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for 
timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#issuecomment-497029953
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105918/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for 
timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#issuecomment-497029951
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve 
logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#issuecomment-497029953
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105918/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

SparkQA removed a comment on issue #24704: [SPARK-20286][core] Improve logic 
for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#issuecomment-496979565
 
 
   **[Test build #105918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105918/testReport)**
 for PR 24704 at commit 
[`7927239`](https://github.com/apache/spark/commit/7927239f81297fc9d90aa09feabebdc130d92f75).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve 
logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#issuecomment-497029951
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #24733: [SPARK-27869][CORE] Redact sensitive information in System Properties from UI

2019-05-29 Thread GitBox

dongjoon-hyun closed pull request #24733: [SPARK-27869][CORE] Redact sensitive 
information in System Properties from UI
URL: https://github.com/apache/spark/pull/24733
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497036774
 
 
   **[Test build #105915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105915/testReport)**
 for PR 24677 at commit 
[`d9936d5`](https://github.com/apache/spark/commit/d9936d5084b71f4a8736d035fbe1fa3afade6966).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-496964895
 
 
   **[Test build #105915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105915/testReport)**
 for PR 24677 at commit 
[`d9936d5`](https://github.com/apache/spark/commit/d9936d5084b71f4a8736d035fbe1fa3afade6966).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497037495
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105915/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497037484
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] 
Propagate SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497037495
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105915/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] 
Propagate SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497037484
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] skonto commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

skonto commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288691773
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -206,6 +206,13 @@ of the most common options to set are:
 name and an array of addresses.
   
 
+
+ spark.driver.resource.{resourceName}.vendor
 
 Review comment:
   Template is harder to use in order to specify the gpu limits as in the 
example [here](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/)? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288694830
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ##
 @@ -129,6 +132,7 @@ private[spark] class BasicDriverFeatureStep(conf: 
KubernetesDriverConf)
 .addToLimits(maybeCpuLimitQuantity.toMap.asJava)
 .addToRequests("memory", driverMemoryQuantity)
 .addToLimits("memory", driverMemoryQuantity)
+.addToLimits(driverResourceQuantities.toMap.asJava)
 
 Review comment:
   Why only apply them to the limit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288693711
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   The Kubernetes resource name uses the format `.com/gpu`. Are 
`vendor` expected to contain the full domain name like `nvidia.com` or just the 
vendor name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288693711
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   The Kubernetes gpu resource name uses the format `.com/gpu`. Are 
`vendor` expected to contain the full domain name like `nvidia.com` or just the 
vendor name to follow the Kubernetes pattern?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288693711
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   The Kubernetes gpu resource name uses the format `.com/gpu`. Are 
`vendor` expected to contain the full domain name like `nvidia.com` or just the 
vendor name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497044695
 
 
   **[Test build #105916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105916/testReport)**
 for PR 24677 at commit 
[`ccfeb9e`](https://github.com/apache/spark/commit/ccfeb9e408b3fd804c0f68308da7ca0adf3094b5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-496972265
 
 
   **[Test build #105916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105916/testReport)**
 for PR 24677 at commit 
[`ccfeb9e`](https://github.com/apache/spark/commit/ccfeb9e408b3fd804c0f68308da7ca0adf3094b5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497045554
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105916/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate 
SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497045538
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mwlon commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-29 Thread GitBox

mwlon commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24713#issuecomment-497046845
 
 
   Ok, if we don't want the other associated changes of SPARK-26192 and only 
the functional part, then close this PR and I'll make a new one later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] 
Propagate SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497045554
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105916/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] 
Propagate SparkExceptions during toPandas with arrow enabled
URL: https://github.com/apache/spark/pull/24677#issuecomment-497045538
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] IvanVergiliev commented on a change in pull request #24068: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion

2019-05-29 Thread GitBox

IvanVergiliev commented on a change in pull request #24068: [SPARK-27105][SQL] 
Optimize away exponential complexity in ORC predicate conversion
URL: https://github.com/apache/spark/pull/24068#discussion_r288700718
 
 

 ##
 File path: 
sql/core/v1.2.1/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
 ##
 @@ -63,55 +64,26 @@ private[sql] object OrcFilters extends OrcFiltersBase {
*/
   def createFilter(schema: StructType, filters: Seq[Filter]): 
Option[SearchArgument] = {
 val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap
+val orcFilterConverter = new OrcFilterConverter(dataTypeMap)
 for {
   // Combines all convertible filters using `And` to produce a single 
conjunction
-  conjunction <- buildTree(convertibleFilters(schema, dataTypeMap, 
filters))
+  conjunction <- 
buildTree(filters.flatMap(orcFilterConverter.trimUnconvertibleFilters))
   // Then tries to build a single ORC `SearchArgument` for the conjunction 
predicate
-  builder <- buildSearchArgument(dataTypeMap, conjunction, newBuilder)
+  builder <- orcFilterConverter.buildSearchArgument(conjunction, 
newBuilder)
 
 Review comment:
   I'm not sure which loop you're referring to here. The `for-yield` is used 
for dealing with the `Option`s that both of these methods can return, and they 
can still return Options so we still need the for loop.
   
   We can create a separate function which takes in a trimmed filter and just 
handles the building, but I'm not sure how useful that would be. I kind of like 
the isolation and slight guarantees that `build` will only be called on a 
trimmed filter that having one common `buildSearchArgument` method gives us.
   
   Let me know if you had something else in mind.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-29 Thread GitBox

dongjoon-hyun commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24713#issuecomment-497048471
 
 
   Yep. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs

2019-05-29 Thread GitBox

dongjoon-hyun closed pull request #24713: [SPARK-26192][MESOS][2.4] Retrieve 
enableFetcherCache option from submission for driver URIs
URL: https://github.com/apache/spark/pull/24713
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24728: [SPARK-27737][SQL][FOLLOW-UP] Move sql/hive-thriftserver/v2.3.4 to sql/hive-thriftserver/v2.3.5

2019-05-29 Thread GitBox

dongjoon-hyun commented on issue #24728: [SPARK-27737][SQL][FOLLOW-UP] Move 
sql/hive-thriftserver/v2.3.4 to sql/hive-thriftserver/v2.3.5
URL: https://github.com/apache/spark/pull/24728#issuecomment-497048212
 
 
   Hi, @juliuszsompolski .
   This PR is okay because it's clearly saying `move`, not `upgrading` and we 
are in the middle of huge transition in #24628 . I merged this intentionally.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288702763
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -206,6 +206,13 @@ of the most common options to set are:
 name and an array of addresses.
   
 
+
+ spark.driver.resource.{resourceName}.vendor
 
 Review comment:
   I'm not sure what you mean by template is harder to use?  here you specify 
the spark conf and we translate it into the k8s one


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288703148
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   meant to be vendor and domain like the comment says,  I can update the name 
of the variable if it makes it more clear?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288704228
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   What's the reason of requiring the full vendor domain name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288704248
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ##
 @@ -129,6 +132,7 @@ private[spark] class BasicDriverFeatureStep(conf: 
KubernetesDriverConf)
 .addToLimits(maybeCpuLimitQuantity.toMap.asJava)
 .addToRequests("memory", driverMemoryQuantity)
 .addToLimits("memory", driverMemoryQuantity)
+.addToLimits(driverResourceQuantities.toMap.asJava)
 
 Review comment:
   I'm going by https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
   -> GPUs are only supposed to be specified in the limits 
   
   also assuming that all resource plugins use limits going by: 
https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/
   where it just has:
containers:
   - name: demo-container-1
 image: k8s.gcr.io/pause:2.0
 resources:
   limits:
 vendor-domain/resource: 2 # requesting 2 vendor-domain/resource
   
   If there is other documentation that says otherwise  please let me know


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288704815
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   that is what k8s takes from reading the docs, if I am wrong please point me 
to correct docs


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288704913
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   
https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288705075
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   if we don't allow that then I don't know how we can be generic for any type


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288706042
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/. It seems from 
the doc that only the vendor name needs to be configurable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

2019-05-29 Thread GitBox

rdblue commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing 
into Catalyst
URL: https://github.com/apache/spark/pull/24723#issuecomment-497053324
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE 
parsing into Catalyst
URL: https://github.com/apache/spark/pull/24723#issuecomment-497054247
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11175/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to 
do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/24706#issuecomment-497054283
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11176/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288707710
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ##
 @@ -129,6 +132,7 @@ private[spark] class BasicDriverFeatureStep(conf: 
KubernetesDriverConf)
 .addToLimits(maybeCpuLimitQuantity.toMap.asJava)
 .addToRequests("memory", driverMemoryQuantity)
 .addToLimits("memory", driverMemoryQuantity)
+.addToLimits(driverResourceQuantities.toMap.asJava)
 
 Review comment:
   You are right. The request will be set by default to the limit value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE 
parsing into Catalyst
URL: https://github.com/apache/spark/pull/24723#issuecomment-497054228
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to 
do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/24706#issuecomment-497054275
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-29 Thread GitBox

vanzin commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number 
of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-497054560
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

2019-05-29 Thread GitBox

SparkQA commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing 
into Catalyst
URL: https://github.com/apache/spark/pull/24723#issuecomment-497054977
 
 
   **[Test build #105923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105923/testReport)**
 for PR 24723 at commit 
[`36a2bcd`](https://github.com/apache/spark/commit/36a2bcdbce6af6f95481580bf25fbf2231a523b7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-29 Thread GitBox

SparkQA commented on issue #24645: [SPARK-27773][Shuffle] add metrics for 
number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-497055047
 
 
   **[Test build #105925 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105925/testReport)**
 for PR 24645 at commit 
[`6d733aa`](https://github.com/apache/spark/commit/6d733aa1a84dcb15b96fd004de44d12fbcbe4891).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-05-29 Thread GitBox

SparkQA commented on issue #24706: [SPARK-23128][SQL] A new approach to do 
adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/24706#issuecomment-497055052
 
 
   **[Test build #105924 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105924/testReport)**
 for PR 24706 at commit 
[`a9b4209`](https://github.com/apache/spark/commit/a9b420982f1db4988a76966b81fb5f43800a0523).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

2019-05-29 Thread GitBox

rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] 
LambdaVariable should use per-query unique IDs instead of globally unique IDs
URL: https://github.com/apache/spark/pull/24735#discussion_r288701933
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
 ##
 @@ -228,3 +228,31 @@ object ObjectSerializerPruning extends Rule[LogicalPlan] {
   }
   }
 }
+
+/**
+ * Reassigns per-query unique IDs to `LambdaVariable`s, whose original IDs are 
globally unique. This
+ * can help Spark to hit codegen cache more often and improve performance.
+ */
+object ReassignLambdaVariableID extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+if (!SQLConf.get.getConf(SQLConf.OPTIMIZER_REASSIGN_LAMBDA_VARIABLE_ID)) 
return plan
+
+// The original LambdaVariable IDs are all positive. To avoid conflicts, 
the new IDs are all
+// negative and starts with -1.
+var newId = -1L
+val oldIdToNewId = scala.collection.mutable.Map.empty[Long, Long]
+
+plan.transformAllExpressions {
 
 Review comment:
   The two traversals on the plan here sure makes the intent clean: one pass 
for collecting old-to-new ID mappings, and then another pass to actually do the 
transformation.
   
   But efficiency-wise, these two traversals can be combined into one easily, 
right? Reducing the number of traversals can help save a lot of time when 
dealing with large plans.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

2019-05-29 Thread GitBox

rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] 
LambdaVariable should use per-query unique IDs instead of globally unique IDs
URL: https://github.com/apache/spark/pull/24735#discussion_r288700780
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala
 ##
 @@ -228,3 +228,31 @@ object ObjectSerializerPruning extends Rule[LogicalPlan] {
   }
   }
 }
+
+/**
+ * Reassigns per-query unique IDs to `LambdaVariable`s, whose original IDs are 
globally unique. This
+ * can help Spark to hit codegen cache more often and improve performance.
+ */
+object ReassignLambdaVariableID extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = {
+if (!SQLConf.get.getConf(SQLConf.OPTIMIZER_REASSIGN_LAMBDA_VARIABLE_ID)) 
return plan
+
+// The original LambdaVariable IDs are all positive. To avoid conflicts, 
the new IDs are all
+// negative and starts with -1.
 
 Review comment:
   Nit: "starts from"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

2019-05-29 Thread GitBox

rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] 
LambdaVariable should use per-query unique IDs instead of globally unique IDs
URL: https://github.com/apache/spark/pull/24735#discussion_r288705981
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ##
 @@ -595,12 +623,15 @@ case class LambdaVariable(
   }
 
   override def genCode(ctx: CodegenContext): ExprCode = {
-val isNullValue = if (nullable) {
-  JavaCode.isNullVariable(isNull)
+// The id can be negative if it's reassigned by `ReassignLambdaVariableID`.
+val suffix = "lambda_variable_" + math.abs(id)
 
 Review comment:
   There an implicitly assumption here that within a plan, all the 
`LambdaVariable`s either hold IDs that were the original IDs (positive) or the 
reassigned ones (negative). We should probably add a comment on that, because 
if the positive/negative ones are mixed together, you can actually get a 
conflict when you do `abs`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

2019-05-29 Thread GitBox

rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] 
LambdaVariable should use per-query unique IDs instead of globally unique IDs
URL: https://github.com/apache/spark/pull/24735#discussion_r288706970
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ##
 @@ -575,15 +575,43 @@ case class WrapOption(child: Expression, optType: 
DataType)
   }
 }
 
+object LambdaVariable {
+  private val curId = new java.util.concurrent.atomic.AtomicLong()
+
+  // Returns the codegen-ed `LambdaVariable` and add it to mutable states, so 
that it can be
 
 Review comment:
   I don't like this part. It might make codegen a bit easier to write but 
you're making unnecessary hoisting of local variables to Java object fields. 
Doesn't sound like a good idea to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function

2019-05-29 Thread GitBox

rdblue commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup 
function
URL: https://github.com/apache/spark/pull/24689#issuecomment-497055324
 
 
   Looks good to me. One minor comment: it doesn't look like Analyzer 
implementations created in BaseSessionStateBuilder actually override the lookup 
method to call the session's catalog lookup. I would probably include that in 
this PR, but we can also add it in later PRs when it is used.
   
   +1, I think this is ready to merge either way.
   
   @cloud-fan, @dongjoon-hyun, could you take a look at this DSv2 PR? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs

2019-05-29 Thread GitBox

rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] 
LambdaVariable should use per-query unique IDs instead of globally unique IDs
URL: https://github.com/apache/spark/pull/24735#discussion_r288692135
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetOptimizationSuite.scala
 ##
 @@ -166,4 +164,42 @@ class DatasetOptimizationSuite extends QueryTest with 
SharedSQLContext {
   checkAnswer(df, Seq(Row("1"), Row("2"), Row("3")))
 }
   }
+
+  test("SPARK-27871: Dataset encoder should benefit from codegen cache") {
+def checkCodegenCache(createDataset: () => Dataset[_]): Unit = {
+  val count1 = CodegenMetrics.METRIC_COMPILATION_TIME.getCount()
 
 Review comment:
   My earlier PR that added the whole-stage codegen ID used another metric for 
the same purpose:
   
https://github.com/apache/spark/commit/e57f394818b0a62f99609e1032fede7e981f306f#diff-0314224342bb8c30143ab784b3805d19R296
   
   Should we try to make them use the exact same logic for checking whether or 
not codegen cache was hit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new 
approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/24706#issuecomment-497054275
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add 
metrics for number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-493853645
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new 
approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/24706#issuecomment-497054283
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11176/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER 
TABLE parsing into Catalyst
URL: https://github.com/apache/spark/pull/24723#issuecomment-497054247
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11175/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER 
TABLE parsing into Catalyst
URL: https://github.com/apache/spark/pull/24723#issuecomment-497054228
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue edited a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function

2019-05-29 Thread GitBox

rdblue edited a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require 
lookup function
URL: https://github.com/apache/spark/pull/24689#issuecomment-497055324
 
 
   Looks good to me. One minor comment: it doesn't look like Analyzer 
implementations created in BaseSessionStateBuilder actually override the lookup 
method to call the session's catalog lookup. I would probably include that in 
this PR, but we can also add it in later PRs when it is used.
   
   +1, I think this is ready to merge either way.
   
   @cloud-fan, @dongjoon-hyun, @HyukjinKwon, could you take a look at this DSv2 
PR? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288709084
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   sorry maybe I'm missing it but that doc says:
   
limits:
 nvidia.com/gpu: 1 # requesting 1 GPU
   
   nvidia is vendor, .com is domain, resource is gpus.
   
   This PR is trying to handle any generic resource type not just GPUs so the 
vendor has to have both vendor and domain.  Otherwise we would have to put 
something in that say for gpu's use .com for domain.
   
   Again I'm just going by 
https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/,
 maybe domain is always .com?  
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add 
new optimizer rule to evaluate the deterministic scala udf only once if all 
inputs are literals
URL: https://github.com/apache/spark/pull/24593#discussion_r288709384
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -150,6 +150,16 @@ object SQLConf {
 }
   }
 
+  val DETERMINISTIC_UDF_FOLD_ENABLED = 
buildConf("spark.deterministic.udf.folding.enabled")
 
 Review comment:
   I didn't hear back, so I used the name I suggested here in the latest push. 
Let me know if you have any other comments.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation

2019-05-29 Thread GitBox

rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] 
DataSourceV2: Add DropTable logical operation
URL: https://github.com/apache/spark/pull/24686#discussion_r288709315
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##
 @@ -2195,4 +2195,22 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 }
   }
 
+  /**
+   * Create a [[sql.DropTableStatement]] command.
+   */
+  override def visitDropTable(ctx: DropTableContext): LogicalPlan = 
withOrigin(ctx) {
+sql.DropTableStatement(
 
 Review comment:
   Nit: I don't think `sql.` is required here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288706042
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/. It seems from 
the doc that only the vendor name needs to be configurable. But the doc you 
referred to suggests that the resource name can be arbitrary domain name. So I 
guess it's safer to use the full domain name. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] skambha commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

skambha commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to 
evaluate the deterministic scala udf only once if all inputs are literals
URL: https://github.com/apache/spark/pull/24593#issuecomment-497056361
 
 
   @dongjoon-hyun ,  I have pushed the changes to address the renames you 
suggested.  Please take a look. Thanks.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] skambha edited a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

skambha edited a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer 
rule to evaluate the deterministic scala udf only once if all inputs are 
literals
URL: https://github.com/apache/spark/pull/24593#issuecomment-497056361
 
 
   @dongjoon-hyun ,  I have pushed the changes to address the renames 
https://github.com/apache/spark/pull/24593/commits/ee5fa4ed49ad26a4f0a84034c14621b266f2a3f1
 you suggested.  Please take a look. Thanks.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s

2019-05-29 Thread GitBox

liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S]  
Resource Scheduling support for k8s
URL: https://github.com/apache/spark/pull/24703#discussion_r288710024
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -199,4 +199,13 @@ private[spark] object KubernetesConf {
   .replaceAll("[^a-z0-9\\-]", "")
   .replaceAll("-+", "-")
   }
+
+  /**
+   * Build a resources name based on the vendor device plugin naming
+   * convention of: vendor-domain/resource. For example, an Nvidia GPU is
+   * advertised as nvidia.com/gpu.
+   */
+  def buildKubernetesResourceName(vendor: String, resourceName: String): 
String = {
+s"${vendor}/${resourceName}"
 
 Review comment:
   Can you update the documentation of the two new options to make it clear 
that they expect vendor domain names?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation

2019-05-29 Thread GitBox

rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] 
DataSourceV2: Add DropTable logical operation
URL: https://github.com/apache/spark/pull/24686#discussion_r288710143
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -83,6 +84,15 @@ case class DataSourceResolution(
 s"No catalog specified for table ${identifier.quoted} and no 
default catalog is set"))
   .asTableCatalog
   convertCTAS(catalog, identifier, create)
+
+case DropTableStatement(CatalogObjectIdentifier(Some(catalog), ident), 
ifExists, _) =>
+  DropTable(catalog.asTableCatalog, ident, ifExists)
+
+case DropTableStatement(AsTableIdentifier(tableName), ifExists, purge) =>
+  DropTableCommand(tableName, ifExists, false, purge)
 
 Review comment:
   Boolean arguments should use the named form: `isView = false`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for 
number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-497057576
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11177/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer 
rule to evaluate the deterministic scala udf only once if all inputs are 
literals
URL: https://github.com/apache/spark/pull/24593#issuecomment-497057602
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer 
rule to evaluate the deterministic scala udf only once if all inputs are 
literals
URL: https://github.com/apache/spark/pull/24593#issuecomment-497057608
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11178/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for 
number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-497057565
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

vanzin commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN 
support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497058065
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #24634: [SPARK-27361][YARN] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

vanzin commented on a change in pull request #24634: [SPARK-27361][YARN] YARN 
support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#discussion_r288710015
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ##
 @@ -39,6 +40,25 @@ object YarnSparkHadoopUtil {
   val MEMORY_OVERHEAD_MIN = 384L
 
   val ANY_HOST = "*"
+  val YARN_GPU_RESOURCE_CONFIG = "yarn.io/gpu"
+  val YARN_FPGA_RESOURCE_CONFIG = "yarn.io/fpga"
+
+  /**
+   * Convert Spark resources into YARN resources.
+   * The only resources we know how to map from spark configs to yarn configs 
are
+   * gpus and fpgas, everything else the user has to specify them in both the
+   * spark.yarn.*.resource and the spark.*.resource configs.
+   */
+  private[yarn] def getYarnResourcesFromSparkResources(
+  confPrefix: String,
+  sparkConf: SparkConf
+  ): Map[String, String] = {
+Map("gpu" -> YARN_GPU_RESOURCE_CONFIG, "fpga" -> 
YARN_FPGA_RESOURCE_CONFIG).map {
+  case (rName, yarnName) =>
+val resourceCountSparkConf = 
s"${confPrefix}${rName}${SPARK_RESOURCE_COUNT_SUFFIX}"
+(yarnName -> 
sparkConf.getOption(resourceCountSparkConf).getOrElse("0"))
 
 Review comment:
   `get(resourceCountSparkConf, "0")`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new 
optimizer rule to evaluate the deterministic scala udf only once if all inputs 
are literals
URL: https://github.com/apache/spark/pull/24593#issuecomment-497057602
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new 
optimizer rule to evaluate the deterministic scala udf only once if all inputs 
are literals
URL: https://github.com/apache/spark/pull/24593#issuecomment-497057608
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11178/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add 
metrics for number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-497057565
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add 
metrics for number of exceptions caught in ExternalShuffleBlockHandler
URL: https://github.com/apache/spark/pull/24645#issuecomment-497057576
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11177/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation

2019-05-29 Thread GitBox

rdblue commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add 
DropTable logical operation
URL: https://github.com/apache/spark/pull/24686#issuecomment-497059525
 
 
   +1 overall, just one minor style problem with boolean args to fix.
   
   I think this is ready to go. You might also mention a couple of things in 
the PR description:
   * This moves parsing of `DROP TABLE` into Catalyst and adds parsed plans, 
like #24029 did for create
   * Like #24029, parsing tests for `DROP TABLE` have been moved to 
`PlanResolutionSuite` to validate existing behavior, and new tests for the 
catalyst parser have been added to Catalyst's `DDLParserSuite`
   
   @cloud-fan, @dongjoon-hyun, could you review this DSv2 PR? I think it is 
ready to merge.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

squito commented on a change in pull request #24704: [SPARK-20286][core] 
Improve logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#discussion_r288698370
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##
 @@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.dynalloc
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.storage.RDDBlockId
+import org.apache.spark.util.Clock
+
+/**
+ * A monitor for executor activity, used by ExecutorAllocationManager to 
detect idle executors.
+ */
+private[spark] class ExecutorMonitor(
+conf: SparkConf,
+client: ExecutorAllocationClient,
+clock: Clock) extends SparkListener with Logging {
+
+  private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
+  private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT))
+  private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED)
+
+  private val executors = new ConcurrentHashMap[String, Tracker]()
+
+  // The following fields are an optimization to avoid having to scan all 
executors on every EAM
+  // schedule interval to find out which ones are timed out. They keep track 
of when the next
+  // executor timeout is expected to happen, and the current list of timed out 
executors. There's
+  // also a flag that forces the EAM task to recompute the timed out 
executors, in case some event
+  // arrives on the listener bus that may cause the current list of timed out 
executors to change.
+  //
+  // There's also per-executor state used for this purpose, so that 
recomputations can be triggered
+  // only when really necessary.
+  //
+  // Note that this isn't meant to, and cannot, always make the right decision 
about which executors
+  // are indeed timed out. For example, the EAM thread may detect a timed out 
executor while a new
+  // "task start" event has just been posted to the listener bus and hasn't 
yet been delivered to
+  // this listener. There are safeguards in other parts of the code that would 
prevent that executor
+  // from being removed.
+  private var nextTimeout = new AtomicLong(Long.MaxValue)
+  private var timedOutExecs = Seq.empty[String]
+
+  if (idleTimeoutMs < 0) {
+throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} 
must be >= 0!")
+  }
+  if (storageTimeoutMs < 0) {
+throw new 
SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 
0!")
+  }
+
+  def reset(): Unit = {
+executors.clear()
+nextTimeout.set(Long.MaxValue)
+timedOutExecs = Nil
+  }
+
+  def timedOutExecutors(): Seq[String] = {
+val now = clock.getTimeMillis()
+if (now >= nextTimeout.get()) {
+  // Temporarily set the next timeout at Long.MaxValue. This ensures that 
after
+  // scanning all executors below, we know when the next timeout for 
non-timed out
+  // executors is (whether that update came from the scan, or from a new 
event
+  // arriving in a different thread).
+  nextTimeout.set(Long.MaxValue)
+
+  var newNextTimeout = Long.MaxValue
+  timedOutExecs = executors.asScala
+.filter { case (_, exec) => !exec.pendingRemoval }
+.filter { case (_, exec) =>
+  val deadline = exec.timeoutAt
+  if (deadline > now) {
+newNextTimeout = math.min(newNextTimeout, deadline)
+exec.timedOut = false
+false
+  } else {
+exec.timedOut = true
+
+// An event arriving while this scan is happening may cause the 
deadline for
+// the executor to change after it was read above. Check the 
deadline again,
+// and if it chang

[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

squito commented on a change in pull request #24704: [SPARK-20286][core] 
Improve logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#discussion_r288677519
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##
 @@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.dynalloc
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.storage.RDDBlockId
+import org.apache.spark.util.Clock
+
+/**
+ * A monitor for executor activity, used by ExecutorAllocationManager to 
detect idle executors.
+ */
+private[spark] class ExecutorMonitor(
+conf: SparkConf,
+client: ExecutorAllocationClient,
+clock: Clock) extends SparkListener with Logging {
+
+  private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
+  private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT))
+  private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED)
+
+  private val executors = new ConcurrentHashMap[String, Tracker]()
+
+  // The following fields are an optimization to avoid having to scan all 
executors on every EAM
+  // schedule interval to find out which ones are timed out. They keep track 
of when the next
+  // executor timeout is expected to happen, and the current list of timed out 
executors. There's
+  // also a flag that forces the EAM task to recompute the timed out 
executors, in case some event
+  // arrives on the listener bus that may cause the current list of timed out 
executors to change.
+  //
+  // There's also per-executor state used for this purpose, so that 
recomputations can be triggered
+  // only when really necessary.
+  //
+  // Note that this isn't meant to, and cannot, always make the right decision 
about which executors
+  // are indeed timed out. For example, the EAM thread may detect a timed out 
executor while a new
+  // "task start" event has just been posted to the listener bus and hasn't 
yet been delivered to
+  // this listener. There are safeguards in other parts of the code that would 
prevent that executor
+  // from being removed.
+  private var nextTimeout = new AtomicLong(Long.MaxValue)
+  private var timedOutExecs = Seq.empty[String]
+
+  if (idleTimeoutMs < 0) {
+throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} 
must be >= 0!")
+  }
+  if (storageTimeoutMs < 0) {
+throw new 
SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 
0!")
+  }
+
+  def reset(): Unit = {
+executors.clear()
+nextTimeout.set(Long.MaxValue)
+timedOutExecs = Nil
+  }
+
+  def timedOutExecutors(): Seq[String] = {
+val now = clock.getTimeMillis()
+if (now >= nextTimeout.get()) {
+  // Temporarily set the next timeout at Long.MaxValue. This ensures that 
after
+  // scanning all executors below, we know when the next timeout for 
non-timed out
+  // executors is (whether that update came from the scan, or from a new 
event
+  // arriving in a different thread).
+  nextTimeout.set(Long.MaxValue)
+
+  var newNextTimeout = Long.MaxValue
+  timedOutExecs = executors.asScala
+.filter { case (_, exec) => !exec.pendingRemoval }
+.filter { case (_, exec) =>
+  val deadline = exec.timeoutAt
+  if (deadline > now) {
+newNextTimeout = math.min(newNextTimeout, deadline)
+exec.timedOut = false
+false
+  } else {
+exec.timedOut = true
+
+// An event arriving while this scan is happening may cause the 
deadline for
+// the executor to change after it was read above. Check the 
deadline again,
+// and if it chang

[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

squito commented on a change in pull request #24704: [SPARK-20286][core] 
Improve logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#discussion_r288699085
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##
 @@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.dynalloc
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.storage.RDDBlockId
+import org.apache.spark.util.Clock
+
+/**
+ * A monitor for executor activity, used by ExecutorAllocationManager to 
detect idle executors.
+ */
+private[spark] class ExecutorMonitor(
+conf: SparkConf,
+client: ExecutorAllocationClient,
+clock: Clock) extends SparkListener with Logging {
+
+  private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
+  private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT))
+  private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED)
+
+  private val executors = new ConcurrentHashMap[String, Tracker]()
+
+  // The following fields are an optimization to avoid having to scan all 
executors on every EAM
+  // schedule interval to find out which ones are timed out. They keep track 
of when the next
+  // executor timeout is expected to happen, and the current list of timed out 
executors. There's
+  // also a flag that forces the EAM task to recompute the timed out 
executors, in case some event
+  // arrives on the listener bus that may cause the current list of timed out 
executors to change.
+  //
+  // There's also per-executor state used for this purpose, so that 
recomputations can be triggered
+  // only when really necessary.
+  //
+  // Note that this isn't meant to, and cannot, always make the right decision 
about which executors
+  // are indeed timed out. For example, the EAM thread may detect a timed out 
executor while a new
+  // "task start" event has just been posted to the listener bus and hasn't 
yet been delivered to
+  // this listener. There are safeguards in other parts of the code that would 
prevent that executor
+  // from being removed.
+  private var nextTimeout = new AtomicLong(Long.MaxValue)
+  private var timedOutExecs = Seq.empty[String]
+
+  if (idleTimeoutMs < 0) {
+throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} 
must be >= 0!")
+  }
+  if (storageTimeoutMs < 0) {
+throw new 
SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 
0!")
+  }
+
+  def reset(): Unit = {
+executors.clear()
+nextTimeout.set(Long.MaxValue)
+timedOutExecs = Nil
+  }
+
+  def timedOutExecutors(): Seq[String] = {
+val now = clock.getTimeMillis()
+if (now >= nextTimeout.get()) {
+  // Temporarily set the next timeout at Long.MaxValue. This ensures that 
after
+  // scanning all executors below, we know when the next timeout for 
non-timed out
+  // executors is (whether that update came from the scan, or from a new 
event
+  // arriving in a different thread).
+  nextTimeout.set(Long.MaxValue)
+
+  var newNextTimeout = Long.MaxValue
+  timedOutExecs = executors.asScala
+.filter { case (_, exec) => !exec.pendingRemoval }
+.filter { case (_, exec) =>
+  val deadline = exec.timeoutAt
+  if (deadline > now) {
+newNextTimeout = math.min(newNextTimeout, deadline)
+exec.timedOut = false
+false
+  } else {
+exec.timedOut = true
+
+// An event arriving while this scan is happening may cause the 
deadline for
+// the executor to change after it was read above. Check the 
deadline again,
+// and if it chang

[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

squito commented on a change in pull request #24704: [SPARK-20286][core] 
Improve logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#discussion_r288695928
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##
 @@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.dynalloc
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.storage.RDDBlockId
+import org.apache.spark.util.Clock
+
+/**
+ * A monitor for executor activity, used by ExecutorAllocationManager to 
detect idle executors.
+ */
+private[spark] class ExecutorMonitor(
+conf: SparkConf,
+client: ExecutorAllocationClient,
+clock: Clock) extends SparkListener with Logging {
+
+  private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
+  private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT))
+  private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED)
+
+  private val executors = new ConcurrentHashMap[String, Tracker]()
+
+  // The following fields are an optimization to avoid having to scan all 
executors on every EAM
+  // schedule interval to find out which ones are timed out. They keep track 
of when the next
+  // executor timeout is expected to happen, and the current list of timed out 
executors. There's
+  // also a flag that forces the EAM task to recompute the timed out 
executors, in case some event
+  // arrives on the listener bus that may cause the current list of timed out 
executors to change.
+  //
+  // There's also per-executor state used for this purpose, so that 
recomputations can be triggered
+  // only when really necessary.
+  //
+  // Note that this isn't meant to, and cannot, always make the right decision 
about which executors
+  // are indeed timed out. For example, the EAM thread may detect a timed out 
executor while a new
+  // "task start" event has just been posted to the listener bus and hasn't 
yet been delivered to
+  // this listener. There are safeguards in other parts of the code that would 
prevent that executor
+  // from being removed.
+  private var nextTimeout = new AtomicLong(Long.MaxValue)
+  private var timedOutExecs = Seq.empty[String]
+
+  if (idleTimeoutMs < 0) {
+throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} 
must be >= 0!")
+  }
+  if (storageTimeoutMs < 0) {
+throw new 
SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 
0!")
 
 Review comment:
   these checks could now go on the configs themselves


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

squito commented on a change in pull request #24704: [SPARK-20286][core] 
Improve logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#discussion_r288698722
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##
 @@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.dynalloc
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.storage.RDDBlockId
+import org.apache.spark.util.Clock
+
+/**
+ * A monitor for executor activity, used by ExecutorAllocationManager to 
detect idle executors.
+ */
+private[spark] class ExecutorMonitor(
+conf: SparkConf,
+client: ExecutorAllocationClient,
+clock: Clock) extends SparkListener with Logging {
+
+  private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
+  private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT))
+  private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED)
+
+  private val executors = new ConcurrentHashMap[String, Tracker]()
+
+  // The following fields are an optimization to avoid having to scan all 
executors on every EAM
+  // schedule interval to find out which ones are timed out. They keep track 
of when the next
+  // executor timeout is expected to happen, and the current list of timed out 
executors. There's
+  // also a flag that forces the EAM task to recompute the timed out 
executors, in case some event
+  // arrives on the listener bus that may cause the current list of timed out 
executors to change.
+  //
+  // There's also per-executor state used for this purpose, so that 
recomputations can be triggered
+  // only when really necessary.
+  //
+  // Note that this isn't meant to, and cannot, always make the right decision 
about which executors
+  // are indeed timed out. For example, the EAM thread may detect a timed out 
executor while a new
+  // "task start" event has just been posted to the listener bus and hasn't 
yet been delivered to
+  // this listener. There are safeguards in other parts of the code that would 
prevent that executor
+  // from being removed.
+  private var nextTimeout = new AtomicLong(Long.MaxValue)
+  private var timedOutExecs = Seq.empty[String]
+
+  if (idleTimeoutMs < 0) {
+throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} 
must be >= 0!")
+  }
+  if (storageTimeoutMs < 0) {
+throw new 
SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 
0!")
+  }
+
+  def reset(): Unit = {
+executors.clear()
+nextTimeout.set(Long.MaxValue)
+timedOutExecs = Nil
+  }
+
+  def timedOutExecutors(): Seq[String] = {
+val now = clock.getTimeMillis()
+if (now >= nextTimeout.get()) {
+  // Temporarily set the next timeout at Long.MaxValue. This ensures that 
after
+  // scanning all executors below, we know when the next timeout for 
non-timed out
+  // executors is (whether that update came from the scan, or from a new 
event
+  // arriving in a different thread).
+  nextTimeout.set(Long.MaxValue)
+
+  var newNextTimeout = Long.MaxValue
+  timedOutExecs = executors.asScala
+.filter { case (_, exec) => !exec.pendingRemoval }
+.filter { case (_, exec) =>
+  val deadline = exec.timeoutAt
+  if (deadline > now) {
+newNextTimeout = math.min(newNextTimeout, deadline)
+exec.timedOut = false
+false
+  } else {
+exec.timedOut = true
+
+// An event arriving while this scan is happening may cause the 
deadline for
+// the executor to change after it was read above. Check the 
deadline again,
+// and if it chang

[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

squito commented on a change in pull request #24704: [SPARK-20286][core] 
Improve logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#discussion_r288713829
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitorSuite.scala
 ##
 @@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.dynalloc
+
+import java.util.concurrent.TimeUnit
+
+import scala.collection.mutable
+
+import org.mockito.ArgumentMatchers.any
+import org.mockito.Mockito.{mock, when}
+
+import org.apache.spark._
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.storage._
+import org.apache.spark.util.ManualClock
+
+class ExecutorMonitorSuite extends SparkFunSuite {
+
+  private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(60L)
+  private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(120L)
+
+  private val conf = new SparkConf()
+.set(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key, "60s")
+.set(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key, "120s")
+
+  private var monitor: ExecutorMonitor = _
+  private var client: ExecutorAllocationClient = _
+  private var clock: ManualClock = _
+
+  // List of known executors. Allows easily mocking which executors are alive 
without
+  // having to use mockito APIs directly in each test.
+  private val knownExecs = mutable.HashSet[String]()
+
+  override def beforeEach(): Unit = {
+super.beforeEach()
+knownExecs.clear()
+clock = new ManualClock()
+client = mock(classOf[ExecutorAllocationClient])
+when(client.isExecutorActive(any())).thenAnswer { invocation =>
+  knownExecs.contains(invocation.getArguments()(0).asInstanceOf[String])
+}
+monitor = new ExecutorMonitor(conf, client, clock)
+  }
+
+  test("basic executor timeout") {
+knownExecs += "1"
+monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), 
"1", null))
+assert(monitor.executorCount === 1)
+assert(monitor.isExecutorIdle("1"))
+assert(monitor.timedOutExecutors(idleDeadline) === Seq("1"))
+  }
+
+  test("SPARK-4951, SPARK-26927: handle out of order task start events") {
+knownExecs ++= Set("1", "2")
+
+monitor.onTaskStart(SparkListenerTaskStart(1, 1, taskInfo("1", 1)))
+assert(monitor.executorCount === 1)
+
+monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), 
"1", null))
+assert(monitor.executorCount === 1)
+
+monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), 
"2", null))
+assert(monitor.executorCount === 2)
+
+
monitor.onExecutorRemoved(SparkListenerExecutorRemoved(clock.getTimeMillis(), 
"2", null))
+assert(monitor.executorCount === 1)
+
+knownExecs -= "2"
+
+monitor.onTaskStart(SparkListenerTaskStart(1, 1, taskInfo("2", 2)))
+assert(monitor.executorCount === 1)
+  }
+
+  test("track tasks running on executor") {
+knownExecs += "1"
+
+monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), 
"1", null))
+monitor.onTaskStart(SparkListenerTaskStart(1, 1, taskInfo("1", 1)))
+assert(!monitor.isExecutorIdle("1"))
+
+// Start/end a few tasks and make sure the executor does not go idle.
+(2 to 10).foreach { i =>
+  monitor.onTaskStart(SparkListenerTaskStart(i, 1, taskInfo("1", 1)))
+  assert(!monitor.isExecutorIdle("1"))
+
+  monitor.onTaskEnd(SparkListenerTaskEnd(i, 1, "foo", Success, 
taskInfo("1", 1), null))
+  assert(!monitor.isExecutorIdle("1"))
+}
+
+monitor.onTaskEnd(SparkListenerTaskEnd(1, 1, "foo", Success, taskInfo("1", 
1), null))
+assert(monitor.isExecutorIdle("1"))
+assert(monitor.timedOutExecutors(clock.getTimeMillis()).isEmpty)
+assert(monitor.timedOutExecutors(clock.getTimeMillis() + idleTimeoutMs + 
1) === Seq("1"))
+  }
+
+  test("use appropriate time out depending on whether blocks are stored") {
+knownExecs += "1"
+monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), 
"1", null))
+assert(monitor.isExecutorIdle("1"))
+assert(monitor.timedOutExecutors(idleDeadline) === Seq("1"))
+
+monitor.o

[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.

2019-05-29 Thread GitBox

squito commented on a change in pull request #24704: [SPARK-20286][core] 
Improve logic for timing out executors in dynamic allocation.
URL: https://github.com/apache/spark/pull/24704#discussion_r288714016
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##
 @@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.dynalloc
+
+import java.util.concurrent.{ConcurrentHashMap, TimeUnit}
+import java.util.concurrent.atomic.AtomicLong
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+import org.apache.spark.scheduler._
+import org.apache.spark.storage.RDDBlockId
+import org.apache.spark.util.Clock
+
+/**
+ * A monitor for executor activity, used by ExecutorAllocationManager to 
detect idle executors.
+ */
+private[spark] class ExecutorMonitor(
+conf: SparkConf,
+client: ExecutorAllocationClient,
+clock: Clock) extends SparkListener with Logging {
+
+  private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT))
+  private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(
+conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT))
+  private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED)
+
+  private val executors = new ConcurrentHashMap[String, Tracker]()
+
+  // The following fields are an optimization to avoid having to scan all 
executors on every EAM
+  // schedule interval to find out which ones are timed out. They keep track 
of when the next
+  // executor timeout is expected to happen, and the current list of timed out 
executors. There's
+  // also a flag that forces the EAM task to recompute the timed out 
executors, in case some event
+  // arrives on the listener bus that may cause the current list of timed out 
executors to change.
+  //
+  // There's also per-executor state used for this purpose, so that 
recomputations can be triggered
+  // only when really necessary.
+  //
+  // Note that this isn't meant to, and cannot, always make the right decision 
about which executors
+  // are indeed timed out. For example, the EAM thread may detect a timed out 
executor while a new
+  // "task start" event has just been posted to the listener bus and hasn't 
yet been delivered to
+  // this listener. There are safeguards in other parts of the code that would 
prevent that executor
+  // from being removed.
+  private var nextTimeout = new AtomicLong(Long.MaxValue)
+  private var timedOutExecs = Seq.empty[String]
+
+  if (idleTimeoutMs < 0) {
+throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} 
must be >= 0!")
+  }
+  if (storageTimeoutMs < 0) {
+throw new 
SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 
0!")
+  }
+
+  def reset(): Unit = {
+executors.clear()
+nextTimeout.set(Long.MaxValue)
+timedOutExecs = Nil
+  }
+
+  def timedOutExecutors(): Seq[String] = {
+val now = clock.getTimeMillis()
+if (now >= nextTimeout.get()) {
+  // Temporarily set the next timeout at Long.MaxValue. This ensures that 
after
+  // scanning all executors below, we know when the next timeout for 
non-timed out
+  // executors is (whether that update came from the scan, or from a new 
event
+  // arriving in a different thread).
+  nextTimeout.set(Long.MaxValue)
+
+  var newNextTimeout = Long.MaxValue
+  timedOutExecs = executors.asScala
+.filter { case (_, exec) => !exec.pendingRemoval }
+.filter { case (_, exec) =>
+  val deadline = exec.timeoutAt
+  if (deadline > now) {
+newNextTimeout = math.min(newNextTimeout, deadline)
+exec.timedOut = false
+false
+  } else {
+exec.timedOut = true
+
+// An event arriving while this scan is happening may cause the 
deadline for
+// the executor to change after it was read above. Check the 
deadline again,
+// and if it chang

[GitHub] [spark] AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] 
YARN support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497060571
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11179/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] 
YARN support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497060567
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN 
support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497061285
 
 
   **[Test build #105926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105926/testReport)**
 for PR 24634 at commit 
[`dcc1bbe`](https://github.com/apache/spark/commit/dcc1bbe93ba00de05a6f2f159a29ed83b98bb3a5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

2019-05-29 Thread GitBox

SparkQA commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to 
evaluate the deterministic scala udf only once if all inputs are literals
URL: https://github.com/apache/spark/pull/24593#issuecomment-497061299
 
 
   **[Test build #105927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105927/testReport)**
 for PR 24593 at commit 
[`ee5fa4e`](https://github.com/apache/spark/commit/ee5fa4ed49ad26a4f0a84034c14621b266f2a3f1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24634: 
[SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497060571
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11179/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24634: 
[SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497060567
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-29 Thread GitBox

SparkQA commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of 
numOutputRows for BroadcastExchangeExec
URL: https://github.com/apache/spark/pull/24603#issuecomment-497061335
 
 
   **[Test build #105919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105919/testReport)**
 for PR 24603 at commit 
[`f230282`](https://github.com/apache/spark/commit/f230282fae680e11e876518eeb91db90b15f2aed).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-29 Thread GitBox

SparkQA removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL 
metrics of numOutputRows for BroadcastExchangeExec
URL: https://github.com/apache/spark/pull/24603#issuecomment-496990121
 
 
   **[Test build #105919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105919/testReport)**
 for PR 24603 at commit 
[`f230282`](https://github.com/apache/spark/commit/f230282fae680e11e876518eeb91db90b15f2aed).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL 
metrics of numOutputRows for BroadcastExchangeExec
URL: https://github.com/apache/spark/pull/24603#issuecomment-497062085
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL 
metrics of numOutputRows for BroadcastExchangeExec
URL: https://github.com/apache/spark/pull/24603#issuecomment-497062089
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105919/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add 
SQL metrics of numOutputRows for BroadcastExchangeExec
URL: https://github.com/apache/spark/pull/24603#issuecomment-497062089
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105919/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec

2019-05-29 Thread GitBox

AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add 
SQL metrics of numOutputRows for BroadcastExchangeExec
URL: https://github.com/apache/spark/pull/24603#issuecomment-497062085
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on issue #24738: [WIP][SPARK-23098][SQL] Migrate Kafka Batch source to v2.

2019-05-29 Thread GitBox

gaborgsomogyi commented on issue #24738: [WIP][SPARK-23098][SQL] Migrate Kafka 
Batch source to v2.
URL: https://github.com/apache/spark/pull/24738#issuecomment-497063432
 
 
   @cloud-fan @gatorsmile you're driving the DSv2 effort and having lot of 
experience, so I would like to ask whether some parts are missing from the 
framework in the sink side since the following exception is coming:
   ```
   sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: Cannot write 
incompatible data to table 'KafkaTable':
   - Cannot find data for output column 'key'
   - Cannot safely cast 'value': StringType to BinaryType
   - Cannot find data for output column 'partition'
   - Cannot find data for output column 'offset'
   - Cannot find data for output column 'timestamp'
   - Cannot find data for output column 'timestampType';
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.resolveOutputColumns(Analyzer.scala:2353)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$26.applyOrElse(Analyzer.scala:2283)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$26.applyOrElse(Analyzer.scala:2280)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2280)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2279)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:109)
at 
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at 
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:106)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:98)
at scala.collection.immutable.List.foreach(List.scala:392)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:98)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:136)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:130)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:96)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:77)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:77)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:114)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:113)
at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:62)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:60)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:60)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52)
at 
org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:66)
at 
org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:6

[GitHub] [spark] dongjoon-hyun commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation

2019-05-29 Thread GitBox

dongjoon-hyun commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add 
DropTable logical operation
URL: https://github.com/apache/spark/pull/24686#issuecomment-497064751
 
 
   Thank you for pinging me, @rdblue . Yep. I'll take a look, too~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN 
support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497068566
 
 
   **[Test build #105926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105926/testReport)**
 for PR 24634 at commit 
[`dcc1bbe`](https://github.com/apache/spark/commit/dcc1bbe93ba00de05a6f2f159a29ed83b98bb3a5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling

2019-05-29 Thread GitBox

AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] 
YARN support for GPU-aware scheduling
URL: https://github.com/apache/spark/pull/24634#issuecomment-497068697
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105926/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 924 matches

Mail list logo