[GitHub] [spark] AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#issuecomment-497029953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105918/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
AmplabJenkins commented on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#issuecomment-497029951 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#issuecomment-497029953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105918/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
SparkQA removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#issuecomment-496979565 **[Test build #105918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105918/testReport)** for PR 24704 at commit [`7927239`](https://github.com/apache/spark/commit/7927239f81297fc9d90aa09feabebdc130d92f75). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
AmplabJenkins removed a comment on issue #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#issuecomment-497029951 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24733: [SPARK-27869][CORE] Redact sensitive information in System Properties from UI
dongjoon-hyun closed pull request #24733: [SPARK-27869][CORE] Redact sensitive information in System Properties from UI URL: https://github.com/apache/spark/pull/24733 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497036774 **[Test build #105915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105915/testReport)** for PR 24677 at commit [`d9936d5`](https://github.com/apache/spark/commit/d9936d5084b71f4a8736d035fbe1fa3afade6966). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496964895 **[Test build #105915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105915/testReport)** for PR 24677 at commit [`d9936d5`](https://github.com/apache/spark/commit/d9936d5084b71f4a8736d035fbe1fa3afade6966). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497037495 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105915/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497037484 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497037495 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105915/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497037484 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
skonto commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288691773 ## File path: docs/configuration.md ## @@ -206,6 +206,13 @@ of the most common options to set are: name and an array of addresses. + + spark.driver.resource.{resourceName}.vendor Review comment: Template is harder to use in order to specify the gpu limits as in the example [here](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288694830 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ## @@ -129,6 +132,7 @@ private[spark] class BasicDriverFeatureStep(conf: KubernetesDriverConf) .addToLimits(maybeCpuLimitQuantity.toMap.asJava) .addToRequests("memory", driverMemoryQuantity) .addToLimits("memory", driverMemoryQuantity) +.addToLimits(driverResourceQuantities.toMap.asJava) Review comment: Why only apply them to the limit? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288693711 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: The Kubernetes resource name uses the format `.com/gpu`. Are `vendor` expected to contain the full domain name like `nvidia.com` or just the vendor name? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288693711 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: The Kubernetes gpu resource name uses the format `.com/gpu`. Are `vendor` expected to contain the full domain name like `nvidia.com` or just the vendor name to follow the Kubernetes pattern? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288693711 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: The Kubernetes gpu resource name uses the format `.com/gpu`. Are `vendor` expected to contain the full domain name like `nvidia.com` or just the vendor name? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497044695 **[Test build #105916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105916/testReport)** for PR 24677 at commit [`ccfeb9e`](https://github.com/apache/spark/commit/ccfeb9e408b3fd804c0f68308da7ca0adf3094b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
SparkQA removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-496972265 **[Test build #105916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105916/testReport)** for PR 24677 at commit [`ccfeb9e`](https://github.com/apache/spark/commit/ccfeb9e408b3fd804c0f68308da7ca0adf3094b5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497045554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105916/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins commented on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497045538 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mwlon commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
mwlon commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24713#issuecomment-497046845 Ok, if we don't want the other associated changes of SPARK-26192 and only the functional part, then close this PR and I'll make a new one later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497045554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105916/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled
AmplabJenkins removed a comment on issue #24677: [SPARK-27805][PYTHON] Propagate SparkExceptions during toPandas with arrow enabled URL: https://github.com/apache/spark/pull/24677#issuecomment-497045538 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] IvanVergiliev commented on a change in pull request #24068: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion
IvanVergiliev commented on a change in pull request #24068: [SPARK-27105][SQL] Optimize away exponential complexity in ORC predicate conversion URL: https://github.com/apache/spark/pull/24068#discussion_r288700718 ## File path: sql/core/v1.2.1/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala ## @@ -63,55 +64,26 @@ private[sql] object OrcFilters extends OrcFiltersBase { */ def createFilter(schema: StructType, filters: Seq[Filter]): Option[SearchArgument] = { val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap +val orcFilterConverter = new OrcFilterConverter(dataTypeMap) for { // Combines all convertible filters using `And` to produce a single conjunction - conjunction <- buildTree(convertibleFilters(schema, dataTypeMap, filters)) + conjunction <- buildTree(filters.flatMap(orcFilterConverter.trimUnconvertibleFilters)) // Then tries to build a single ORC `SearchArgument` for the conjunction predicate - builder <- buildSearchArgument(dataTypeMap, conjunction, newBuilder) + builder <- orcFilterConverter.buildSearchArgument(conjunction, newBuilder) Review comment: I'm not sure which loop you're referring to here. The `for-yield` is used for dealing with the `Option`s that both of these methods can return, and they can still return Options so we still need the for loop. We can create a separate function which takes in a trimmed filter and just handles the building, but I'm not sure how useful that would be. I kind of like the isolation and slight guarantees that `build` will only be called on a trimmed filter that having one common `buildSearchArgument` method gives us. Let me know if you had something else in mind. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun commented on issue #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24713#issuecomment-497048471 Yep. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs
dongjoon-hyun closed pull request #24713: [SPARK-26192][MESOS][2.4] Retrieve enableFetcherCache option from submission for driver URIs URL: https://github.com/apache/spark/pull/24713 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24728: [SPARK-27737][SQL][FOLLOW-UP] Move sql/hive-thriftserver/v2.3.4 to sql/hive-thriftserver/v2.3.5
dongjoon-hyun commented on issue #24728: [SPARK-27737][SQL][FOLLOW-UP] Move sql/hive-thriftserver/v2.3.4 to sql/hive-thriftserver/v2.3.5 URL: https://github.com/apache/spark/pull/24728#issuecomment-497048212 Hi, @juliuszsompolski . This PR is okay because it's clearly saying `move`, not `upgrading` and we are in the middle of huge transition in #24628 . I merged this intentionally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288702763 ## File path: docs/configuration.md ## @@ -206,6 +206,13 @@ of the most common options to set are: name and an array of addresses. + + spark.driver.resource.{resourceName}.vendor Review comment: I'm not sure what you mean by template is harder to use? here you specify the spark conf and we translate it into the k8s one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288703148 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: meant to be vendor and domain like the comment says, I can update the name of the variable if it makes it more clear? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288704228 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: What's the reason of requiring the full vendor domain name? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288704248 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ## @@ -129,6 +132,7 @@ private[spark] class BasicDriverFeatureStep(conf: KubernetesDriverConf) .addToLimits(maybeCpuLimitQuantity.toMap.asJava) .addToRequests("memory", driverMemoryQuantity) .addToLimits("memory", driverMemoryQuantity) +.addToLimits(driverResourceQuantities.toMap.asJava) Review comment: I'm going by https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ -> GPUs are only supposed to be specified in the limits also assuming that all resource plugins use limits going by: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ where it just has: containers: - name: demo-container-1 image: k8s.gcr.io/pause:2.0 resources: limits: vendor-domain/resource: 2 # requesting 2 vendor-domain/resource If there is other documentation that says otherwise please let me know This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288704815 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: that is what k8s takes from reading the docs, if I am wrong please point me to correct docs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288704913 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288705075 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: if we don't allow that then I don't know how we can be generic for any type This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288706042 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/. It seems from the doc that only the vendor name needs to be configurable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst
rdblue commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst URL: https://github.com/apache/spark/pull/24723#issuecomment-497053324 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst
AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst URL: https://github.com/apache/spark/pull/24723#issuecomment-497054247 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11175/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/24706#issuecomment-497054283 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11176/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288707710 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ## @@ -129,6 +132,7 @@ private[spark] class BasicDriverFeatureStep(conf: KubernetesDriverConf) .addToLimits(maybeCpuLimitQuantity.toMap.asJava) .addToRequests("memory", driverMemoryQuantity) .addToLimits("memory", driverMemoryQuantity) +.addToLimits(driverResourceQuantities.toMap.asJava) Review comment: You are right. The request will be set by default to the limit value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst
AmplabJenkins commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst URL: https://github.com/apache/spark/pull/24723#issuecomment-497054228 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
AmplabJenkins commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/24706#issuecomment-497054275 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
vanzin commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler URL: https://github.com/apache/spark/pull/24645#issuecomment-497054560 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst
SparkQA commented on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst URL: https://github.com/apache/spark/pull/24723#issuecomment-497054977 **[Test build #105923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105923/testReport)** for PR 24723 at commit [`36a2bcd`](https://github.com/apache/spark/commit/36a2bcdbce6af6f95481580bf25fbf2231a523b7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
SparkQA commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler URL: https://github.com/apache/spark/pull/24645#issuecomment-497055047 **[Test build #105925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105925/testReport)** for PR 24645 at commit [`6d733aa`](https://github.com/apache/spark/commit/6d733aa1a84dcb15b96fd004de44d12fbcbe4891). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
SparkQA commented on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/24706#issuecomment-497055052 **[Test build #105924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105924/testReport)** for PR 24706 at commit [`a9b4209`](https://github.com/apache/spark/commit/a9b420982f1db4988a76966b81fb5f43800a0523). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#discussion_r288701933 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala ## @@ -228,3 +228,31 @@ object ObjectSerializerPruning extends Rule[LogicalPlan] { } } } + +/** + * Reassigns per-query unique IDs to `LambdaVariable`s, whose original IDs are globally unique. This + * can help Spark to hit codegen cache more often and improve performance. + */ +object ReassignLambdaVariableID extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = { +if (!SQLConf.get.getConf(SQLConf.OPTIMIZER_REASSIGN_LAMBDA_VARIABLE_ID)) return plan + +// The original LambdaVariable IDs are all positive. To avoid conflicts, the new IDs are all +// negative and starts with -1. +var newId = -1L +val oldIdToNewId = scala.collection.mutable.Map.empty[Long, Long] + +plan.transformAllExpressions { Review comment: The two traversals on the plan here sure makes the intent clean: one pass for collecting old-to-new ID mappings, and then another pass to actually do the transformation. But efficiency-wise, these two traversals can be combined into one easily, right? Reducing the number of traversals can help save a lot of time when dealing with large plans. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#discussion_r288700780 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala ## @@ -228,3 +228,31 @@ object ObjectSerializerPruning extends Rule[LogicalPlan] { } } } + +/** + * Reassigns per-query unique IDs to `LambdaVariable`s, whose original IDs are globally unique. This + * can help Spark to hit codegen cache more often and improve performance. + */ +object ReassignLambdaVariableID extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = { +if (!SQLConf.get.getConf(SQLConf.OPTIMIZER_REASSIGN_LAMBDA_VARIABLE_ID)) return plan + +// The original LambdaVariable IDs are all positive. To avoid conflicts, the new IDs are all +// negative and starts with -1. Review comment: Nit: "starts from" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#discussion_r288705981 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -595,12 +623,15 @@ case class LambdaVariable( } override def genCode(ctx: CodegenContext): ExprCode = { -val isNullValue = if (nullable) { - JavaCode.isNullVariable(isNull) +// The id can be negative if it's reassigned by `ReassignLambdaVariableID`. +val suffix = "lambda_variable_" + math.abs(id) Review comment: There an implicitly assumption here that within a plan, all the `LambdaVariable`s either hold IDs that were the original IDs (positive) or the reassigned ones (negative). We should probably add a comment on that, because if the positive/negative ones are mixed together, you can actually get a conflict when you do `abs`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#discussion_r288706970 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -575,15 +575,43 @@ case class WrapOption(child: Expression, optType: DataType) } } +object LambdaVariable { + private val curId = new java.util.concurrent.atomic.AtomicLong() + + // Returns the codegen-ed `LambdaVariable` and add it to mutable states, so that it can be Review comment: I don't like this part. It might make codegen a bit easier to write but you're making unnecessary hoisting of local variables to Java object fields. Doesn't sound like a good idea to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
rdblue commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-497055324 Looks good to me. One minor comment: it doesn't look like Analyzer implementations created in BaseSessionStateBuilder actually override the lookup method to call the session's catalog lookup. I would probably include that in this PR, but we can also add it in later PRs when it is used. +1, I think this is ready to merge either way. @cloud-fan, @dongjoon-hyun, could you take a look at this DSv2 PR? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs
rednaxelafx commented on a change in pull request #24735: [SPARK-27871][SQL] LambdaVariable should use per-query unique IDs instead of globally unique IDs URL: https://github.com/apache/spark/pull/24735#discussion_r288692135 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetOptimizationSuite.scala ## @@ -166,4 +164,42 @@ class DatasetOptimizationSuite extends QueryTest with SharedSQLContext { checkAnswer(df, Seq(Row("1"), Row("2"), Row("3"))) } } + + test("SPARK-27871: Dataset encoder should benefit from codegen cache") { +def checkCodegenCache(createDataset: () => Dataset[_]): Unit = { + val count1 = CodegenMetrics.METRIC_COMPILATION_TIME.getCount() Review comment: My earlier PR that added the whole-stage codegen ID used another metric for the same purpose: https://github.com/apache/spark/commit/e57f394818b0a62f99609e1032fede7e981f306f#diff-0314224342bb8c30143ab784b3805d19R296 Should we try to make them use the exact same logic for checking whether or not codegen cache was hit? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/24706#issuecomment-497054275 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler URL: https://github.com/apache/spark/pull/24645#issuecomment-493853645 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
AmplabJenkins removed a comment on issue #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL URL: https://github.com/apache/spark/pull/24706#issuecomment-497054283 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11176/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst
AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst URL: https://github.com/apache/spark/pull/24723#issuecomment-497054247 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11175/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst
AmplabJenkins removed a comment on issue #24723: [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst URL: https://github.com/apache/spark/pull/24723#issuecomment-497054228 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue edited a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function
rdblue edited a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Require lookup function URL: https://github.com/apache/spark/pull/24689#issuecomment-497055324 Looks good to me. One minor comment: it doesn't look like Analyzer implementations created in BaseSessionStateBuilder actually override the lookup method to call the session's catalog lookup. I would probably include that in this PR, but we can also add it in later PRs when it is used. +1, I think this is ready to merge either way. @cloud-fan, @dongjoon-hyun, @HyukjinKwon, could you take a look at this DSv2 PR? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
tgravescs commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288709084 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: sorry maybe I'm missing it but that doc says: limits: nvidia.com/gpu: 1 # requesting 1 GPU nvidia is vendor, .com is domain, resource is gpus. This PR is trying to handle any generic resource type not just GPUs so the vendor has to have both vendor and domain. Otherwise we would have to put something in that say for gpu's use .com for domain. Again I'm just going by https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/, maybe domain is always .com? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#discussion_r288709384 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -150,6 +150,16 @@ object SQLConf { } } + val DETERMINISTIC_UDF_FOLD_ENABLED = buildConf("spark.deterministic.udf.folding.enabled") Review comment: I didn't hear back, so I used the name I suggested here in the latest push. Let me know if you have any other comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation
rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation URL: https://github.com/apache/spark/pull/24686#discussion_r288709315 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2195,4 +2195,22 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } } + /** + * Create a [[sql.DropTableStatement]] command. + */ + override def visitDropTable(ctx: DropTableContext): LogicalPlan = withOrigin(ctx) { +sql.DropTableStatement( Review comment: Nit: I don't think `sql.` is required here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288706042 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/. It seems from the doc that only the vendor name needs to be configurable. But the doc you referred to suggests that the resource name can be arbitrary domain name. So I guess it's safer to use the full domain name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skambha commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
skambha commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#issuecomment-497056361 @dongjoon-hyun , I have pushed the changes to address the renames you suggested. Please take a look. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skambha edited a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
skambha edited a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#issuecomment-497056361 @dongjoon-hyun , I have pushed the changes to address the renames https://github.com/apache/spark/pull/24593/commits/ee5fa4ed49ad26a4f0a84034c14621b266f2a3f1 you suggested. Please take a look. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s
liyinan926 commented on a change in pull request #24703: [SPARK-27362][K8S] Resource Scheduling support for k8s URL: https://github.com/apache/spark/pull/24703#discussion_r288710024 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -199,4 +199,13 @@ private[spark] object KubernetesConf { .replaceAll("[^a-z0-9\\-]", "") .replaceAll("-+", "-") } + + /** + * Build a resources name based on the vendor device plugin naming + * convention of: vendor-domain/resource. For example, an Nvidia GPU is + * advertised as nvidia.com/gpu. + */ + def buildKubernetesResourceName(vendor: String, resourceName: String): String = { +s"${vendor}/${resourceName}" Review comment: Can you update the documentation of the two new options to make it clear that they expect vendor domain names? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation
rdblue commented on a change in pull request #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation URL: https://github.com/apache/spark/pull/24686#discussion_r288710143 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala ## @@ -83,6 +84,15 @@ case class DataSourceResolution( s"No catalog specified for table ${identifier.quoted} and no default catalog is set")) .asTableCatalog convertCTAS(catalog, identifier, create) + +case DropTableStatement(CatalogObjectIdentifier(Some(catalog), ident), ifExists, _) => + DropTable(catalog.asTableCatalog, ident, ifExists) + +case DropTableStatement(AsTableIdentifier(tableName), ifExists, purge) => + DropTableCommand(tableName, ifExists, false, purge) Review comment: Boolean arguments should use the named form: `isView = false`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler URL: https://github.com/apache/spark/pull/24645#issuecomment-497057576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11177/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#issuecomment-497057602 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
AmplabJenkins commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#issuecomment-497057608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11178/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
AmplabJenkins commented on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler URL: https://github.com/apache/spark/pull/24645#issuecomment-497057565 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
vanzin commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497058065 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin commented on a change in pull request #24634: [SPARK-27361][YARN] YARN support for GPU-aware scheduling
vanzin commented on a change in pull request #24634: [SPARK-27361][YARN] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#discussion_r288710015 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ## @@ -39,6 +40,25 @@ object YarnSparkHadoopUtil { val MEMORY_OVERHEAD_MIN = 384L val ANY_HOST = "*" + val YARN_GPU_RESOURCE_CONFIG = "yarn.io/gpu" + val YARN_FPGA_RESOURCE_CONFIG = "yarn.io/fpga" + + /** + * Convert Spark resources into YARN resources. + * The only resources we know how to map from spark configs to yarn configs are + * gpus and fpgas, everything else the user has to specify them in both the + * spark.yarn.*.resource and the spark.*.resource configs. + */ + private[yarn] def getYarnResourcesFromSparkResources( + confPrefix: String, + sparkConf: SparkConf + ): Map[String, String] = { +Map("gpu" -> YARN_GPU_RESOURCE_CONFIG, "fpga" -> YARN_FPGA_RESOURCE_CONFIG).map { + case (rName, yarnName) => +val resourceCountSparkConf = s"${confPrefix}${rName}${SPARK_RESOURCE_COUNT_SUFFIX}" +(yarnName -> sparkConf.getOption(resourceCountSparkConf).getOrElse("0")) Review comment: `get(resourceCountSparkConf, "0")` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#issuecomment-497057602 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
AmplabJenkins removed a comment on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#issuecomment-497057608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11178/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler URL: https://github.com/apache/spark/pull/24645#issuecomment-497057565 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler
AmplabJenkins removed a comment on issue #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in ExternalShuffleBlockHandler URL: https://github.com/apache/spark/pull/24645#issuecomment-497057576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11177/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation
rdblue commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation URL: https://github.com/apache/spark/pull/24686#issuecomment-497059525 +1 overall, just one minor style problem with boolean args to fix. I think this is ready to go. You might also mention a couple of things in the PR description: * This moves parsing of `DROP TABLE` into Catalyst and adds parsed plans, like #24029 did for create * Like #24029, parsing tests for `DROP TABLE` have been moved to `PlanResolutionSuite` to validate existing behavior, and new tests for the catalyst parser have been added to Catalyst's `DDLParserSuite` @cloud-fan, @dongjoon-hyun, could you review this DSv2 PR? I think it is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#discussion_r288698370 ## File path: core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.dynalloc + +import java.util.concurrent.{ConcurrentHashMap, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.scheduler._ +import org.apache.spark.storage.RDDBlockId +import org.apache.spark.util.Clock + +/** + * A monitor for executor activity, used by ExecutorAllocationManager to detect idle executors. + */ +private[spark] class ExecutorMonitor( +conf: SparkConf, +client: ExecutorAllocationClient, +clock: Clock) extends SparkListener with Logging { + + private val idleTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT)) + private val storageTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT)) + private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) && +conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED) + + private val executors = new ConcurrentHashMap[String, Tracker]() + + // The following fields are an optimization to avoid having to scan all executors on every EAM + // schedule interval to find out which ones are timed out. They keep track of when the next + // executor timeout is expected to happen, and the current list of timed out executors. There's + // also a flag that forces the EAM task to recompute the timed out executors, in case some event + // arrives on the listener bus that may cause the current list of timed out executors to change. + // + // There's also per-executor state used for this purpose, so that recomputations can be triggered + // only when really necessary. + // + // Note that this isn't meant to, and cannot, always make the right decision about which executors + // are indeed timed out. For example, the EAM thread may detect a timed out executor while a new + // "task start" event has just been posted to the listener bus and hasn't yet been delivered to + // this listener. There are safeguards in other parts of the code that would prevent that executor + // from being removed. + private var nextTimeout = new AtomicLong(Long.MaxValue) + private var timedOutExecs = Seq.empty[String] + + if (idleTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + if (storageTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + + def reset(): Unit = { +executors.clear() +nextTimeout.set(Long.MaxValue) +timedOutExecs = Nil + } + + def timedOutExecutors(): Seq[String] = { +val now = clock.getTimeMillis() +if (now >= nextTimeout.get()) { + // Temporarily set the next timeout at Long.MaxValue. This ensures that after + // scanning all executors below, we know when the next timeout for non-timed out + // executors is (whether that update came from the scan, or from a new event + // arriving in a different thread). + nextTimeout.set(Long.MaxValue) + + var newNextTimeout = Long.MaxValue + timedOutExecs = executors.asScala +.filter { case (_, exec) => !exec.pendingRemoval } +.filter { case (_, exec) => + val deadline = exec.timeoutAt + if (deadline > now) { +newNextTimeout = math.min(newNextTimeout, deadline) +exec.timedOut = false +false + } else { +exec.timedOut = true + +// An event arriving while this scan is happening may cause the deadline for +// the executor to change after it was read above. Check the deadline again, +// and if it chang
[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#discussion_r288677519 ## File path: core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.dynalloc + +import java.util.concurrent.{ConcurrentHashMap, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.scheduler._ +import org.apache.spark.storage.RDDBlockId +import org.apache.spark.util.Clock + +/** + * A monitor for executor activity, used by ExecutorAllocationManager to detect idle executors. + */ +private[spark] class ExecutorMonitor( +conf: SparkConf, +client: ExecutorAllocationClient, +clock: Clock) extends SparkListener with Logging { + + private val idleTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT)) + private val storageTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT)) + private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) && +conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED) + + private val executors = new ConcurrentHashMap[String, Tracker]() + + // The following fields are an optimization to avoid having to scan all executors on every EAM + // schedule interval to find out which ones are timed out. They keep track of when the next + // executor timeout is expected to happen, and the current list of timed out executors. There's + // also a flag that forces the EAM task to recompute the timed out executors, in case some event + // arrives on the listener bus that may cause the current list of timed out executors to change. + // + // There's also per-executor state used for this purpose, so that recomputations can be triggered + // only when really necessary. + // + // Note that this isn't meant to, and cannot, always make the right decision about which executors + // are indeed timed out. For example, the EAM thread may detect a timed out executor while a new + // "task start" event has just been posted to the listener bus and hasn't yet been delivered to + // this listener. There are safeguards in other parts of the code that would prevent that executor + // from being removed. + private var nextTimeout = new AtomicLong(Long.MaxValue) + private var timedOutExecs = Seq.empty[String] + + if (idleTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + if (storageTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + + def reset(): Unit = { +executors.clear() +nextTimeout.set(Long.MaxValue) +timedOutExecs = Nil + } + + def timedOutExecutors(): Seq[String] = { +val now = clock.getTimeMillis() +if (now >= nextTimeout.get()) { + // Temporarily set the next timeout at Long.MaxValue. This ensures that after + // scanning all executors below, we know when the next timeout for non-timed out + // executors is (whether that update came from the scan, or from a new event + // arriving in a different thread). + nextTimeout.set(Long.MaxValue) + + var newNextTimeout = Long.MaxValue + timedOutExecs = executors.asScala +.filter { case (_, exec) => !exec.pendingRemoval } +.filter { case (_, exec) => + val deadline = exec.timeoutAt + if (deadline > now) { +newNextTimeout = math.min(newNextTimeout, deadline) +exec.timedOut = false +false + } else { +exec.timedOut = true + +// An event arriving while this scan is happening may cause the deadline for +// the executor to change after it was read above. Check the deadline again, +// and if it chang
[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#discussion_r288699085 ## File path: core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.dynalloc + +import java.util.concurrent.{ConcurrentHashMap, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.scheduler._ +import org.apache.spark.storage.RDDBlockId +import org.apache.spark.util.Clock + +/** + * A monitor for executor activity, used by ExecutorAllocationManager to detect idle executors. + */ +private[spark] class ExecutorMonitor( +conf: SparkConf, +client: ExecutorAllocationClient, +clock: Clock) extends SparkListener with Logging { + + private val idleTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT)) + private val storageTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT)) + private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) && +conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED) + + private val executors = new ConcurrentHashMap[String, Tracker]() + + // The following fields are an optimization to avoid having to scan all executors on every EAM + // schedule interval to find out which ones are timed out. They keep track of when the next + // executor timeout is expected to happen, and the current list of timed out executors. There's + // also a flag that forces the EAM task to recompute the timed out executors, in case some event + // arrives on the listener bus that may cause the current list of timed out executors to change. + // + // There's also per-executor state used for this purpose, so that recomputations can be triggered + // only when really necessary. + // + // Note that this isn't meant to, and cannot, always make the right decision about which executors + // are indeed timed out. For example, the EAM thread may detect a timed out executor while a new + // "task start" event has just been posted to the listener bus and hasn't yet been delivered to + // this listener. There are safeguards in other parts of the code that would prevent that executor + // from being removed. + private var nextTimeout = new AtomicLong(Long.MaxValue) + private var timedOutExecs = Seq.empty[String] + + if (idleTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + if (storageTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + + def reset(): Unit = { +executors.clear() +nextTimeout.set(Long.MaxValue) +timedOutExecs = Nil + } + + def timedOutExecutors(): Seq[String] = { +val now = clock.getTimeMillis() +if (now >= nextTimeout.get()) { + // Temporarily set the next timeout at Long.MaxValue. This ensures that after + // scanning all executors below, we know when the next timeout for non-timed out + // executors is (whether that update came from the scan, or from a new event + // arriving in a different thread). + nextTimeout.set(Long.MaxValue) + + var newNextTimeout = Long.MaxValue + timedOutExecs = executors.asScala +.filter { case (_, exec) => !exec.pendingRemoval } +.filter { case (_, exec) => + val deadline = exec.timeoutAt + if (deadline > now) { +newNextTimeout = math.min(newNextTimeout, deadline) +exec.timedOut = false +false + } else { +exec.timedOut = true + +// An event arriving while this scan is happening may cause the deadline for +// the executor to change after it was read above. Check the deadline again, +// and if it chang
[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#discussion_r288695928 ## File path: core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.dynalloc + +import java.util.concurrent.{ConcurrentHashMap, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.scheduler._ +import org.apache.spark.storage.RDDBlockId +import org.apache.spark.util.Clock + +/** + * A monitor for executor activity, used by ExecutorAllocationManager to detect idle executors. + */ +private[spark] class ExecutorMonitor( +conf: SparkConf, +client: ExecutorAllocationClient, +clock: Clock) extends SparkListener with Logging { + + private val idleTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT)) + private val storageTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT)) + private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) && +conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED) + + private val executors = new ConcurrentHashMap[String, Tracker]() + + // The following fields are an optimization to avoid having to scan all executors on every EAM + // schedule interval to find out which ones are timed out. They keep track of when the next + // executor timeout is expected to happen, and the current list of timed out executors. There's + // also a flag that forces the EAM task to recompute the timed out executors, in case some event + // arrives on the listener bus that may cause the current list of timed out executors to change. + // + // There's also per-executor state used for this purpose, so that recomputations can be triggered + // only when really necessary. + // + // Note that this isn't meant to, and cannot, always make the right decision about which executors + // are indeed timed out. For example, the EAM thread may detect a timed out executor while a new + // "task start" event has just been posted to the listener bus and hasn't yet been delivered to + // this listener. There are safeguards in other parts of the code that would prevent that executor + // from being removed. + private var nextTimeout = new AtomicLong(Long.MaxValue) + private var timedOutExecs = Seq.empty[String] + + if (idleTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + if (storageTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") Review comment: these checks could now go on the configs themselves This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#discussion_r288698722 ## File path: core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.dynalloc + +import java.util.concurrent.{ConcurrentHashMap, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.scheduler._ +import org.apache.spark.storage.RDDBlockId +import org.apache.spark.util.Clock + +/** + * A monitor for executor activity, used by ExecutorAllocationManager to detect idle executors. + */ +private[spark] class ExecutorMonitor( +conf: SparkConf, +client: ExecutorAllocationClient, +clock: Clock) extends SparkListener with Logging { + + private val idleTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT)) + private val storageTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT)) + private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) && +conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED) + + private val executors = new ConcurrentHashMap[String, Tracker]() + + // The following fields are an optimization to avoid having to scan all executors on every EAM + // schedule interval to find out which ones are timed out. They keep track of when the next + // executor timeout is expected to happen, and the current list of timed out executors. There's + // also a flag that forces the EAM task to recompute the timed out executors, in case some event + // arrives on the listener bus that may cause the current list of timed out executors to change. + // + // There's also per-executor state used for this purpose, so that recomputations can be triggered + // only when really necessary. + // + // Note that this isn't meant to, and cannot, always make the right decision about which executors + // are indeed timed out. For example, the EAM thread may detect a timed out executor while a new + // "task start" event has just been posted to the listener bus and hasn't yet been delivered to + // this listener. There are safeguards in other parts of the code that would prevent that executor + // from being removed. + private var nextTimeout = new AtomicLong(Long.MaxValue) + private var timedOutExecs = Seq.empty[String] + + if (idleTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + if (storageTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + + def reset(): Unit = { +executors.clear() +nextTimeout.set(Long.MaxValue) +timedOutExecs = Nil + } + + def timedOutExecutors(): Seq[String] = { +val now = clock.getTimeMillis() +if (now >= nextTimeout.get()) { + // Temporarily set the next timeout at Long.MaxValue. This ensures that after + // scanning all executors below, we know when the next timeout for non-timed out + // executors is (whether that update came from the scan, or from a new event + // arriving in a different thread). + nextTimeout.set(Long.MaxValue) + + var newNextTimeout = Long.MaxValue + timedOutExecs = executors.asScala +.filter { case (_, exec) => !exec.pendingRemoval } +.filter { case (_, exec) => + val deadline = exec.timeoutAt + if (deadline > now) { +newNextTimeout = math.min(newNextTimeout, deadline) +exec.timedOut = false +false + } else { +exec.timedOut = true + +// An event arriving while this scan is happening may cause the deadline for +// the executor to change after it was read above. Check the deadline again, +// and if it chang
[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#discussion_r288713829 ## File path: core/src/test/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitorSuite.scala ## @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.dynalloc + +import java.util.concurrent.TimeUnit + +import scala.collection.mutable + +import org.mockito.ArgumentMatchers.any +import org.mockito.Mockito.{mock, when} + +import org.apache.spark._ +import org.apache.spark.internal.config._ +import org.apache.spark.scheduler._ +import org.apache.spark.storage._ +import org.apache.spark.util.ManualClock + +class ExecutorMonitorSuite extends SparkFunSuite { + + private val idleTimeoutMs = TimeUnit.SECONDS.toMillis(60L) + private val storageTimeoutMs = TimeUnit.SECONDS.toMillis(120L) + + private val conf = new SparkConf() +.set(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key, "60s") +.set(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key, "120s") + + private var monitor: ExecutorMonitor = _ + private var client: ExecutorAllocationClient = _ + private var clock: ManualClock = _ + + // List of known executors. Allows easily mocking which executors are alive without + // having to use mockito APIs directly in each test. + private val knownExecs = mutable.HashSet[String]() + + override def beforeEach(): Unit = { +super.beforeEach() +knownExecs.clear() +clock = new ManualClock() +client = mock(classOf[ExecutorAllocationClient]) +when(client.isExecutorActive(any())).thenAnswer { invocation => + knownExecs.contains(invocation.getArguments()(0).asInstanceOf[String]) +} +monitor = new ExecutorMonitor(conf, client, clock) + } + + test("basic executor timeout") { +knownExecs += "1" +monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), "1", null)) +assert(monitor.executorCount === 1) +assert(monitor.isExecutorIdle("1")) +assert(monitor.timedOutExecutors(idleDeadline) === Seq("1")) + } + + test("SPARK-4951, SPARK-26927: handle out of order task start events") { +knownExecs ++= Set("1", "2") + +monitor.onTaskStart(SparkListenerTaskStart(1, 1, taskInfo("1", 1))) +assert(monitor.executorCount === 1) + +monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), "1", null)) +assert(monitor.executorCount === 1) + +monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), "2", null)) +assert(monitor.executorCount === 2) + + monitor.onExecutorRemoved(SparkListenerExecutorRemoved(clock.getTimeMillis(), "2", null)) +assert(monitor.executorCount === 1) + +knownExecs -= "2" + +monitor.onTaskStart(SparkListenerTaskStart(1, 1, taskInfo("2", 2))) +assert(monitor.executorCount === 1) + } + + test("track tasks running on executor") { +knownExecs += "1" + +monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), "1", null)) +monitor.onTaskStart(SparkListenerTaskStart(1, 1, taskInfo("1", 1))) +assert(!monitor.isExecutorIdle("1")) + +// Start/end a few tasks and make sure the executor does not go idle. +(2 to 10).foreach { i => + monitor.onTaskStart(SparkListenerTaskStart(i, 1, taskInfo("1", 1))) + assert(!monitor.isExecutorIdle("1")) + + monitor.onTaskEnd(SparkListenerTaskEnd(i, 1, "foo", Success, taskInfo("1", 1), null)) + assert(!monitor.isExecutorIdle("1")) +} + +monitor.onTaskEnd(SparkListenerTaskEnd(1, 1, "foo", Success, taskInfo("1", 1), null)) +assert(monitor.isExecutorIdle("1")) +assert(monitor.timedOutExecutors(clock.getTimeMillis()).isEmpty) +assert(monitor.timedOutExecutors(clock.getTimeMillis() + idleTimeoutMs + 1) === Seq("1")) + } + + test("use appropriate time out depending on whether blocks are stored") { +knownExecs += "1" +monitor.onExecutorAdded(SparkListenerExecutorAdded(clock.getTimeMillis(), "1", null)) +assert(monitor.isExecutorIdle("1")) +assert(monitor.timedOutExecutors(idleDeadline) === Seq("1")) + +monitor.o
[GitHub] [spark] squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation.
squito commented on a change in pull request #24704: [SPARK-20286][core] Improve logic for timing out executors in dynamic allocation. URL: https://github.com/apache/spark/pull/24704#discussion_r288714016 ## File path: core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.dynalloc + +import java.util.concurrent.{ConcurrentHashMap, TimeUnit} +import java.util.concurrent.atomic.AtomicLong + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark._ +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config._ +import org.apache.spark.scheduler._ +import org.apache.spark.storage.RDDBlockId +import org.apache.spark.util.Clock + +/** + * A monitor for executor activity, used by ExecutorAllocationManager to detect idle executors. + */ +private[spark] class ExecutorMonitor( +conf: SparkConf, +client: ExecutorAllocationClient, +clock: Clock) extends SparkListener with Logging { + + private val idleTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT)) + private val storageTimeoutMs = TimeUnit.SECONDS.toMillis( +conf.get(DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT)) + private val fetchFromShuffleSvcEnabled = conf.get(SHUFFLE_SERVICE_ENABLED) && +conf.get(SHUFFLE_SERVICE_FETCH_RDD_ENABLED) + + private val executors = new ConcurrentHashMap[String, Tracker]() + + // The following fields are an optimization to avoid having to scan all executors on every EAM + // schedule interval to find out which ones are timed out. They keep track of when the next + // executor timeout is expected to happen, and the current list of timed out executors. There's + // also a flag that forces the EAM task to recompute the timed out executors, in case some event + // arrives on the listener bus that may cause the current list of timed out executors to change. + // + // There's also per-executor state used for this purpose, so that recomputations can be triggered + // only when really necessary. + // + // Note that this isn't meant to, and cannot, always make the right decision about which executors + // are indeed timed out. For example, the EAM thread may detect a timed out executor while a new + // "task start" event has just been posted to the listener bus and hasn't yet been delivered to + // this listener. There are safeguards in other parts of the code that would prevent that executor + // from being removed. + private var nextTimeout = new AtomicLong(Long.MaxValue) + private var timedOutExecs = Seq.empty[String] + + if (idleTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + if (storageTimeoutMs < 0) { +throw new SparkException(s"${DYN_ALLOCATION_CACHED_EXECUTOR_IDLE_TIMEOUT.key} must be >= 0!") + } + + def reset(): Unit = { +executors.clear() +nextTimeout.set(Long.MaxValue) +timedOutExecs = Nil + } + + def timedOutExecutors(): Seq[String] = { +val now = clock.getTimeMillis() +if (now >= nextTimeout.get()) { + // Temporarily set the next timeout at Long.MaxValue. This ensures that after + // scanning all executors below, we know when the next timeout for non-timed out + // executors is (whether that update came from the scan, or from a new event + // arriving in a different thread). + nextTimeout.set(Long.MaxValue) + + var newNextTimeout = Long.MaxValue + timedOutExecs = executors.asScala +.filter { case (_, exec) => !exec.pendingRemoval } +.filter { case (_, exec) => + val deadline = exec.timeoutAt + if (deadline > now) { +newNextTimeout = math.min(newNextTimeout, deadline) +exec.timedOut = false +false + } else { +exec.timedOut = true + +// An event arriving while this scan is happening may cause the deadline for +// the executor to change after it was read above. Check the deadline again, +// and if it chang
[GitHub] [spark] AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497060571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11179/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497060567 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497061285 **[Test build #105926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105926/testReport)** for PR 24634 at commit [`dcc1bbe`](https://github.com/apache/spark/commit/dcc1bbe93ba00de05a6f2f159a29ed83b98bb3a5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals
SparkQA commented on issue #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals URL: https://github.com/apache/spark/pull/24593#issuecomment-497061299 **[Test build #105927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105927/testReport)** for PR 24593 at commit [`ee5fa4e`](https://github.com/apache/spark/commit/ee5fa4ed49ad26a4f0a84034c14621b266f2a3f1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
AmplabJenkins removed a comment on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497060571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11179/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
AmplabJenkins removed a comment on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497060567 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec
SparkQA commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec URL: https://github.com/apache/spark/pull/24603#issuecomment-497061335 **[Test build #105919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105919/testReport)** for PR 24603 at commit [`f230282`](https://github.com/apache/spark/commit/f230282fae680e11e876518eeb91db90b15f2aed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec
SparkQA removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec URL: https://github.com/apache/spark/pull/24603#issuecomment-496990121 **[Test build #105919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105919/testReport)** for PR 24603 at commit [`f230282`](https://github.com/apache/spark/commit/f230282fae680e11e876518eeb91db90b15f2aed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec
AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec URL: https://github.com/apache/spark/pull/24603#issuecomment-497062085 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec
AmplabJenkins commented on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec URL: https://github.com/apache/spark/pull/24603#issuecomment-497062089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105919/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec
AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec URL: https://github.com/apache/spark/pull/24603#issuecomment-497062089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105919/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec
AmplabJenkins removed a comment on issue #24603: [SPARK-27706][SQL][WEBUI] Add SQL metrics of numOutputRows for BroadcastExchangeExec URL: https://github.com/apache/spark/pull/24603#issuecomment-497062085 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on issue #24738: [WIP][SPARK-23098][SQL] Migrate Kafka Batch source to v2.
gaborgsomogyi commented on issue #24738: [WIP][SPARK-23098][SQL] Migrate Kafka Batch source to v2. URL: https://github.com/apache/spark/pull/24738#issuecomment-497063432 @cloud-fan @gatorsmile you're driving the DSv2 effort and having lot of experience, so I would like to ask whether some parts are missing from the framework in the sink side since the following exception is coming: ``` sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: Cannot write incompatible data to table 'KafkaTable': - Cannot find data for output column 'key' - Cannot safely cast 'value': StringType to BinaryType - Cannot find data for output column 'partition' - Cannot find data for output column 'offset' - Cannot find data for output column 'timestamp' - Cannot find data for output column 'timestampType'; at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.resolveOutputColumns(Analyzer.scala:2353) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$26.applyOrElse(Analyzer.scala:2283) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$$anonfun$apply$26.applyOrElse(Analyzer.scala:2280) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2280) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation$.apply(Analyzer.scala:2279) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:109) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:89) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:106) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:98) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:98) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:136) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:130) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:96) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:77) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:77) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:114) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:113) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:62) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:60) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:60) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:6
[GitHub] [spark] dongjoon-hyun commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation
dongjoon-hyun commented on issue #24686: [SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation URL: https://github.com/apache/spark/pull/24686#issuecomment-497064751 Thank you for pinging me, @rdblue . Yep. I'll take a look, too~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
SparkQA commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497068566 **[Test build #105926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105926/testReport)** for PR 24634 at commit [`dcc1bbe`](https://github.com/apache/spark/commit/dcc1bbe93ba00de05a6f2f159a29ed83b98bb3a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling
AmplabJenkins commented on issue #24634: [SPARK-27361][YARN][test-hadoop3.2] YARN support for GPU-aware scheduling URL: https://github.com/apache/spark/pull/24634#issuecomment-497068697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105926/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org