[GitHub] AmplabJenkins removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
AmplabJenkins removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445909870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99915/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445909870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99915/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445909862 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
SparkQA removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445815525 **[Test build #99915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99915/testReport)** for PR 23272 at commit [`9d52320`](https://github.com/apache/spark/commit/9d52320e24077a8c94639aad6b21a4af5d3e83d9). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
SparkQA commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445909073 **[Test build #99915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99915/testReport)** for PR 23272 at commit [`9d52320`](https://github.com/apache/spark/commit/9d52320e24077a8c94639aad6b21a4af5d3e83d9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
AmplabJenkins removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445906449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99914/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
AmplabJenkins removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445906442 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445906442 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
AmplabJenkins commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445906449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99914/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
SparkQA removed a comment on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445815520 **[Test build #99914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99914/testReport)** for PR 23272 at commit [`4c621d2`](https://github.com/apache/spark/commit/4c621d2bd36c50a10591d93ccd77bd7c0432a873). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager
SparkQA commented on issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapIterator when locking both BytesToBytesMap.MapIterator and TaskMemoryManager URL: https://github.com/apache/spark/pull/23272#issuecomment-445905638 **[Test build #99914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99914/testReport)** for PR 23272 at commit [`4c621d2`](https://github.com/apache/spark/commit/4c621d2bd36c50a10591d93ccd77bd7c0432a873). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445905073 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445905307 **[Test build #99928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99928/testReport)** for PR 23278 at commit [`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445905081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5934/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445905081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5934/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445905073 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445902818 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] vanzin commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
vanzin commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445903549 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
SparkQA commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445903508 **[Test build #99927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99927/testReport)** for PR 23278 at commit [`f73bc8f`](https://github.com/apache/spark/commit/f73bc8fde7208c6256303c850c49ffbe22feda07). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins removed a comment on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445902681 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445902681 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
AmplabJenkins commented on issue #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278#issuecomment-445902818 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] vanzin commented on a change in pull request #22904: [SPARK-25887][K8S] Configurable K8S context support
vanzin commented on a change in pull request #22904: [SPARK-25887][K8S] Configurable K8S context support URL: https://github.com/apache/spark/pull/22904#discussion_r240308164 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala ## @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) Review comment: > What does client mode mean to you? Client mode means that the driver process / container is not started by Spark. It's started directly by the user. > Also - how should one interpret this paragraph in the docs? I have no idea. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] vanzin commented on a change in pull request #22904: [SPARK-25887][K8S] Configurable K8S context support
vanzin commented on a change in pull request #22904: [SPARK-25887][K8S] Configurable K8S context support URL: https://github.com/apache/spark/pull/22904#discussion_r240308164 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala ## @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) Review comment: > What does client mode mean to you? Client mode means that the driver is not started by Spark. It's started directly by the user. > Also - how should one interpret this paragraph in the docs? I have no idea. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
AmplabJenkins removed a comment on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-445901339 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
AmplabJenkins removed a comment on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-445901351 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5933/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
AmplabJenkins commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-445901351 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5933/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] attilapiros opened a new pull request #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators
attilapiros opened a new pull request #23278: [SPARK-24920][Core] Allow sharing Netty's memory pool allocators URL: https://github.com/apache/spark/pull/23278 ## What changes were proposed in this pull request? Introducing shared polled ByteBuf allocators. This feature can be enabled via the "spark.network.sharedByteBufAllocators" configuration. When it is on then only two pooled ByteBuf allocators are created: - one for transport servers where caching is allowed and - one for transport clients where caching is disabled This way the cache allowance remains as before. Both shareable pools are created with numCores parameter set to 0 (which defaults to the available processors) as conf.serverThreads() and conf.clientThreads() are module dependant and the lazy creation of this allocators would lead to unpredicted behaviour. When "spark.network.sharedByteBufAllocators" is false then a new allocator is created for every transport client and server separately as was before this PR. ## How was this patch tested? Existing unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
AmplabJenkins commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-445901339 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
gatorsmile commented on a change in pull request #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#discussion_r240306597 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -56,6 +56,11 @@ private[spark] class AppStatusStore( store.read(classOf[JobDataWrapper], jobId).info } + def jobWithAssociatedSql(jobId: Int): (v1.JobData, Option[Long]) = { Review comment: Add a function description above this line. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
gatorsmile commented on a change in pull request #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#discussion_r240305948 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -56,6 +56,11 @@ private[spark] class AppStatusStore( store.read(classOf[JobDataWrapper], jobId).info } + def jobWithAssociatedSql(jobId: Int): (v1.JobData, Option[Long]) = { Review comment: Add a function description above this line. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on a change in pull request #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
gatorsmile commented on a change in pull request #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#discussion_r240305749 ## File path: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ## @@ -189,14 +189,19 @@ private[ui] class JobPage(parent: JobsTab, store: AppStatusStore) extends WebUIP require(parameterId != null && parameterId.nonEmpty, "Missing id parameter") val jobId = parameterId.toInt -val jobData = store.asOption(store.job(jobId)).getOrElse { +val (jobData, sqlExecutionId) = store.asOption(store.jobWithAssociatedSql(jobId)).getOrElse { val content = No information to display for job {jobId} return UIUtils.headerSparkPage( request, s"Details for Job $jobId", content, parent) } +val sqlDetailUrl = sqlExecutionId.map { id => Review comment: Add a code comment to explain when `sqlExecutionId ` can be None. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
SparkQA commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-445899666 **[Test build #99926 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99926/testReport)** for PR 23068 at commit [`e7c2ebb`](https://github.com/apache/spark/commit/e7c2ebbda949918034cb9cb92ac6ef30af17d943). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page
gatorsmile commented on issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job page URL: https://github.com/apache/spark/pull/23068#issuecomment-445899184 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #21881: [SPARK-24930][SQL] Improve exception information when using LOAD DATA LOCAL INPATH
AmplabJenkins commented on issue #21881: [SPARK-24930][SQL] Improve exception information when using LOAD DATA LOCAL INPATH URL: https://github.com/apache/spark/pull/21881#issuecomment-445898969 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445898081 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99925/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
SparkQA removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445889277 **[Test build #99925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99925/testReport)** for PR 22273 at commit [`8574291`](https://github.com/apache/spark/commit/8574291a0b84574626ca213bc6f95dc0db73b0ef). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445897956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99924/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445898071 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445897948 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
SparkQA removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445887183 **[Test build #99924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99924/testReport)** for PR 22273 at commit [`8574291`](https://github.com/apache/spark/commit/8574291a0b84574626ca213bc6f95dc0db73b0ef). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445898081 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99925/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445898071 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445897833 **[Test build #99925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99925/testReport)** for PR 22273 at commit [`8574291`](https://github.com/apache/spark/commit/8574291a0b84574626ca213bc6f95dc0db73b0ef). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HaveArrowTests(unittest.TestCase):` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445897956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99924/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445897948 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445897623 **[Test build #99924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99924/testReport)** for PR 22273 at commit [`8574291`](https://github.com/apache/spark/commit/8574291a0b84574626ca213bc6f95dc0db73b0ef). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HaveArrowTests(unittest.TestCase):` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445896751 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445896763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99913/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] rezasafi commented on issue #22612: [SPARK-24958][CORE] Add memory from procfs to executor metrics.
rezasafi commented on issue #22612: [SPARK-24958][CORE] Add memory from procfs to executor metrics. URL: https://github.com/apache/spark/pull/22612#issuecomment-445897006 Thank you very much @squito This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445896763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99913/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445896751 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] squito commented on issue #22612: [SPARK-24958][CORE] Add memory from procfs to executor metrics.
squito commented on issue #22612: [SPARK-24958][CORE] Add memory from procfs to executor metrics. URL: https://github.com/apache/spark/pull/22612#issuecomment-445895902 merged to master, thanks @rezasafi This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
SparkQA removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445815532 **[Test build #99913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99913/testReport)** for PR 23262 at commit [`9758534`](https://github.com/apache/spark/commit/9758534ef28109df25d4ef9155c54f09ac58a45c). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
SparkQA commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445895749 **[Test build #99913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99913/testReport)** for PR 23262 at commit [`9758534`](https://github.com/apache/spark/commit/9758534ef28109df25d4ef9155c54f09ac58a45c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen closed pull request #23048: transform DenseVector x DenseVector sqdist from imperativ to function…
srowen closed pull request #23048: transform DenseVector x DenseVector sqdist from imperativ to function… URL: https://github.com/apache/spark/pull/23048 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala index 6e950f968a65d..42364fe132dd5 100644 --- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala +++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala @@ -370,14 +370,19 @@ object Vectors { case (v1: DenseVector, v2: SparseVector) => squaredDistance = sqdist(v2, v1) - case (DenseVector(vv1), DenseVector(vv2)) => -var kv = 0 + case (DenseVector(vv1), DenseVector(vv2)) => { val sz = vv1.length -while (kv < sz) { - val score = vv1(kv) - vv2(kv) - squaredDistance += score * score - kv += 1 +@annotation.tailrec +def go(d: Double, kv: Int): Double = { + if (kv < sz) { +val score = vv1(kv) - vv2(kv) +go(d + score * score, kv + 1) + } + else d } +go(0D, 0) + } + case _ => throw new IllegalArgumentException("Do not support vector type " + v1.getClass + " and " + v2.getClass) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#issuecomment-445894455 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
hvanhovell commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#discussion_r240300150 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala ## @@ -241,12 +240,12 @@ case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int) /** * Represents a partitioning where rows are split across partitions based on some total ordering of - * the expressions specified in `ordering`. When data is partitioned in this manner the following - * two conditions are guaranteed to hold: - * - All row where the expressions in `ordering` evaluate to the same values will be in the same - *partition. - * - Each partition will have a `min` and `max` row, relative to the given ordering. All rows - *that are in between `min` and `max` in this `ordering` will reside in this partition. + * the expressions specified in `ordering`. When data is partitioned in this manner, it guarantees: + * - Given any 2 adjacent partitions, all the rows of the second partition must be larger than Review comment: Nit don't use bullets if you have only one of them This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
AmplabJenkins removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#issuecomment-445894461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99912/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#issuecomment-445894461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99912/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
AmplabJenkins commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#issuecomment-445894455 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly
AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly URL: https://github.com/apache/spark/pull/23277#issuecomment-445893199 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99922/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
SparkQA removed a comment on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#issuecomment-445815528 **[Test build #99912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99912/testReport)** for PR 23249 at commit [`adfcec4`](https://github.com/apache/spark/commit/adfcec41adbffbef2e33fb85db5ad48eba5f3d71). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly
AmplabJenkins removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly URL: https://github.com/apache/spark/pull/23277#issuecomment-445893192 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
hvanhovell commented on a change in pull request #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#discussion_r240299365 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala ## @@ -118,10 +115,12 @@ case class HashClusteredDistribution( /** * Represents data where tuples have been ordered according to the `ordering` - * [[Expression Expressions]]. This is a strictly stronger guarantee than - * [[ClusteredDistribution]] as an ordering will ensure that tuples that share the - * same value for the ordering expressions are contiguous and will never be split across - * partitions. + * [[Expression Expressions]]. Its requirement is defined as the following: + * - Given any 2 adjacent partitions, all the rows of the second partition must be larger than or + * equal to any row in the first partition, according to the `ordering` expressions. Review comment: Global sort (actually the `RangePartitioner`) currently guarantees that all rows in partition `p + 1` are larger than the rows in partition `p`. I don't think we should relax this, besides collect limit there aren't any use cases I can think of that could work with this relaxed requirement. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen closed pull request #22723: [SPARK-25729][CORE]It is better to replace `minPartitions` with `defaultParallelism` , when `minPartitions` is less than `defaultParallelism`
srowen closed pull request #22723: [SPARK-25729][CORE]It is better to replace `minPartitions` with `defaultParallelism` , when `minPartitions` is less than `defaultParallelism` URL: https://github.com/apache/spark/pull/22723 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala b/core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala index 04c5c4b90e8a1..9400879f27048 100644 --- a/core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala +++ b/core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala @@ -46,13 +46,15 @@ private[spark] class WholeTextFileInputFormat /** * Allow minPartitions set by end-user in order to keep compatibility with old Hadoop API, - * which is set through setMaxSplitSize + * which is set through setMaxSplitSize. But when minPartitions is less than defaultParallelism, + * it is better to replace minPartitions with defaultParallelism, because this can improve + * parallelism. */ - def setMinPartitions(context: JobContext, minPartitions: Int) { + def setMinPartitions(defaultParallelism: Int, context: JobContext, minPartitions: Int) { val files = listStatus(context).asScala val totalLen = files.map(file => if (file.isDirectory) 0L else file.getLen).sum -val maxSplitSize = Math.ceil(totalLen * 1.0 / - (if (minPartitions == 0) 1 else minPartitions)).toLong +val minPartNum = Math.max(defaultParallelism, minPartitions) +val maxSplitSize = Math.ceil(totalLen * 1.0 / minPartNum).toLong // For small files we need to ensure the min split size per node & rack <= maxSplitSize val config = context.getConfiguration diff --git a/core/src/main/scala/org/apache/spark/rdd/WholeTextFileRDD.scala b/core/src/main/scala/org/apache/spark/rdd/WholeTextFileRDD.scala index 9f3d0745c33c9..6377b677ed10c 100644 --- a/core/src/main/scala/org/apache/spark/rdd/WholeTextFileRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/WholeTextFileRDD.scala @@ -30,7 +30,7 @@ import org.apache.spark.input.WholeTextFileInputFormat * An RDD that reads a bunch of text files in, and each text file becomes one record. */ private[spark] class WholeTextFileRDD( -sc : SparkContext, +@transient private val sc: SparkContext, inputFormatClass: Class[_ <: WholeTextFileInputFormat], keyClass: Class[Text], valueClass: Class[Text], @@ -51,7 +51,7 @@ private[spark] class WholeTextFileRDD( case _ => } val jobContext = new JobContextImpl(conf, jobId) -inputFormat.setMinPartitions(jobContext, minPartitions) +inputFormat.setMinPartitions(sc.defaultParallelism, jobContext, minPartitions) val rawSplits = inputFormat.getSplits(jobContext).toArray val result = new Array[Partition](rawSplits.size) for (i <- 0 until rawSplits.size) { diff --git a/core/src/test/scala/org/apache/spark/input/WholeTextFileInputFormatSuite.scala b/core/src/test/scala/org/apache/spark/input/WholeTextFileInputFormatSuite.scala index 817dc082b7d38..531ac936a4d5d 100644 --- a/core/src/test/scala/org/apache/spark/input/WholeTextFileInputFormatSuite.scala +++ b/core/src/test/scala/org/apache/spark/input/WholeTextFileInputFormatSuite.scala @@ -38,7 +38,7 @@ class WholeTextFileInputFormatSuite extends SparkFunSuite with BeforeAndAfterAll override def beforeAll() { super.beforeAll() val conf = new SparkConf() -sc = new SparkContext("local", "test", conf) +sc = new SparkContext("local[2]", "test", conf) } override def afterAll() { @@ -79,6 +79,22 @@ class WholeTextFileInputFormatSuite extends SparkFunSuite with BeforeAndAfterAll Utils.deleteRecursively(dir) } } + + test("Test the number of partitions for WholeTextFileRDD") { +var dir: File = null +try { + dir = Utils.createTempDir() + WholeTextFileInputFormatSuite.files.foreach { case (filename, contents) => +createNativeFile(dir, filename, contents, true) + } + // set `minPartitions = 1` + val rdd = sc.wholeTextFiles(dir.toString, 1) + // The number of partitions is equal to 2, not equal to 1, because the defaultParallelism is 2 + assert(rdd.getNumPartitions === 2) +} finally { + Utils.deleteRecursively(dir) +} + } } /** @@ -88,7 +104,7 @@ object WholeTextFileInputFormatSuite { private val testWords: IndexedSeq[Byte] = "Spark is easy to use.\n".map(_.toByte) private val fileNames = Array("part-0", "part-1", "part-2") - private val fileLengths = Array(10, 100, 1000) + private val fileLengths = Array(10, 100, 100) private val files = fileLengths.zip(fileNames).map { case (upperBound,
[GitHub] srowen commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
srowen commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#discussion_r240298939 ## File path: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ## @@ -88,68 +88,49 @@ sealed trait UserDefinedFunction { private[sql] case class SparkUserDefinedFunction( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]], -nullableTypes: Option[Seq[Boolean]], +inputSchemas: Seq[Option[ScalaReflection.Schema]], name: Option[String] = None, nullable: Boolean = true, deterministic: Boolean = true) extends UserDefinedFunction { @scala.annotation.varargs - override def apply(exprs: Column*): Column = { -// TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()` -// and `nullableTypes` is always set. -if (inputTypes.isDefined) { - assert(inputTypes.get.length == nullableTypes.get.length) -} - -val inputsNullSafe = nullableTypes.getOrElse { - ScalaReflection.getParameterTypeNullability(f) -} + override def apply(cols: Column*): Column = { +Column(createScalaUDF(cols.map(_.expr))) + } -Column(ScalaUDF( + private[sql] def createScalaUDF(exprs: Seq[Expression]): ScalaUDF = { +// It's possible that some of the inputs don't have a specific type(e.g. `Any`), skip type +// check and null check for them. +val inputTypes = inputSchemas.map(_.map(_.dataType).getOrElse(AnyDataType)) +val inputsNullSafe = inputSchemas.map(_.map(_.nullable).getOrElse(true)) Review comment: Ah right. I'm neutral on whether it's clearer than getOrElse; I think we end up using the latter in the code more. I know IJ suggests forall though. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly
AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly URL: https://github.com/apache/spark/pull/23277#issuecomment-445893199 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99922/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning
SparkQA commented on issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Partitioning URL: https://github.com/apache/spark/pull/23249#issuecomment-445893422 **[Test build #99912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99912/testReport)** for PR 23249 at commit [`adfcec4`](https://github.com/apache/spark/commit/adfcec41adbffbef2e33fb85db5ad48eba5f3d71). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly
AmplabJenkins commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly URL: https://github.com/apache/spark/pull/23277#issuecomment-445893192 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly
SparkQA removed a comment on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly URL: https://github.com/apache/spark/pull/23277#issuecomment-445854780 **[Test build #99922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99922/testReport)** for PR 23277 at commit [`0e00aa7`](https://github.com/apache/spark/commit/0e00aa7a219805f3d14ca4d222df4a922a34d825). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly
SparkQA commented on issue #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec not update correctly URL: https://github.com/apache/spark/pull/23277#issuecomment-445892950 **[Test build #99922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99922/testReport)** for PR 23277 at commit [`0e00aa7`](https://github.com/apache/spark/commit/0e00aa7a219805f3d14ca4d222df4a922a34d825). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445891136 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445891136 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445891152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99910/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
AmplabJenkins commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445891152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99910/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
SparkQA commented on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445890454 **[Test build #99910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99910/testReport)** for PR 23262 at commit [`56cf4e5`](https://github.com/apache/spark/commit/56cf4e5f079c6ddd36de197eecc3b51393a5859b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance
SparkQA removed a comment on issue #23262: [SPARK-26312][SQL]Replace RDDConversions.rowToRowRdd with RowEncoder to improve its conversion performance URL: https://github.com/apache/spark/pull/23262#issuecomment-445815524 **[Test build #99910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99910/testReport)** for PR 23262 at commit [`56cf4e5`](https://github.com/apache/spark/commit/56cf4e5f079c6ddd36de197eecc3b51393a5859b). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445889234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5932/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins removed a comment on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445889224 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445889277 **[Test build #99925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99925/testReport)** for PR 22273 at commit [`8574291`](https://github.com/apache/spark/commit/8574291a0b84574626ca213bc6f95dc0db73b0ef). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445889234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5932/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
AmplabJenkins commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445889224 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
SparkQA commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445887183 **[Test build #99924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99924/testReport)** for PR 22273 at commit [`8574291`](https://github.com/apache/spark/commit/8574291a0b84574626ca213bc6f95dc0db73b0ef). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] BryanCutler commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
BryanCutler commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445887027 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] BryanCutler commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run
BryanCutler commented on issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate pyarrow is installed and related tests will run URL: https://github.com/apache/spark/pull/22273#issuecomment-445886897 Hey @holdenk , yeah we could do that but I'm ok with the current way which is to only print if the tests are skipped and assume they ran otherwise. I just wanted to make sure that the weird behavior I saw earlier wasn't happening anymore. Looks good so far, but I was trying to hit one more worker to check, then I'll close this out. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445882601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99920/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445882594 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445882601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99920/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445882594 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
SparkQA removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445840083 **[Test build #99920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99920/testReport)** for PR 23275 at commit [`92466d4`](https://github.com/apache/spark/commit/92466d486734f3904be31e45b85e49654eb39255). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445882225 **[Test build #99920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99920/testReport)** for PR 23275 at commit [`92466d4`](https://github.com/apache/spark/commit/92466d486734f3904be31e45b85e49654eb39255). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445878940 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99919/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
SparkQA removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445836177 **[Test build #99919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99919/testReport)** for PR 23275 at commit [`8582607`](https://github.com/apache/spark/commit/8582607195f12a4c133fb28b59e8a7fce7a97fbb). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins removed a comment on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445878926 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445878926 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
AmplabJenkins commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445878940 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99919/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
SparkQA commented on issue #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#issuecomment-445878673 **[Test build #99919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99919/testReport)** for PR 23275 at commit [`8582607`](https://github.com/apache/spark/commit/8582607195f12a4c133fb28b59e8a7fce7a97fbb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] mgaido91 commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
mgaido91 commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#discussion_r240278143 ## File path: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ## @@ -88,68 +88,49 @@ sealed trait UserDefinedFunction { private[sql] case class SparkUserDefinedFunction( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]], -nullableTypes: Option[Seq[Boolean]], +inputSchemas: Seq[Option[ScalaReflection.Schema]], name: Option[String] = None, nullable: Boolean = true, deterministic: Boolean = true) extends UserDefinedFunction { @scala.annotation.varargs - override def apply(exprs: Column*): Column = { -// TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()` -// and `nullableTypes` is always set. -if (inputTypes.isDefined) { - assert(inputTypes.get.length == nullableTypes.get.length) -} - -val inputsNullSafe = nullableTypes.getOrElse { - ScalaReflection.getParameterTypeNullability(f) -} + override def apply(cols: Column*): Column = { +Column(createScalaUDF(cols.map(_.expr))) + } -Column(ScalaUDF( + private[sql] def createScalaUDF(exprs: Seq[Expression]): ScalaUDF = { +// It's possible that some of the inputs don't have a specific type(e.g. `Any`), skip type +// check and null check for them. +val inputTypes = inputSchemas.map(_.map(_.dataType).getOrElse(AnyDataType)) +val inputsNullSafe = inputSchemas.map(_.map(_.nullable).getOrElse(true)) Review comment: I mean `inputSchemas.map(_.forall(_.nullable))` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any
srowen commented on a change in pull request #23275: [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any URL: https://github.com/apache/spark/pull/23275#discussion_r240275041 ## File path: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ## @@ -88,68 +88,49 @@ sealed trait UserDefinedFunction { private[sql] case class SparkUserDefinedFunction( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]], -nullableTypes: Option[Seq[Boolean]], +inputSchemas: Seq[Option[ScalaReflection.Schema]], name: Option[String] = None, nullable: Boolean = true, deterministic: Boolean = true) extends UserDefinedFunction { @scala.annotation.varargs - override def apply(exprs: Column*): Column = { -// TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()` -// and `nullableTypes` is always set. -if (inputTypes.isDefined) { - assert(inputTypes.get.length == nullableTypes.get.length) -} - -val inputsNullSafe = nullableTypes.getOrElse { - ScalaReflection.getParameterTypeNullability(f) -} + override def apply(cols: Column*): Column = { +Column(createScalaUDF(cols.map(_.expr))) + } -Column(ScalaUDF( + private[sql] def createScalaUDF(exprs: Seq[Expression]): ScalaUDF = { +// It's possible that some of the inputs don't have a specific type(e.g. `Any`), skip type +// check and null check for them. +val inputTypes = inputSchemas.map(_.map(_.dataType).getOrElse(AnyDataType)) +val inputsNullSafe = inputSchemas.map(_.map(_.nullable).getOrElse(true)) Review comment: I'm missing it, how could you write this more simply with `forall` to get from `Option[Schema]` to `Boolean`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org