[GitHub] [spark] maropu opened a new pull request #24200: [SPARK-27266][SQL] Support ANALYZE TABLE to collect tables stats for cached catalog views
maropu opened a new pull request #24200: [SPARK-27266][SQL] Support ANALYZE TABLE to collect tables stats for cached catalog views URL: https://github.com/apache/spark/pull/24200 ## What changes were proposed in this pull request? The current master doesn't support ANALYZE TABLE to collect tables stats for catalog views even if they are cached as follows; ``` scala> sql(s"CREATE VIEW v AS SELECT 1 c") scala> sql(s"CACHE LAZY TABLE v") scala> sql(s"ANALYZE TABLE v COMPUTE STATISTICS") org.apache.spark.sql.AnalysisException: ANALYZE TABLE is not supported on views.; ... ``` Since SPARK-25196 has supported to an ANALYZE command to collect column statistics for cached catalog view, we could support table stats, too. ## How was this patch tested? Added tests in `StatisticsCollectionSuite` and `InMemoryColumnarQuerySuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] hehuiyuan edited a comment on issue #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number
hehuiyuan edited a comment on issue #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number URL: https://github.com/apache/spark/pull/24188#issuecomment-476063794 I think we can only update the resourceNamePrefix of service, which may be better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] hehuiyuan commented on issue #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number
hehuiyuan commented on issue #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number URL: https://github.com/apache/spark/pull/24188#issuecomment-476063794 I think we can only update the resourceNamePrefix of service This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #23783: [SPARK-26854][SQL] Support ANY/SOME subquery
dongjoon-hyun commented on a change in pull request #23783: [SPARK-26854][SQL] Support ANY/SOME subquery URL: https://github.com/apache/spark/pull/23783#discussion_r268494095 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -576,6 +576,7 @@ predicate : NOT? kind=BETWEEN lower=valueExpression AND upper=valueExpression | NOT? kind=IN '(' expression (',' expression)* ')' | NOT? kind=IN '(' query ')' +| comparisonOperator kind=(ANY | SOME) '(' query ')' Review comment: Hi, @francis0407 . I have some questions beyond of the current implementation of this PR. Currently, this PR aims to add `ANY/SOME` over a query. 1. Are we going to extend this to 'ALL' later? e.g. `SELECT 1 > ALL (SELECT 0)` (PostgreSQL example) 2. Are we going to support `array`, too? e.g. `SELECT 1 > ANY (ARRAY[0,1])` (PostgreSQL example) Although I used PostgreSQL example, `ALL` and array are supported in Oracle, too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #23783: [SPARK-26854][SQL] Support ANY/SOME subquery
dongjoon-hyun commented on a change in pull request #23783: [SPARK-26854][SQL] Support ANY/SOME subquery URL: https://github.com/apache/spark/pull/23783#discussion_r268494095 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -576,6 +576,7 @@ predicate : NOT? kind=BETWEEN lower=valueExpression AND upper=valueExpression | NOT? kind=IN '(' expression (',' expression)* ')' | NOT? kind=IN '(' query ')' +| comparisonOperator kind=(ANY | SOME) '(' query ')' Review comment: Hi, @francis0407 . I have some questions beyond of the current implementation of this PR. Currently, this PR aims to add `ANY/SOME` over a query. 1. Are we going to extend this to 'ALL' later? e.g. `SELECT 1 > ALL (SELECT 0)` (PostgreSQL example) 2. Are we going to support `array`, too? e.g. `SELECT 1 > ANY (ARRAY[0,1])` (PostgreSQL example) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476060943 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476060945 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9257/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476060943 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476060945 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9257/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
SparkQA commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476059772 **[Test build #103894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103894/testReport)** for PR 24187 at commit [`ef67451`](https://github.com/apache/spark/commit/ef67451a1aeacc1004e140d62ca99868a146f001). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] 10110346 commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
10110346 commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476059626 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476057938 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] hehuiyuan commented on a change in pull request #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number
hehuiyuan commented on a change in pull request #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number URL: https://github.com/apache/spark/pull/24188#discussion_r268491200 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -78,7 +78,10 @@ private[spark] class KubernetesDriverConf( override val resourceNamePrefix: String = { val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None -custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName)) +val resourceNamePrefix = custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName)) +// If the first character of resourceNamePrefix is number,add the extra prefix : "spark-". +val prefix = "spark-" + resourceNamePrefix.charAt(0) +resourceNamePrefix.replaceAll("^[0-9]", prefix) Review comment: 1. For example: resourceNamePrefix = 1min-xxx After execute this code: `val prefix = "spark-" + resourceNamePrefix.charAt(0) resourceNamePrefix.replaceAll("^[0-9]", prefix)` resourceNamePrefix = spark-1min-xxx,which satisfies the regular expression `[a-z]([-a-z0-9]*[a-z0-9])?` 2. For example : resourceNamePrefix = min-xxx After execute this code: `val prefix = "spark-" + resourceNamePrefix.charAt(0) resourceNamePrefix.replaceAll("^[0-9]", prefix)` resourceNamePrefix = min-xxx,which does not change. Local testing was done for this`override val resourceNamePrefix` only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476057941 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103892/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476057941 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103892/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476057938 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
SparkQA removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476052906 **[Test build #103892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103892/testReport)** for PR 24177 at commit [`4309d46`](https://github.com/apache/spark/commit/4309d46b341eb7b552e9bb7e9491974be78eb9c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] hehuiyuan commented on a change in pull request #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number
hehuiyuan commented on a change in pull request #24188: [SPARK-27258][K8S]Add the prefix 'spark-' for resourceNamePrefix that starts with number URL: https://github.com/apache/spark/pull/24188#discussion_r268491200 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ## @@ -78,7 +78,10 @@ private[spark] class KubernetesDriverConf( override val resourceNamePrefix: String = { val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None -custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName)) +val resourceNamePrefix = custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName)) +// If the first character of resourceNamePrefix is number,add the extra prefix : "spark-". +val prefix = "spark-" + resourceNamePrefix.charAt(0) +resourceNamePrefix.replaceAll("^[0-9]", prefix) Review comment: 1. For example: resourceNamePrefix = 1min-xxx After execute this code: `val prefix = "spark-" + resourceNamePrefix.charAt(0) resourceNamePrefix.replaceAll("^[0-9]", prefix)` resourceNamePrefix = spark-1min-xxx,which satisfies the regular expression `[a-z]([-a-z0-9]*[a-z0-9])?` 2. For example : resourceNamePrefix = min-xxx After execute this code: `val prefix = "spark-" + resourceNamePrefix.charAt(0) resourceNamePrefix.replaceAll("^[0-9]", prefix)` resourceNamePrefix = min-xxx,which does not change. Local testing was done for this method only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
SparkQA commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476057757 **[Test build #103892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103892/testReport)** for PR 24177 at commit [`4309d46`](https://github.com/apache/spark/commit/4309d46b341eb7b552e9bb7e9491974be78eb9c4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476056020 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103891/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476056018 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
SparkQA removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476049672 **[Test build #103891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103891/testReport)** for PR 24197 at commit [`d050f32`](https://github.com/apache/spark/commit/d050f32eb0a4d4217d4bca20ea3b15d74a226383). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476056020 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103891/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476056018 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
SparkQA commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476055918 **[Test build #103891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103891/testReport)** for PR 24197 at commit [`d050f32`](https://github.com/apache/spark/commit/d050f32eb0a4d4217d4bca20ea3b15d74a226383). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RegressionEvaluator(JavaEvaluator, HasLabelCol, HasPredictionCol, HasWeightCol,` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476055107 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103881/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24191: [SPARK-27261][DOC] Improve app submission doc for passing multiple configs
dongjoon-hyun closed pull request #24191: [SPARK-27261][DOC] Improve app submission doc for passing multiple configs URL: https://github.com/apache/spark/pull/24191 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476055101 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476055101 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
AmplabJenkins commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476055107 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103881/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476054843 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476054852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9256/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
SparkQA removed a comment on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476021759 **[Test build #103881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103881/testReport)** for PR 24187 at commit [`ef67451`](https://github.com/apache/spark/commit/ef67451a1aeacc1004e140d62ca99868a146f001). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476054843 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
SparkQA commented on issue #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#issuecomment-476054929 **[Test build #103881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103881/testReport)** for PR 24187 at commit [`ef67451`](https://github.com/apache/spark/commit/ef67451a1aeacc1004e140d62ca99868a146f001). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476054852 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9256/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path.
dongjoon-hyun closed pull request #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path. URL: https://github.com/apache/spark/pull/24160 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path.
AmplabJenkins removed a comment on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path. URL: https://github.com/apache/spark/pull/24160#issuecomment-476054059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103879/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path.
AmplabJenkins commented on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path. URL: https://github.com/apache/spark/pull/24160#issuecomment-476054057 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path.
AmplabJenkins removed a comment on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path. URL: https://github.com/apache/spark/pull/24160#issuecomment-476054057 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path.
AmplabJenkins commented on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path. URL: https://github.com/apache/spark/pull/24160#issuecomment-476054059 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103879/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
SparkQA commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476053913 **[Test build #103893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103893/testReport)** for PR 24197 at commit [`f659fb2`](https://github.com/apache/spark/commit/f659fb2ade7124ea80d620d42b446aae7d999d24). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path.
SparkQA removed a comment on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path. URL: https://github.com/apache/spark/pull/24160#issuecomment-476014878 **[Test build #103879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103879/testReport)** for PR 24160 at commit [`d6dfa8c`](https://github.com/apache/spark/commit/d6dfa8c63f36d9c51431439332688e8eeffd25c5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path.
SparkQA commented on issue #24160: [SPARK-27219][core] Treat timeouts as fatal in SASL fallback path. URL: https://github.com/apache/spark/pull/24160#issuecomment-476053729 **[Test build #103879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103879/testReport)** for PR 24160 at commit [`d6dfa8c`](https://github.com/apache/spark/commit/d6dfa8c63f36d9c51431439332688e8eeffd25c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
ueshin commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476053576 @BryanCutler I'm sorry, but I couldn't figure out what you meant. So, do you want to use multiple "flattened" arguments instead of a single DataFrame in Grouped Map Pandas UDFs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
SparkQA commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476052906 **[Test build #103892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103892/testReport)** for PR 24177 at commit [`4309d46`](https://github.com/apache/spark/commit/4309d46b341eb7b552e9bb7e9491974be78eb9c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476052747 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9255/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476052745 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins removed a comment on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476052747 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9255/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
AmplabJenkins commented on issue #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#issuecomment-476052745 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
ueshin commented on a change in pull request #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#discussion_r268486139 ## File path: python/pyspark/serializers.py ## @@ -378,6 +379,29 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by Python worker to evaluate Pandas UDFs """ +def __init__(self, timezone, safecheck, assign_cols_by_name, df_for_struct=False): +super(ArrowStreamPandasUDFSerializer, self) \ +.__init__(timezone, safecheck, assign_cols_by_name) +self._df_for_struct = df_for_struct + +def arrow_to_pandas(self, arrow_column, data_type): +from pyspark.sql.types import StructType, \ +_arrow_column_to_pandas, _check_dataframe_localize_timestamps + +if self._df_for_struct and type(data_type) == StructType: +import pandas as pd +import pyarrow as pa +column_arrays = zip(*[[chunk.field(i) + for i in range(chunk.type.num_children)] + for chunk in arrow_column.data.iterchunks()]) Review comment: `arrow_column.flatten()` is great! Then we can support pyarrow>=0.10. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.
ueshin commented on a change in pull request #24177: [SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF. URL: https://github.com/apache/spark/pull/24177#discussion_r268486150 ## File path: python/pyspark/serializers.py ## @@ -378,6 +379,29 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by Python worker to evaluate Pandas UDFs """ +def __init__(self, timezone, safecheck, assign_cols_by_name, df_for_struct=False): +super(ArrowStreamPandasUDFSerializer, self) \ +.__init__(timezone, safecheck, assign_cols_by_name) +self._df_for_struct = df_for_struct + +def arrow_to_pandas(self, arrow_column, data_type): +from pyspark.sql.types import StructType, \ +_arrow_column_to_pandas, _check_dataframe_localize_timestamps + +if self._df_for_struct and type(data_type) == StructType: Review comment: I don't think so. We can't construct pandas DataFrame with a nested DataFrame. I might miss what you mean? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24195: [SPARK-25496][SQL] Deprecate from_utc_timestamp and to_utc_timestamp
cloud-fan commented on issue #24195: [SPARK-25496][SQL] Deprecate from_utc_timestamp and to_utc_timestamp URL: https://github.com/apache/spark/pull/24195#issuecomment-476051715 AFAIK hive's timestamp has different semantic from Spark's timestamp, I think it's more than a naming issue. It's arguable if we should deprecate these two functions, but we definitely should not ban them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #23823: [SPARK-27262][R] Add explicit UTF-8 Encoding to DESCRIPTION
dongjoon-hyun commented on issue #23823: [SPARK-27262][R] Add explicit UTF-8 Encoding to DESCRIPTION URL: https://github.com/apache/spark/pull/23823#issuecomment-476051378 I see. Thank you for reverting, @HyukjinKwon . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name
srowen commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name URL: https://github.com/apache/spark/pull/24199#issuecomment-476050799 I think these weren't added just because of the number of additional methods vs value add over `getAs[Type](String)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name
AmplabJenkins removed a comment on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name URL: https://github.com/apache/spark/pull/24199#issuecomment-476049308 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name
AmplabJenkins commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name URL: https://github.com/apache/spark/pull/24199#issuecomment-476049636 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
SparkQA commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476049672 **[Test build #103891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103891/testReport)** for PR 24197 at commit [`d050f32`](https://github.com/apache/spark/commit/d050f32eb0a4d4217d4bca20ea3b15d74a226383). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476049425 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9254/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476049425 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9254/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins removed a comment on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476049422 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics
AmplabJenkins commented on issue #24197: [SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics URL: https://github.com/apache/spark/pull/24197#issuecomment-476049422 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name
AmplabJenkins commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name URL: https://github.com/apache/spark/pull/24199#issuecomment-476049308 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name
AmplabJenkins removed a comment on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name URL: https://github.com/apache/spark/pull/24199#issuecomment-476049244 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name
AmplabJenkins commented on issue #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name URL: https://github.com/apache/spark/pull/24199#issuecomment-476049244 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] reynoldsm88 opened a new pull request #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name
reynoldsm88 opened a new pull request #24199: [SPARK-27265][SQL] - convenience methods for accessing values by column name URL: https://github.com/apache/spark/pull/24199 ## What changes were proposed in this pull request? Add convenience methods that make accessing values by column name similar to how they are accessed by column index ## How was this patch tested? Relevant unit tests were updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.
srowen commented on a change in pull request #24187: [SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. URL: https://github.com/apache/spark/pull/24187#discussion_r268483319 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -579,15 +579,15 @@ package object config { private[spark] val FILES_MAX_PARTITION_BYTES = ConfigBuilder("spark.files.maxPartitionBytes") .doc("The maximum number of bytes to pack into a single partition when reading files.") -.longConf +.bytesConf(ByteUnit.BYTE) Review comment: OK, I'm fine with it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
srowen commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#issuecomment-476047682 I would not change this; it's not necessarily better to totally hide the table This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24080: [SPARK-27147][TEST]Create new unit test cases for SortShuffleWriter
srowen commented on a change in pull request #24080: [SPARK-27147][TEST]Create new unit test cases for SortShuffleWriter URL: https://github.com/apache/spark/pull/24080#discussion_r268483014 ## File path: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleWriterSuite.scala ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle.sort + +import java.io.File + +import org.mockito.Mockito._ +import org.mockito.MockitoAnnotations +import org.scalatest.BeforeAndAfterEach +import org.apache.spark.{ShuffleDependency, _} +import org.apache.spark.memory.MemoryTestingUtils +import org.apache.spark.serializer.JavaSerializer +import org.apache.spark.shuffle.{BaseShuffleHandle, IndexShuffleBlockResolver} +import org.apache.spark.util.Utils + +class SortShuffleWriterSuite extends SparkFunSuite with BeforeAndAfterEach { + + private val shuffleId = 0 + private val numMaps = 5 + private var outputFile: File = _ + private val conf: SparkConf = new SparkConf(loadDefaults = false) + private val sc = new SparkContext("local-cluster[1,1,1024]", "test", conf) + private var shuffleHandle: BaseShuffleHandle[Int, Int, Int] = _ + private val shuffleBlockResolver = new IndexShuffleBlockResolver(conf) + private val serializer = new JavaSerializer(conf) + private val context = MemoryTestingUtils.fakeTaskContext(sc.env) + + override def beforeEach(): Unit = { +super.beforeEach() +MockitoAnnotations.initMocks(this) +val partitioner = new Partitioner() { + def numPartitions = numMaps + def getPartition(key: Any) = Utils.nonNegativeMod(key.hashCode, numPartitions) +} +shuffleHandle = { + val dependency = mock(classOf[ShuffleDependency[Int, Int, Int]]) + when(dependency.partitioner).thenReturn(partitioner) + when(dependency.serializer).thenReturn(serializer) + when(dependency.aggregator).thenReturn(None) + when(dependency.keyOrdering).thenReturn(None) + new BaseShuffleHandle(shuffleId, numMaps = numMaps, dependency) +} + } + + override def afterEach(): Unit = { +try { + if (outputFile != null) { +Utils.deleteRecursively(outputFile) + } +} finally { + super.afterEach() +} + } + + test("write empty iterator") { +val writer = new SortShuffleWriter[Int, Int, Int]( + shuffleBlockResolver, + shuffleHandle, + mapId = 1, // MapId + context +) +writer.write(Iterator.empty) +writer.stop( /* success = */ true) +val dataFile = shuffleBlockResolver.getDataFile(shuffleId, 1) +assert(!dataFile.exists()) +assert(dataFile.length() == 0) +assert(context.taskMetrics().shuffleWriteMetrics.bytesWritten == 0) +assert(context.taskMetrics().shuffleWriteMetrics.recordsWritten == 0) + + } + + test("write with some records") { +val records = Iterator((1, 2), (2, 3), (4, 4), (6, 5)) +val writer = new SortShuffleWriter[Int, Int, Int]( + shuffleBlockResolver, + shuffleHandle, + mapId = 2, // MapId + context +) +writer.write(records) +writer.stop( /* success = */ true) +val dataFile = shuffleBlockResolver.getDataFile(shuffleId, 2) +assert(dataFile.exists()) +assert(dataFile.length() != 0) +assert(context.taskMetrics().shuffleWriteMetrics.bytesWritten != 0) Review comment: Also, use `===` or `!==` from scalatest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24080: [SPARK-27147][TEST]Create new unit test cases for SortShuffleWriter
srowen commented on a change in pull request #24080: [SPARK-27147][TEST]Create new unit test cases for SortShuffleWriter URL: https://github.com/apache/spark/pull/24080#discussion_r268483062 ## File path: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleWriterSuite.scala ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle.sort + +import java.io.File + +import org.mockito.Mockito._ +import org.mockito.MockitoAnnotations +import org.scalatest.BeforeAndAfterEach +import org.apache.spark.{ShuffleDependency, _} +import org.apache.spark.memory.MemoryTestingUtils +import org.apache.spark.serializer.JavaSerializer +import org.apache.spark.shuffle.{BaseShuffleHandle, IndexShuffleBlockResolver} +import org.apache.spark.util.Utils + +class SortShuffleWriterSuite extends SparkFunSuite with BeforeAndAfterEach { + + private val shuffleId = 0 + private val numMaps = 5 + private var outputFile: File = _ + private val conf: SparkConf = new SparkConf(loadDefaults = false) + private val sc = new SparkContext("local-cluster[1,1,1024]", "test", conf) + private var shuffleHandle: BaseShuffleHandle[Int, Int, Int] = _ + private val shuffleBlockResolver = new IndexShuffleBlockResolver(conf) + private val serializer = new JavaSerializer(conf) + private val context = MemoryTestingUtils.fakeTaskContext(sc.env) + + override def beforeEach(): Unit = { +super.beforeEach() +MockitoAnnotations.initMocks(this) +val partitioner = new Partitioner() { + def numPartitions = numMaps + def getPartition(key: Any) = Utils.nonNegativeMod(key.hashCode, numPartitions) +} +shuffleHandle = { + val dependency = mock(classOf[ShuffleDependency[Int, Int, Int]]) + when(dependency.partitioner).thenReturn(partitioner) + when(dependency.serializer).thenReturn(serializer) + when(dependency.aggregator).thenReturn(None) + when(dependency.keyOrdering).thenReturn(None) + new BaseShuffleHandle(shuffleId, numMaps = numMaps, dependency) +} + } + + override def afterEach(): Unit = { +try { + if (outputFile != null) { +Utils.deleteRecursively(outputFile) Review comment: We have test support code for managing a SparkContext and temp files; you can use those rather than recreate it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24086: [SPARK-27155][TEST] Parameterize Oracle docker image name
SparkQA commented on issue #24086: [SPARK-27155][TEST] Parameterize Oracle docker image name URL: https://github.com/apache/spark/pull/24086#issuecomment-476047273 **[Test build #4660 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4660/testReport)** for PR 24086 at commit [`e7bc15d`](https://github.com/apache/spark/commit/e7bc15d96faa033b3f0aa9a7ff9606ce0cbba2cf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476047143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103878/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476047139 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476047143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103878/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476047139 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
SparkQA removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476012307 **[Test build #103878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103878/testReport)** for PR 23599 at commit [`119f5a3`](https://github.com/apache/spark/commit/119f5a360b6064c4f033c9a34f0a5b4efcbe83a0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
SparkQA commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476046866 **[Test build #103878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103878/testReport)** for PR 23599 at commit [`119f5a3`](https://github.com/apache/spark/commit/119f5a360b6064c4f033c9a34f0a5b4efcbe83a0). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CommandLineLoggingUtils ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #24080: [SPARK-27147][TEST]Create new unit test cases for SortShuffleWriter
kiszk commented on a change in pull request #24080: [SPARK-27147][TEST]Create new unit test cases for SortShuffleWriter URL: https://github.com/apache/spark/pull/24080#discussion_r268482573 ## File path: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleWriterSuite.scala ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle.sort + +import java.io.File + +import org.mockito.Mockito._ +import org.mockito.MockitoAnnotations +import org.scalatest.BeforeAndAfterEach +import org.apache.spark.{ShuffleDependency, _} +import org.apache.spark.memory.MemoryTestingUtils +import org.apache.spark.serializer.JavaSerializer +import org.apache.spark.shuffle.{BaseShuffleHandle, IndexShuffleBlockResolver} +import org.apache.spark.util.Utils + +class SortShuffleWriterSuite extends SparkFunSuite with BeforeAndAfterEach { + + private val shuffleId = 0 + private val numMaps = 5 + private var outputFile: File = _ + private val conf: SparkConf = new SparkConf(loadDefaults = false) + private val sc = new SparkContext("local-cluster[1,1,1024]", "test", conf) + private var shuffleHandle: BaseShuffleHandle[Int, Int, Int] = _ + private val shuffleBlockResolver = new IndexShuffleBlockResolver(conf) + private val serializer = new JavaSerializer(conf) + private val context = MemoryTestingUtils.fakeTaskContext(sc.env) + + override def beforeEach(): Unit = { +super.beforeEach() +MockitoAnnotations.initMocks(this) +val partitioner = new Partitioner() { + def numPartitions = numMaps + def getPartition(key: Any) = Utils.nonNegativeMod(key.hashCode, numPartitions) +} +shuffleHandle = { + val dependency = mock(classOf[ShuffleDependency[Int, Int, Int]]) + when(dependency.partitioner).thenReturn(partitioner) + when(dependency.serializer).thenReturn(serializer) + when(dependency.aggregator).thenReturn(None) + when(dependency.keyOrdering).thenReturn(None) + new BaseShuffleHandle(shuffleId, numMaps = numMaps, dependency) +} + } + + override def afterEach(): Unit = { +try { + if (outputFile != null) { +Utils.deleteRecursively(outputFile) + } +} finally { + super.afterEach() +} + } + + test("write empty iterator") { +val writer = new SortShuffleWriter[Int, Int, Int]( + shuffleBlockResolver, + shuffleHandle, + mapId = 1, // MapId + context +) +writer.write(Iterator.empty) +writer.stop( /* success = */ true) +val dataFile = shuffleBlockResolver.getDataFile(shuffleId, 1) +assert(!dataFile.exists()) +assert(dataFile.length() == 0) +assert(context.taskMetrics().shuffleWriteMetrics.bytesWritten == 0) +assert(context.taskMetrics().shuffleWriteMetrics.recordsWritten == 0) + + } + + test("write with some records") { +val records = Iterator((1, 2), (2, 3), (4, 4), (6, 5)) +val writer = new SortShuffleWriter[Int, Int, Int]( + shuffleBlockResolver, + shuffleHandle, + mapId = 2, // MapId + context +) +writer.write(records) +writer.stop( /* success = */ true) +val dataFile = shuffleBlockResolver.getDataFile(shuffleId, 2) +assert(dataFile.exists()) +assert(dataFile.length() != 0) Review comment: Can we compare the certain expected value instead of `!= 0` in these three lines? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shivusondur commented on a change in pull request #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
shivusondur commented on a change in pull request #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#discussion_r268482018 ## File path: core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala ## @@ -83,6 +84,7 @@ private[ui] class EnvironmentPage( {hadoopPropertiesTable} + }} Review comment: @dongjoon-hyun Corrected the Identation, removed extra space This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476046282 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476046282 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shivusondur commented on a change in pull request #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
shivusondur commented on a change in pull request #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#discussion_r268482018 ## File path: core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala ## @@ -83,6 +84,7 @@ private[ui] class EnvironmentPage( {hadoopPropertiesTable} + }} Review comment: @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476046286 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103877/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] DaveDeCaprio commented on a change in pull request #24028: [SPARK-26917][SQL] Further reduce locks in CacheManager
DaveDeCaprio commented on a change in pull request #24028: [SPARK-26917][SQL] Further reduce locks in CacheManager URL: https://github.com/apache/spark/pull/24028#discussion_r268482158 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ## @@ -194,46 +167,36 @@ class CacheManager extends Logging { private def recacheByCondition( spark: SparkSession, condition: CachedData => Boolean): Unit = { -val needToRecache = scala.collection.mutable.ArrayBuffer.empty[CachedData] -readLock { - val it = cachedData.iterator() - while (it.hasNext) { -val cd = it.next() -if (condition(cd)) { - needToRecache += cd -} - } +val needToRecache = cachedData.filter(condition) +this.synchronized { + // Remove the cache entry before creating a new ones. + cachedData = cachedData.filterNot(cd => needToRecache.exists(_ eq cd)) Review comment: The write itself is atomic, but you need the this.synchronized to ensure that the value you are writing is using the most up to date value of cachedData. The value is read as part of cachedData.filterNot, and you need to make sure cachedData doesn't change between the time when it is read and when it is written. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
AmplabJenkins commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476046286 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103877/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
SparkQA removed a comment on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476011536 **[Test build #103877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103877/testReport)** for PR 23599 at commit [`dc50774`](https://github.com/apache/spark/commit/dc507741258bbc4de4092932b67eb422f4a1d982). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management
SparkQA commented on issue #23599: [SPARK-24793][K8s] Enhance spark-submit for app management URL: https://github.com/apache/spark/pull/23599#issuecomment-476046015 **[Test build #103877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103877/testReport)** for PR 23599 at commit [`dc50774`](https://github.com/apache/spark/commit/dc507741258bbc4de4092932b67eb422f4a1d982). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CommandLineLoggingUtils ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite
AmplabJenkins removed a comment on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite URL: https://github.com/apache/spark/pull/24198#issuecomment-476045376 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9252/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite
AmplabJenkins removed a comment on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite URL: https://github.com/apache/spark/pull/24198#issuecomment-476045373 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] qiuchenjian commented on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions
qiuchenjian commented on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions URL: https://github.com/apache/spark/pull/20965#issuecomment-476045557 @mgaido91 Hi,any updates about this issue that the decrease performance of multiply columns with wholecodegen? We found that this issue is not solved in spark 2.4 perfectly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite
SparkQA commented on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite URL: https://github.com/apache/spark/pull/24198#issuecomment-476045568 **[Test build #103889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103889/testReport)** for PR 24198 at commit [`c3150c6`](https://github.com/apache/spark/commit/c3150c65029f74233301c8d411a303acc3631664). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
SparkQA commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#issuecomment-476045571 **[Test build #103890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103890/testReport)** for PR 24193 at commit [`786da8a`](https://github.com/apache/spark/commit/786da8ad8cb05437b06b0ef3d69dec82a5b001ac). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
AmplabJenkins commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#issuecomment-476045381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9253/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
AmplabJenkins removed a comment on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#issuecomment-476045381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9253/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
AmplabJenkins commented on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#issuecomment-476045377 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite
AmplabJenkins commented on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite URL: https://github.com/apache/spark/pull/24198#issuecomment-476045373 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty
AmplabJenkins removed a comment on issue #24193: [SPARK-27263][HistoryServer] Hide `Hadoop Properties` table in Environment page if it's empty URL: https://github.com/apache/spark/pull/24193#issuecomment-476045377 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite
AmplabJenkins commented on issue #24198: [SPARK-25196][SQL][FOLLOWUP] Fix wrong tests in StatisticsCollectionSuite URL: https://github.com/apache/spark/pull/24198#issuecomment-476045376 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/9252/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24047: [SPARK-25196][SQL] Extends the analyze column command for cached tables
maropu commented on a change in pull request #24047: [SPARK-25196][SQL] Extends the analyze column command for cached tables URL: https://github.com/apache/spark/pull/24047#discussion_r268481175 ## File path: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ## @@ -470,4 +471,77 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + def getStatAttrNames(tableName: String): Set[String] = { +val queryStats = spark.table(tableName).queryExecution.optimizedPlan.stats.attributeStats +queryStats.map(_._1.name).toSet + } + + test("analyzes column statistics in cached query") { +withTempView("cachedQuery") { + sql( +"""CACHE TABLE cachedQuery AS + | SELECT c0, avg(c1) AS v1, avg(c2) AS v2 + | FROM (SELECT id % 3 AS c0, id % 5 AS c1, 2 AS c2 FROM range(1, 30)) + | GROUP BY c0 +""".stripMargin) + + // Analyzes one column in the cached logical plan + sql("ANALYZE TABLE cachedQuery COMPUTE STATISTICS FOR COLUMNS v1") + assert(getStatAttrNames("cachedQuery") === Set("v1")) + + // Analyzes two more columns + sql("ANALYZE TABLE cachedQuery COMPUTE STATISTICS FOR COLUMNS c0, v2") + assert(getStatAttrNames("cachedQuery") === Set("c0", "v1", "v2")) +} + } + + test("analyzes column statistics in cached local temporary view") { +withTempView("tempView") { + // Analyzes in a temporary view + sql("CREATE TEMPORARY VIEW tempView AS SELECT * FROM range(1, 30)") + val errMsg = intercept[AnalysisException] { +sql("ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS id") + }.getMessage + assert(errMsg.contains(s"Table or view 'tempView' not found in database 'default'")) + + // Cache the view then analyze it + sql("CACHE TABLE tempView") + assert(getStatAttrNames("tempView") !== Set("id")) + sql("ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS id") + assert(getStatAttrNames("tempView") === Set("id")) +} + } + + test("analyzes column statistics in cached global temporary view") { +withGlobalTempView("gTempView") { + val globalTempDB = spark.sharedState.globalTempViewManager.database + val errMsg1 = intercept[NoSuchTableException] { +sql(s"ANALYZE TABLE $globalTempDB.gTempView COMPUTE STATISTICS FOR COLUMNS id") + }.getMessage + assert(errMsg1.contains(s"Table or view 'gTempView' not found in database '$globalTempDB'")) + // Analyzes in a global temporary view + sql("CREATE GLOBAL TEMP VIEW gTempView AS SELECT * FROM range(1, 30)") + val errMsg2 = intercept[AnalysisException] { +sql(s"ANALYZE TABLE $globalTempDB.gTempView COMPUTE STATISTICS FOR COLUMNS id") + }.getMessage + assert(errMsg2.contains(s"Table or view 'gTempView' not found in database '$globalTempDB'")) + + // Cache the view then analyze it + sql(s"CACHE TABLE $globalTempDB.gTempView") + assert(getStatAttrNames(s"$globalTempDB.gTempView") !== Set("id")) + sql(s"ANALYZE TABLE $globalTempDB.gTempView COMPUTE STATISTICS FOR COLUMNS id") + assert(getStatAttrNames(s"$globalTempDB.gTempView") === Set("id")) +} + } + + test("analyzes column statistics in cached catalog view") { +withTempDatabase { database => + sql(s"CREATE VIEW $database.v AS SELECT 1 c") + sql(s"CACHE TABLE $database.v") + assert(getStatAttrNames(s"$database.v") !== Set("id")) + sql(s"ANALYZE TABLE $database.v COMPUTE STATISTICS FOR COLUMNS c") + assert(getStatAttrNames(s"$database.v") !== Set("id")) Review comment: @dongjoon-hyun Sorry, but I fond my silly mistake now, so could you check https://github.com/apache/spark/pull/24198? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24047: [SPARK-25196][SQL] Extends the analyze column command for cached tables
maropu commented on a change in pull request #24047: [SPARK-25196][SQL] Extends the analyze column command for cached tables URL: https://github.com/apache/spark/pull/24047#discussion_r268481175 ## File path: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ## @@ -470,4 +471,77 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + def getStatAttrNames(tableName: String): Set[String] = { +val queryStats = spark.table(tableName).queryExecution.optimizedPlan.stats.attributeStats +queryStats.map(_._1.name).toSet + } + + test("analyzes column statistics in cached query") { +withTempView("cachedQuery") { + sql( +"""CACHE TABLE cachedQuery AS + | SELECT c0, avg(c1) AS v1, avg(c2) AS v2 + | FROM (SELECT id % 3 AS c0, id % 5 AS c1, 2 AS c2 FROM range(1, 30)) + | GROUP BY c0 +""".stripMargin) + + // Analyzes one column in the cached logical plan + sql("ANALYZE TABLE cachedQuery COMPUTE STATISTICS FOR COLUMNS v1") + assert(getStatAttrNames("cachedQuery") === Set("v1")) + + // Analyzes two more columns + sql("ANALYZE TABLE cachedQuery COMPUTE STATISTICS FOR COLUMNS c0, v2") + assert(getStatAttrNames("cachedQuery") === Set("c0", "v1", "v2")) +} + } + + test("analyzes column statistics in cached local temporary view") { +withTempView("tempView") { + // Analyzes in a temporary view + sql("CREATE TEMPORARY VIEW tempView AS SELECT * FROM range(1, 30)") + val errMsg = intercept[AnalysisException] { +sql("ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS id") + }.getMessage + assert(errMsg.contains(s"Table or view 'tempView' not found in database 'default'")) + + // Cache the view then analyze it + sql("CACHE TABLE tempView") + assert(getStatAttrNames("tempView") !== Set("id")) + sql("ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS id") + assert(getStatAttrNames("tempView") === Set("id")) +} + } + + test("analyzes column statistics in cached global temporary view") { +withGlobalTempView("gTempView") { + val globalTempDB = spark.sharedState.globalTempViewManager.database + val errMsg1 = intercept[NoSuchTableException] { +sql(s"ANALYZE TABLE $globalTempDB.gTempView COMPUTE STATISTICS FOR COLUMNS id") + }.getMessage + assert(errMsg1.contains(s"Table or view 'gTempView' not found in database '$globalTempDB'")) + // Analyzes in a global temporary view + sql("CREATE GLOBAL TEMP VIEW gTempView AS SELECT * FROM range(1, 30)") + val errMsg2 = intercept[AnalysisException] { +sql(s"ANALYZE TABLE $globalTempDB.gTempView COMPUTE STATISTICS FOR COLUMNS id") + }.getMessage + assert(errMsg2.contains(s"Table or view 'gTempView' not found in database '$globalTempDB'")) + + // Cache the view then analyze it + sql(s"CACHE TABLE $globalTempDB.gTempView") + assert(getStatAttrNames(s"$globalTempDB.gTempView") !== Set("id")) + sql(s"ANALYZE TABLE $globalTempDB.gTempView COMPUTE STATISTICS FOR COLUMNS id") + assert(getStatAttrNames(s"$globalTempDB.gTempView") === Set("id")) +} + } + + test("analyzes column statistics in cached catalog view") { +withTempDatabase { database => + sql(s"CREATE VIEW $database.v AS SELECT 1 c") + sql(s"CACHE TABLE $database.v") + assert(getStatAttrNames(s"$database.v") !== Set("id")) + sql(s"ANALYZE TABLE $database.v COMPUTE STATISTICS FOR COLUMNS c") + assert(getStatAttrNames(s"$database.v") !== Set("id")) Review comment: @dongjoon-hyun Sorry, but I fond my silly mistake now, so could you check https://github.com/apache/spark/pull/24198? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org