[jira] [Commented] (SPARK-26345) Parquet support Column indexes
[ https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248526#comment-17248526 ] Yuming Wang commented on SPARK-26345: - [~jamestaylor] Please see [https://github.com/apache/spark/pull/30517] for more details. > Parquet support Column indexes > -- > > Key: SPARK-26345 > URL: https://issues.apache.org/jira/browse/SPARK-26345 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Parquet 1.11.0 supports column indexing. Spark can supports this feature for > good read performance. > More details: > https://issues.apache.org/jira/browse/PARQUET-1201 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.
[ https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248523#comment-17248523 ] Apache Spark commented on SPARK-32617: -- User 'attilapiros' has created a pull request for this issue: https://github.com/apache/spark/pull/30751 > Upgrade kubernetes client version to support latest minikube version. > - > > Key: SPARK-32617 > URL: https://issues.apache.org/jira/browse/SPARK-32617 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Priority: Major > > Following error comes, when the k8s integration tests are run against the > minikube cluster with version 1.2.1 > {code:java} > Run starting. Expected test count is: 18 > KubernetesSuite: > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED *** > io.fabric8.kubernetes.client.KubernetesClientException: An error has > occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62) > at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51) > at > io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > ... > Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt > at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) > at java.nio.file.Files.newByteChannel(Files.java:361) > at java.nio.file.Files.newByteChannel(Files.java:407) > at java.nio.file.Files.readAllBytes(Files.java:3152) > at > io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72) > at > io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242) > at > io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128) > ... > Run completed in 1 second, 821 milliseconds. > Total number of tests run: 0 > Suites: completed 1, aborted 1 > Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0 > *** 1 SUITE ABORTED *** > [INFO] > > [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.454 > s] > [INFO] Spark Project Tags . SUCCESS [ 4.768 > s] > [INFO] Spark Project Local DB . SUCCESS [ 2.961 > s] > [INFO] Spark Project Networking ... SUCCESS [ 4.258 > s] > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 5.703 > s] > [INFO] Spark Project Unsafe ... SUCCESS [ 3.239 > s] > [INFO] Spark Project Launcher . SUCCESS [ 3.224 > s] > [INFO] Spark Project Core . SUCCESS [02:25 > min] > [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 > s] > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 03:12 min > [INFO] Finished at: 2020-08-11T06:26:15-05:00 > [INFO] > > [ERROR] Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project > spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and
[jira] [Assigned] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.
[ https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32617: Assignee: (was: Apache Spark) > Upgrade kubernetes client version to support latest minikube version. > - > > Key: SPARK-32617 > URL: https://issues.apache.org/jira/browse/SPARK-32617 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Priority: Major > > Following error comes, when the k8s integration tests are run against the > minikube cluster with version 1.2.1 > {code:java} > Run starting. Expected test count is: 18 > KubernetesSuite: > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED *** > io.fabric8.kubernetes.client.KubernetesClientException: An error has > occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62) > at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51) > at > io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > ... > Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt > at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) > at java.nio.file.Files.newByteChannel(Files.java:361) > at java.nio.file.Files.newByteChannel(Files.java:407) > at java.nio.file.Files.readAllBytes(Files.java:3152) > at > io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72) > at > io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242) > at > io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128) > ... > Run completed in 1 second, 821 milliseconds. > Total number of tests run: 0 > Suites: completed 1, aborted 1 > Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0 > *** 1 SUITE ABORTED *** > [INFO] > > [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.454 > s] > [INFO] Spark Project Tags . SUCCESS [ 4.768 > s] > [INFO] Spark Project Local DB . SUCCESS [ 2.961 > s] > [INFO] Spark Project Networking ... SUCCESS [ 4.258 > s] > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 5.703 > s] > [INFO] Spark Project Unsafe ... SUCCESS [ 3.239 > s] > [INFO] Spark Project Launcher . SUCCESS [ 3.224 > s] > [INFO] Spark Project Core . SUCCESS [02:25 > min] > [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 > s] > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 03:12 min > [INFO] Finished at: 2020-08-11T06:26:15-05:00 > [INFO] > > [ERROR] Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project > spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] >
[jira] [Commented] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.
[ https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248522#comment-17248522 ] Apache Spark commented on SPARK-32617: -- User 'attilapiros' has created a pull request for this issue: https://github.com/apache/spark/pull/30751 > Upgrade kubernetes client version to support latest minikube version. > - > > Key: SPARK-32617 > URL: https://issues.apache.org/jira/browse/SPARK-32617 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Priority: Major > > Following error comes, when the k8s integration tests are run against the > minikube cluster with version 1.2.1 > {code:java} > Run starting. Expected test count is: 18 > KubernetesSuite: > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED *** > io.fabric8.kubernetes.client.KubernetesClientException: An error has > occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62) > at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51) > at > io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > ... > Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt > at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) > at java.nio.file.Files.newByteChannel(Files.java:361) > at java.nio.file.Files.newByteChannel(Files.java:407) > at java.nio.file.Files.readAllBytes(Files.java:3152) > at > io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72) > at > io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242) > at > io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128) > ... > Run completed in 1 second, 821 milliseconds. > Total number of tests run: 0 > Suites: completed 1, aborted 1 > Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0 > *** 1 SUITE ABORTED *** > [INFO] > > [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.454 > s] > [INFO] Spark Project Tags . SUCCESS [ 4.768 > s] > [INFO] Spark Project Local DB . SUCCESS [ 2.961 > s] > [INFO] Spark Project Networking ... SUCCESS [ 4.258 > s] > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 5.703 > s] > [INFO] Spark Project Unsafe ... SUCCESS [ 3.239 > s] > [INFO] Spark Project Launcher . SUCCESS [ 3.224 > s] > [INFO] Spark Project Core . SUCCESS [02:25 > min] > [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 > s] > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 03:12 min > [INFO] Finished at: 2020-08-11T06:26:15-05:00 > [INFO] > > [ERROR] Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project > spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and
[jira] [Assigned] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.
[ https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32617: Assignee: Apache Spark > Upgrade kubernetes client version to support latest minikube version. > - > > Key: SPARK-32617 > URL: https://issues.apache.org/jira/browse/SPARK-32617 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Assignee: Apache Spark >Priority: Major > > Following error comes, when the k8s integration tests are run against the > minikube cluster with version 1.2.1 > {code:java} > Run starting. Expected test count is: 18 > KubernetesSuite: > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED *** > io.fabric8.kubernetes.client.KubernetesClientException: An error has > occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62) > at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51) > at > io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81) > at > org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > ... > Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt > at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) > at java.nio.file.Files.newByteChannel(Files.java:361) > at java.nio.file.Files.newByteChannel(Files.java:407) > at java.nio.file.Files.readAllBytes(Files.java:3152) > at > io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72) > at > io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242) > at > io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128) > ... > Run completed in 1 second, 821 milliseconds. > Total number of tests run: 0 > Suites: completed 1, aborted 1 > Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0 > *** 1 SUITE ABORTED *** > [INFO] > > [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.454 > s] > [INFO] Spark Project Tags . SUCCESS [ 4.768 > s] > [INFO] Spark Project Local DB . SUCCESS [ 2.961 > s] > [INFO] Spark Project Networking ... SUCCESS [ 4.258 > s] > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 5.703 > s] > [INFO] Spark Project Unsafe ... SUCCESS [ 3.239 > s] > [INFO] Spark Project Launcher . SUCCESS [ 3.224 > s] > [INFO] Spark Project Core . SUCCESS [02:25 > min] > [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 > s] > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 03:12 min > [INFO] Finished at: 2020-08-11T06:26:15-05:00 > [INFO] > > [ERROR] Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project > spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] >
[jira] [Updated] (SPARK-33705) Three cases always fail in hive-thriftserver module
[ https://issues.apache.org/jira/browse/SPARK-33705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33705: - Priority: Critical (was: Major) > Three cases always fail in hive-thriftserver module > --- > > Key: SPARK-33705 > URL: https://issues.apache.org/jira/browse/SPARK-33705 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Critical > > Seems the following tests always failed both in Jenkins and Github Action: > * org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.JDBC > query execution > * org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.Checks > Hive version > * > org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.SPARK-24829 > Checks cast as float > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132189/testReport/] > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132403/testReport/] > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32447) Use python3 by default in pyspark and find-spark-home scripts
[ https://issues.apache.org/jira/browse/SPARK-32447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248489#comment-17248489 ] Apache Spark commented on SPARK-32447: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30750 > Use python3 by default in pyspark and find-spark-home scripts > - > > Key: SPARK-32447 > URL: https://issues.apache.org/jira/browse/SPARK-32447 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > > This script depends on `find_spark_home.py` which is migrated to `python3` > already by using `#!/usr/bin/env python3`. > {code} > FIND_SPARK_HOME_PYTHON_SCRIPT="$(cd "$(dirname "$0")"; > pwd)/find_spark_home.py" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32447) Use python3 by default in pyspark and find-spark-home scripts
[ https://issues.apache.org/jira/browse/SPARK-32447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248488#comment-17248488 ] Apache Spark commented on SPARK-32447: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30750 > Use python3 by default in pyspark and find-spark-home scripts > - > > Key: SPARK-32447 > URL: https://issues.apache.org/jira/browse/SPARK-32447 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > > This script depends on `find_spark_home.py` which is migrated to `python3` > already by using `#!/usr/bin/env python3`. > {code} > FIND_SPARK_HOME_PYTHON_SCRIPT="$(cd "$(dirname "$0")"; > pwd)/find_spark_home.py" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition
[ https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248446#comment-17248446 ] Apache Spark commented on SPARK-33768: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30748 > Remove unused parameter `retainData` from AlterTableDropPartition > - > > Key: SPARK-33768 > URL: https://issues.apache.org/jira/browse/SPARK-33768 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The parameter is hard-coded to false while parsing in AstBuilder. The > parameter can be removed from the logical node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition
[ https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33768: Assignee: (was: Apache Spark) > Remove unused parameter `retainData` from AlterTableDropPartition > - > > Key: SPARK-33768 > URL: https://issues.apache.org/jira/browse/SPARK-33768 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The parameter is hard-coded to false while parsing in AstBuilder. The > parameter can be removed from the logical node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition
[ https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33768: Assignee: Apache Spark > Remove unused parameter `retainData` from AlterTableDropPartition > - > > Key: SPARK-33768 > URL: https://issues.apache.org/jira/browse/SPARK-33768 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > The parameter is hard-coded to false while parsing in AstBuilder. The > parameter can be removed from the logical node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition
[ https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248445#comment-17248445 ] Apache Spark commented on SPARK-33768: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30748 > Remove unused parameter `retainData` from AlterTableDropPartition > - > > Key: SPARK-33768 > URL: https://issues.apache.org/jira/browse/SPARK-33768 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The parameter is hard-coded to false while parsing in AstBuilder. The > parameter can be removed from the logical node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition
Maxim Gekk created SPARK-33768: -- Summary: Remove unused parameter `retainData` from AlterTableDropPartition Key: SPARK-33768 URL: https://issues.apache.org/jira/browse/SPARK-33768 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The parameter is hard-coded to false while parsing in AstBuilder. The parameter can be removed from the logical node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33744) Canonicalization error in SortAggregate
[ https://issues.apache.org/jira/browse/SPARK-33744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248417#comment-17248417 ] Pablo Langa Blanco commented on SPARK-33744: I'm taking a look at it, thanks for reporting. > Canonicalization error in SortAggregate > --- > > Key: SPARK-33744 > URL: https://issues.apache.org/jira/browse/SPARK-33744 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Andy Grove >Priority: Minor > > The canonicalization plan for a simple aggregate query is different each time > for SortAggregate but not for HashAggregate. > The issue can be demonstrated by adding the following unit tests to > SQLQuerySuite. The HashAggregate test passes and the SortAggregate test fails. > The first test has numeric input and the second test is operating on strings, > which forces the use of SortAggregate rather than HashAggregate. > {code:java} > test("HashAggregate canonicalization") { > val data = Seq((1, 1)).toDF("c0", "c1") > val df1 = data.groupBy(col("c0")).agg(first("c1")) > val df2 = data.groupBy(col("c0")).agg(first("c1")) > assert(df1.queryExecution.executedPlan.canonicalized == > df2.queryExecution.executedPlan.canonicalized) > } > test("SortAggregate canonicalization") { > val data = Seq(("a", "a")).toDF("c0", "c1") > val df1 = data.groupBy(col("c0")).agg(first("c1")) > val df2 = data.groupBy(col("c0")).agg(first("c1")) > assert(df1.queryExecution.executedPlan.canonicalized == > df2.queryExecution.executedPlan.canonicalized) > } {code} > The SortAggregate test fails with the following output . > {code:java} > SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, > #1]) > +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0 >+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#105] > +- SortAggregate(key=[none#0], functions=[partial_first(none#1, > false)], output=[none#0, none#2, none#3]) > +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0 > +- *(1) Project [none#0 AS #0, none#1 AS #1] >+- *(1) LocalTableScan [none#0, none#1] > did not equal > SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, > #1]) > +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0 >+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#148] > +- SortAggregate(key=[none#0], functions=[partial_first(none#1, > false)], output=[none#0, none#2, none#3]) > +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0 > +- *(1) Project [none#0 AS #0, none#1 AS #1] >+- *(1) LocalTableScan [none#0, none#1] {code} > The error is caused by the resultExpression for the aggregate function being > assigned a new ExprId in the final aggregate. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26345) Parquet support Column indexes
[ https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248412#comment-17248412 ] James R. Taylor commented on SPARK-26345: - Those results are excellent, [~yumwang]. I thought from earlier comments that vectorized reads in Spark weren't compatible with column indexing? Do the child JIRAs here fix that? > Parquet support Column indexes > -- > > Key: SPARK-26345 > URL: https://issues.apache.org/jira/browse/SPARK-26345 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Parquet 1.11.0 supports column indexing. Spark can supports this feature for > good read performance. > More details: > https://issues.apache.org/jira/browse/PARQUET-1201 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248403#comment-17248403 ] Apache Spark commented on SPARK-33767: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30747 > Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests > --- > > Key: SPARK-33767 > URL: https://issues.apache.org/jira/browse/SPARK-33767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33767: Assignee: Maxim Gekk (was: Apache Spark) > Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests > --- > > Key: SPARK-33767 > URL: https://issues.apache.org/jira/browse/SPARK-33767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248401#comment-17248401 ] Apache Spark commented on SPARK-33767: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30747 > Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests > --- > > Key: SPARK-33767 > URL: https://issues.apache.org/jira/browse/SPARK-33767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33767: Assignee: Apache Spark (was: Maxim Gekk) > Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests > --- > > Key: SPARK-33767 > URL: https://issues.apache.org/jira/browse/SPARK-33767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.1.0 > > > Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33767: --- Description: Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. (was: Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites.) > Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests > --- > > Key: SPARK-33767 > URL: https://issues.apache.org/jira/browse/SPARK-33767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
Maxim Gekk created SPARK-33767: -- Summary: Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests Key: SPARK-33767 URL: https://issues.apache.org/jira/browse/SPARK-33767 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.1.0 Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26345) Parquet support Column indexes
[ https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248374#comment-17248374 ] Yuming Wang commented on SPARK-26345: - Benchmark and benchmark result: {code:scala} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.spark.sql.execution.benchmark import java.io.File import scala.util.Random import org.apache.spark.SparkConf import org.apache.spark.benchmark.Benchmark import org.apache.spark.sql.{DataFrame, SparkSession} import org.apache.spark.sql.functions.{monotonically_increasing_id, timestamp_seconds} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType} /** * Benchmark to measure read performance with Parquet column index. * To run this benchmark: * {{{ * 1. without sbt: bin/spark-submit --class * 2. build/sbt "sql/test:runMain " * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " * Results will be written to "benchmarks/ParquetFilterPushdownBenchmark-results.txt". * }}} */ object ParquetFilterPushdownBenchmark extends SqlBasedBenchmark { override def getSparkSession: SparkSession = { val conf = new SparkConf() .setAppName(this.getClass.getSimpleName) // Since `spark.master` always exists, overrides this value .set("spark.master", "local[1]") .setIfMissing("spark.driver.memory", "3g") .setIfMissing("spark.executor.memory", "3g") .setIfMissing("orc.compression", "snappy") .setIfMissing("spark.sql.parquet.compression.codec", "snappy") SparkSession.builder().config(conf).getOrCreate() } private val numRows = 1024 * 1024 * 15 private val width = 5 private val mid = numRows / 2 def withTempTable(tableNames: String*)(f: => Unit): Unit = { try f finally tableNames.foreach(spark.catalog.dropTempView) } private def prepareTable( dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = { import spark.implicits._ val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i") val valueCol = if (useStringForValue) { monotonically_increasing_id().cast("string") } else { monotonically_increasing_id() } val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*) .withColumn("value", valueCol) .sort("value") saveAsTable(df, dir) } private def prepareStringDictTable( dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = { val selectExpr = (0 to width).map { case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value" case i => s"CAST(rand() AS STRING) c$i" } val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value") saveAsTable(df, dir, true) } private def saveAsTable(df: DataFrame, dir: File, useDictionary: Boolean = false): Unit = { val parquetPath = dir.getCanonicalPath + "/parquet" df.write.mode("overwrite").parquet(parquetPath) spark.read.parquet(parquetPath).createOrReplaceTempView("parquetTable") } def filterPushDownBenchmark( values: Int, title: String, whereExpr: String, selectExpr: String = "*"): Unit = { val benchmark = new Benchmark(title, values, minNumIters = 5, output = output) Seq(false, true).foreach { columnIndexEnabled => val name = s"Parquet Vectorized ${if (columnIndexEnabled) s"(columnIndex)" else ""}" benchmark.addCase(name) { _ => withSQLConf("parquet.filter.columnindex.enabled" -> s"$columnIndexEnabled") { spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").noop() } } } benchmark.run() } private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = { Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr => val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value") filterPushDownBenchmark(numRows, title, whereExpr) } Seq( s"value = $mid", s"value <=> $mid", s"$mid
[jira] [Commented] (SPARK-33734) Spark Core ::Spark core versions upto 3.0.1 using interdependency on Jackson-core-asl version 1.9.13, which is having security issues reported.
[ https://issues.apache.org/jira/browse/SPARK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248364#comment-17248364 ] Aparna commented on SPARK-33734: [~hyukjin.kwon] Updated the title. I don't have the CVE ticket. Please let me know the updated version of Spark-core to pick. > Spark Core ::Spark core versions upto 3.0.1 using interdependency on > Jackson-core-asl version 1.9.13, which is having security issues reported. > > > Key: SPARK-33734 > URL: https://issues.apache.org/jira/browse/SPARK-33734 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Aparna >Priority: Major > > spark-core version upto latest 3.0.1 is using dependency > [org.apache.avro|https://mvnrepository.com/artifact/org.apache.avro] version > 1.8.2 which is having > [jackson-core-asl|https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-core-asl] > version 1.9.13 which has security issues. > Please fix and share the new version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33734) Spark Core ::Spark core versions upto 3.0.1 using interdependency on Jackson-core-asl version 1.9.13, which is having security issues reported.
[ https://issues.apache.org/jira/browse/SPARK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aparna updated SPARK-33734: --- Summary: Spark Core ::Spark core versions upto 3.0.1 using interdependency on Jackson-core-asl version 1.9.13, which is having security issues reported. (was: Spark Core ) > Spark Core ::Spark core versions upto 3.0.1 using interdependency on > Jackson-core-asl version 1.9.13, which is having security issues reported. > > > Key: SPARK-33734 > URL: https://issues.apache.org/jira/browse/SPARK-33734 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Aparna >Priority: Major > > spark-core version upto latest 3.0.1 is using dependency > [org.apache.avro|https://mvnrepository.com/artifact/org.apache.avro] version > 1.8.2 which is having > [jackson-core-asl|https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-core-asl] > version 1.9.13 which has security issues. > Please fix and share the new version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32195) Standardize warning types and messages
[ https://issues.apache.org/jira/browse/SPARK-32195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32195: Assignee: Maciej Szymkiewicz > Standardize warning types and messages > -- > > Key: SPARK-32195 > URL: https://issues.apache.org/jira/browse/SPARK-32195 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Maciej Szymkiewicz >Priority: Major > > Currently PySpark uses a somewhat inconsistent warning type and message such > as UserWarning. We should standardize it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33730) Standardize warning types
[ https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33730: Assignee: Maciej Bryński > Standardize warning types > - > > Key: SPARK-33730 > URL: https://issues.apache.org/jira/browse/SPARK-33730 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Maciej Bryński >Priority: Major > > We should use warnings properly per > [https://docs.python.org/3/library/warnings.html#warning-categories] > In particular, > - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the > places we should show the warnings to end-users by default. > - we should __maybe__ think about customizing stacklevel > ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas > does. > - ... > Current warnings are a bit messy and somewhat arbitrary. > To be more explicit, we'll have to fix: > {code:java} > pyspark/context.py:warnings.warn( > pyspark/context.py:warnings.warn( > pyspark/ml/classification.py:warnings.warn("weightCol is > ignored, " > pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will > be removed in future versions. Use " > pyspark/mllib/classification.py:warnings.warn( > pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd > are false. The model does nothing.") > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " > pyspark/rdd.py:warnings.warn( > pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") > pyspark/shuffle.py:warnings.warn("Please install psutil to have > better " > pyspark/sql/catalog.py:warnings.warn( > pyspark/sql/catalog.py:warnings.warn( > pyspark/sql/column.py:warnings.warn( > pyspark/sql/column.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/dataframe.py:warnings.warn( > pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict > and value is not None. value will be ignored.") > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees > instead.", DeprecationWarning) > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians > instead.", DeprecationWarning) > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use > approx_count_distinct instead.", DeprecationWarning) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/functions.py:warnings.warn( > pyspark/sql/pandas/group_ops.py:warnings.warn( > pyspark/sql/session.py:warnings.warn("Fall back to non-hive > support because failing to access HiveConf, " > {code} > PySpark prints warnings via using {{print}} in some places as well. We should > also see if we should switch and replace to {{warnings.warn}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26345) Parquet support Column indexes
[ https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248323#comment-17248323 ] Yuming Wang commented on SPARK-26345: - We have a pr test compatibility against Parquet 1.11.1, Avro 1.10.1 and Hive 2.3.8: https://github.com/apache/spark/pull/30517 > Parquet support Column indexes > -- > > Key: SPARK-26345 > URL: https://issues.apache.org/jira/browse/SPARK-26345 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Parquet 1.11.0 supports column indexing. Spark can supports this feature for > good read performance. > More details: > https://issues.apache.org/jira/browse/PARQUET-1201 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33766) Upgrade Jackson to 2.11.4
[ https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33766: Assignee: (was: Apache Spark) > Upgrade Jackson to 2.11.4 > - > > Key: SPARK-33766 > URL: https://issues.apache.org/jira/browse/SPARK-33766 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1. > More details: > https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 > {noformat} > com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 > requires Jackson Databind version >= 2.10.0 and < 2.11.0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33766) Upgrade Jackson to 2.11.4
[ https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248322#comment-17248322 ] Apache Spark commented on SPARK-33766: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30746 > Upgrade Jackson to 2.11.4 > - > > Key: SPARK-33766 > URL: https://issues.apache.org/jira/browse/SPARK-33766 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1. > More details: > https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 > {noformat} > com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 > requires Jackson Databind version >= 2.10.0 and < 2.11.0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33766) Upgrade Jackson to 2.11.4
[ https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33766: Assignee: Apache Spark > Upgrade Jackson to 2.11.4 > - > > Key: SPARK-33766 > URL: https://issues.apache.org/jira/browse/SPARK-33766 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1. > More details: > https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 > {noformat} > com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 > requires Jackson Databind version >= 2.10.0 and < 2.11.0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33766) Upgrade Jackson to 2.11.4
[ https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248321#comment-17248321 ] Apache Spark commented on SPARK-33766: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30746 > Upgrade Jackson to 2.11.4 > - > > Key: SPARK-33766 > URL: https://issues.apache.org/jira/browse/SPARK-33766 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1. > More details: > https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 > {noformat} > com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 > requires Jackson Databind version >= 2.10.0 and < 2.11.0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33766) Upgrade Jackson to 2.11.4
[ https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33766: Description: Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1. More details: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 {noformat} com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 requires Jackson Databind version >= 2.10.0 and < 2.11.0 {noformat} was: Upgrade Jackson to 2.11.3 to make it easy to upgrade Avro 1.10.1. More details: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 {noformat} com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 requires Jackson Databind version >= 2.10.0 and < 2.11.0 {noformat} > Upgrade Jackson to 2.11.4 > - > > Key: SPARK-33766 > URL: https://issues.apache.org/jira/browse/SPARK-33766 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1. > More details: > https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 > {noformat} > com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 > requires Jackson Databind version >= 2.10.0 and < 2.11.0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33766) Upgrade Jackson to 2.11.4
Yuming Wang created SPARK-33766: --- Summary: Upgrade Jackson to 2.11.4 Key: SPARK-33766 URL: https://issues.apache.org/jira/browse/SPARK-33766 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: Yuming Wang Upgrade Jackson to 2.11.3 to make it easy to upgrade Avro 1.10.1. More details: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11 {noformat} com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 requires Jackson Databind version >= 2.10.0 and < 2.11.0 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33678) Numerical product aggregation
[ https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248314#comment-17248314 ] Apache Spark commented on SPARK-33678: -- User 'rwpenney' has created a pull request for this issue: https://github.com/apache/spark/pull/30745 > Numerical product aggregation > - > > Key: SPARK-33678 > URL: https://issues.apache.org/jira/browse/SPARK-33678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.1.0 >Reporter: Richard Penney >Priority: Minor > > There is currently no facility in {{spark.sql.functions}} to allow > computation of the product of all numbers in a grouping expression. Such a > facility would likely be useful when computing statistical quantities such as > the combined probability of a set of independent events, or in financial > applications when calculating a cumulative interest rate. > Although it is certainly possible to emulate this by an expression of the > form {{exp(sum(log(column)))}}, this has a number of significant drawbacks: > * It involves computationally costly functions (exp, log) > * It is more verbose than something like {{product(column)}} > * It is more prone to numerical inaccuracies when handling quantities that > are close to one than by directly multiplying a set of numbers > * It will not handle zeros or negative numbers cleanly > I am currently developing an addition to {{sql.functions}}, which involves [a > new Catalyst aggregation > expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala]. > This needs some additional testing, and I hope to issue a pull-request soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33678) Numerical product aggregation
[ https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Penney updated SPARK-33678: --- Flags: Patch Pull-request now issued (https://github.com/apache/spark/pull/30745) > Numerical product aggregation > - > > Key: SPARK-33678 > URL: https://issues.apache.org/jira/browse/SPARK-33678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.1.0 >Reporter: Richard Penney >Priority: Minor > > There is currently no facility in {{spark.sql.functions}} to allow > computation of the product of all numbers in a grouping expression. Such a > facility would likely be useful when computing statistical quantities such as > the combined probability of a set of independent events, or in financial > applications when calculating a cumulative interest rate. > Although it is certainly possible to emulate this by an expression of the > form {{exp(sum(log(column)))}}, this has a number of significant drawbacks: > * It involves computationally costly functions (exp, log) > * It is more verbose than something like {{product(column)}} > * It is more prone to numerical inaccuracies when handling quantities that > are close to one than by directly multiplying a set of numbers > * It will not handle zeros or negative numbers cleanly > I am currently developing an addition to {{sql.functions}}, which involves [a > new Catalyst aggregation > expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala]. > This needs some additional testing, and I hope to issue a pull-request soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33678) Numerical product aggregation
[ https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248312#comment-17248312 ] Apache Spark commented on SPARK-33678: -- User 'rwpenney' has created a pull request for this issue: https://github.com/apache/spark/pull/30745 > Numerical product aggregation > - > > Key: SPARK-33678 > URL: https://issues.apache.org/jira/browse/SPARK-33678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.1.0 >Reporter: Richard Penney >Priority: Minor > > There is currently no facility in {{spark.sql.functions}} to allow > computation of the product of all numbers in a grouping expression. Such a > facility would likely be useful when computing statistical quantities such as > the combined probability of a set of independent events, or in financial > applications when calculating a cumulative interest rate. > Although it is certainly possible to emulate this by an expression of the > form {{exp(sum(log(column)))}}, this has a number of significant drawbacks: > * It involves computationally costly functions (exp, log) > * It is more verbose than something like {{product(column)}} > * It is more prone to numerical inaccuracies when handling quantities that > are close to one than by directly multiplying a set of numbers > * It will not handle zeros or negative numbers cleanly > I am currently developing an addition to {{sql.functions}}, which involves [a > new Catalyst aggregation > expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala]. > This needs some additional testing, and I hope to issue a pull-request soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33678) Numerical product aggregation
[ https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33678: Assignee: Apache Spark > Numerical product aggregation > - > > Key: SPARK-33678 > URL: https://issues.apache.org/jira/browse/SPARK-33678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.1.0 >Reporter: Richard Penney >Assignee: Apache Spark >Priority: Minor > > There is currently no facility in {{spark.sql.functions}} to allow > computation of the product of all numbers in a grouping expression. Such a > facility would likely be useful when computing statistical quantities such as > the combined probability of a set of independent events, or in financial > applications when calculating a cumulative interest rate. > Although it is certainly possible to emulate this by an expression of the > form {{exp(sum(log(column)))}}, this has a number of significant drawbacks: > * It involves computationally costly functions (exp, log) > * It is more verbose than something like {{product(column)}} > * It is more prone to numerical inaccuracies when handling quantities that > are close to one than by directly multiplying a set of numbers > * It will not handle zeros or negative numbers cleanly > I am currently developing an addition to {{sql.functions}}, which involves [a > new Catalyst aggregation > expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala]. > This needs some additional testing, and I hope to issue a pull-request soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33678) Numerical product aggregation
[ https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33678: Assignee: (was: Apache Spark) > Numerical product aggregation > - > > Key: SPARK-33678 > URL: https://issues.apache.org/jira/browse/SPARK-33678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.1.0 >Reporter: Richard Penney >Priority: Minor > > There is currently no facility in {{spark.sql.functions}} to allow > computation of the product of all numbers in a grouping expression. Such a > facility would likely be useful when computing statistical quantities such as > the combined probability of a set of independent events, or in financial > applications when calculating a cumulative interest rate. > Although it is certainly possible to emulate this by an expression of the > form {{exp(sum(log(column)))}}, this has a number of significant drawbacks: > * It involves computationally costly functions (exp, log) > * It is more verbose than something like {{product(column)}} > * It is more prone to numerical inaccuracies when handling quantities that > are close to one than by directly multiplying a set of numbers > * It will not handle zeros or negative numbers cleanly > I am currently developing an addition to {{sql.functions}}, which involves [a > new Catalyst aggregation > expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala]. > This needs some additional testing, and I hope to issue a pull-request soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33589) Close opened session if the initialization fails
[ https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248310#comment-17248310 ] Apache Spark commented on SPARK-33589: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30744 > Close opened session if the initialization fails > > > Key: SPARK-33589 > URL: https://issues.apache.org/jira/browse/SPARK-33589 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33589) Close opened session if the initialization fails
[ https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248311#comment-17248311 ] Apache Spark commented on SPARK-33589: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30744 > Close opened session if the initialization fails > > > Key: SPARK-33589 > URL: https://issues.apache.org/jira/browse/SPARK-33589 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org