[jira] [Commented] (SPARK-26345) Parquet support Column indexes

2020-12-12 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248526#comment-17248526
 ] 

Yuming Wang commented on SPARK-26345:
-

[~jamestaylor] Please see [https://github.com/apache/spark/pull/30517] for more 
details.

> Parquet support Column indexes
> --
>
> Key: SPARK-26345
> URL: https://issues.apache.org/jira/browse/SPARK-26345
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Parquet 1.11.0 supports column indexing. Spark can supports this feature for 
> good read performance.
> More details:
> https://issues.apache.org/jira/browse/PARQUET-1201



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248523#comment-17248523
 ] 

Apache Spark commented on SPARK-32617:
--

User 'attilapiros' has created a pull request for this issue:
https://github.com/apache/spark/pull/30751

> Upgrade kubernetes client version to support latest minikube version.
> -
>
> Key: SPARK-32617
> URL: https://issues.apache.org/jira/browse/SPARK-32617
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Following error comes, when the k8s integration tests are run against the 
> minikube cluster with version 1.2.1
> {code:java}
> Run starting. Expected test count is: 18
> KubernetesSuite:
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: An error has 
> occurred.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62)
>   at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51)
>   at 
> io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   ...
>   Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt
>   at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at java.nio.file.Files.readAllBytes(Files.java:3152)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242)
>   at 
> io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128)
>   ...
> Run completed in 1 second, 821 milliseconds.
> Total number of tests run: 0
> Suites: completed 1, aborted 1
> Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
> *** 1 SUITE ABORTED ***
> [INFO] 
> 
> [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.454 
> s]
> [INFO] Spark Project Tags . SUCCESS [  4.768 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  2.961 
> s]
> [INFO] Spark Project Networking ... SUCCESS [  4.258 
> s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  5.703 
> s]
> [INFO] Spark Project Unsafe ... SUCCESS [  3.239 
> s]
> [INFO] Spark Project Launcher . SUCCESS [  3.224 
> s]
> [INFO] Spark Project Core . SUCCESS [02:25 
> min]
> [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 
> s]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  03:12 min
> [INFO] Finished at: 2020-08-11T06:26:15-05:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project 
> spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and 

[jira] [Assigned] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32617:


Assignee: (was: Apache Spark)

> Upgrade kubernetes client version to support latest minikube version.
> -
>
> Key: SPARK-32617
> URL: https://issues.apache.org/jira/browse/SPARK-32617
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Following error comes, when the k8s integration tests are run against the 
> minikube cluster with version 1.2.1
> {code:java}
> Run starting. Expected test count is: 18
> KubernetesSuite:
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: An error has 
> occurred.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62)
>   at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51)
>   at 
> io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   ...
>   Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt
>   at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at java.nio.file.Files.readAllBytes(Files.java:3152)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242)
>   at 
> io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128)
>   ...
> Run completed in 1 second, 821 milliseconds.
> Total number of tests run: 0
> Suites: completed 1, aborted 1
> Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
> *** 1 SUITE ABORTED ***
> [INFO] 
> 
> [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.454 
> s]
> [INFO] Spark Project Tags . SUCCESS [  4.768 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  2.961 
> s]
> [INFO] Spark Project Networking ... SUCCESS [  4.258 
> s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  5.703 
> s]
> [INFO] Spark Project Unsafe ... SUCCESS [  3.239 
> s]
> [INFO] Spark Project Launcher . SUCCESS [  3.224 
> s]
> [INFO] Spark Project Core . SUCCESS [02:25 
> min]
> [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 
> s]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  03:12 min
> [INFO] Finished at: 2020-08-11T06:26:15-05:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project 
> spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> 

[jira] [Commented] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248522#comment-17248522
 ] 

Apache Spark commented on SPARK-32617:
--

User 'attilapiros' has created a pull request for this issue:
https://github.com/apache/spark/pull/30751

> Upgrade kubernetes client version to support latest minikube version.
> -
>
> Key: SPARK-32617
> URL: https://issues.apache.org/jira/browse/SPARK-32617
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Following error comes, when the k8s integration tests are run against the 
> minikube cluster with version 1.2.1
> {code:java}
> Run starting. Expected test count is: 18
> KubernetesSuite:
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: An error has 
> occurred.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62)
>   at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51)
>   at 
> io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   ...
>   Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt
>   at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at java.nio.file.Files.readAllBytes(Files.java:3152)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242)
>   at 
> io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128)
>   ...
> Run completed in 1 second, 821 milliseconds.
> Total number of tests run: 0
> Suites: completed 1, aborted 1
> Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
> *** 1 SUITE ABORTED ***
> [INFO] 
> 
> [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.454 
> s]
> [INFO] Spark Project Tags . SUCCESS [  4.768 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  2.961 
> s]
> [INFO] Spark Project Networking ... SUCCESS [  4.258 
> s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  5.703 
> s]
> [INFO] Spark Project Unsafe ... SUCCESS [  3.239 
> s]
> [INFO] Spark Project Launcher . SUCCESS [  3.224 
> s]
> [INFO] Spark Project Core . SUCCESS [02:25 
> min]
> [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 
> s]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  03:12 min
> [INFO] Finished at: 2020-08-11T06:26:15-05:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project 
> spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and 

[jira] [Assigned] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32617:


Assignee: Apache Spark

> Upgrade kubernetes client version to support latest minikube version.
> -
>
> Key: SPARK-32617
> URL: https://issues.apache.org/jira/browse/SPARK-32617
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Apache Spark
>Priority: Major
>
> Following error comes, when the k8s integration tests are run against the 
> minikube cluster with version 1.2.1
> {code:java}
> Run starting. Expected test count is: 18
> KubernetesSuite:
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: An error has 
> occurred.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196)
>   at 
> io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62)
>   at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51)
>   at 
> io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   ...
>   Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt
>   at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
>   at java.nio.file.Files.newByteChannel(Files.java:361)
>   at java.nio.file.Files.newByteChannel(Files.java:407)
>   at java.nio.file.Files.readAllBytes(Files.java:3152)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72)
>   at 
> io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242)
>   at 
> io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128)
>   ...
> Run completed in 1 second, 821 milliseconds.
> Total number of tests run: 0
> Suites: completed 1, aborted 1
> Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
> *** 1 SUITE ABORTED ***
> [INFO] 
> 
> [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.454 
> s]
> [INFO] Spark Project Tags . SUCCESS [  4.768 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  2.961 
> s]
> [INFO] Spark Project Networking ... SUCCESS [  4.258 
> s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [  5.703 
> s]
> [INFO] Spark Project Unsafe ... SUCCESS [  3.239 
> s]
> [INFO] Spark Project Launcher . SUCCESS [  3.224 
> s]
> [INFO] Spark Project Core . SUCCESS [02:25 
> min]
> [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 
> s]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  03:12 min
> [INFO] Finished at: 2020-08-11T06:26:15-05:00
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project 
> spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> 

[jira] [Updated] (SPARK-33705) Three cases always fail in hive-thriftserver module

2020-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33705:
-
Priority: Critical  (was: Major)

> Three cases always fail in hive-thriftserver module
> ---
>
> Key: SPARK-33705
> URL: https://issues.apache.org/jira/browse/SPARK-33705
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Critical
>
> Seems the following tests always failed both in Jenkins and Github Action:
>  * org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.JDBC 
> query execution
>  * org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.Checks 
> Hive version
>  * 
> org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.SPARK-24829 
> Checks cast as float
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132189/testReport/]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132403/testReport/]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32447) Use python3 by default in pyspark and find-spark-home scripts

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248489#comment-17248489
 ] 

Apache Spark commented on SPARK-32447:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30750

> Use python3 by default in pyspark and find-spark-home scripts
> -
>
> Key: SPARK-32447
> URL: https://issues.apache.org/jira/browse/SPARK-32447
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.1.0
>
>
> This script depends on `find_spark_home.py` which is migrated to `python3` 
> already by using `#!/usr/bin/env python3`.
> {code}
> FIND_SPARK_HOME_PYTHON_SCRIPT="$(cd "$(dirname "$0")"; 
> pwd)/find_spark_home.py"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32447) Use python3 by default in pyspark and find-spark-home scripts

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248488#comment-17248488
 ] 

Apache Spark commented on SPARK-32447:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/30750

> Use python3 by default in pyspark and find-spark-home scripts
> -
>
> Key: SPARK-32447
> URL: https://issues.apache.org/jira/browse/SPARK-32447
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.1.0
>
>
> This script depends on `find_spark_home.py` which is migrated to `python3` 
> already by using `#!/usr/bin/env python3`.
> {code}
> FIND_SPARK_HOME_PYTHON_SCRIPT="$(cd "$(dirname "$0")"; 
> pwd)/find_spark_home.py"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248446#comment-17248446
 ] 

Apache Spark commented on SPARK-33768:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30748

> Remove unused parameter `retainData` from AlterTableDropPartition
> -
>
> Key: SPARK-33768
> URL: https://issues.apache.org/jira/browse/SPARK-33768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The parameter is hard-coded to false while parsing in AstBuilder. The 
> parameter can be removed from the logical node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33768:


Assignee: (was: Apache Spark)

> Remove unused parameter `retainData` from AlterTableDropPartition
> -
>
> Key: SPARK-33768
> URL: https://issues.apache.org/jira/browse/SPARK-33768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The parameter is hard-coded to false while parsing in AstBuilder. The 
> parameter can be removed from the logical node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33768:


Assignee: Apache Spark

> Remove unused parameter `retainData` from AlterTableDropPartition
> -
>
> Key: SPARK-33768
> URL: https://issues.apache.org/jira/browse/SPARK-33768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The parameter is hard-coded to false while parsing in AstBuilder. The 
> parameter can be removed from the logical node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248445#comment-17248445
 ] 

Apache Spark commented on SPARK-33768:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30748

> Remove unused parameter `retainData` from AlterTableDropPartition
> -
>
> Key: SPARK-33768
> URL: https://issues.apache.org/jira/browse/SPARK-33768
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The parameter is hard-coded to false while parsing in AstBuilder. The 
> parameter can be removed from the logical node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition

2020-12-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33768:
--

 Summary: Remove unused parameter `retainData` from 
AlterTableDropPartition
 Key: SPARK-33768
 URL: https://issues.apache.org/jira/browse/SPARK-33768
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The parameter is hard-coded to false while parsing in AstBuilder. The parameter 
can be removed from the logical node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33744) Canonicalization error in SortAggregate

2020-12-12 Thread Pablo Langa Blanco (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248417#comment-17248417
 ] 

Pablo Langa Blanco commented on SPARK-33744:


I'm taking a look at it, thanks for reporting.

 

> Canonicalization error in SortAggregate
> ---
>
> Key: SPARK-33744
> URL: https://issues.apache.org/jira/browse/SPARK-33744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Andy Grove
>Priority: Minor
>
> The canonicalization plan for a simple aggregate query is different each time 
> for SortAggregate but not for HashAggregate.
> The issue can be demonstrated by adding the following unit tests to 
> SQLQuerySuite. The HashAggregate test passes and the SortAggregate test fails.
> The first test has numeric input and the second test is operating on strings, 
> which forces the use of SortAggregate rather than HashAggregate.
> {code:java}
> test("HashAggregate canonicalization") {
>   val data = Seq((1, 1)).toDF("c0", "c1")
>   val df1 = data.groupBy(col("c0")).agg(first("c1"))
>   val df2 = data.groupBy(col("c0")).agg(first("c1"))
>   assert(df1.queryExecution.executedPlan.canonicalized ==
>   df2.queryExecution.executedPlan.canonicalized)
> }
> test("SortAggregate canonicalization") {
>   val data = Seq(("a", "a")).toDF("c0", "c1")
>   val df1 = data.groupBy(col("c0")).agg(first("c1"))
>   val df2 = data.groupBy(col("c0")).agg(first("c1"))
>   assert(df1.queryExecution.executedPlan.canonicalized ==
>   df2.queryExecution.executedPlan.canonicalized)
> } {code}
> The SortAggregate test fails with the following output .
> {code:java}
> SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
> #1])
> +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#105]
>   +- SortAggregate(key=[none#0], functions=[partial_first(none#1, 
> false)], output=[none#0, none#2, none#3])
>  +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
> +- *(1) Project [none#0 AS #0, none#1 AS #1]
>+- *(1) LocalTableScan [none#0, none#1]
>  did not equal 
> SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
> #1])
> +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#148]
>   +- SortAggregate(key=[none#0], functions=[partial_first(none#1, 
> false)], output=[none#0, none#2, none#3])
>  +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
> +- *(1) Project [none#0 AS #0, none#1 AS #1]
>+- *(1) LocalTableScan [none#0, none#1] {code}
> The error is caused by the resultExpression for the aggregate function being 
> assigned a new ExprId in the final aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26345) Parquet support Column indexes

2020-12-12 Thread James R. Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248412#comment-17248412
 ] 

James R. Taylor commented on SPARK-26345:
-

Those results are excellent, [~yumwang]. I thought from earlier comments that 
vectorized reads in Spark weren't compatible with column indexing? Do the child 
JIRAs here fix that?

> Parquet support Column indexes
> --
>
> Key: SPARK-26345
> URL: https://issues.apache.org/jira/browse/SPARK-26345
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Parquet 1.11.0 supports column indexing. Spark can supports this feature for 
> good read performance.
> More details:
> https://issues.apache.org/jira/browse/PARQUET-1201



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248403#comment-17248403
 ] 

Apache Spark commented on SPARK-33767:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30747

> Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
> ---
>
> Key: SPARK-33767
> URL: https://issues.apache.org/jira/browse/SPARK-33767
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33767:


Assignee: Maxim Gekk  (was: Apache Spark)

> Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
> ---
>
> Key: SPARK-33767
> URL: https://issues.apache.org/jira/browse/SPARK-33767
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248401#comment-17248401
 ] 

Apache Spark commented on SPARK-33767:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30747

> Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
> ---
>
> Key: SPARK-33767
> URL: https://issues.apache.org/jira/browse/SPARK-33767
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33767:


Assignee: Apache Spark  (was: Maxim Gekk)

> Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
> ---
>
> Key: SPARK-33767
> URL: https://issues.apache.org/jira/browse/SPARK-33767
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.1.0
>
>
> Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33767:
---
Description: Extract ALTER TABLE .. DROP PARTITION tests to the common 
place to run them for V1 and v2 datasources. Some tests can be places to V1 and 
V2 specific test suites.  (was: Extract ALTER TABLE .. ADD PARTITION tests to 
the common place to run them for V1 and v2 datasources. Some tests can be 
places to V1 and V2 specific test suites.)

> Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
> ---
>
> Key: SPARK-33767
> URL: https://issues.apache.org/jira/browse/SPARK-33767
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33767:
--

 Summary: Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
 Key: SPARK-33767
 URL: https://issues.apache.org/jira/browse/SPARK-33767
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.1.0


Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them for 
V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26345) Parquet support Column indexes

2020-12-12 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248374#comment-17248374
 ] 

Yuming Wang commented on SPARK-26345:
-

Benchmark and benchmark result:

{code:scala}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.sql.execution.benchmark

import java.io.File

import scala.util.Random

import org.apache.spark.SparkConf
import org.apache.spark.benchmark.Benchmark
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions.{monotonically_increasing_id, 
timestamp_seconds}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType
import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType}

/**
 * Benchmark to measure read performance with Parquet column index.
 * To run this benchmark:
 * {{{
 *   1. without sbt: bin/spark-submit --class  
 *   2. build/sbt "sql/test:runMain "
 *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
 *  Results will be written to 
"benchmarks/ParquetFilterPushdownBenchmark-results.txt".
 * }}}
 */
object ParquetFilterPushdownBenchmark extends SqlBasedBenchmark {

  override def getSparkSession: SparkSession = {
val conf = new SparkConf()
  .setAppName(this.getClass.getSimpleName)
  // Since `spark.master` always exists, overrides this value
  .set("spark.master", "local[1]")
  .setIfMissing("spark.driver.memory", "3g")
  .setIfMissing("spark.executor.memory", "3g")
  .setIfMissing("orc.compression", "snappy")
  .setIfMissing("spark.sql.parquet.compression.codec", "snappy")

SparkSession.builder().config(conf).getOrCreate()
  }

  private val numRows = 1024 * 1024 * 15
  private val width = 5
  private val mid = numRows / 2

  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
try f finally tableNames.foreach(spark.catalog.dropTempView)
  }

  private def prepareTable(
  dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
import spark.implicits._
val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
val valueCol = if (useStringForValue) {
  monotonically_increasing_id().cast("string")
} else {
  monotonically_increasing_id()
}
val df = spark.range(numRows).map(_ => 
Random.nextLong).selectExpr(selectExpr: _*)
  .withColumn("value", valueCol)
  .sort("value")

saveAsTable(df, dir)
  }

  private def prepareStringDictTable(
  dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
val selectExpr = (0 to width).map {
  case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
  case i => s"CAST(rand() AS STRING) c$i"
}
val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")

saveAsTable(df, dir, true)
  }

  private def saveAsTable(df: DataFrame, dir: File, useDictionary: Boolean = 
false): Unit = {
val parquetPath = dir.getCanonicalPath + "/parquet"
df.write.mode("overwrite").parquet(parquetPath)
spark.read.parquet(parquetPath).createOrReplaceTempView("parquetTable")
  }

  def filterPushDownBenchmark(
  values: Int,
  title: String,
  whereExpr: String,
  selectExpr: String = "*"): Unit = {
val benchmark = new Benchmark(title, values, minNumIters = 5, output = 
output)

Seq(false, true).foreach { columnIndexEnabled =>
  val name = s"Parquet Vectorized ${if (columnIndexEnabled) 
s"(columnIndex)" else ""}"
  benchmark.addCase(name) { _ =>
withSQLConf("parquet.filter.columnindex.enabled" -> 
s"$columnIndexEnabled") {
  spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE 
$whereExpr").noop()
}
  }
}

benchmark.run()
  }

  private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr 
=>
  val title = s"Select 0 int row ($whereExpr)".replace("value AND value", 
"value")
  filterPushDownBenchmark(numRows, title, whereExpr)
}

Seq(
  s"value = $mid",
  s"value <=> $mid",
  s"$mid 

[jira] [Commented] (SPARK-33734) Spark Core ::Spark core versions upto 3.0.1 using interdependency on Jackson-core-asl version 1.9.13, which is having security issues reported.

2020-12-12 Thread Aparna (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248364#comment-17248364
 ] 

Aparna commented on SPARK-33734:


[~hyukjin.kwon] 
Updated the title. I don't have the CVE ticket. 
Please let me know the updated version of Spark-core to pick. 

> Spark Core ::Spark core versions upto 3.0.1 using interdependency on 
> Jackson-core-asl version 1.9.13, which is having security issues reported. 
> 
>
> Key: SPARK-33734
> URL: https://issues.apache.org/jira/browse/SPARK-33734
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Aparna
>Priority: Major
>
> spark-core version upto latest 3.0.1 is using dependency 
> [org.apache.avro|https://mvnrepository.com/artifact/org.apache.avro] version 
> 1.8.2 which is having 
> [jackson-core-asl|https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-core-asl]
>  version 1.9.13 which has security issues.
> Please fix and share the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33734) Spark Core ::Spark core versions upto 3.0.1 using interdependency on Jackson-core-asl version 1.9.13, which is having security issues reported.

2020-12-12 Thread Aparna (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aparna updated SPARK-33734:
---
Summary: Spark Core ::Spark core versions upto 3.0.1 using interdependency 
on Jackson-core-asl version 1.9.13, which is having security issues reported.   
(was: Spark Core )

> Spark Core ::Spark core versions upto 3.0.1 using interdependency on 
> Jackson-core-asl version 1.9.13, which is having security issues reported. 
> 
>
> Key: SPARK-33734
> URL: https://issues.apache.org/jira/browse/SPARK-33734
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Aparna
>Priority: Major
>
> spark-core version upto latest 3.0.1 is using dependency 
> [org.apache.avro|https://mvnrepository.com/artifact/org.apache.avro] version 
> 1.8.2 which is having 
> [jackson-core-asl|https://mvnrepository.com/artifact/org.codehaus.jackson/jackson-core-asl]
>  version 1.9.13 which has security issues.
> Please fix and share the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32195) Standardize warning types and messages

2020-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-32195:


Assignee: Maciej Szymkiewicz

> Standardize warning types and messages
> --
>
> Key: SPARK-32195
> URL: https://issues.apache.org/jira/browse/SPARK-32195
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Szymkiewicz
>Priority: Major
>
> Currently PySpark uses a somewhat inconsistent warning type and message such 
> as UserWarning. We should standardize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33730) Standardize warning types

2020-12-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33730:


Assignee: Maciej Bryński

> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Bryński
>Priority: Major
>
> We should use warnings properly per 
> [https://docs.python.org/3/library/warnings.html#warning-categories]
> In particular,
>  - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
> places we should show the warnings to end-users by default.
>  - we should __maybe__ think about customizing stacklevel 
> ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas 
> does.
>  - ...
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code:java}
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:warnings.warn(
> pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
> pyspark/shuffle.py:warnings.warn("Please install psutil to have 
> better "
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
> and value is not None. value will be ignored.")
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
> approx_count_distinct instead.", DeprecationWarning)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/functions.py:warnings.warn(
> pyspark/sql/pandas/group_ops.py:warnings.warn(
> pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
> support because failing to access HiveConf, "
> {code}
> PySpark prints warnings via using {{print}} in some places as well. We should 
> also see if we should switch and replace to {{warnings.warn}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26345) Parquet support Column indexes

2020-12-12 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248323#comment-17248323
 ] 

Yuming Wang commented on SPARK-26345:
-

We have a pr test compatibility against Parquet 1.11.1, Avro 1.10.1 and Hive 
2.3.8: https://github.com/apache/spark/pull/30517

> Parquet support Column indexes
> --
>
> Key: SPARK-26345
> URL: https://issues.apache.org/jira/browse/SPARK-26345
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Parquet 1.11.0 supports column indexing. Spark can supports this feature for 
> good read performance.
> More details:
> https://issues.apache.org/jira/browse/PARQUET-1201



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33766) Upgrade Jackson to 2.11.4

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33766:


Assignee: (was: Apache Spark)

> Upgrade Jackson to 2.11.4
> -
>
> Key: SPARK-33766
> URL: https://issues.apache.org/jira/browse/SPARK-33766
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1.
> More details:
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11
> {noformat}
>  com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
> requires Jackson Databind version >= 2.10.0 and < 2.11.0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33766) Upgrade Jackson to 2.11.4

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248322#comment-17248322
 ] 

Apache Spark commented on SPARK-33766:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30746

> Upgrade Jackson to 2.11.4
> -
>
> Key: SPARK-33766
> URL: https://issues.apache.org/jira/browse/SPARK-33766
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1.
> More details:
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11
> {noformat}
>  com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
> requires Jackson Databind version >= 2.10.0 and < 2.11.0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33766) Upgrade Jackson to 2.11.4

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33766:


Assignee: Apache Spark

> Upgrade Jackson to 2.11.4
> -
>
> Key: SPARK-33766
> URL: https://issues.apache.org/jira/browse/SPARK-33766
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1.
> More details:
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11
> {noformat}
>  com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
> requires Jackson Databind version >= 2.10.0 and < 2.11.0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33766) Upgrade Jackson to 2.11.4

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248321#comment-17248321
 ] 

Apache Spark commented on SPARK-33766:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30746

> Upgrade Jackson to 2.11.4
> -
>
> Key: SPARK-33766
> URL: https://issues.apache.org/jira/browse/SPARK-33766
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1.
> More details:
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11
> {noformat}
>  com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
> requires Jackson Databind version >= 2.10.0 and < 2.11.0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33766) Upgrade Jackson to 2.11.4

2020-12-12 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33766:

Description: 
Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1.

More details:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11

{noformat}
 com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
requires Jackson Databind version >= 2.10.0 and < 2.11.0
{noformat}

  was:
Upgrade Jackson to 2.11.3 to make it easy to upgrade Avro 1.10.1.

More details:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11

{noformat}
 com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
requires Jackson Databind version >= 2.10.0 and < 2.11.0
{noformat}


> Upgrade Jackson to 2.11.4
> -
>
> Key: SPARK-33766
> URL: https://issues.apache.org/jira/browse/SPARK-33766
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Upgrade Jackson to 2.11.4 to make it easy to upgrade Avro 1.10.1.
> More details:
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11
> {noformat}
>  com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
> requires Jackson Databind version >= 2.10.0 and < 2.11.0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33766) Upgrade Jackson to 2.11.4

2020-12-12 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-33766:
---

 Summary: Upgrade Jackson to 2.11.4
 Key: SPARK-33766
 URL: https://issues.apache.org/jira/browse/SPARK-33766
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: Yuming Wang


Upgrade Jackson to 2.11.3 to make it easy to upgrade Avro 1.10.1.

More details:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11

{noformat}
 com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 
requires Jackson Databind version >= 2.10.0 and < 2.11.0
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33678) Numerical product aggregation

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248314#comment-17248314
 ] 

Apache Spark commented on SPARK-33678:
--

User 'rwpenney' has created a pull request for this issue:
https://github.com/apache/spark/pull/30745

> Numerical product aggregation
> -
>
> Key: SPARK-33678
> URL: https://issues.apache.org/jira/browse/SPARK-33678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.1.0
>Reporter: Richard Penney
>Priority: Minor
>
> There is currently no facility in {{spark.sql.functions}} to allow 
> computation of the product of all numbers in a grouping expression. Such a 
> facility would likely be useful when computing statistical quantities such as 
> the combined probability of a set of independent events, or in financial 
> applications when calculating a cumulative interest rate.
> Although it is certainly possible to emulate this by an expression of the 
> form {{exp(sum(log(column)))}}, this has a number of significant drawbacks:
>  * It involves computationally costly functions (exp, log)
>  * It is more verbose than something like {{product(column)}}
>  * It is more prone to numerical inaccuracies when handling quantities that 
> are close to one than by directly multiplying a set of numbers
>  * It will not handle zeros or negative numbers cleanly
> I am currently developing an addition to {{sql.functions}}, which involves [a 
> new Catalyst aggregation 
> expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala].
>  This needs some additional testing, and I hope to issue a pull-request soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33678) Numerical product aggregation

2020-12-12 Thread Richard Penney (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Penney updated SPARK-33678:
---
Flags: Patch

Pull-request now issued (https://github.com/apache/spark/pull/30745)

> Numerical product aggregation
> -
>
> Key: SPARK-33678
> URL: https://issues.apache.org/jira/browse/SPARK-33678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.1.0
>Reporter: Richard Penney
>Priority: Minor
>
> There is currently no facility in {{spark.sql.functions}} to allow 
> computation of the product of all numbers in a grouping expression. Such a 
> facility would likely be useful when computing statistical quantities such as 
> the combined probability of a set of independent events, or in financial 
> applications when calculating a cumulative interest rate.
> Although it is certainly possible to emulate this by an expression of the 
> form {{exp(sum(log(column)))}}, this has a number of significant drawbacks:
>  * It involves computationally costly functions (exp, log)
>  * It is more verbose than something like {{product(column)}}
>  * It is more prone to numerical inaccuracies when handling quantities that 
> are close to one than by directly multiplying a set of numbers
>  * It will not handle zeros or negative numbers cleanly
> I am currently developing an addition to {{sql.functions}}, which involves [a 
> new Catalyst aggregation 
> expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala].
>  This needs some additional testing, and I hope to issue a pull-request soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33678) Numerical product aggregation

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248312#comment-17248312
 ] 

Apache Spark commented on SPARK-33678:
--

User 'rwpenney' has created a pull request for this issue:
https://github.com/apache/spark/pull/30745

> Numerical product aggregation
> -
>
> Key: SPARK-33678
> URL: https://issues.apache.org/jira/browse/SPARK-33678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.1.0
>Reporter: Richard Penney
>Priority: Minor
>
> There is currently no facility in {{spark.sql.functions}} to allow 
> computation of the product of all numbers in a grouping expression. Such a 
> facility would likely be useful when computing statistical quantities such as 
> the combined probability of a set of independent events, or in financial 
> applications when calculating a cumulative interest rate.
> Although it is certainly possible to emulate this by an expression of the 
> form {{exp(sum(log(column)))}}, this has a number of significant drawbacks:
>  * It involves computationally costly functions (exp, log)
>  * It is more verbose than something like {{product(column)}}
>  * It is more prone to numerical inaccuracies when handling quantities that 
> are close to one than by directly multiplying a set of numbers
>  * It will not handle zeros or negative numbers cleanly
> I am currently developing an addition to {{sql.functions}}, which involves [a 
> new Catalyst aggregation 
> expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala].
>  This needs some additional testing, and I hope to issue a pull-request soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33678) Numerical product aggregation

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33678:


Assignee: Apache Spark

> Numerical product aggregation
> -
>
> Key: SPARK-33678
> URL: https://issues.apache.org/jira/browse/SPARK-33678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.1.0
>Reporter: Richard Penney
>Assignee: Apache Spark
>Priority: Minor
>
> There is currently no facility in {{spark.sql.functions}} to allow 
> computation of the product of all numbers in a grouping expression. Such a 
> facility would likely be useful when computing statistical quantities such as 
> the combined probability of a set of independent events, or in financial 
> applications when calculating a cumulative interest rate.
> Although it is certainly possible to emulate this by an expression of the 
> form {{exp(sum(log(column)))}}, this has a number of significant drawbacks:
>  * It involves computationally costly functions (exp, log)
>  * It is more verbose than something like {{product(column)}}
>  * It is more prone to numerical inaccuracies when handling quantities that 
> are close to one than by directly multiplying a set of numbers
>  * It will not handle zeros or negative numbers cleanly
> I am currently developing an addition to {{sql.functions}}, which involves [a 
> new Catalyst aggregation 
> expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala].
>  This needs some additional testing, and I hope to issue a pull-request soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33678) Numerical product aggregation

2020-12-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33678:


Assignee: (was: Apache Spark)

> Numerical product aggregation
> -
>
> Key: SPARK-33678
> URL: https://issues.apache.org/jira/browse/SPARK-33678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.1.0
>Reporter: Richard Penney
>Priority: Minor
>
> There is currently no facility in {{spark.sql.functions}} to allow 
> computation of the product of all numbers in a grouping expression. Such a 
> facility would likely be useful when computing statistical quantities such as 
> the combined probability of a set of independent events, or in financial 
> applications when calculating a cumulative interest rate.
> Although it is certainly possible to emulate this by an expression of the 
> form {{exp(sum(log(column)))}}, this has a number of significant drawbacks:
>  * It involves computationally costly functions (exp, log)
>  * It is more verbose than something like {{product(column)}}
>  * It is more prone to numerical inaccuracies when handling quantities that 
> are close to one than by directly multiplying a set of numbers
>  * It will not handle zeros or negative numbers cleanly
> I am currently developing an addition to {{sql.functions}}, which involves [a 
> new Catalyst aggregation 
> expression|https://github.com/rwpenney/spark/blob/feature/agg-product/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala].
>  This needs some additional testing, and I hope to issue a pull-request soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33589) Close opened session if the initialization fails

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248310#comment-17248310
 ] 

Apache Spark commented on SPARK-33589:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30744

> Close opened session if the initialization fails
> 
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33589) Close opened session if the initialization fails

2020-12-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248311#comment-17248311
 ] 

Apache Spark commented on SPARK-33589:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30744

> Close opened session if the initialization fails
> 
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org