date:20210702

[jira] [Commented] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373912#comment-17373912
 ] 

Apache Spark commented on SPARK-36006:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/33200

>  Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution 
> framework
> ---
>
> Key: SPARK-36006
> URL: https://issues.apache.org/jira/browse/SPARK-36006
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373911#comment-17373911
 ] 

Apache Spark commented on SPARK-36006:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/33200

>  Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution 
> framework
> ---
>
> Key: SPARK-36006
> URL: https://issues.apache.org/jira/browse/SPARK-36006
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36006:


Assignee: (was: Apache Spark)

>  Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution 
> framework
> ---
>
> Key: SPARK-36006
> URL: https://issues.apache.org/jira/browse/SPARK-36006
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36006:


Assignee: Apache Spark

>  Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution 
> framework
> ---
>
> Key: SPARK-36006
> URL: https://issues.apache.org/jira/browse/SPARK-36006
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>
> Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework

2021-07-02 Thread Terry Kim (Jira)

Terry Kim created SPARK-36006:
-

 Summary:  Migrate ALTER TABLE ADD/RENAME COLUMNS command to the 
new resolution framework
 Key: SPARK-36006
 URL: https://issues.apache.org/jira/browse/SPARK-36006
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Terry Kim


Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36005) in spark3.1.2 version The canCast method of type of char/varchar needs to be consistent with StringType

2021-07-02 Thread Sun BiaoBiao (Jira)

Sun BiaoBiao created SPARK-36005:


 Summary: in spark3.1.2 version The canCast method of type of 
char/varchar needs to be consistent with StringType
 Key: SPARK-36005
 URL: https://issues.apache.org/jira/browse/SPARK-36005
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2
Reporter: Sun BiaoBiao
 Fix For: 3.1.3


In https://github.com/apache/spark/pull/32109 this pr, we introduced the 
char/varchar type,

As described in this issue:

To be safe, this PR doesn't add char/varchar type to the query 
engine(expression input check, internal row framework, codegen framework, 
etc.). We will replace char/varchar type by string type with metadata 
(Attribute.metadata or StructField.metadata) that includes the original type 
string before it goes into the query engine. That said, the existing code will 
not see char/varchar type but only string type.


so The canCast method of type of char/varchar needs to be consistent with 
StringType



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36004) Update MiMa and audit Scala/Java API changes

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373859#comment-17373859
 ] 

Apache Spark commented on SPARK-36004:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33199

> Update MiMa and audit Scala/Java API changes
> 
>
> Key: SPARK-36004
> URL: https://issues.apache.org/jira/browse/SPARK-36004
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36004) Update MiMa and audit Scala/Java API changes

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36004:


Assignee: Apache Spark  (was: Dongjoon Hyun)

> Update MiMa and audit Scala/Java API changes
> 
>
> Key: SPARK-36004
> URL: https://issues.apache.org/jira/browse/SPARK-36004
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36004) Update MiMa and audit Scala/Java API changes

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36004:


Assignee: Dongjoon Hyun  (was: Apache Spark)

> Update MiMa and audit Scala/Java API changes
> 
>
> Key: SPARK-36004
> URL: https://issues.apache.org/jira/browse/SPARK-36004
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36004) Update MiMa and audit API changes

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36004:
-

Assignee: Dongjoon Hyun

> Update MiMa and audit API changes
> -
>
> Key: SPARK-36004
> URL: https://issues.apache.org/jira/browse/SPARK-36004
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36004) Update MiMa and audit Scala/Java API changes

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36004:
--
Summary: Update MiMa and audit Scala/Java API changes  (was: Update MiMa 
and audit API changes)

> Update MiMa and audit Scala/Java API changes
> 
>
> Key: SPARK-36004
> URL: https://issues.apache.org/jira/browse/SPARK-36004
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36004) Update MiMa and audit API changes

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36004:
--
Target Version/s: 3.2.0

> Update MiMa and audit API changes
> -
>
> Key: SPARK-36004
> URL: https://issues.apache.org/jira/browse/SPARK-36004
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36004) Update MiMa and audit API changes

2021-07-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-36004:
-

 Summary: Update MiMa and audit API changes
 Key: SPARK-36004
 URL: https://issues.apache.org/jira/browse/SPARK-36004
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS

2021-07-02 Thread t oo (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated SPARK-35974:
-
Affects Version/s: (was: 2.4.6)
   2.4.8
  Description: 
{code:java}
/var/lib/spark-2.4.8-bin-hadoop2.7/bin/spark-submit --master 
spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf 
spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
spark.hadoop.fs.s3a.secret.key='redact2' --conf 
spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
spark.hadoop.fs.s3a.session.token='redact3' --conf 
spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf 
spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
 --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
-DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf 
spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
-DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' 
--total-executor-cores 4 --executor-cores 2 --executor-memory 2g 
--driver-memory 1g --name lin1 --deploy-mode cluster --conf 
spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku 
s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
{code}
running the above command give below stack trace:

 
{code:java}
 Exception from the cluster:\njava.nio.file.AccessDeniedException: 
s3a://mybuc/metorikku_2.11.jar: getFileStatus on 
s3a://mybuc/metorikku_2.11.jar: 
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended 
Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)
org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)
org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code}
all the ec2s in the spark cluster only have access to s3 via STS tokens. The 
jar itself reads csvs from s3 using the tokens, and everything works if either 
1. i change the commandline to point to local jars on the ec2 OR 2. use port 
7077/client mode instead of cluster mode. But it seems the jar itself can't be 
launched off s3, as if the tokens are not being picked up properly.

  was:
{code:java}
/var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master 
spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf 
spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
spark.hadoop.fs.s3a.secret.key='redact2' --conf 
spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
spark.hadoop.fs.s3a.session.token='redact3' --conf 
spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf 
spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
 --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
-DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf 
spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
-DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' 
--total-executor-cores 4 --executor-cores 2 --executor-memory 2g 
--driver-memory 1g --name lin1 --deploy-mode cluster --conf 
spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku 
s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
{code}
running the above command give below stack trace:

 
{code:java}
 Exception from the cluster:\njava.nio.file.AccessDeniedException: 
s3a://mybuc/metorikku_2.11.jar: getFileStatus on 
s3a://mybuc/metorikku_2.11.jar: 
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended 
Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
org.apache.hadoop.fs.s3a.S3AFileSyst

[jira] [Reopened] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS

2021-07-02 Thread t oo (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo reopened SPARK-35974:
--

v2.4.8 is less than 2 months old

> Spark submit REST cluster/standalone mode - launching an s3a jar with STS
> -
>
> Key: SPARK-35974
> URL: https://issues.apache.org/jira/browse/SPARK-35974
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8
>Reporter: t oo
>Priority: Major
>
> {code:java}
> /var/lib/spark-2.4.8-bin-hadoop2.7/bin/spark-submit --master 
> spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf 
> spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
> spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf 
> spark.hadoop.fs.s3a.secret.key='redact2' --conf 
> spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
> spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf 
> spark.hadoop.fs.s3a.session.token='redact3' --conf 
> spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf 
> spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf 
> spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
>  --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf 
> spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' 
> --total-executor-cores 4 --executor-cores 2 --executor-memory 2g 
> --driver-memory 1g --name lin1 --deploy-mode cluster --conf 
> spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku 
> s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
> {code}
> running the above command give below stack trace:
>  
> {code:java}
>  Exception from the cluster:\njava.nio.file.AccessDeniedException: 
> s3a://mybuc/metorikku_2.11.jar: getFileStatus on 
> s3a://mybuc/metorikku_2.11.jar: 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended 
> Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
> org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)
> org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)
> org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
> org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
> org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code}
> all the ec2s in the spark cluster only have access to s3 via STS tokens. The 
> jar itself reads csvs from s3 using the tokens, and everything works if 
> either 1. i change the commandline to point to local jars on the ec2 OR 2. 
> use port 7077/client mode instead of cluster mode. But it seems the jar 
> itself can't be launched off s3, as if the tokens are not being picked up 
> properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35993) Flaky test: org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that concurrent update and cleanup consistent versions

2021-07-02 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373855#comment-17373855
 ] 

Jungtaek Lim commented on SPARK-35993:
--

Thanks for reporting [~gsomogyi]! The test is marked as "ignored" for now. We 
will try to fix or remove the test via this JIRA issue.

> Flaky test: 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that 
> concurrent update and cleanup consistent versions
> -
>
> Key: SPARK-35993
> URL: https://issues.apache.org/jira/browse/SPARK-35993
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.2.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> Appeared in jenkins: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140575/testReport/org.apache.spark.sql.execution.streaming.state/RocksDBSuite/ensure_that_concurrent_update_and_cleanup_consistent_versions/
> {code:java}
> Error Message
> java.io.FileNotFoundException: File 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip
>  does not exist
> Stacktrace
> sbt.ForkMain$ForkError: java.io.FileNotFoundException: File 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
>   at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:74)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
>   at org.apache.spark.util.Utils$.unzipFilesFromFile(Utils.scala:3132)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.loadCheckpointFromDfs(RocksDBFileManager.scala:174)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDB.load(RocksDB.scala:103)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.withDB(RocksDBSuite.scala:443)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$57(RocksDBSuite.scala:397)
>   at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$56(RocksDBSuite.scala:341)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
>   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
>   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest

[jira] [Updated] (SPARK-35993) Flaky test: org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that concurrent update and cleanup consistent versions

2021-07-02 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-35993:
-
Affects Version/s: (was: 3.1.2)
   3.2.0

> Flaky test: 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that 
> concurrent update and cleanup consistent versions
> -
>
> Key: SPARK-35993
> URL: https://issues.apache.org/jira/browse/SPARK-35993
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.2.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> Appeared in jenkins: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140575/testReport/org.apache.spark.sql.execution.streaming.state/RocksDBSuite/ensure_that_concurrent_update_and_cleanup_consistent_versions/
> {code:java}
> Error Message
> java.io.FileNotFoundException: File 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip
>  does not exist
> Stacktrace
> sbt.ForkMain$ForkError: java.io.FileNotFoundException: File 
> /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
>   at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:74)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
>   at org.apache.spark.util.Utils$.unzipFilesFromFile(Utils.scala:3132)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.loadCheckpointFromDfs(RocksDBFileManager.scala:174)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDB.load(RocksDB.scala:103)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.withDB(RocksDBSuite.scala:443)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$57(RocksDBSuite.scala:397)
>   at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
>   at 
> org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$56(RocksDBSuite.scala:341)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
>   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
>   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
>   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
>   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
>   at org.

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2021-07-02 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373849#comment-17373849
 ] 

Jim Kleckner commented on SPARK-33349:
--

[~redsk] can you confirm that https://issues.apache.org/jira/browse/SPARK-33471 
 fixes your issue with 4.12.0 ?

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2, 3.1.0
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36003) Implement unary operator `invert` of numeric ps.Series/Index

2021-07-02 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-36003:


 Summary: Implement unary operator `invert` of numeric 
ps.Series/Index
 Key: SPARK-36003
 URL: https://issues.apache.org/jira/browse/SPARK-36003
 Project: Spark
  Issue Type: Story
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng


 
{code:java}
>>> ~ps.Series([1, 2, 3])
Traceback (most recent call last):
...
pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data 
type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint 
type.;
'Project [unresolvedalias(NOT 0#1L, 
Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))]
+- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS 
__natural_order__#4L]
 +- LogicalRDD [__index_level_0__#0L, 0#1L], false
{code}
 

 Currently, unary operator `invert` of numeric ps.Series/Index is not 
supported. We ought to implement that following pandas' behaviors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36002) Consolidate tests for data-type-based operations of decimal Series

2021-07-02 Thread Xinrong Meng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373839#comment-17373839
 ] 

Xinrong Meng commented on SPARK-36002:
--

How do you think about that? [~yikunkero]

> Consolidate tests for data-type-based operations of decimal Series
> --
>
> Key: SPARK-36002
> URL: https://issues.apache.org/jira/browse/SPARK-36002
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Tests for data-type-based operations of decimal Series are in two places:
>  * python/pyspark/pandas/tests/data_type_ops/test_decimal_ops.py
>  * python/pyspark/pandas/tests/data_type_ops/test_num_ops.py
> We'd better either merge test_decimal_ops into test_num_ops or keep all tests 
> related to decimal Series to test_decimal_ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36002) Consolidate tests for data-type-based operations of decimal Series

2021-07-02 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-36002:


 Summary: Consolidate tests for data-type-based operations of 
decimal Series
 Key: SPARK-36002
 URL: https://issues.apache.org/jira/browse/SPARK-36002
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng


Tests for data-type-based operations of decimal Series are in two places:
 * python/pyspark/pandas/tests/data_type_ops/test_decimal_ops.py
 * python/pyspark/pandas/tests/data_type_ops/test_num_ops.py

We'd better either merge test_decimal_ops into test_num_ops or keep all tests 
related to decimal Series to test_decimal_ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36001) Assume result's index to be disordered in tests with operations on different Series

2021-07-02 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-36001:


 Summary:  Assume result's index to be disordered in tests with 
operations on different Series 
 Key: SPARK-36001
 URL: https://issues.apache.org/jira/browse/SPARK-36001
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng


We have many tests with operations on different Series in 
spark/python/pyspark/pandas/tests/data_type_ops/ that assume the result's index 
to be sorted and then compare to the pandas' behavior. The assumption is wrong, 
so we should sort index of such result before comparing with pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36000) Support creating a ps.Series/Index with `Decimal('NaN')` with Arrow disabled

2021-07-02 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-36000:


 Summary: Support creating a ps.Series/Index with `Decimal('NaN')` 
with Arrow disabled
 Key: SPARK-36000
 URL: https://issues.apache.org/jira/browse/SPARK-36000
 Project: Spark
  Issue Type: Story
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng


 
{code:java}
>>> import decimal as d
>>> import pyspark.pandas as ps
>>> import numpy as np
>>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled',
>>>  True)
>>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)])
0   1
1   2
2None
dtype: object
>>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled',
>>>  False)
>>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)])
21/07/02 15:01:07 ERROR Executor: Exception in task 6.0 in stage 13.0 (TID 51)
net.razorvine.pickle.PickleException: problem construction object: 
java.lang.reflect.InvocationTargetException
...

{code}
As the code is shown above, we cannot create a Series with `Decimal('NaN')` 
when Arrow disabled. We ought to fix that.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35996.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33196
[https://github.com/apache/spark/pull/33196]

> Setting version to 3.3.0-SNAPSHOT
> -
>
> Key: SPARK-35996
> URL: https://issues.apache.org/jira/browse/SPARK-35996
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35996:
-

Assignee: Dongjoon Hyun

> Setting version to 3.3.0-SNAPSHOT
> -
>
> Key: SPARK-35996
> URL: https://issues.apache.org/jira/browse/SPARK-35996
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373753#comment-17373753
 ] 

Apache Spark commented on SPARK-35995:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33197

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373754#comment-17373754
 ] 

Apache Spark commented on SPARK-35995:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33197

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35999) Make from_csv/to_csv to handle day-time intervals properly

2021-07-02 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-35999:
--

 Summary: Make from_csv/to_csv to handle day-time intervals properly
 Key: SPARK-35999
 URL: https://issues.apache.org/jira/browse/SPARK-35999
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


from_csv throws exception if day-time interval types are given.

{code}
spark-sql> select from_csv("interval '1 2:3:4' day to second", "a interval day 
to second");
21/07/03 04:39:13 ERROR SparkSQLDriver: Failed in [select from_csv("interval '1 
2:3:4' day to second", "a interval day to second")]
java.lang.Exception: Unsupported type: interval day to second
 at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775)
 at 
org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224)
 at 
org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134)
 {code}

Also, to_csv doesn't handle day-time interval types properly though any 
exception is thrown.
The result of to_csv for day-time interval types is not ANSI interval compliant 
form.

{code}
spark-sql> select to_csv(named_struct("a", interval '1 2:3:4' day to second));
9378400
{code}

The result above should be INTERVAL '1 02:03:04' DAY TO SECOND.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35998) Make from_csv/to_csv to handle year-month intervals properly

2021-07-02 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-35998:
--

 Summary: Make from_csv/to_csv to handle year-month intervals 
properly
 Key: SPARK-35998
 URL: https://issues.apache.org/jira/browse/SPARK-35998
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


from_csv throws exception if year-month interval types are given.
{code}
spark-sql> select from_csv("interval '1-2' year to month", "a interval year to 
month");
21/07/03 04:32:24 ERROR SparkSQLDriver: Failed in [select from_csv("interval 
'1-2' year to month", "a interval year to month")]
java.lang.Exception: Unsupported type: interval year to month
at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775)
at 
org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224)
at 
org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134)
{code}

Also, to_csv doesn't handle year-month interval types properly though any 
exception is thrown.
The result of to_csv for year-month interval types is not ANSI interval 
compliant form.

{code}
spark-sql> select to_csv(named_struct("a", interval '1-2' year to month));
14
{code}

The result above should be INTERVAL '1-2' YEAR TO MONTH.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35997) Implement comparison operators for CategoricalDtype in pandas API on Spark

2021-07-02 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-35997:


 Summary: Implement comparison operators for CategoricalDtype in 
pandas API on Spark
 Key: SPARK-35997
 URL: https://issues.apache.org/jira/browse/SPARK-35997
 Project: Spark
  Issue Type: Story
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng


In pandas API on Spark, "<, <=, >, >=" have not been implemented for 
CategoricalDtype.

We ought to match pandas' behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373714#comment-17373714
 ] 

Apache Spark commented on SPARK-35996:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33196

> Setting version to 3.3.0-SNAPSHOT
> -
>
> Key: SPARK-35996
> URL: https://issues.apache.org/jira/browse/SPARK-35996
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373712#comment-17373712
 ] 

Apache Spark commented on SPARK-35996:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33196

> Setting version to 3.3.0-SNAPSHOT
> -
>
> Key: SPARK-35996
> URL: https://issues.apache.org/jira/browse/SPARK-35996
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35996:


Assignee: Apache Spark

> Setting version to 3.3.0-SNAPSHOT
> -
>
> Key: SPARK-35996
> URL: https://issues.apache.org/jira/browse/SPARK-35996
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35996:


Assignee: (was: Apache Spark)

> Setting version to 3.3.0-SNAPSHOT
> -
>
> Key: SPARK-35996
> URL: https://issues.apache.org/jira/browse/SPARK-35996
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT

2021-07-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-35996:
-

 Summary: Setting version to 3.3.0-SNAPSHOT
 Key: SPARK-35996
 URL: https://issues.apache.org/jira/browse/SPARK-35996
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35990) Remove avro-sbt plugin dependency

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35990.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33190
[https://github.com/apache/spark/pull/33190]

> Remove avro-sbt plugin dependency
> -
>
> Key: SPARK-35990
> URL: https://issues.apache.org/jira/browse/SPARK-35990
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.2.0
>
>
> avro-sbt plugin seems to be no longer used in build.
> Let's consider to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35995.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33194
[https://github.com/apache/spark/pull/33194]

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35995:
-

Assignee: Dongjoon Hyun

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35994) Publish snapshot from branch-3.2

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35994:
--
Fix Version/s: (was: 3.3.0)
   3.2.0

> Publish snapshot from branch-3.2
> 
>
> Key: SPARK-35994
> URL: https://issues.apache.org/jira/browse/SPARK-35994
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35994) Publish snapshot from branch-3.2

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35994.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33192
[https://github.com/apache/spark/pull/33192]

> Publish snapshot from branch-3.2
> 
>
> Key: SPARK-35994
> URL: https://issues.apache.org/jira/browse/SPARK-35994
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35994) Publish snapshot from branch-3.2

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35994:
-

Assignee: Dongjoon Hyun

> Publish snapshot from branch-3.2
> 
>
> Key: SPARK-35994
> URL: https://issues.apache.org/jira/browse/SPARK-35994
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373694#comment-17373694
 ] 

Apache Spark commented on SPARK-35995:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33194

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35785) Cleanup support for RocksDB instance

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373695#comment-17373695
 ] 

Apache Spark commented on SPARK-35785:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/33195

> Cleanup support for RocksDB instance
> 
>
> Key: SPARK-35785
> URL: https://issues.apache.org/jira/browse/SPARK-35785
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.2.0, 3.3.0
>
>
> Add the functionality of cleaning up files of old versions for the RocksDB 
> instance and RocksDBFileManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373693#comment-17373693
 ] 

Apache Spark commented on SPARK-35981:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33193

> Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check 
> precision
> ---
>
> Key: SPARK-35981
> URL: https://issues.apache.org/jira/browse/SPARK-35981
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>
> In some environment, the precision could be different in {{DataFrame.corr}} 
> function.
> We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35995:


Assignee: (was: Apache Spark)

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35995:


Assignee: Apache Spark

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373691#comment-17373691
 ] 

Apache Spark commented on SPARK-35981:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33193

> Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check 
> precision
> ---
>
> Key: SPARK-35981
> URL: https://issues.apache.org/jira/browse/SPARK-35981
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>
> In some environment, the precision could be different in {{DataFrame.corr}} 
> function.
> We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373692#comment-17373692
 ] 

Apache Spark commented on SPARK-35995:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33194

> Enable GitHub Action build_and_test on branch-3.2
> -
>
> Key: SPARK-35995
> URL: https://issues.apache.org/jira/browse/SPARK-35995
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2

2021-07-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-35995:
-

 Summary: Enable GitHub Action build_and_test on branch-3.2
 Key: SPARK-35995
 URL: https://issues.apache.org/jira/browse/SPARK-35995
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35994) Publish snapshot from branch-3.2

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35994:


Assignee: Apache Spark

> Publish snapshot from branch-3.2
> 
>
> Key: SPARK-35994
> URL: https://issues.apache.org/jira/browse/SPARK-35994
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35994) Publish snapshot from branch-3.2

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35994:


Assignee: (was: Apache Spark)

> Publish snapshot from branch-3.2
> 
>
> Key: SPARK-35994
> URL: https://issues.apache.org/jira/browse/SPARK-35994
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35994) Publish snapshot from branch-3.2

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373665#comment-17373665
 ] 

Apache Spark commented on SPARK-35994:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33192

> Publish snapshot from branch-3.2
> 
>
> Key: SPARK-35994
> URL: https://issues.apache.org/jira/browse/SPARK-35994
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35994) Publish snapshot from branch-3.2

2021-07-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-35994:
-

 Summary: Publish snapshot from branch-3.2
 Key: SPARK-35994
 URL: https://issues.apache.org/jira/browse/SPARK-35994
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35992.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33189
[https://github.com/apache/spark/pull/33189]

> Upgrade ORC to 1.6.9
> 
>
> Key: SPARK-35992
> URL: https://issues.apache.org/jira/browse/SPARK-35992
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
> Fix For: 3.2.0
>
>
> This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC encryption 
> masking fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35992:
-

Assignee: Dongjoon Hyun

> Upgrade ORC to 1.6.9
> 
>
> Key: SPARK-35992
> URL: https://issues.apache.org/jira/browse/SPARK-35992
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC encryption 
> masking fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35985) File source V2 ignores partition filters when empty readDataSchema

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35985:


Assignee: (was: Apache Spark)

> File source V2 ignores partition filters when empty readDataSchema
> --
>
> Key: SPARK-35985
> URL: https://issues.apache.org/jira/browse/SPARK-35985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Steven Aerts
>Priority: Major
>
> A V2 datasource fails to rely on partition filters when it only wants to know 
> how many entries there are, and is not interested of their context.
> So when the {{readDataSchema}} of the {{FileScan}} is empty, partition 
> filters are not pushed down and all data is scanned.
> Some examples where this happens:
> {code:java}
> scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(2) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#136]
>  +- *(1) HashAggregate(keys=[], functions=[partial_count(1)])
>  +- *(1) Project
>  +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702))
>  +- *(1) ColumnarToRow
>  +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: 
> InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: 
> [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]
> scala> spark.sql("SELECT input_file_name() FROM parq WHERE 
> day=20210702").explain
> == Physical Plan ==
> *(1) Project [input_file_name() AS input_file_name()#131]
> +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702))
>  +- *(1) ColumnarToRow
>  +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: 
> InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: 
> [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]
> {code}
>  
> Once the {{readDataSchema}} is not empty, it works correctly:
> {code:java}
> scala> spark.sql("SELECT header.tenant FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(1) Project [header#51.tenant AS tenant#199]
> +- BatchScan[header#51, day#68] ParquetScan DataFilters: [], Format: parquet, 
> Location: InMemoryFileIndex[file:/..., PartitionFilters: [isnotnull(day#68), 
> (day#68 = 20210702)], PushedFilers: [IsNotNull(day), EqualTo(day,20210702)], 
> ReadSchema: struct>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]{code}
>  
> In V1 this optimization is available:
> {code:java}
> scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(2) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#27]
>  +- *(1) HashAggregate(keys=[], functions=[partial_count(1)])
>  +- *(1) Project
>  +- *(1) ColumnarToRow
>  +- FileScan parquet [year#15,month#16,day#17,hour#18] Batched: true, 
> DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/..., 
> PartitionFilters: [isnotnull(day#17), (day#17 = 20210702)], PushedFilters: 
> [], ReadSchema: struct<>{code}
> The examples use {{ParquetScan}}, but the problem happens for all File based 
> V2 datasources.
> The fix for this issue feels very straight forward. In 
> {{PruneFileSourcePartitions}} queries with an empty {{readDataSchema}} are 
> explicitly excluded from being pushed down:
> {code:java}
> if filters.nonEmpty && scan.readDataSchema.nonEmpty =>{code}
> Removing that condition seems to fix the issue however, this might be too 
> naive.
> I am making a PR with tests where this change can be discussed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35985) File source V2 ignores partition filters when empty readDataSchema

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35985:


Assignee: Apache Spark

> File source V2 ignores partition filters when empty readDataSchema
> --
>
> Key: SPARK-35985
> URL: https://issues.apache.org/jira/browse/SPARK-35985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Steven Aerts
>Assignee: Apache Spark
>Priority: Major
>
> A V2 datasource fails to rely on partition filters when it only wants to know 
> how many entries there are, and is not interested of their context.
> So when the {{readDataSchema}} of the {{FileScan}} is empty, partition 
> filters are not pushed down and all data is scanned.
> Some examples where this happens:
> {code:java}
> scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(2) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#136]
>  +- *(1) HashAggregate(keys=[], functions=[partial_count(1)])
>  +- *(1) Project
>  +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702))
>  +- *(1) ColumnarToRow
>  +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: 
> InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: 
> [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]
> scala> spark.sql("SELECT input_file_name() FROM parq WHERE 
> day=20210702").explain
> == Physical Plan ==
> *(1) Project [input_file_name() AS input_file_name()#131]
> +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702))
>  +- *(1) ColumnarToRow
>  +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: 
> InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: 
> [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]
> {code}
>  
> Once the {{readDataSchema}} is not empty, it works correctly:
> {code:java}
> scala> spark.sql("SELECT header.tenant FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(1) Project [header#51.tenant AS tenant#199]
> +- BatchScan[header#51, day#68] ParquetScan DataFilters: [], Format: parquet, 
> Location: InMemoryFileIndex[file:/..., PartitionFilters: [isnotnull(day#68), 
> (day#68 = 20210702)], PushedFilers: [IsNotNull(day), EqualTo(day,20210702)], 
> ReadSchema: struct>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]{code}
>  
> In V1 this optimization is available:
> {code:java}
> scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(2) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#27]
>  +- *(1) HashAggregate(keys=[], functions=[partial_count(1)])
>  +- *(1) Project
>  +- *(1) ColumnarToRow
>  +- FileScan parquet [year#15,month#16,day#17,hour#18] Batched: true, 
> DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/..., 
> PartitionFilters: [isnotnull(day#17), (day#17 = 20210702)], PushedFilters: 
> [], ReadSchema: struct<>{code}
> The examples use {{ParquetScan}}, but the problem happens for all File based 
> V2 datasources.
> The fix for this issue feels very straight forward. In 
> {{PruneFileSourcePartitions}} queries with an empty {{readDataSchema}} are 
> explicitly excluded from being pushed down:
> {code:java}
> if filters.nonEmpty && scan.readDataSchema.nonEmpty =>{code}
> Removing that condition seems to fix the issue however, this might be too 
> naive.
> I am making a PR with tests where this change can be discussed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35985) File source V2 ignores partition filters when empty readDataSchema

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373557#comment-17373557
 ] 

Apache Spark commented on SPARK-35985:
--

User 'steven-aerts' has created a pull request for this issue:
https://github.com/apache/spark/pull/33191

> File source V2 ignores partition filters when empty readDataSchema
> --
>
> Key: SPARK-35985
> URL: https://issues.apache.org/jira/browse/SPARK-35985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Steven Aerts
>Priority: Major
>
> A V2 datasource fails to rely on partition filters when it only wants to know 
> how many entries there are, and is not interested of their context.
> So when the {{readDataSchema}} of the {{FileScan}} is empty, partition 
> filters are not pushed down and all data is scanned.
> Some examples where this happens:
> {code:java}
> scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(2) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#136]
>  +- *(1) HashAggregate(keys=[], functions=[partial_count(1)])
>  +- *(1) Project
>  +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702))
>  +- *(1) ColumnarToRow
>  +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: 
> InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: 
> [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]
> scala> spark.sql("SELECT input_file_name() FROM parq WHERE 
> day=20210702").explain
> == Physical Plan ==
> *(1) Project [input_file_name() AS input_file_name()#131]
> +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702))
>  +- *(1) ColumnarToRow
>  +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: 
> InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: 
> [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]
> {code}
>  
> Once the {{readDataSchema}} is not empty, it works correctly:
> {code:java}
> scala> spark.sql("SELECT header.tenant FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(1) Project [header#51.tenant AS tenant#199]
> +- BatchScan[header#51, day#68] ParquetScan DataFilters: [], Format: parquet, 
> Location: InMemoryFileIndex[file:/..., PartitionFilters: [isnotnull(day#68), 
> (day#68 = 20210702)], PushedFilers: [IsNotNull(day), EqualTo(day,20210702)], 
> ReadSchema: struct>, PushedFilters: 
> [IsNotNull(day), EqualTo(day,20210702)]{code}
>  
> In V1 this optimization is available:
> {code:java}
> scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain
> == Physical Plan ==
> *(2) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#27]
>  +- *(1) HashAggregate(keys=[], functions=[partial_count(1)])
>  +- *(1) Project
>  +- *(1) ColumnarToRow
>  +- FileScan parquet [year#15,month#16,day#17,hour#18] Batched: true, 
> DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/..., 
> PartitionFilters: [isnotnull(day#17), (day#17 = 20210702)], PushedFilters: 
> [], ReadSchema: struct<>{code}
> The examples use {{ParquetScan}}, but the problem happens for all File based 
> V2 datasources.
> The fix for this issue feels very straight forward. In 
> {{PruneFileSourcePartitions}} queries with an empty {{readDataSchema}} are 
> explicitly excluded from being pushed down:
> {code:java}
> if filters.nonEmpty && scan.readDataSchema.nonEmpty =>{code}
> Removing that condition seems to fix the issue however, this might be too 
> naive.
> I am making a PR with tests where this change can be discussed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34883) Setting CSV reader option "multiLine" to "true" causes URISyntaxException when colon is in file path

2021-07-02 Thread Mike Pieters (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373552#comment-17373552
 ] 

Mike Pieters commented on SPARK-34883:
--

I've got the same error here when I try to run:
{code:java}
spark.read.csv(URL_ABFS_RAW + "/salesforce/Case/timestamp=2021-07-02 
00:14:15.129481", header=True, multiLine=True)
{code}
I'm running Spark 3.0.1

 

> Setting CSV reader option "multiLine" to "true" causes URISyntaxException 
> when colon is in file path
> 
>
> Key: SPARK-34883
> URL: https://issues.apache.org/jira/browse/SPARK-34883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.1
>Reporter: Brady Tello
>Priority: Major
>
> Setting the CSV reader's "multiLine" option to "True" throws the following 
> exception when a ':' character is in the file path.
>  
> {code:java}
> java.net.URISyntaxException: Relative path in absolute URI: test:dir
> {code}
> I've tested this in both Spark 3.0.0 and Spark 3.1.1 and I get the same error 
> whether I use Scala, Python, or SQL.
> The following code works fine:
>  
> {code:java}
> csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" 
> tempDF = (spark.read.option("sep", "\t").csv(csvFile)
> {code}
> While the following code fails:
>  
> {code:java}
> csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv"
> tempDF = (spark.read.option("sep", "\t").option("multiLine", 
> "True").csv(csvFile)
> {code}
> Full Stack Trace from Python:
>  
> {code:java}
> --- 
> IllegalArgumentException Traceback (most recent call last)  
> in  
> 3 csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" 
> 4 
> > 5  tempDF = (spark.read.option("sep", "\t").option("multiLine", "True") 
> /databricks/spark/python/pyspark/sql/readwriter.py in csv(self, path, schema, 
> sep, encoding, quote, escape, comment, header, inferSchema, 
> ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, nullValue, nanValue, 
> positiveInf, negativeInf, dateFormat, timestampFormat, maxColumns, 
> maxCharsPerColumn, maxMalformedLogPerPartition, mode, 
> columnNameOfCorruptRecord, multiLine, charToEscapeQuoteEscaping, 
> samplingRatio, enforceSchema, emptyValue, locale, lineSep, pathGlobFilter, 
> recursiveFileLookup, modifiedBefore, modifiedAfter, unescapedQuoteHandling) 
> 735 path = [path] 
> 736 if type(path) == list: 
> --> 737 return 
> self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) 
> 738 elif isinstance(path, RDD): 
> 739 def func(iterator): 
> /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in 
> __call__(self, *args) 
> 1302 
> 1303 answer = self.gateway_client.send_command(command) 
> -> 1304 return_value = get_return_value( 
> 1305 answer, self.gateway_client, self.target_id, self.name) 
> 1306 
> /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 
> 114 # Hide where the exception came from that shows a non-Pythonic 
> 115 # JVM exception message. 
> --> 116 raise converted from None 
> 117 else: 
> 118 raise IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: test:dir
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23814) Couldn't read file with colon in name and new line character in one of the field.

2021-07-02 Thread Mike Pieters (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373526#comment-17373526
 ] 

Mike Pieters commented on SPARK-23814:
--

I also got the same error in version 3.0.1

> Couldn't read file with colon in name and new line character in one of the 
> field.
> -
>
> Key: SPARK-23814
> URL: https://issues.apache.org/jira/browse/SPARK-23814
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.2.0
>Reporter: bharath kumar avusherla
>Priority: Major
>
> When the file name has colon and new line character in data, while reading 
> using spark.read.option("multiLine","true").csv("s3n://DirectoryPath/") 
> function. It is throwing *"**java.lang.IllegalArgumentException: 
> java.net.URISyntaxException: Relative path in absolute URI: 
> 2017-08-01T00:00:00Z.csv.gz"* error. If we remove the 
> option("multiLine","true"), it is working just fine though the file name has 
> colon in it. It is working fine, If i apply this option 
> *option("multiLine","true")* on any other file which doesn't have colon in 
> it. But when both are present (colon in file name and new line in the data), 
> it's not working.
> {quote}java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: 2017-08-01T00:00:00Z.csv.gz
>   at org.apache.hadoop.fs.Path.initialize(Path.java:205)
>   at org.apache.hadoop.fs.Path.(Path.java:171)
>   at org.apache.hadoop.fs.Path.(Path.java:93)
>   at org.apache.hadoop.fs.Globber.glob(Globber.java:253)
>   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1676)
>   at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294)
>   at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
>   at 
> org.apache.spark.input.StreamFileInputFormat.setMinPartitions(PortableDataStream.scala:51)
>   at org.apache.spark.rdd.BinaryFileRDD.getPartitions(BinaryFileRDD.scala:46)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
>   at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
>   at 
> org.apache.spark.sql.execution.datasources.csv.MultiLineCSVDataSource$.infer(CSVDataSource.scala:224)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:62)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:177)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:177)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:176)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533)
>   at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412)
>   ... 48 elided
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> 2017-08-01T00:00:00Z.csv.gz
>   at java.net.URI.checkPath(URI.java:1823)
>   at java.net.URI.(URI.java:745)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:202)
>   ... 86 more
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

--

[jira] [Created] (SPARK-35993) Flaky test: org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that concurrent update and cleanup consistent versions

2021-07-02 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-35993:
-

 Summary: Flaky test: 
org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that 
concurrent update and cleanup consistent versions
 Key: SPARK-35993
 URL: https://issues.apache.org/jira/browse/SPARK-35993
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.1.2
Reporter: Gabor Somogyi


Appeared in jenkins: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140575/testReport/org.apache.spark.sql.execution.streaming.state/RocksDBSuite/ensure_that_concurrent_update_and_cleanup_consistent_versions/

{code:java}
Error Message
java.io.FileNotFoundException: File 
/home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip
 does not exist
Stacktrace
sbt.ForkMain$ForkError: java.io.FileNotFoundException: File 
/home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:74)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
at org.apache.spark.util.Utils$.unzipFilesFromFile(Utils.scala:3132)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.loadCheckpointFromDfs(RocksDBFileManager.scala:174)
at 
org.apache.spark.sql.execution.streaming.state.RocksDB.load(RocksDB.scala:103)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBSuite.withDB(RocksDBSuite.scala:443)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$57(RocksDBSuite.scala:397)
at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
at 
org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$56(RocksDBSuite.scala:341)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at 
org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.funsuite.AnyFunSuiteLike.

[jira] [Commented] (SPARK-35990) Remove avro-sbt plugin dependency

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373443#comment-17373443
 ] 

Apache Spark commented on SPARK-35990:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33190

> Remove avro-sbt plugin dependency
> -
>
> Key: SPARK-35990
> URL: https://issues.apache.org/jira/browse/SPARK-35990
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> avro-sbt plugin seems to be no longer used in build.
> Let's consider to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35990) Remove avro-sbt plugin dependency

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35990:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Remove avro-sbt plugin dependency
> -
>
> Key: SPARK-35990
> URL: https://issues.apache.org/jira/browse/SPARK-35990
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> avro-sbt plugin seems to be no longer used in build.
> Let's consider to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35990) Remove avro-sbt plugin dependency

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373442#comment-17373442
 ] 

Apache Spark commented on SPARK-35990:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33190

> Remove avro-sbt plugin dependency
> -
>
> Key: SPARK-35990
> URL: https://issues.apache.org/jira/browse/SPARK-35990
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> avro-sbt plugin seems to be no longer used in build.
> Let's consider to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35990) Remove avro-sbt plugin dependency

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35990:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Remove avro-sbt plugin dependency
> -
>
> Key: SPARK-35990
> URL: https://issues.apache.org/jira/browse/SPARK-35990
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> avro-sbt plugin seems to be no longer used in build.
> Let's consider to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35992:
--
Description: This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC 
encryption masking fix.

> Upgrade ORC to 1.6.9
> 
>
> Key: SPARK-35992
> URL: https://issues.apache.org/jira/browse/SPARK-35992
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC encryption 
> masking fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35992:
--
Priority: Critical  (was: Major)

> Upgrade ORC to 1.6.9
> 
>
> Key: SPARK-35992
> URL: https://issues.apache.org/jira/browse/SPARK-35992
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35992:


Assignee: (was: Apache Spark)

> Upgrade ORC to 1.6.9
> 
>
> Key: SPARK-35992
> URL: https://issues.apache.org/jira/browse/SPARK-35992
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35992:


Assignee: Apache Spark

> Upgrade ORC to 1.6.9
> 
>
> Key: SPARK-35992
> URL: https://issues.apache.org/jira/browse/SPARK-35992
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373439#comment-17373439
 ] 

Apache Spark commented on SPARK-35992:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/33189

> Upgrade ORC to 1.6.9
> 
>
> Key: SPARK-35992
> URL: https://issues.apache.org/jira/browse/SPARK-35992
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35992) Upgrade ORC to 1.6.9

2021-07-02 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-35992:
-

 Summary: Upgrade ORC to 1.6.9
 Key: SPARK-35992
 URL: https://issues.apache.org/jira/browse/SPARK-35992
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.2.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35991) Add PlanStability suite for TPCH

2021-07-02 Thread angerszhu (Jira)

angerszhu created SPARK-35991:
-

 Summary: Add PlanStability suite for TPCH
 Key: SPARK-35991
 URL: https://issues.apache.org/jira/browse/SPARK-35991
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2, 3.2.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35990) Remove avro-sbt plugin dependency

2021-07-02 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-35990:
--

 Summary: Remove avro-sbt plugin dependency
 Key: SPARK-35990
 URL: https://issues.apache.org/jira/browse/SPARK-35990
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


avro-sbt plugin seems to be no longer used in build.
Let's consider to remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34632) Can we create 'SessionState' with a username in 'HiveClientImpl'

2021-07-02 Thread HonglunChen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373428#comment-17373428
 ] 

HonglunChen commented on SPARK-34632:
-

Yes, we can do that. I just want Spark to support this by default, and it has 
no effect on Spark at all.

> Can we create 'SessionState' with a username in 'HiveClientImpl'
> 
>
> Key: SPARK-34632
> URL: https://issues.apache.org/jira/browse/SPARK-34632
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: HonglunChen
>Priority: Minor
>
> [https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L165]
> Like this:
> val state = new SessionState(hiveConf, userName)
> We can then easily use the Hive Authorization through the user information in 
> the 'SessionState'.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373408#comment-17373408
 ] 

Apache Spark commented on SPARK-35989:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/33188

> Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
> --
>
> Key: SPARK-35989
> URL: https://issues.apache.org/jira/browse/SPARK-35989
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition 
> number with repartition, then we should not do any change of the number. That 
> said, the shuffle output partitioning number should be always same with user 
> expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35989:


Assignee: Apache Spark

> Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
> --
>
> Key: SPARK-35989
> URL: https://issues.apache.org/jira/browse/SPARK-35989
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition 
> number with repartition, then we should not do any change of the number. That 
> said, the shuffle output partitioning number should be always same with user 
> expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35989:


Assignee: (was: Apache Spark)

> Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
> --
>
> Key: SPARK-35989
> URL: https://issues.apache.org/jira/browse/SPARK-35989
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition 
> number with repartition, then we should not do any change of the number. That 
> said, the shuffle output partitioning number should be always same with user 
> expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34632) Can we create 'SessionState' with a username in 'HiveClientImpl'

2021-07-02 Thread dzcxzl (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373410#comment-17373410
 ] 

dzcxzl commented on SPARK-34632:


You can use the default Authenticator to get the username through ugi.
hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator

> Can we create 'SessionState' with a username in 'HiveClientImpl'
> 
>
> Key: SPARK-34632
> URL: https://issues.apache.org/jira/browse/SPARK-34632
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: HonglunChen
>Priority: Minor
>
> [https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L165]
> Like this:
> val state = new SessionState(hiveConf, userName)
> We can then easily use the Hive Authorization through the user information in 
> the 'SessionState'.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled

2021-07-02 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-35989:
--
Description: The shuffle origin is `REPARTITION_BY_NUM` if user specify an 
exact partition number with repartition, then we should not do any change of 
the number. That said, the shuffle output partitioning number should be always 
same with user expected.

> Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
> --
>
> Key: SPARK-35989
> URL: https://issues.apache.org/jira/browse/SPARK-35989
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition 
> number with repartition, then we should not do any change of the number. That 
> said, the shuffle output partitioning number should be always same with user 
> expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled

2021-07-02 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-35989:
--
Environment: (was: The shuffle origin is `REPARTITION_BY_NUM` if user 
specify an exact partition number with repartition, then we should not do any 
change of the number. That said, the shuffle output partitioning number should 
be always same with user expected.)

> Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
> --
>
> Key: SPARK-35989
> URL: https://issues.apache.org/jira/browse/SPARK-35989
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled

2021-07-02 Thread XiDuo You (Jira)

XiDuo You created SPARK-35989:
-

 Summary: Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
 Key: SPARK-35989
 URL: https://issues.apache.org/jira/browse/SPARK-35989
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
 Environment: The shuffle origin is `REPARTITION_BY_NUM` if user 
specify an exact partition number with repartition, then we should not do any 
change of the number. That said, the shuffle output partitioning number should 
be always same with user expected.
Reporter: XiDuo You






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35988) The implementation for RocksDBStateStoreProvider

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35988:


Assignee: (was: Apache Spark)

> The implementation for RocksDBStateStoreProvider
> 
>
> Key: SPARK-35988
> URL: https://issues.apache.org/jira/browse/SPARK-35988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Major
>
> Add the implementation for the RocksDBStateStoreProvider. It's the subclass 
> of StateStoreProvider that leverages all the functionalities implemented in 
> the RocksDB instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35988) The implementation for RocksDBStateStoreProvider

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373396#comment-17373396
 ] 

Apache Spark commented on SPARK-35988:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/33187

> The implementation for RocksDBStateStoreProvider
> 
>
> Key: SPARK-35988
> URL: https://issues.apache.org/jira/browse/SPARK-35988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Major
>
> Add the implementation for the RocksDBStateStoreProvider. It's the subclass 
> of StateStoreProvider that leverages all the functionalities implemented in 
> the RocksDB instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35988) The implementation for RocksDBStateStoreProvider

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35988:


Assignee: Apache Spark

> The implementation for RocksDBStateStoreProvider
> 
>
> Key: SPARK-35988
> URL: https://issues.apache.org/jira/browse/SPARK-35988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Apache Spark
>Priority: Major
>
> Add the implementation for the RocksDBStateStoreProvider. It's the subclass 
> of StateStoreProvider that leverages all the functionalities implemented in 
> the RocksDB instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35988) The implementation for RocksDBStateStoreProvider

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373397#comment-17373397
 ] 

Apache Spark commented on SPARK-35988:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/33187

> The implementation for RocksDBStateStoreProvider
> 
>
> Key: SPARK-35988
> URL: https://issues.apache.org/jira/browse/SPARK-35988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Priority: Major
>
> Add the implementation for the RocksDBStateStoreProvider. It's the subclass 
> of StateStoreProvider that leverages all the functionalities implemented in 
> the RocksDB instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35988) The implementation for RocksDBStateStoreProvider

2021-07-02 Thread Yuanjian Li (Jira)

Yuanjian Li created SPARK-35988:
---

 Summary: The implementation for RocksDBStateStoreProvider
 Key: SPARK-35988
 URL: https://issues.apache.org/jira/browse/SPARK-35988
 Project: Spark
  Issue Type: Sub-task
  Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: Yuanjian Li


Add the implementation for the RocksDBStateStoreProvider. It's the subclass of 
StateStoreProvider that leverages all the functionalities implemented in the 
RocksDB instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373365#comment-17373365
 ] 

Apache Spark commented on SPARK-35987:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33186

> The ANSI flags of Sum and Avg should be kept after being copied
> ---
>
> Key: SPARK-35987
> URL: https://issues.apache.org/jira/browse/SPARK-35987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> For Views and UDFs, it is important to show consistent results even the ANSI 
> configuration is different in the running session. This is why many 
> expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case 
> class parameter list.
> We should make it consistent for `Sum`/`Avg`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35987:


Assignee: Apache Spark  (was: Gengliang Wang)

> The ANSI flags of Sum and Avg should be kept after being copied
> ---
>
> Key: SPARK-35987
> URL: https://issues.apache.org/jira/browse/SPARK-35987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> For Views and UDFs, it is important to show consistent results even the ANSI 
> configuration is different in the running session. This is why many 
> expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case 
> class parameter list.
> We should make it consistent for `Sum`/`Avg`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373364#comment-17373364
 ] 

Apache Spark commented on SPARK-35987:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33186

> The ANSI flags of Sum and Avg should be kept after being copied
> ---
>
> Key: SPARK-35987
> URL: https://issues.apache.org/jira/browse/SPARK-35987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> For Views and UDFs, it is important to show consistent results even the ANSI 
> configuration is different in the running session. This is why many 
> expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case 
> class parameter list.
> We should make it consistent for `Sum`/`Avg`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35987:


Assignee: Gengliang Wang  (was: Apache Spark)

> The ANSI flags of Sum and Avg should be kept after being copied
> ---
>
> Key: SPARK-35987
> URL: https://issues.apache.org/jira/browse/SPARK-35987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> For Views and UDFs, it is important to show consistent results even the ANSI 
> configuration is different in the running session. This is why many 
> expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case 
> class parameter list.
> We should make it consistent for `Sum`/`Avg`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied

2021-07-02 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-35987:
--

 Summary: The ANSI flags of Sum and Avg should be kept after being 
copied
 Key: SPARK-35987
 URL: https://issues.apache.org/jira/browse/SPARK-35987
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


For Views and UDFs, it is important to show consistent results even the ANSI 
configuration is different in the running session. This is why many expressions 
like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case class 
parameter list.
We should make it consistent for `Sum`/`Avg`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision

2021-07-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-35981.
--
Fix Version/s: 3.2.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/33179

> Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check 
> precision
> ---
>
> Key: SPARK-35981
> URL: https://issues.apache.org/jira/browse/SPARK-35981
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.2.0
>
>
> In some environment, the precision could be different in {{DataFrame.corr}} 
> function.
> We should use {{check_exact=False}} to loosen the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35986:


Assignee: (was: Apache Spark)

> fix pyspark.rdd.RDD.histogram's buckets argument
> 
>
> Key: SPARK-35986
> URL: https://issues.apache.org/jira/browse/SPARK-35986
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Tomas Pereira de Vasconcelos
>Priority: Minor
>  Labels: PySpark, pyspark, stubs
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I originally opened an issue and created a PR in the 
> [https://github.com/zero323/pyspark-stubs] repository.
> Issue: [https://github.com/zero323/pyspark-stubs/issues/548]
> PR: [https://github.com/zero323/pyspark-stubs/pull/549]
> —
> The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
> be {{Union[int, List[T], Tuple[T, ...]]}}
> From {{pyspark}} source:
> {code:java}
> if isinstance(buckets, int):
> ...
> elif isinstance(buckets, (list, tuple)):
> ...
> else:
> raise TypeError("buckets should be a list or tuple or number(int or 
> long)")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument

2021-07-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35986:


Assignee: Apache Spark

> fix pyspark.rdd.RDD.histogram's buckets argument
> 
>
> Key: SPARK-35986
> URL: https://issues.apache.org/jira/browse/SPARK-35986
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Tomas Pereira de Vasconcelos
>Assignee: Apache Spark
>Priority: Minor
>  Labels: PySpark, pyspark, stubs
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I originally opened an issue and created a PR in the 
> [https://github.com/zero323/pyspark-stubs] repository.
> Issue: [https://github.com/zero323/pyspark-stubs/issues/548]
> PR: [https://github.com/zero323/pyspark-stubs/pull/549]
> —
> The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
> be {{Union[int, List[T], Tuple[T, ...]]}}
> From {{pyspark}} source:
> {code:java}
> if isinstance(buckets, int):
> ...
> elif isinstance(buckets, (list, tuple)):
> ...
> else:
> raise TypeError("buckets should be a list or tuple or number(int or 
> long)")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument

2021-07-02 Thread Tomas Pereira de Vasconcelos (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373352#comment-17373352
 ] 

Tomas Pereira de Vasconcelos commented on SPARK-35986:
--

Pull request: https://github.com/apache/spark/pull/33185

> fix pyspark.rdd.RDD.histogram's buckets argument
> 
>
> Key: SPARK-35986
> URL: https://issues.apache.org/jira/browse/SPARK-35986
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Tomas Pereira de Vasconcelos
>Priority: Minor
>  Labels: PySpark, pyspark, stubs
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I originally opened an issue and created a PR in the 
> [https://github.com/zero323/pyspark-stubs] repository.
> Issue: [https://github.com/zero323/pyspark-stubs/issues/548]
> PR: [https://github.com/zero323/pyspark-stubs/pull/549]
> —
> The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
> be {{Union[int, List[T], Tuple[T, ...]]}}
> From {{pyspark}} source:
> {code:java}
> if isinstance(buckets, int):
> ...
> elif isinstance(buckets, (list, tuple)):
> ...
> else:
> raise TypeError("buckets should be a list or tuple or number(int or 
> long)")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373353#comment-17373353
 ] 

Apache Spark commented on SPARK-35986:
--

User 'tpvasconcelos' has created a pull request for this issue:
https://github.com/apache/spark/pull/33185

> fix pyspark.rdd.RDD.histogram's buckets argument
> 
>
> Key: SPARK-35986
> URL: https://issues.apache.org/jira/browse/SPARK-35986
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Tomas Pereira de Vasconcelos
>Priority: Minor
>  Labels: PySpark, pyspark, stubs
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I originally opened an issue and created a PR in the 
> [https://github.com/zero323/pyspark-stubs] repository.
> Issue: [https://github.com/zero323/pyspark-stubs/issues/548]
> PR: [https://github.com/zero323/pyspark-stubs/pull/549]
> —
> The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
> be {{Union[int, List[T], Tuple[T, ...]]}}
> From {{pyspark}} source:
> {code:java}
> if isinstance(buckets, int):
> ...
> elif isinstance(buckets, (list, tuple)):
> ...
> else:
> raise TypeError("buckets should be a list or tuple or number(int or 
> long)")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument

2021-07-02 Thread Tomas Pereira de Vasconcelos (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Pereira de Vasconcelos updated SPARK-35986:
-
Description: 
I originally opened an issue and created a PR in the 
[https://github.com/zero323/pyspark-stubs] repository.

Issue: [https://github.com/zero323/pyspark-stubs/issues/548]

PR: [https://github.com/zero323/pyspark-stubs/pull/549]

—

The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
be {{Union[int, List[T], Tuple[T, ...]]}}

>From {{pyspark}} source:
{code:java}
if isinstance(buckets, int):
...
elif isinstance(buckets, (list, tuple)):
...
else:
raise TypeError("buckets should be a list or tuple or number(int or long)")
{code}

  was:
I originally opened an issue and created a PR in the 
[https://github.com/zero323/pyspark-stubs] repository.

Issue: [https://github.com/zero323/pyspark-stubs/issues/548]

PR: [https://github.com/zero323/pyspark-stubs/pull/549]

---

The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
be {{Union[int, List[T], Tuple[T]]}}

>From {{pyspark}} source:
{code:java}
if isinstance(buckets, int):
...
elif isinstance(buckets, (list, tuple)):
...
else:
raise TypeError("buckets should be a list or tuple or number(int or long)")
{code}


> fix pyspark.rdd.RDD.histogram's buckets argument
> 
>
> Key: SPARK-35986
> URL: https://issues.apache.org/jira/browse/SPARK-35986
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2
>Reporter: Tomas Pereira de Vasconcelos
>Priority: Minor
>  Labels: PySpark, pyspark, stubs
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I originally opened an issue and created a PR in the 
> [https://github.com/zero323/pyspark-stubs] repository.
> Issue: [https://github.com/zero323/pyspark-stubs/issues/548]
> PR: [https://github.com/zero323/pyspark-stubs/pull/549]
> —
> The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
> be {{Union[int, List[T], Tuple[T, ...]]}}
> From {{pyspark}} source:
> {code:java}
> if isinstance(buckets, int):
> ...
> elif isinstance(buckets, (list, tuple)):
> ...
> else:
> raise TypeError("buckets should be a list or tuple or number(int or 
> long)")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument

2021-07-02 Thread Tomas Pereira de Vasconcelos (Jira)

Tomas Pereira de Vasconcelos created SPARK-35986:


 Summary: fix pyspark.rdd.RDD.histogram's buckets argument
 Key: SPARK-35986
 URL: https://issues.apache.org/jira/browse/SPARK-35986
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0
Reporter: Tomas Pereira de Vasconcelos


I originally opened an issue and created a PR in the 
[https://github.com/zero323/pyspark-stubs] repository.

Issue: [https://github.com/zero323/pyspark-stubs/issues/548]

PR: [https://github.com/zero323/pyspark-stubs/pull/549]

---

The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should 
be {{Union[int, List[T], Tuple[T]]}}

>From {{pyspark}} source:
{code:java}
if isinstance(buckets, int):
...
elif isinstance(buckets, (list, tuple)):
...
else:
raise TypeError("buckets should be a list or tuple or number(int or long)")
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35785) Cleanup support for RocksDB instance

2021-07-02 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-35785:
-
Fix Version/s: 3.2.0

> Cleanup support for RocksDB instance
> 
>
> Key: SPARK-35785
> URL: https://issues.apache.org/jira/browse/SPARK-35785
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.2.0, 3.3.0
>
>
> Add the functionality of cleaning up files of old versions for the RocksDB 
> instance and RocksDBFileManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35785) Cleanup support for RocksDB instance

2021-07-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373319#comment-17373319
 ] 

Apache Spark commented on SPARK-35785:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/33184

> Cleanup support for RocksDB instance
> 
>
> Key: SPARK-35785
> URL: https://issues.apache.org/jira/browse/SPARK-35785
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.3.0
>
>
> Add the functionality of cleaning up files of old versions for the RocksDB 
> instance and RocksDBFileManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 107 matches

Mail list logo