[jira] [Commented] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373912#comment-17373912 ] Apache Spark commented on SPARK-36006: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/33200 > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution > framework > --- > > Key: SPARK-36006 > URL: https://issues.apache.org/jira/browse/SPARK-36006 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Terry Kim >Priority: Major > > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373911#comment-17373911 ] Apache Spark commented on SPARK-36006: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/33200 > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution > framework > --- > > Key: SPARK-36006 > URL: https://issues.apache.org/jira/browse/SPARK-36006 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Terry Kim >Priority: Major > > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36006: Assignee: (was: Apache Spark) > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution > framework > --- > > Key: SPARK-36006 > URL: https://issues.apache.org/jira/browse/SPARK-36006 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Terry Kim >Priority: Major > > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-36006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36006: Assignee: Apache Spark > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution > framework > --- > > Key: SPARK-36006 > URL: https://issues.apache.org/jira/browse/SPARK-36006 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Major > > Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36006) Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework
Terry Kim created SPARK-36006: - Summary: Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework Key: SPARK-36006 URL: https://issues.apache.org/jira/browse/SPARK-36006 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Terry Kim Migrate ALTER TABLE ADD/RENAME COLUMNS command to the new resolution framework -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36005) in spark3.1.2 version The canCast method of type of char/varchar needs to be consistent with StringType
Sun BiaoBiao created SPARK-36005: Summary: in spark3.1.2 version The canCast method of type of char/varchar needs to be consistent with StringType Key: SPARK-36005 URL: https://issues.apache.org/jira/browse/SPARK-36005 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2 Reporter: Sun BiaoBiao Fix For: 3.1.3 In https://github.com/apache/spark/pull/32109 this pr, we introduced the char/varchar type, As described in this issue: To be safe, this PR doesn't add char/varchar type to the query engine(expression input check, internal row framework, codegen framework, etc.). We will replace char/varchar type by string type with metadata (Attribute.metadata or StructField.metadata) that includes the original type string before it goes into the query engine. That said, the existing code will not see char/varchar type but only string type. so The canCast method of type of char/varchar needs to be consistent with StringType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36004) Update MiMa and audit Scala/Java API changes
[ https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373859#comment-17373859 ] Apache Spark commented on SPARK-36004: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33199 > Update MiMa and audit Scala/Java API changes > > > Key: SPARK-36004 > URL: https://issues.apache.org/jira/browse/SPARK-36004 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36004) Update MiMa and audit Scala/Java API changes
[ https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36004: Assignee: Apache Spark (was: Dongjoon Hyun) > Update MiMa and audit Scala/Java API changes > > > Key: SPARK-36004 > URL: https://issues.apache.org/jira/browse/SPARK-36004 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36004) Update MiMa and audit Scala/Java API changes
[ https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36004: Assignee: Dongjoon Hyun (was: Apache Spark) > Update MiMa and audit Scala/Java API changes > > > Key: SPARK-36004 > URL: https://issues.apache.org/jira/browse/SPARK-36004 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36004) Update MiMa and audit API changes
[ https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-36004: - Assignee: Dongjoon Hyun > Update MiMa and audit API changes > - > > Key: SPARK-36004 > URL: https://issues.apache.org/jira/browse/SPARK-36004 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36004) Update MiMa and audit Scala/Java API changes
[ https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36004: -- Summary: Update MiMa and audit Scala/Java API changes (was: Update MiMa and audit API changes) > Update MiMa and audit Scala/Java API changes > > > Key: SPARK-36004 > URL: https://issues.apache.org/jira/browse/SPARK-36004 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36004) Update MiMa and audit API changes
[ https://issues.apache.org/jira/browse/SPARK-36004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36004: -- Target Version/s: 3.2.0 > Update MiMa and audit API changes > - > > Key: SPARK-36004 > URL: https://issues.apache.org/jira/browse/SPARK-36004 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36004) Update MiMa and audit API changes
Dongjoon Hyun created SPARK-36004: - Summary: Update MiMa and audit API changes Key: SPARK-36004 URL: https://issues.apache.org/jira/browse/SPARK-36004 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS
[ https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] t oo updated SPARK-35974: - Affects Version/s: (was: 2.4.6) 2.4.8 Description: {code:java} /var/lib/spark-2.4.8-bin-hadoop2.7/bin/spark-submit --master spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf spark.hadoop.fs.s3a.secret.key='redact2' --conf spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf spark.hadoop.fs.s3a.session.token='redact3' --conf spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --total-executor-cores 4 --executor-cores 2 --executor-memory 2g --driver-memory 1g --name lin1 --deploy-mode cluster --conf spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml {code} running the above command give below stack trace: {code:java} Exception from the cluster:\njava.nio.file.AccessDeniedException: s3a://mybuc/metorikku_2.11.jar: getFileStatus on s3a://mybuc/metorikku_2.11.jar: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\ org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542) org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463) org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030) org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747) org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723) org.apache.spark.util.Utils$.fetchFile(Utils.scala:509) org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155) org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173) org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code} all the ec2s in the spark cluster only have access to s3 via STS tokens. The jar itself reads csvs from s3 using the tokens, and everything works if either 1. i change the commandline to point to local jars on the ec2 OR 2. use port 7077/client mode instead of cluster mode. But it seems the jar itself can't be launched off s3, as if the tokens are not being picked up properly. was: {code:java} /var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf spark.hadoop.fs.s3a.secret.key='redact2' --conf spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf spark.hadoop.fs.s3a.session.token='redact3' --conf spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --total-executor-cores 4 --executor-cores 2 --executor-memory 2g --driver-memory 1g --name lin1 --deploy-mode cluster --conf spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml {code} running the above command give below stack trace: {code:java} Exception from the cluster:\njava.nio.file.AccessDeniedException: s3a://mybuc/metorikku_2.11.jar: getFileStatus on s3a://mybuc/metorikku_2.11.jar: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\ org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) org.apache.hadoop.fs.s3a.S3AFileSyst
[jira] [Reopened] (SPARK-35974) Spark submit REST cluster/standalone mode - launching an s3a jar with STS
[ https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] t oo reopened SPARK-35974: -- v2.4.8 is less than 2 months old > Spark submit REST cluster/standalone mode - launching an s3a jar with STS > - > > Key: SPARK-35974 > URL: https://issues.apache.org/jira/browse/SPARK-35974 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8 >Reporter: t oo >Priority: Major > > {code:java} > /var/lib/spark-2.4.8-bin-hadoop2.7/bin/spark-submit --master > spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf > spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf > spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf > spark.hadoop.fs.s3a.secret.key='redact2' --conf > spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf > spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf > spark.hadoop.fs.s3a.session.token='redact3' --conf > spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf > spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf > spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider > --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 > -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf > spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1 > -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' > --total-executor-cores 4 --executor-cores 2 --executor-memory 2g > --driver-memory 1g --name lin1 --deploy-mode cluster --conf > spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku > s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml > {code} > running the above command give below stack trace: > > {code:java} > Exception from the cluster:\njava.nio.file.AccessDeniedException: > s3a://mybuc/metorikku_2.11.jar: getFileStatus on > s3a://mybuc/metorikku_2.11.jar: > com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon > S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended > Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\ > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542) > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) > org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463) > org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030) > org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747) > org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723) > org.apache.spark.util.Utils$.fetchFile(Utils.scala:509) > org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155) > org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173) > org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code} > all the ec2s in the spark cluster only have access to s3 via STS tokens. The > jar itself reads csvs from s3 using the tokens, and everything works if > either 1. i change the commandline to point to local jars on the ec2 OR 2. > use port 7077/client mode instead of cluster mode. But it seems the jar > itself can't be launched off s3, as if the tokens are not being picked up > properly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35993) Flaky test: org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that concurrent update and cleanup consistent versions
[ https://issues.apache.org/jira/browse/SPARK-35993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373855#comment-17373855 ] Jungtaek Lim commented on SPARK-35993: -- Thanks for reporting [~gsomogyi]! The test is marked as "ignored" for now. We will try to fix or remove the test via this JIRA issue. > Flaky test: > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that > concurrent update and cleanup consistent versions > - > > Key: SPARK-35993 > URL: https://issues.apache.org/jira/browse/SPARK-35993 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.2.0 >Reporter: Gabor Somogyi >Priority: Major > > Appeared in jenkins: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140575/testReport/org.apache.spark.sql.execution.streaming.state/RocksDBSuite/ensure_that_concurrent_update_and_cleanup_consistent_versions/ > {code:java} > Error Message > java.io.FileNotFoundException: File > /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip > does not exist > Stacktrace > sbt.ForkMain$ForkError: java.io.FileNotFoundException: File > /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:74) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976) > at org.apache.spark.util.Utils$.unzipFilesFromFile(Utils.scala:3132) > at > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.loadCheckpointFromDfs(RocksDBFileManager.scala:174) > at > org.apache.spark.sql.execution.streaming.state.RocksDB.load(RocksDB.scala:103) > at > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.withDB(RocksDBSuite.scala:443) > at > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$57(RocksDBSuite.scala:397) > at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) > at > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$56(RocksDBSuite.scala:341) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) > at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) > at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) > at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > at scala.collection.immutable.List.foreach(List.scala:431) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTest
[jira] [Updated] (SPARK-35993) Flaky test: org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that concurrent update and cleanup consistent versions
[ https://issues.apache.org/jira/browse/SPARK-35993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-35993: - Affects Version/s: (was: 3.1.2) 3.2.0 > Flaky test: > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that > concurrent update and cleanup consistent versions > - > > Key: SPARK-35993 > URL: https://issues.apache.org/jira/browse/SPARK-35993 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.2.0 >Reporter: Gabor Somogyi >Priority: Major > > Appeared in jenkins: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140575/testReport/org.apache.spark.sql.execution.streaming.state/RocksDBSuite/ensure_that_concurrent_update_and_cleanup_consistent_versions/ > {code:java} > Error Message > java.io.FileNotFoundException: File > /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip > does not exist > Stacktrace > sbt.ForkMain$ForkError: java.io.FileNotFoundException: File > /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:74) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976) > at org.apache.spark.util.Utils$.unzipFilesFromFile(Utils.scala:3132) > at > org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.loadCheckpointFromDfs(RocksDBFileManager.scala:174) > at > org.apache.spark.sql.execution.streaming.state.RocksDB.load(RocksDB.scala:103) > at > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.withDB(RocksDBSuite.scala:443) > at > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$57(RocksDBSuite.scala:397) > at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) > at > org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$56(RocksDBSuite.scala:341) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) > at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) > at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) > at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > at scala.collection.immutable.List.foreach(List.scala:431) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563) > at org.
[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373849#comment-17373849 ] Jim Kleckner commented on SPARK-33349: -- [~redsk] can you confirm that https://issues.apache.org/jira/browse/SPARK-33471 fixes your issue with 4.12.0 ? > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2, 3.1.0 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36003) Implement unary operator `invert` of numeric ps.Series/Index
Xinrong Meng created SPARK-36003: Summary: Implement unary operator `invert` of numeric ps.Series/Index Key: SPARK-36003 URL: https://issues.apache.org/jira/browse/SPARK-36003 Project: Spark Issue Type: Story Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng {code:java} >>> ~ps.Series([1, 2, 3]) Traceback (most recent call last): ... pyspark.sql.utils.AnalysisException: cannot resolve '(NOT `0`)' due to data type mismatch: argument 1 requires boolean type, however, '`0`' is of bigint type.; 'Project [unresolvedalias(NOT 0#1L, Some(org.apache.spark.sql.Column$$Lambda$1365/2097273578@53165e1))] +- Project [__index_level_0__#0L, 0#1L, monotonically_increasing_id() AS __natural_order__#4L] +- LogicalRDD [__index_level_0__#0L, 0#1L], false {code} Currently, unary operator `invert` of numeric ps.Series/Index is not supported. We ought to implement that following pandas' behaviors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36002) Consolidate tests for data-type-based operations of decimal Series
[ https://issues.apache.org/jira/browse/SPARK-36002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373839#comment-17373839 ] Xinrong Meng commented on SPARK-36002: -- How do you think about that? [~yikunkero] > Consolidate tests for data-type-based operations of decimal Series > -- > > Key: SPARK-36002 > URL: https://issues.apache.org/jira/browse/SPARK-36002 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Tests for data-type-based operations of decimal Series are in two places: > * python/pyspark/pandas/tests/data_type_ops/test_decimal_ops.py > * python/pyspark/pandas/tests/data_type_ops/test_num_ops.py > We'd better either merge test_decimal_ops into test_num_ops or keep all tests > related to decimal Series to test_decimal_ops. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36002) Consolidate tests for data-type-based operations of decimal Series
Xinrong Meng created SPARK-36002: Summary: Consolidate tests for data-type-based operations of decimal Series Key: SPARK-36002 URL: https://issues.apache.org/jira/browse/SPARK-36002 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng Tests for data-type-based operations of decimal Series are in two places: * python/pyspark/pandas/tests/data_type_ops/test_decimal_ops.py * python/pyspark/pandas/tests/data_type_ops/test_num_ops.py We'd better either merge test_decimal_ops into test_num_ops or keep all tests related to decimal Series to test_decimal_ops. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36001) Assume result's index to be disordered in tests with operations on different Series
Xinrong Meng created SPARK-36001: Summary: Assume result's index to be disordered in tests with operations on different Series Key: SPARK-36001 URL: https://issues.apache.org/jira/browse/SPARK-36001 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng We have many tests with operations on different Series in spark/python/pyspark/pandas/tests/data_type_ops/ that assume the result's index to be sorted and then compare to the pandas' behavior. The assumption is wrong, so we should sort index of such result before comparing with pandas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36000) Support creating a ps.Series/Index with `Decimal('NaN')` with Arrow disabled
Xinrong Meng created SPARK-36000: Summary: Support creating a ps.Series/Index with `Decimal('NaN')` with Arrow disabled Key: SPARK-36000 URL: https://issues.apache.org/jira/browse/SPARK-36000 Project: Spark Issue Type: Story Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng {code:java} >>> import decimal as d >>> import pyspark.pandas as ps >>> import numpy as np >>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled', >>> True) >>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)]) 0 1 1 2 2None dtype: object >>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled', >>> False) >>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)]) 21/07/02 15:01:07 ERROR Executor: Exception in task 6.0 in stage 13.0 (TID 51) net.razorvine.pickle.PickleException: problem construction object: java.lang.reflect.InvocationTargetException ... {code} As the code is shown above, we cannot create a Series with `Decimal('NaN')` when Arrow disabled. We ought to fix that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35996. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33196 [https://github.com/apache/spark/pull/33196] > Setting version to 3.3.0-SNAPSHOT > - > > Key: SPARK-35996 > URL: https://issues.apache.org/jira/browse/SPARK-35996 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35996: - Assignee: Dongjoon Hyun > Setting version to 3.3.0-SNAPSHOT > - > > Key: SPARK-35996 > URL: https://issues.apache.org/jira/browse/SPARK-35996 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373753#comment-17373753 ] Apache Spark commented on SPARK-35995: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33197 > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373754#comment-17373754 ] Apache Spark commented on SPARK-35995: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33197 > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35999) Make from_csv/to_csv to handle day-time intervals properly
Kousuke Saruta created SPARK-35999: -- Summary: Make from_csv/to_csv to handle day-time intervals properly Key: SPARK-35999 URL: https://issues.apache.org/jira/browse/SPARK-35999 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta from_csv throws exception if day-time interval types are given. {code} spark-sql> select from_csv("interval '1 2:3:4' day to second", "a interval day to second"); 21/07/03 04:39:13 ERROR SparkSQLDriver: Failed in [select from_csv("interval '1 2:3:4' day to second", "a interval day to second")] java.lang.Exception: Unsupported type: interval day to second at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775) at org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224) at org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134) {code} Also, to_csv doesn't handle day-time interval types properly though any exception is thrown. The result of to_csv for day-time interval types is not ANSI interval compliant form. {code} spark-sql> select to_csv(named_struct("a", interval '1 2:3:4' day to second)); 9378400 {code} The result above should be INTERVAL '1 02:03:04' DAY TO SECOND. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35998) Make from_csv/to_csv to handle year-month intervals properly
Kousuke Saruta created SPARK-35998: -- Summary: Make from_csv/to_csv to handle year-month intervals properly Key: SPARK-35998 URL: https://issues.apache.org/jira/browse/SPARK-35998 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta from_csv throws exception if year-month interval types are given. {code} spark-sql> select from_csv("interval '1-2' year to month", "a interval year to month"); 21/07/03 04:32:24 ERROR SparkSQLDriver: Failed in [select from_csv("interval '1-2' year to month", "a interval year to month")] java.lang.Exception: Unsupported type: interval year to month at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775) at org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224) at org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134) {code} Also, to_csv doesn't handle year-month interval types properly though any exception is thrown. The result of to_csv for year-month interval types is not ANSI interval compliant form. {code} spark-sql> select to_csv(named_struct("a", interval '1-2' year to month)); 14 {code} The result above should be INTERVAL '1-2' YEAR TO MONTH. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35997) Implement comparison operators for CategoricalDtype in pandas API on Spark
Xinrong Meng created SPARK-35997: Summary: Implement comparison operators for CategoricalDtype in pandas API on Spark Key: SPARK-35997 URL: https://issues.apache.org/jira/browse/SPARK-35997 Project: Spark Issue Type: Story Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng In pandas API on Spark, "<, <=, >, >=" have not been implemented for CategoricalDtype. We ought to match pandas' behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373714#comment-17373714 ] Apache Spark commented on SPARK-35996: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33196 > Setting version to 3.3.0-SNAPSHOT > - > > Key: SPARK-35996 > URL: https://issues.apache.org/jira/browse/SPARK-35996 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373712#comment-17373712 ] Apache Spark commented on SPARK-35996: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33196 > Setting version to 3.3.0-SNAPSHOT > - > > Key: SPARK-35996 > URL: https://issues.apache.org/jira/browse/SPARK-35996 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35996: Assignee: Apache Spark > Setting version to 3.3.0-SNAPSHOT > - > > Key: SPARK-35996 > URL: https://issues.apache.org/jira/browse/SPARK-35996 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/SPARK-35996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35996: Assignee: (was: Apache Spark) > Setting version to 3.3.0-SNAPSHOT > - > > Key: SPARK-35996 > URL: https://issues.apache.org/jira/browse/SPARK-35996 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35996) Setting version to 3.3.0-SNAPSHOT
Dongjoon Hyun created SPARK-35996: - Summary: Setting version to 3.3.0-SNAPSHOT Key: SPARK-35996 URL: https://issues.apache.org/jira/browse/SPARK-35996 Project: Spark Issue Type: Task Components: Build Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35990) Remove avro-sbt plugin dependency
[ https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35990. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33190 [https://github.com/apache/spark/pull/33190] > Remove avro-sbt plugin dependency > - > > Key: SPARK-35990 > URL: https://issues.apache.org/jira/browse/SPARK-35990 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.2.0 > > > avro-sbt plugin seems to be no longer used in build. > Let's consider to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35995. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33194 [https://github.com/apache/spark/pull/33194] > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35995: - Assignee: Dongjoon Hyun > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35994) Publish snapshot from branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35994: -- Fix Version/s: (was: 3.3.0) 3.2.0 > Publish snapshot from branch-3.2 > > > Key: SPARK-35994 > URL: https://issues.apache.org/jira/browse/SPARK-35994 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35994) Publish snapshot from branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35994. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33192 [https://github.com/apache/spark/pull/33192] > Publish snapshot from branch-3.2 > > > Key: SPARK-35994 > URL: https://issues.apache.org/jira/browse/SPARK-35994 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35994) Publish snapshot from branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35994: - Assignee: Dongjoon Hyun > Publish snapshot from branch-3.2 > > > Key: SPARK-35994 > URL: https://issues.apache.org/jira/browse/SPARK-35994 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373694#comment-17373694 ] Apache Spark commented on SPARK-35995: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33194 > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35785) Cleanup support for RocksDB instance
[ https://issues.apache.org/jira/browse/SPARK-35785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373695#comment-17373695 ] Apache Spark commented on SPARK-35785: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/33195 > Cleanup support for RocksDB instance > > > Key: SPARK-35785 > URL: https://issues.apache.org/jira/browse/SPARK-35785 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.2.0, 3.3.0 > > > Add the functionality of cleaning up files of old versions for the RocksDB > instance and RocksDBFileManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
[ https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373693#comment-17373693 ] Apache Spark commented on SPARK-35981: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33193 > Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check > precision > --- > > Key: SPARK-35981 > URL: https://issues.apache.org/jira/browse/SPARK-35981 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.2.0 > > > In some environment, the precision could be different in {{DataFrame.corr}} > function. > We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35995: Assignee: (was: Apache Spark) > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35995: Assignee: Apache Spark > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
[ https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373691#comment-17373691 ] Apache Spark commented on SPARK-35981: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33193 > Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check > precision > --- > > Key: SPARK-35981 > URL: https://issues.apache.org/jira/browse/SPARK-35981 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.2.0 > > > In some environment, the precision could be different in {{DataFrame.corr}} > function. > We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373692#comment-17373692 ] Apache Spark commented on SPARK-35995: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33194 > Enable GitHub Action build_and_test on branch-3.2 > - > > Key: SPARK-35995 > URL: https://issues.apache.org/jira/browse/SPARK-35995 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35995) Enable GitHub Action build_and_test on branch-3.2
Dongjoon Hyun created SPARK-35995: - Summary: Enable GitHub Action build_and_test on branch-3.2 Key: SPARK-35995 URL: https://issues.apache.org/jira/browse/SPARK-35995 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35994) Publish snapshot from branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35994: Assignee: Apache Spark > Publish snapshot from branch-3.2 > > > Key: SPARK-35994 > URL: https://issues.apache.org/jira/browse/SPARK-35994 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35994) Publish snapshot from branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35994: Assignee: (was: Apache Spark) > Publish snapshot from branch-3.2 > > > Key: SPARK-35994 > URL: https://issues.apache.org/jira/browse/SPARK-35994 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35994) Publish snapshot from branch-3.2
[ https://issues.apache.org/jira/browse/SPARK-35994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373665#comment-17373665 ] Apache Spark commented on SPARK-35994: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33192 > Publish snapshot from branch-3.2 > > > Key: SPARK-35994 > URL: https://issues.apache.org/jira/browse/SPARK-35994 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35994) Publish snapshot from branch-3.2
Dongjoon Hyun created SPARK-35994: - Summary: Publish snapshot from branch-3.2 Key: SPARK-35994 URL: https://issues.apache.org/jira/browse/SPARK-35994 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35992) Upgrade ORC to 1.6.9
[ https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35992. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33189 [https://github.com/apache/spark/pull/33189] > Upgrade ORC to 1.6.9 > > > Key: SPARK-35992 > URL: https://issues.apache.org/jira/browse/SPARK-35992 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Critical > Fix For: 3.2.0 > > > This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC encryption > masking fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35992) Upgrade ORC to 1.6.9
[ https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35992: - Assignee: Dongjoon Hyun > Upgrade ORC to 1.6.9 > > > Key: SPARK-35992 > URL: https://issues.apache.org/jira/browse/SPARK-35992 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Critical > > This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC encryption > masking fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35985) File source V2 ignores partition filters when empty readDataSchema
[ https://issues.apache.org/jira/browse/SPARK-35985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35985: Assignee: (was: Apache Spark) > File source V2 ignores partition filters when empty readDataSchema > -- > > Key: SPARK-35985 > URL: https://issues.apache.org/jira/browse/SPARK-35985 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Steven Aerts >Priority: Major > > A V2 datasource fails to rely on partition filters when it only wants to know > how many entries there are, and is not interested of their context. > So when the {{readDataSchema}} of the {{FileScan}} is empty, partition > filters are not pushed down and all data is scanned. > Some examples where this happens: > {code:java} > scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#136] > +- *(1) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(1) Project > +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702)) > +- *(1) ColumnarToRow > +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: > InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: > [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)] > scala> spark.sql("SELECT input_file_name() FROM parq WHERE > day=20210702").explain > == Physical Plan == > *(1) Project [input_file_name() AS input_file_name()#131] > +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702)) > +- *(1) ColumnarToRow > +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: > InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: > [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)] > {code} > > Once the {{readDataSchema}} is not empty, it works correctly: > {code:java} > scala> spark.sql("SELECT header.tenant FROM parq WHERE day=20210702").explain > == Physical Plan == > *(1) Project [header#51.tenant AS tenant#199] > +- BatchScan[header#51, day#68] ParquetScan DataFilters: [], Format: parquet, > Location: InMemoryFileIndex[file:/..., PartitionFilters: [isnotnull(day#68), > (day#68 = 20210702)], PushedFilers: [IsNotNull(day), EqualTo(day,20210702)], > ReadSchema: struct>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)]{code} > > In V1 this optimization is available: > {code:java} > scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#27] > +- *(1) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(1) Project > +- *(1) ColumnarToRow > +- FileScan parquet [year#15,month#16,day#17,hour#18] Batched: true, > DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/..., > PartitionFilters: [isnotnull(day#17), (day#17 = 20210702)], PushedFilters: > [], ReadSchema: struct<>{code} > The examples use {{ParquetScan}}, but the problem happens for all File based > V2 datasources. > The fix for this issue feels very straight forward. In > {{PruneFileSourcePartitions}} queries with an empty {{readDataSchema}} are > explicitly excluded from being pushed down: > {code:java} > if filters.nonEmpty && scan.readDataSchema.nonEmpty =>{code} > Removing that condition seems to fix the issue however, this might be too > naive. > I am making a PR with tests where this change can be discussed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35985) File source V2 ignores partition filters when empty readDataSchema
[ https://issues.apache.org/jira/browse/SPARK-35985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35985: Assignee: Apache Spark > File source V2 ignores partition filters when empty readDataSchema > -- > > Key: SPARK-35985 > URL: https://issues.apache.org/jira/browse/SPARK-35985 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Steven Aerts >Assignee: Apache Spark >Priority: Major > > A V2 datasource fails to rely on partition filters when it only wants to know > how many entries there are, and is not interested of their context. > So when the {{readDataSchema}} of the {{FileScan}} is empty, partition > filters are not pushed down and all data is scanned. > Some examples where this happens: > {code:java} > scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#136] > +- *(1) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(1) Project > +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702)) > +- *(1) ColumnarToRow > +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: > InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: > [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)] > scala> spark.sql("SELECT input_file_name() FROM parq WHERE > day=20210702").explain > == Physical Plan == > *(1) Project [input_file_name() AS input_file_name()#131] > +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702)) > +- *(1) ColumnarToRow > +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: > InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: > [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)] > {code} > > Once the {{readDataSchema}} is not empty, it works correctly: > {code:java} > scala> spark.sql("SELECT header.tenant FROM parq WHERE day=20210702").explain > == Physical Plan == > *(1) Project [header#51.tenant AS tenant#199] > +- BatchScan[header#51, day#68] ParquetScan DataFilters: [], Format: parquet, > Location: InMemoryFileIndex[file:/..., PartitionFilters: [isnotnull(day#68), > (day#68 = 20210702)], PushedFilers: [IsNotNull(day), EqualTo(day,20210702)], > ReadSchema: struct>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)]{code} > > In V1 this optimization is available: > {code:java} > scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#27] > +- *(1) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(1) Project > +- *(1) ColumnarToRow > +- FileScan parquet [year#15,month#16,day#17,hour#18] Batched: true, > DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/..., > PartitionFilters: [isnotnull(day#17), (day#17 = 20210702)], PushedFilters: > [], ReadSchema: struct<>{code} > The examples use {{ParquetScan}}, but the problem happens for all File based > V2 datasources. > The fix for this issue feels very straight forward. In > {{PruneFileSourcePartitions}} queries with an empty {{readDataSchema}} are > explicitly excluded from being pushed down: > {code:java} > if filters.nonEmpty && scan.readDataSchema.nonEmpty =>{code} > Removing that condition seems to fix the issue however, this might be too > naive. > I am making a PR with tests where this change can be discussed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35985) File source V2 ignores partition filters when empty readDataSchema
[ https://issues.apache.org/jira/browse/SPARK-35985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373557#comment-17373557 ] Apache Spark commented on SPARK-35985: -- User 'steven-aerts' has created a pull request for this issue: https://github.com/apache/spark/pull/33191 > File source V2 ignores partition filters when empty readDataSchema > -- > > Key: SPARK-35985 > URL: https://issues.apache.org/jira/browse/SPARK-35985 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Steven Aerts >Priority: Major > > A V2 datasource fails to rely on partition filters when it only wants to know > how many entries there are, and is not interested of their context. > So when the {{readDataSchema}} of the {{FileScan}} is empty, partition > filters are not pushed down and all data is scanned. > Some examples where this happens: > {code:java} > scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#136] > +- *(1) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(1) Project > +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702)) > +- *(1) ColumnarToRow > +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: > InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: > [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)] > scala> spark.sql("SELECT input_file_name() FROM parq WHERE > day=20210702").explain > == Physical Plan == > *(1) Project [input_file_name() AS input_file_name()#131] > +- *(1) Filter (isnotnull(day#68) AND (day#68 = 20210702)) > +- *(1) ColumnarToRow > +- BatchScan[day#68] ParquetScan DataFilters: [], Format: parquet, Location: > InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilers: > [IsNotNull(day), EqualTo(day,20210702)], ReadSchema: struct<>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)] > {code} > > Once the {{readDataSchema}} is not empty, it works correctly: > {code:java} > scala> spark.sql("SELECT header.tenant FROM parq WHERE day=20210702").explain > == Physical Plan == > *(1) Project [header#51.tenant AS tenant#199] > +- BatchScan[header#51, day#68] ParquetScan DataFilters: [], Format: parquet, > Location: InMemoryFileIndex[file:/..., PartitionFilters: [isnotnull(day#68), > (day#68 = 20210702)], PushedFilers: [IsNotNull(day), EqualTo(day,20210702)], > ReadSchema: struct>, PushedFilters: > [IsNotNull(day), EqualTo(day,20210702)]{code} > > In V1 this optimization is available: > {code:java} > scala> spark.sql("SELECT count(*) FROM parq WHERE day=20210702").explain > == Physical Plan == > *(2) HashAggregate(keys=[], functions=[count(1)]) > +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#27] > +- *(1) HashAggregate(keys=[], functions=[partial_count(1)]) > +- *(1) Project > +- *(1) ColumnarToRow > +- FileScan parquet [year#15,month#16,day#17,hour#18] Batched: true, > DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/..., > PartitionFilters: [isnotnull(day#17), (day#17 = 20210702)], PushedFilters: > [], ReadSchema: struct<>{code} > The examples use {{ParquetScan}}, but the problem happens for all File based > V2 datasources. > The fix for this issue feels very straight forward. In > {{PruneFileSourcePartitions}} queries with an empty {{readDataSchema}} are > explicitly excluded from being pushed down: > {code:java} > if filters.nonEmpty && scan.readDataSchema.nonEmpty =>{code} > Removing that condition seems to fix the issue however, this might be too > naive. > I am making a PR with tests where this change can be discussed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34883) Setting CSV reader option "multiLine" to "true" causes URISyntaxException when colon is in file path
[ https://issues.apache.org/jira/browse/SPARK-34883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373552#comment-17373552 ] Mike Pieters commented on SPARK-34883: -- I've got the same error here when I try to run: {code:java} spark.read.csv(URL_ABFS_RAW + "/salesforce/Case/timestamp=2021-07-02 00:14:15.129481", header=True, multiLine=True) {code} I'm running Spark 3.0.1 > Setting CSV reader option "multiLine" to "true" causes URISyntaxException > when colon is in file path > > > Key: SPARK-34883 > URL: https://issues.apache.org/jira/browse/SPARK-34883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.1.1 >Reporter: Brady Tello >Priority: Major > > Setting the CSV reader's "multiLine" option to "True" throws the following > exception when a ':' character is in the file path. > > {code:java} > java.net.URISyntaxException: Relative path in absolute URI: test:dir > {code} > I've tested this in both Spark 3.0.0 and Spark 3.1.1 and I get the same error > whether I use Scala, Python, or SQL. > The following code works fine: > > {code:java} > csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" > tempDF = (spark.read.option("sep", "\t").csv(csvFile) > {code} > While the following code fails: > > {code:java} > csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" > tempDF = (spark.read.option("sep", "\t").option("multiLine", > "True").csv(csvFile) > {code} > Full Stack Trace from Python: > > {code:java} > --- > IllegalArgumentException Traceback (most recent call last) > in > 3 csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" > 4 > > 5 tempDF = (spark.read.option("sep", "\t").option("multiLine", "True") > /databricks/spark/python/pyspark/sql/readwriter.py in csv(self, path, schema, > sep, encoding, quote, escape, comment, header, inferSchema, > ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, nullValue, nanValue, > positiveInf, negativeInf, dateFormat, timestampFormat, maxColumns, > maxCharsPerColumn, maxMalformedLogPerPartition, mode, > columnNameOfCorruptRecord, multiLine, charToEscapeQuoteEscaping, > samplingRatio, enforceSchema, emptyValue, locale, lineSep, pathGlobFilter, > recursiveFileLookup, modifiedBefore, modifiedAfter, unescapedQuoteHandling) > 735 path = [path] > 736 if type(path) == list: > --> 737 return > self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) > 738 elif isinstance(path, RDD): > 739 def func(iterator): > /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 1302 > 1303 answer = self.gateway_client.send_command(command) > -> 1304 return_value = get_return_value( > 1305 answer, self.gateway_client, self.target_id, self.name) > 1306 > /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 114 # Hide where the exception came from that shows a non-Pythonic > 115 # JVM exception message. > --> 116 raise converted from None > 117 else: > 118 raise IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:dir > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23814) Couldn't read file with colon in name and new line character in one of the field.
[ https://issues.apache.org/jira/browse/SPARK-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373526#comment-17373526 ] Mike Pieters commented on SPARK-23814: -- I also got the same error in version 3.0.1 > Couldn't read file with colon in name and new line character in one of the > field. > - > > Key: SPARK-23814 > URL: https://issues.apache.org/jira/browse/SPARK-23814 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell >Affects Versions: 2.2.0 >Reporter: bharath kumar avusherla >Priority: Major > > When the file name has colon and new line character in data, while reading > using spark.read.option("multiLine","true").csv("s3n://DirectoryPath/") > function. It is throwing *"**java.lang.IllegalArgumentException: > java.net.URISyntaxException: Relative path in absolute URI: > 2017-08-01T00:00:00Z.csv.gz"* error. If we remove the > option("multiLine","true"), it is working just fine though the file name has > colon in it. It is working fine, If i apply this option > *option("multiLine","true")* on any other file which doesn't have colon in > it. But when both are present (colon in file name and new line in the data), > it's not working. > {quote}java.lang.IllegalArgumentException: java.net.URISyntaxException: > Relative path in absolute URI: 2017-08-01T00:00:00Z.csv.gz > at org.apache.hadoop.fs.Path.initialize(Path.java:205) > at org.apache.hadoop.fs.Path.(Path.java:171) > at org.apache.hadoop.fs.Path.(Path.java:93) > at org.apache.hadoop.fs.Globber.glob(Globber.java:253) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1676) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:294) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) > at > org.apache.spark.input.StreamFileInputFormat.setMinPartitions(PortableDataStream.scala:51) > at org.apache.spark.rdd.BinaryFileRDD.getPartitions(BinaryFileRDD.scala:46) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.RDD.take(RDD.scala:1327) > at > org.apache.spark.sql.execution.datasources.csv.MultiLineCSVDataSource$.infer(CSVDataSource.scala:224) > at > org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:62) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:57) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:177) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:177) > at scala.Option.orElse(Option.scala:289) > at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:176) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412) > ... 48 elided > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > 2017-08-01T00:00:00Z.csv.gz > at java.net.URI.checkPath(URI.java:1823) > at java.net.URI.(URI.java:745) > at org.apache.hadoop.fs.Path.initialize(Path.java:202) > ... 86 more > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) --
[jira] [Created] (SPARK-35993) Flaky test: org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that concurrent update and cleanup consistent versions
Gabor Somogyi created SPARK-35993: - Summary: Flaky test: org.apache.spark.sql.execution.streaming.state.RocksDBSuite.ensure that concurrent update and cleanup consistent versions Key: SPARK-35993 URL: https://issues.apache.org/jira/browse/SPARK-35993 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 3.1.2 Reporter: Gabor Somogyi Appeared in jenkins: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140575/testReport/org.apache.spark.sql.execution.streaming.state/RocksDBSuite/ensure_that_concurrent_update_and_cleanup_consistent_versions/ {code:java} Error Message java.io.FileNotFoundException: File /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip does not exist Stacktrace sbt.ForkMain$ForkError: java.io.FileNotFoundException: File /home/jenkins/workspace/SparkPullRequestBuilder@2/target/tmp/spark-21674620-ac83-4ad3-a153-5a7adf909244/20.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372) at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:74) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976) at org.apache.spark.util.Utils$.unzipFilesFromFile(Utils.scala:3132) at org.apache.spark.sql.execution.streaming.state.RocksDBFileManager.loadCheckpointFromDfs(RocksDBFileManager.scala:174) at org.apache.spark.sql.execution.streaming.state.RocksDB.load(RocksDB.scala:103) at org.apache.spark.sql.execution.streaming.state.RocksDBSuite.withDB(RocksDBSuite.scala:443) at org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$57(RocksDBSuite.scala:397) at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42) at org.apache.spark.sql.execution.streaming.state.RocksDBSuite.$anonfun$new$56(RocksDBSuite.scala:341) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) at scala.collection.immutable.List.foreach(List.scala:431) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563) at org.scalatest.Suite.run(Suite.scala:1112) at org.scalatest.Suite.run$(Suite.scala:1094) at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563) at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) at org.scalatest.SuperEngine.runImpl(Engine.scala:535) at org.scalatest.funsuite.AnyFunSuiteLike.
[jira] [Commented] (SPARK-35990) Remove avro-sbt plugin dependency
[ https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373443#comment-17373443 ] Apache Spark commented on SPARK-35990: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33190 > Remove avro-sbt plugin dependency > - > > Key: SPARK-35990 > URL: https://issues.apache.org/jira/browse/SPARK-35990 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > avro-sbt plugin seems to be no longer used in build. > Let's consider to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35990) Remove avro-sbt plugin dependency
[ https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35990: Assignee: Apache Spark (was: Kousuke Saruta) > Remove avro-sbt plugin dependency > - > > Key: SPARK-35990 > URL: https://issues.apache.org/jira/browse/SPARK-35990 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > avro-sbt plugin seems to be no longer used in build. > Let's consider to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35990) Remove avro-sbt plugin dependency
[ https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373442#comment-17373442 ] Apache Spark commented on SPARK-35990: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33190 > Remove avro-sbt plugin dependency > - > > Key: SPARK-35990 > URL: https://issues.apache.org/jira/browse/SPARK-35990 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > avro-sbt plugin seems to be no longer used in build. > Let's consider to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35990) Remove avro-sbt plugin dependency
[ https://issues.apache.org/jira/browse/SPARK-35990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35990: Assignee: Kousuke Saruta (was: Apache Spark) > Remove avro-sbt plugin dependency > - > > Key: SPARK-35990 > URL: https://issues.apache.org/jira/browse/SPARK-35990 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > avro-sbt plugin seems to be no longer used in build. > Let's consider to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35992) Upgrade ORC to 1.6.9
[ https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35992: -- Description: This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC encryption masking fix. > Upgrade ORC to 1.6.9 > > > Key: SPARK-35992 > URL: https://issues.apache.org/jira/browse/SPARK-35992 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Critical > > This issue aims to upgrade Apache ORC to 1.6.9 to bring ORC encryption > masking fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35992) Upgrade ORC to 1.6.9
[ https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35992: -- Priority: Critical (was: Major) > Upgrade ORC to 1.6.9 > > > Key: SPARK-35992 > URL: https://issues.apache.org/jira/browse/SPARK-35992 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Critical > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35992) Upgrade ORC to 1.6.9
[ https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35992: Assignee: (was: Apache Spark) > Upgrade ORC to 1.6.9 > > > Key: SPARK-35992 > URL: https://issues.apache.org/jira/browse/SPARK-35992 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35992) Upgrade ORC to 1.6.9
[ https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35992: Assignee: Apache Spark > Upgrade ORC to 1.6.9 > > > Key: SPARK-35992 > URL: https://issues.apache.org/jira/browse/SPARK-35992 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35992) Upgrade ORC to 1.6.9
[ https://issues.apache.org/jira/browse/SPARK-35992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373439#comment-17373439 ] Apache Spark commented on SPARK-35992: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33189 > Upgrade ORC to 1.6.9 > > > Key: SPARK-35992 > URL: https://issues.apache.org/jira/browse/SPARK-35992 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35992) Upgrade ORC to 1.6.9
Dongjoon Hyun created SPARK-35992: - Summary: Upgrade ORC to 1.6.9 Key: SPARK-35992 URL: https://issues.apache.org/jira/browse/SPARK-35992 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35991) Add PlanStability suite for TPCH
angerszhu created SPARK-35991: - Summary: Add PlanStability suite for TPCH Key: SPARK-35991 URL: https://issues.apache.org/jira/browse/SPARK-35991 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2, 3.2.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35990) Remove avro-sbt plugin dependency
Kousuke Saruta created SPARK-35990: -- Summary: Remove avro-sbt plugin dependency Key: SPARK-35990 URL: https://issues.apache.org/jira/browse/SPARK-35990 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta avro-sbt plugin seems to be no longer used in build. Let's consider to remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34632) Can we create 'SessionState' with a username in 'HiveClientImpl'
[ https://issues.apache.org/jira/browse/SPARK-34632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373428#comment-17373428 ] HonglunChen commented on SPARK-34632: - Yes, we can do that. I just want Spark to support this by default, and it has no effect on Spark at all. > Can we create 'SessionState' with a username in 'HiveClientImpl' > > > Key: SPARK-34632 > URL: https://issues.apache.org/jira/browse/SPARK-34632 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: HonglunChen >Priority: Minor > > [https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L165] > Like this: > val state = new SessionState(hiveConf, userName) > We can then easily use the Hive Authorization through the user information in > the 'SessionState'. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
[ https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373408#comment-17373408 ] Apache Spark commented on SPARK-35989: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/33188 > Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled > -- > > Key: SPARK-35989 > URL: https://issues.apache.org/jira/browse/SPARK-35989 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition > number with repartition, then we should not do any change of the number. That > said, the shuffle output partitioning number should be always same with user > expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
[ https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35989: Assignee: Apache Spark > Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled > -- > > Key: SPARK-35989 > URL: https://issues.apache.org/jira/browse/SPARK-35989 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition > number with repartition, then we should not do any change of the number. That > said, the shuffle output partitioning number should be always same with user > expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
[ https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35989: Assignee: (was: Apache Spark) > Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled > -- > > Key: SPARK-35989 > URL: https://issues.apache.org/jira/browse/SPARK-35989 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition > number with repartition, then we should not do any change of the number. That > said, the shuffle output partitioning number should be always same with user > expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34632) Can we create 'SessionState' with a username in 'HiveClientImpl'
[ https://issues.apache.org/jira/browse/SPARK-34632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373410#comment-17373410 ] dzcxzl commented on SPARK-34632: You can use the default Authenticator to get the username through ugi. hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator > Can we create 'SessionState' with a username in 'HiveClientImpl' > > > Key: SPARK-34632 > URL: https://issues.apache.org/jira/browse/SPARK-34632 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: HonglunChen >Priority: Minor > > [https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L165] > Like this: > val state = new SessionState(hiveConf, userName) > We can then easily use the Hive Authorization through the user information in > the 'SessionState'. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
[ https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-35989: -- Description: The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition number with repartition, then we should not do any change of the number. That said, the shuffle output partitioning number should be always same with user expected. > Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled > -- > > Key: SPARK-35989 > URL: https://issues.apache.org/jira/browse/SPARK-35989 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition > number with repartition, then we should not do any change of the number. That > said, the shuffle output partitioning number should be always same with user > expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
[ https://issues.apache.org/jira/browse/SPARK-35989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-35989: -- Environment: (was: The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition number with repartition, then we should not do any change of the number. That said, the shuffle output partitioning number should be always same with user expected.) > Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled > -- > > Key: SPARK-35989 > URL: https://issues.apache.org/jira/browse/SPARK-35989 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35989) Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled
XiDuo You created SPARK-35989: - Summary: Do not remove REPARTITION_BY_NUM shuffle if AQE is enabled Key: SPARK-35989 URL: https://issues.apache.org/jira/browse/SPARK-35989 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Environment: The shuffle origin is `REPARTITION_BY_NUM` if user specify an exact partition number with repartition, then we should not do any change of the number. That said, the shuffle output partitioning number should be always same with user expected. Reporter: XiDuo You -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35988) The implementation for RocksDBStateStoreProvider
[ https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35988: Assignee: (was: Apache Spark) > The implementation for RocksDBStateStoreProvider > > > Key: SPARK-35988 > URL: https://issues.apache.org/jira/browse/SPARK-35988 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Major > > Add the implementation for the RocksDBStateStoreProvider. It's the subclass > of StateStoreProvider that leverages all the functionalities implemented in > the RocksDB instance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35988) The implementation for RocksDBStateStoreProvider
[ https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373396#comment-17373396 ] Apache Spark commented on SPARK-35988: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/33187 > The implementation for RocksDBStateStoreProvider > > > Key: SPARK-35988 > URL: https://issues.apache.org/jira/browse/SPARK-35988 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Major > > Add the implementation for the RocksDBStateStoreProvider. It's the subclass > of StateStoreProvider that leverages all the functionalities implemented in > the RocksDB instance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35988) The implementation for RocksDBStateStoreProvider
[ https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35988: Assignee: Apache Spark > The implementation for RocksDBStateStoreProvider > > > Key: SPARK-35988 > URL: https://issues.apache.org/jira/browse/SPARK-35988 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Apache Spark >Priority: Major > > Add the implementation for the RocksDBStateStoreProvider. It's the subclass > of StateStoreProvider that leverages all the functionalities implemented in > the RocksDB instance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35988) The implementation for RocksDBStateStoreProvider
[ https://issues.apache.org/jira/browse/SPARK-35988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373397#comment-17373397 ] Apache Spark commented on SPARK-35988: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/33187 > The implementation for RocksDBStateStoreProvider > > > Key: SPARK-35988 > URL: https://issues.apache.org/jira/browse/SPARK-35988 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Priority: Major > > Add the implementation for the RocksDBStateStoreProvider. It's the subclass > of StateStoreProvider that leverages all the functionalities implemented in > the RocksDB instance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35988) The implementation for RocksDBStateStoreProvider
Yuanjian Li created SPARK-35988: --- Summary: The implementation for RocksDBStateStoreProvider Key: SPARK-35988 URL: https://issues.apache.org/jira/browse/SPARK-35988 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 3.2.0 Reporter: Yuanjian Li Add the implementation for the RocksDBStateStoreProvider. It's the subclass of StateStoreProvider that leverages all the functionalities implemented in the RocksDB instance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied
[ https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373365#comment-17373365 ] Apache Spark commented on SPARK-35987: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33186 > The ANSI flags of Sum and Avg should be kept after being copied > --- > > Key: SPARK-35987 > URL: https://issues.apache.org/jira/browse/SPARK-35987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > For Views and UDFs, it is important to show consistent results even the ANSI > configuration is different in the running session. This is why many > expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case > class parameter list. > We should make it consistent for `Sum`/`Avg` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied
[ https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35987: Assignee: Apache Spark (was: Gengliang Wang) > The ANSI flags of Sum and Avg should be kept after being copied > --- > > Key: SPARK-35987 > URL: https://issues.apache.org/jira/browse/SPARK-35987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > For Views and UDFs, it is important to show consistent results even the ANSI > configuration is different in the running session. This is why many > expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case > class parameter list. > We should make it consistent for `Sum`/`Avg` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied
[ https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373364#comment-17373364 ] Apache Spark commented on SPARK-35987: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/33186 > The ANSI flags of Sum and Avg should be kept after being copied > --- > > Key: SPARK-35987 > URL: https://issues.apache.org/jira/browse/SPARK-35987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > For Views and UDFs, it is important to show consistent results even the ANSI > configuration is different in the running session. This is why many > expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case > class parameter list. > We should make it consistent for `Sum`/`Avg` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied
[ https://issues.apache.org/jira/browse/SPARK-35987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35987: Assignee: Gengliang Wang (was: Apache Spark) > The ANSI flags of Sum and Avg should be kept after being copied > --- > > Key: SPARK-35987 > URL: https://issues.apache.org/jira/browse/SPARK-35987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > For Views and UDFs, it is important to show consistent results even the ANSI > configuration is different in the running session. This is why many > expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case > class parameter list. > We should make it consistent for `Sum`/`Avg` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35987) The ANSI flags of Sum and Avg should be kept after being copied
Gengliang Wang created SPARK-35987: -- Summary: The ANSI flags of Sum and Avg should be kept after being copied Key: SPARK-35987 URL: https://issues.apache.org/jira/browse/SPARK-35987 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Gengliang Wang Assignee: Gengliang Wang For Views and UDFs, it is important to show consistent results even the ANSI configuration is different in the running session. This is why many expressions like 'Add'/'Divide'/'CAST' making the ANSI flag part of its case class parameter list. We should make it consistent for `Sum`/`Avg` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35981) Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check precision
[ https://issues.apache.org/jira/browse/SPARK-35981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35981. -- Fix Version/s: 3.2.0 Assignee: Takuya Ueshin Resolution: Fixed Fixed in https://github.com/apache/spark/pull/33179 > Use check_exact=False in StatsTest.test_cov_corr_meta to loosen the check > precision > --- > > Key: SPARK-35981 > URL: https://issues.apache.org/jira/browse/SPARK-35981 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.2.0 > > > In some environment, the precision could be different in {{DataFrame.corr}} > function. > We should use {{check_exact=False}} to loosen the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument
[ https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35986: Assignee: (was: Apache Spark) > fix pyspark.rdd.RDD.histogram's buckets argument > > > Key: SPARK-35986 > URL: https://issues.apache.org/jira/browse/SPARK-35986 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2 >Reporter: Tomas Pereira de Vasconcelos >Priority: Minor > Labels: PySpark, pyspark, stubs > Original Estimate: 1m > Remaining Estimate: 1m > > I originally opened an issue and created a PR in the > [https://github.com/zero323/pyspark-stubs] repository. > Issue: [https://github.com/zero323/pyspark-stubs/issues/548] > PR: [https://github.com/zero323/pyspark-stubs/pull/549] > — > The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should > be {{Union[int, List[T], Tuple[T, ...]]}} > From {{pyspark}} source: > {code:java} > if isinstance(buckets, int): > ... > elif isinstance(buckets, (list, tuple)): > ... > else: > raise TypeError("buckets should be a list or tuple or number(int or > long)") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument
[ https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35986: Assignee: Apache Spark > fix pyspark.rdd.RDD.histogram's buckets argument > > > Key: SPARK-35986 > URL: https://issues.apache.org/jira/browse/SPARK-35986 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2 >Reporter: Tomas Pereira de Vasconcelos >Assignee: Apache Spark >Priority: Minor > Labels: PySpark, pyspark, stubs > Original Estimate: 1m > Remaining Estimate: 1m > > I originally opened an issue and created a PR in the > [https://github.com/zero323/pyspark-stubs] repository. > Issue: [https://github.com/zero323/pyspark-stubs/issues/548] > PR: [https://github.com/zero323/pyspark-stubs/pull/549] > — > The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should > be {{Union[int, List[T], Tuple[T, ...]]}} > From {{pyspark}} source: > {code:java} > if isinstance(buckets, int): > ... > elif isinstance(buckets, (list, tuple)): > ... > else: > raise TypeError("buckets should be a list or tuple or number(int or > long)") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument
[ https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373352#comment-17373352 ] Tomas Pereira de Vasconcelos commented on SPARK-35986: -- Pull request: https://github.com/apache/spark/pull/33185 > fix pyspark.rdd.RDD.histogram's buckets argument > > > Key: SPARK-35986 > URL: https://issues.apache.org/jira/browse/SPARK-35986 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2 >Reporter: Tomas Pereira de Vasconcelos >Priority: Minor > Labels: PySpark, pyspark, stubs > Original Estimate: 1m > Remaining Estimate: 1m > > I originally opened an issue and created a PR in the > [https://github.com/zero323/pyspark-stubs] repository. > Issue: [https://github.com/zero323/pyspark-stubs/issues/548] > PR: [https://github.com/zero323/pyspark-stubs/pull/549] > — > The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should > be {{Union[int, List[T], Tuple[T, ...]]}} > From {{pyspark}} source: > {code:java} > if isinstance(buckets, int): > ... > elif isinstance(buckets, (list, tuple)): > ... > else: > raise TypeError("buckets should be a list or tuple or number(int or > long)") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument
[ https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373353#comment-17373353 ] Apache Spark commented on SPARK-35986: -- User 'tpvasconcelos' has created a pull request for this issue: https://github.com/apache/spark/pull/33185 > fix pyspark.rdd.RDD.histogram's buckets argument > > > Key: SPARK-35986 > URL: https://issues.apache.org/jira/browse/SPARK-35986 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2 >Reporter: Tomas Pereira de Vasconcelos >Priority: Minor > Labels: PySpark, pyspark, stubs > Original Estimate: 1m > Remaining Estimate: 1m > > I originally opened an issue and created a PR in the > [https://github.com/zero323/pyspark-stubs] repository. > Issue: [https://github.com/zero323/pyspark-stubs/issues/548] > PR: [https://github.com/zero323/pyspark-stubs/pull/549] > — > The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should > be {{Union[int, List[T], Tuple[T, ...]]}} > From {{pyspark}} source: > {code:java} > if isinstance(buckets, int): > ... > elif isinstance(buckets, (list, tuple)): > ... > else: > raise TypeError("buckets should be a list or tuple or number(int or > long)") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument
[ https://issues.apache.org/jira/browse/SPARK-35986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Pereira de Vasconcelos updated SPARK-35986: - Description: I originally opened an issue and created a PR in the [https://github.com/zero323/pyspark-stubs] repository. Issue: [https://github.com/zero323/pyspark-stubs/issues/548] PR: [https://github.com/zero323/pyspark-stubs/pull/549] — The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should be {{Union[int, List[T], Tuple[T, ...]]}} >From {{pyspark}} source: {code:java} if isinstance(buckets, int): ... elif isinstance(buckets, (list, tuple)): ... else: raise TypeError("buckets should be a list or tuple or number(int or long)") {code} was: I originally opened an issue and created a PR in the [https://github.com/zero323/pyspark-stubs] repository. Issue: [https://github.com/zero323/pyspark-stubs/issues/548] PR: [https://github.com/zero323/pyspark-stubs/pull/549] --- The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should be {{Union[int, List[T], Tuple[T]]}} >From {{pyspark}} source: {code:java} if isinstance(buckets, int): ... elif isinstance(buckets, (list, tuple)): ... else: raise TypeError("buckets should be a list or tuple or number(int or long)") {code} > fix pyspark.rdd.RDD.histogram's buckets argument > > > Key: SPARK-35986 > URL: https://issues.apache.org/jira/browse/SPARK-35986 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2 >Reporter: Tomas Pereira de Vasconcelos >Priority: Minor > Labels: PySpark, pyspark, stubs > Original Estimate: 1m > Remaining Estimate: 1m > > I originally opened an issue and created a PR in the > [https://github.com/zero323/pyspark-stubs] repository. > Issue: [https://github.com/zero323/pyspark-stubs/issues/548] > PR: [https://github.com/zero323/pyspark-stubs/pull/549] > — > The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should > be {{Union[int, List[T], Tuple[T, ...]]}} > From {{pyspark}} source: > {code:java} > if isinstance(buckets, int): > ... > elif isinstance(buckets, (list, tuple)): > ... > else: > raise TypeError("buckets should be a list or tuple or number(int or > long)") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35986) fix pyspark.rdd.RDD.histogram's buckets argument
Tomas Pereira de Vasconcelos created SPARK-35986: Summary: fix pyspark.rdd.RDD.histogram's buckets argument Key: SPARK-35986 URL: https://issues.apache.org/jira/browse/SPARK-35986 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0 Reporter: Tomas Pereira de Vasconcelos I originally opened an issue and created a PR in the [https://github.com/zero323/pyspark-stubs] repository. Issue: [https://github.com/zero323/pyspark-stubs/issues/548] PR: [https://github.com/zero323/pyspark-stubs/pull/549] --- The type hint for {{pyspark.rdd.RDD.histogram}}'s {{buckets}} argument should be {{Union[int, List[T], Tuple[T]]}} >From {{pyspark}} source: {code:java} if isinstance(buckets, int): ... elif isinstance(buckets, (list, tuple)): ... else: raise TypeError("buckets should be a list or tuple or number(int or long)") {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35785) Cleanup support for RocksDB instance
[ https://issues.apache.org/jira/browse/SPARK-35785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-35785: - Fix Version/s: 3.2.0 > Cleanup support for RocksDB instance > > > Key: SPARK-35785 > URL: https://issues.apache.org/jira/browse/SPARK-35785 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.2.0, 3.3.0 > > > Add the functionality of cleaning up files of old versions for the RocksDB > instance and RocksDBFileManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35785) Cleanup support for RocksDB instance
[ https://issues.apache.org/jira/browse/SPARK-35785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373319#comment-17373319 ] Apache Spark commented on SPARK-35785: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/33184 > Cleanup support for RocksDB instance > > > Key: SPARK-35785 > URL: https://issues.apache.org/jira/browse/SPARK-35785 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.3.0 > > > Add the functionality of cleaning up files of old versions for the RocksDB > instance and RocksDBFileManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org