[jira] [Created] (SPARK-47982) Update code style' plugins to latest version

2024-04-24 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-47982:
---

 Summary: Update code style' plugins to latest version
 Key: SPARK-47982
 URL: https://issues.apache.org/jira/browse/SPARK-47982
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840644#comment-17840644
 ] 

Dongjoon Hyun commented on SPARK-46122:
---

I sent the discussion thread for this issue.

- [https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd]

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-44304) Broadcast operation is not required when no parameters are specified

2024-04-24 Thread 7mming7 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

7mming7 closed SPARK-44304.
---

> Broadcast operation is not required when no parameters are specified
> 
>
> Key: SPARK-44304
> URL: https://issues.apache.org/jira/browse/SPARK-44304
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: 7mming7
>Priority: Minor
>
> The ability introduced by SPARK-14912, we can broadcast the parameters of the 
> data source to the read and write operations, but if the user does not 
> specify a specific parameter, the propagation operation will also be 
> performed, which affects the performance has a greater impact, so we need to 
> avoid broadcasting the full Hadoop parameters when the user does not specify 
> a specific parameter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46122:
--
Summary: Set `spark.sql.legacy.createHiveTableByDefault` to `false` by 
default  (was: Set `spark.sql.legacy.createHiveTableByDefault` to false by 
default)

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to false by default

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46122:
--
Summary: Set `spark.sql.legacy.createHiveTableByDefault` to false by 
default  (was: Disable spark.sql.legacy.createHiveTableByDefault by default)

> Set `spark.sql.legacy.createHiveTableByDefault` to false by default
> ---
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47962) Improve doc test in pyspark dataframe

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47962.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46189
[https://github.com/apache/spark/pull/46189]

> Improve doc test in pyspark dataframe
> -
>
> Key: SPARK-47962
> URL: https://issues.apache.org/jira/browse/SPARK-47962
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The doc test for dataframe's observe API doesn't use a streaming DF which is 
> wrong. We should start a streaming df to make sure it runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47979.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46211
[https://github.com/apache/spark/pull/46211]

> Use Hive tables explicitly for Hive table capability tests
> --
>
> Key: SPARK-47979
> URL: https://issues.apache.org/jira/browse/SPARK-47979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47921) Fix ExecuteJobTag creation in ExecuteHolder

2024-04-24 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47921.
---
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46140
[https://github.com/apache/spark/pull/46140]

> Fix ExecuteJobTag creation in ExecuteHolder
> ---
>
> Key: SPARK-47921
> URL: https://issues.apache.org/jira/browse/SPARK-47921
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47979:
---
Labels: pull-request-available  (was: )

> Use Hive tables explicitly for Hive table capability tests
> --
>
> Key: SPARK-47979
> URL: https://issues.apache.org/jira/browse/SPARK-47979
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47979) Use Hive table explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47979:
-

 Summary: Use Hive table explicitly for Hive table capability tests
 Key: SPARK-47979
 URL: https://issues.apache.org/jira/browse/SPARK-47979
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47979:
--
Summary: Use Hive tables explicitly for Hive table capability tests  (was: 
Use Hive table explicitly for Hive table capability tests)

> Use Hive tables explicitly for Hive table capability tests
> --
>
> Key: SPARK-47979
> URL: https://issues.apache.org/jira/browse/SPARK-47979
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47977) DateTimeUtils.timestampDiff and DateTimeUtils.timestampAdd should not throw INTERNAL_ERROR exception

2024-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47977:
---
Labels: pull-request-available  (was: )

> DateTimeUtils.timestampDiff and DateTimeUtils.timestampAdd should not throw 
> INTERNAL_ERROR exception
> 
>
> Key: SPARK-47977
> URL: https://issues.apache.org/jira/browse/SPARK-47977
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Vitalii Li
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/pull/44263 converted `IllegalStateException` 
> to `InternalError` when a unit for DateTimeUtils.timestampDiff and 
> DateTimeUtils.timestampAdd is not from a sanctioned list.
> Originally incorrect unit should have been caught in parser before an 
> expression is constructed. However, PySpark introduced PythonSQLUtils that 
> creates expressions without validation. Since then the `INTERNAL_ERROR` is 
> incorrect error class and should be converted to execution error with correct 
> error class instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47842) Spark job relying over Hudi are blocked after one or zero commit

2024-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47842:
---
Labels: pull-request-available  (was: )

> Spark job relying over Hudi are blocked after one or zero commit
> 
>
> Key: SPARK-47842
> URL: https://issues.apache.org/jira/browse/SPARK-47842
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Structured Streaming
>Affects Versions: 3.3.0
> Environment: Hudi version : 0.12.1-amzn-0
> Spark version : 3.3.0
> Hive version : 3.1.3
> Hadoop version : 3.3.3 amz
> Storage (HDFS/S3/GCS..) : S3
> Running on Docker? (yes/no) : no (EMR 6.9.0)
> Additional context
>Reporter: alessandro pontis
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: console_spark.png
>
>
> Hello, we are facing the fact that some pyspark job that rely on Hudi seems 
> to be blocked, in fact if we go over the spark console we can see the 
> situation in the attachment
> we can see that we have 71 completed jobs but those are CDC process that 
> should read from Kafka topic continuously. We verified yet that there are 
> messages queued over the kafka topic. If you kill the application and then 
> restart in some cases the job will act normally and other times the job still 
> remain stacked.
> Our deploy condition are the following:
> We read INSERT, UPDATE and DELETE operation from a Kafka topic and we 
> replicate them in a target hudi table stored on Hive via a pyspark job 
> running 24/7
>  
> PYSPARK WRITE
> df_source.writeStream.foreachBatch(foreach_batch_write_function)
>  {{ FOR EACH BATCH FUNCTION:
> #management of delete messages
> batchDF_deletes.write.format('hudi') \
> .option('hoodie.datasource.write.operation', 'delete') \
> .options(**hudiOptions_table) \
> .mode('append') \
> .save(S3_OUTPUT_PATH)
> #management of update and insert messages
> batchDF_upserts.write.format('org.apache.hudi') \
> .option('hoodie.datasource.write.operation', 'upsert') \
> .options(**hudiOptions_table) \
> .mode('append') \
> .save(S3_OUTPUT_PATH)}}
>  
> SPARK SUBMIT
> spark-submit --master yarn --deploy-mode cluster --num-executors 1 
> --executor-memory 1G --executor-cores 2 --conf 
> spark.dynamicAllocation.enabled=false --packages 
> org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 --conf 
> spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
> spark.sql.hive.convertMetastoreParquet=false --jars 
> /usr/lib/hudi/hudi-spark-bundle.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47977) DateTimeUtils.timestampDiff and DateTimeUtils.timestampAdd should not throw INTERNAL_ERROR exception

2024-04-24 Thread Vitalii Li (Jira)
Vitalii Li created SPARK-47977:
--

 Summary: DateTimeUtils.timestampDiff and 
DateTimeUtils.timestampAdd should not throw INTERNAL_ERROR exception
 Key: SPARK-47977
 URL: https://issues.apache.org/jira/browse/SPARK-47977
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Vitalii Li


https://github.com/apache/spark/pull/44263 converted `IllegalStateException` to 
`InternalError` when a unit for DateTimeUtils.timestampDiff and 
DateTimeUtils.timestampAdd is not from a sanctioned list.

Originally incorrect unit should have been caught in parser before an 
expression is constructed. However, PySpark introduced PythonSQLUtils that 
creates expressions without validation. Since then the `INTERNAL_ERROR` is 
incorrect error class and should be converted to execution error with correct 
error class instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47583) SQL core: Migrate logError with variables to structured logging framework

2024-04-24 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47583.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45969
[https://github.com/apache/spark/pull/45969]

> SQL core: Migrate logError with variables to structured logging framework
> -
>
> Key: SPARK-47583
> URL: https://issues.apache.org/jira/browse/SPARK-47583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error

2024-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47939:
---
Labels: pull-request-available  (was: )

> Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER 
> error
> 
>
> Key: SPARK-47939
> URL: https://issues.apache.org/jira/browse/SPARK-47939
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> *Succeeds:* scala> spark.sql("select ?", Array(1)).show();
> *Fails:* spark.sql("describe select ?", Array(1)).show();
> *Fails:* spark.sql("explain select ?", Array(1)).show();
> Failures are of the form:
> org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` 
> and provide a mapping of the parameter to either a SQL literal or collection 
> constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 
> 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- 
> OneRowRelation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45265) Support Hive 4.0 metastore

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45265:
-

Assignee: (was: Attila Zsolt Piros)

> Support Hive 4.0 metastore
> --
>
> Key: SPARK-45265
> URL: https://issues.apache.org/jira/browse/SPARK-45265
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>  Labels: pull-request-available
>
> Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
> will support support the pushdowns of partition column filters with 
> VARCHAR/CHAR types.
> For details please see HIVE-26661: Support partition filter for char and 
> varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44677) Drop legacy Hive-based ORC file format

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44677:
--
Parent: (was: SPARK-44111)
Issue Type: Task  (was: Sub-task)

> Drop legacy Hive-based ORC file format
> --
>
> Key: SPARK-44677
> URL: https://issues.apache.org/jira/browse/SPARK-44677
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>
> Currently, Spark allows to use spark.sql.orc.impl=native/hive to switch the 
> ORC FileFormat implementation.
> SPARK-23456(2.4) switched the default value of spark.sql.orc.impl from "hive" 
> to "native". and prepared to drop the "hive" implementation in the future.
> > ... eventually, Apache Spark will drop old Hive-based ORC code.
> The native implementation works well during the whole Spark 3.x period, so 
> it's a good time to consider dropping the "hive" one in Spark 4.0.
> Also, we should take care about the backward-compatibility during change.
> > BTW, IIRC, there was a different at Hive ORC CHAR implementation before. 
> > So, we couldn't remove it for backward-compatibility issues. Since Spark 
> > implements many CHAR features, we need to re-verify that {{native}} 
> > implementation has all legacy Hive-based ORC features



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47499) Reuse `test_help_command` in Connect

2024-04-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840514#comment-17840514
 ] 

Dongjoon Hyun commented on SPARK-47499:
---

Thank you for collecting this to the umbrella Jira, [~podongfeng] . 

> Reuse `test_help_command` in Connect
> 
>
> Key: SPARK-47499
> URL: https://issues.apache.org/jira/browse/SPARK-47499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47633:
-

Assignee: Bruce Robbins

> Cache miss for queries using JOIN LATERAL with join condition
> -
>
> Key: SPARK-47633
> URL: https://issues.apache.org/jira/browse/SPARK-47633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> For example:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v1 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2)
> on c1 = a;
> cache table v1;
> explain select * from v1;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
>:- LocalTableScan [c1#180, c2#181]
>+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> false] as bigint)),false), [plan_id=113]
>   +- LocalTableScan [a#173, b#174]
> {noformat}
> Note that there is no {{InMemoryRelation}}.
> However, if you move the join condition into the subquery, the cached plan is 
> used:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v2 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2
>   where t1.c1 = t2.c1);
> cache table v2;
> explain select * from v2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
>   +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
> memory, deserialized, 1 replicas)
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   *(1) Project [c1#26, c2#27, a#19, b#20]
>   +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, 
> BuildLeft, false
>  :- BroadcastQueryStage 0
>  :  +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  : +- LocalTableScan [c1#26, c2#27]
>  +- *(1) LocalTableScan [a#19, b#20, c1#30]
>+- == Initial Plan ==
>   Project [c1#26, c2#27, a#19, b#20]
>   +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
> false
>  :- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  :  +- LocalTableScan [c1#26, c2#27]
>  +- LocalTableScan [a#19, b#20, c1#30]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47633.
---
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46190
[https://github.com/apache/spark/pull/46190]

> Cache miss for queries using JOIN LATERAL with join condition
> -
>
> Key: SPARK-47633
> URL: https://issues.apache.org/jira/browse/SPARK-47633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> For example:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v1 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2)
> on c1 = a;
> cache table v1;
> explain select * from v1;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
>:- LocalTableScan [c1#180, c2#181]
>+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> false] as bigint)),false), [plan_id=113]
>   +- LocalTableScan [a#173, b#174]
> {noformat}
> Note that there is no {{InMemoryRelation}}.
> However, if you move the join condition into the subquery, the cached plan is 
> used:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v2 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2
>   where t1.c1 = t2.c1);
> cache table v2;
> explain select * from v2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
>   +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
> memory, deserialized, 1 replicas)
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   *(1) Project [c1#26, c2#27, a#19, b#20]
>   +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, 
> BuildLeft, false
>  :- BroadcastQueryStage 0
>  :  +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  : +- LocalTableScan [c1#26, c2#27]
>  +- *(1) LocalTableScan [a#19, b#20, c1#30]
>+- == Initial Plan ==
>   Project [c1#26, c2#27, a#19, b#20]
>   +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
> false
>  :- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  :  +- LocalTableScan [c1#26, c2#27]
>  +- LocalTableScan [a#19, b#20, c1#30]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46841) Language support for collations

2024-04-24 Thread Nikola Mandic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikola Mandic updated SPARK-46841:
--
Description: 
Languages and localization for collations are supported by ICU library. 
Collation naming format is as follows:
{code:java}
<2-letter language code>[_<4-letter script>][_<3-letter country 
code>][_specifier_specifier...]{code}
Locale specifier consists of the first part of collation name (language + 
script + country). Locale specifiers need to be stable across ICU versions; to 
keep existing ids and names invariant we introduce golden file will locale 
table which should case CI failure on any silent changes.

Currently supported optional specifiers:
 * CS/CI - case sensitivity, default is case-sensitive; supported by 
configuring ICU collation levels
 * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
configuring ICU collation levels
 * /LCASE/UCASE - case conversion performed prior to comparisons; 
supported by internal implementation relying on ICU locale-aware conversions

User can use collation specifiers in any order except of locale which is 
mandatory and must go first. There is a one-to-one mapping between collation 
ids and collation names defined in CollationFactory.

  was:
Languages and localization for collations are supported by ICU library. 
Collation naming format is as follows:
{code:java}
<2-letter language code>__[_specifier_specifier...]{code}
Locale specifier consists of the first part of collation name (language + 
script + country). Locale specifiers need to be stable across ICU versions; to 
keep existing ids and names invariant we introduce golden file will locale 
table which should case CI failure on any silent changes.

Currently supported optional specifiers:
 * CS/CI - case sensitivity, default is case-sensitive; supported by 
configuring ICU collation levels
 * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
configuring ICU collation levels
 * /LCASE/UCASE - case conversion performed prior to comparisons; 
supported by internal implementation relying on ICU locale-aware conversions

User can use collation specifiers in any order except of locale which is 
mandatory and must go first. There is a one-to-one mapping between collation 
ids and collation names defined in CollationFactory.


> Language support for collations
> ---
>
> Key: SPARK-46841
> URL: https://issues.apache.org/jira/browse/SPARK-46841
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>
> Languages and localization for collations are supported by ICU library. 
> Collation naming format is as follows:
> {code:java}
> <2-letter language code>[_<4-letter script>][_<3-letter country 
> code>][_specifier_specifier...]{code}
> Locale specifier consists of the first part of collation name (language + 
> script + country). Locale specifiers need to be stable across ICU versions; 
> to keep existing ids and names invariant we introduce golden file will locale 
> table which should case CI failure on any silent changes.
> Currently supported optional specifiers:
>  * CS/CI - case sensitivity, default is case-sensitive; supported by 
> configuring ICU collation levels
>  * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
> configuring ICU collation levels
>  * /LCASE/UCASE - case conversion performed prior to 
> comparisons; supported by internal implementation relying on ICU locale-aware 
> conversions
> User can use collation specifiers in any order except of locale which is 
> mandatory and must go first. There is a one-to-one mapping between collation 
> ids and collation names defined in CollationFactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46841) Language support for collations

2024-04-24 Thread Nikola Mandic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikola Mandic updated SPARK-46841:
--
Description: 
Languages and localization for collations are supported by ICU library. 
Collation naming format is as follows:
{code:java}
<2-letter language code>__[_specifier_specifier...]{code}
Locale specifier consists of the first part of collation name (language + 
script + country). Locale specifiers need to be stable across ICU versions; to 
keep existing ids and names invariant we introduce golden file will locale 
table which should case CI failure on any silent changes.

Currently supported optional specifiers:
 * CS/CI - case sensitivity, default is case-sensitive; supported by 
configuring ICU collation levels
 * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
configuring ICU collation levels
 * /LCASE/UCASE - case conversion performed prior to comparisons; 
supported by internal implementation relying on ICU locale-aware conversions

User can use collation specifiers in any order except of locale which is 
mandatory and must go first. There is a one-to-one mapping between collation 
ids and collation names defined in CollationFactory.

  was:
Languages and localization for collations are supported by ICU library. 
Collation naming format is as follows:
{code:java}
<2-letter language code>__<3-letter country 
code>[_specifier_specifier...]{code}
Locale specifier consists of the first part of collation name (language + 
script + country). Locale specifiers need to be stable across ICU versions; to 
keep existing ids and names invariant we introduce golden file will locale 
table which should case CI failure on any silent changes.

Currently supported optional specifiers:
 * CS/CI - case sensitivity, default is case-sensitive; supported by 
configuring ICU collation levels
 * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
configuring ICU collation levels
 * /LCASE/UCASE - case conversion performed prior to comparisons; 
supported by internal implementation relying on ICU locale-aware conversions

User can use collation specifiers in any order except of locale which is 
mandatory and must go first. There is a one-to-one mapping between collation 
ids and collation names defined in CollationFactory.


> Language support for collations
> ---
>
> Key: SPARK-46841
> URL: https://issues.apache.org/jira/browse/SPARK-46841
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>
> Languages and localization for collations are supported by ICU library. 
> Collation naming format is as follows:
> {code:java}
> <2-letter language code>__ country code>[_specifier_specifier...]{code}
> Locale specifier consists of the first part of collation name (language + 
> script + country). Locale specifiers need to be stable across ICU versions; 
> to keep existing ids and names invariant we introduce golden file will locale 
> table which should case CI failure on any silent changes.
> Currently supported optional specifiers:
>  * CS/CI - case sensitivity, default is case-sensitive; supported by 
> configuring ICU collation levels
>  * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
> configuring ICU collation levels
>  * /LCASE/UCASE - case conversion performed prior to 
> comparisons; supported by internal implementation relying on ICU locale-aware 
> conversions
> User can use collation specifiers in any order except of locale which is 
> mandatory and must go first. There is a one-to-one mapping between collation 
> ids and collation names defined in CollationFactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46841) Language support for collations

2024-04-24 Thread Nikola Mandic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikola Mandic updated SPARK-46841:
--
Component/s: SQL
Description: 
Languages and localization for collations are supported by ICU library. 
Collation naming format is as follows:
{code:java}
<2-letter language code>__<3-letter country 
code>[_specifier_specifier...]{code}
Locale specifier consists of the first part of collation name (language + 
script + country). Locale specifiers need to be stable across ICU versions; to 
keep existing ids and names invariant we introduce golden file will locale 
table which should case CI failure on any silent changes.

Currently supported optional specifiers:
 * CS/CI - case sensitivity, default is case-sensitive; supported by 
configuring ICU collation levels
 * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
configuring ICU collation levels
 * /LCASE/UCASE - case conversion performed prior to comparisons; 
supported by internal implementation relying on ICU locale-aware conversions

User can use collation specifiers in any order except of locale which is 
mandatory and must go first. There is a one-to-one mapping between collation 
ids and collation names defined in CollationFactory.

> Language support for collations
> ---
>
> Key: SPARK-46841
> URL: https://issues.apache.org/jira/browse/SPARK-46841
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>
> Languages and localization for collations are supported by ICU library. 
> Collation naming format is as follows:
> {code:java}
> <2-letter language code>__<3-letter country 
> code>[_specifier_specifier...]{code}
> Locale specifier consists of the first part of collation name (language + 
> script + country). Locale specifiers need to be stable across ICU versions; 
> to keep existing ids and names invariant we introduce golden file will locale 
> table which should case CI failure on any silent changes.
> Currently supported optional specifiers:
>  * CS/CI - case sensitivity, default is case-sensitive; supported by 
> configuring ICU collation levels
>  * AS/AI - accent sensitivity; default is accent-sensitive; supported by 
> configuring ICU collation levels
>  * /LCASE/UCASE - case conversion performed prior to 
> comparisons; supported by internal implementation relying on ICU locale-aware 
> conversions
> User can use collation specifiers in any order except of locale which is 
> mandatory and must go first. There is a one-to-one mapping between collation 
> ids and collation names defined in CollationFactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47974) Remove install_scala from build/mvn

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47974:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Remove install_scala from build/mvn
> ---
>
> Key: SPARK-47974
> URL: https://issues.apache.org/jira/browse/SPARK-47974
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47974) Remove install_scala from build/mvn

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47974.
---
Fix Version/s: 4.0.0
 Assignee: Cheng Pan
   Resolution: Fixed

This is resolved via [https://github.com/apache/spark/pull/46204]

> Remove install_scala from build/mvn
> ---
>
> Key: SPARK-47974
> URL: https://issues.apache.org/jira/browse/SPARK-47974
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44478) Executor decommission causes stage failure

2024-04-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-44478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840442#comment-17840442
 ] 

Juha Iso-Sipilä commented on SPARK-44478:
-

Just a short update. This is clearly related to evictions/lost nodes/etc. I've 
now retried with spark 3.5.1 and 3.4.3 with a cluster configuration that caused 
excessive pod evictions (by Karpenter). I had a handful of different tasks that 
all consistently failed during the execution with this error. But, not all 
tasks were equal; while having a pipeline of tasks, the map tasks mostly 
succeeded but those doing shuffles (join, sort, groupby) were the ones that 
mostly failed. I suppose decommissioning has an impact to resolving lost 
shuffle data so issue may lie somewhere there.

Our SRE team then fixed the eviction issue and suddenly all tasks with the 
above spark versions as well as 3.5.0 work normally. Conclusion: No evictions 
=> no failures. I don't know what the mechanism of triggering this is but it is 
related to pods decommissioned by k8s. I am going to follow with this ticket in 
case I see any more issues.

> Executor decommission causes stage failure
> --
>
> Key: SPARK-44478
> URL: https://issues.apache.org/jira/browse/SPARK-44478
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Dale Huettenmoser
>Priority: Minor
>
> During spark execution, save fails due to executor decommissioning. Issue not 
> present in 3.3.0
> Sample error:
>  
> {code:java}
> An error occurred while calling o8948.save.
> : org.apache.spark.SparkException: Job aborted due to stage failure: 
> Authorized committer (attemptNumber=0, stage=170, partition=233) failed; but 
> task commit success, data duplication may happen. 
> reason=ExecutorLostFailure(1,false,Some(Executor decommission: Executor 1 is 
> decommissioned.))
>     at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
>     at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>     at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>     at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1(DAGScheduler.scala:1199)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1$adapted(DAGScheduler.scala:1199)
>     at scala.Option.foreach(Option.scala:407)
>     at 
> org.apache.spark.scheduler.DAGScheduler.handleStageFailed(DAGScheduler.scala:1199)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2981)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>     at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:307)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:271)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190)
>     at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190)
>     at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
>     at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
>     at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
>     at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
>     at 
> 

[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell updated SPARK-47819:
--
Fix Version/s: 3.5.2

> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in rare 
> cases, interrupting the execution thread of a query in a session can take 
> hours, causing the entire maintenance process to stall, resulting in a large 
> amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above scenarios. To be more specific, 
> instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
> post-cleanup function as the callback through 
> {{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
> the execution runner is completed or interrupted. In this way, the 
> maintenance thread won't get blocked on {{{}join{}}}ing an execution thread.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47958) Task Scheduler may not know about executor when using LocalSchedulerBackend

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47958.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46187
[https://github.com/apache/spark/pull/46187]

> Task Scheduler may not know about executor when using LocalSchedulerBackend
> ---
>
> Key: SPARK-47958
> URL: https://issues.apache.org/jira/browse/SPARK-47958
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Davin Tjong
>Assignee: Davin Tjong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When using LocalSchedulerBackend, the task scheduler will not know about the 
> executor until a task is run, which can lead to unexpected behavior in tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47965.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46197
[https://github.com/apache/spark/pull/46197]

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47965) Avoid orNull in TypedConfigBuilder and OptionalConfigEntry

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47965:


Assignee: Hyukjin Kwon

> Avoid orNull in TypedConfigBuilder and OptionalConfigEntry
> --
>
> Key: SPARK-47965
> URL: https://issues.apache.org/jira/browse/SPARK-47965
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> Configuration values/keys cannot be nulls. We should fix:
> {code}
> diff --git 
> a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala 
> b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> index 1f19e9444d38..d06535722625 100644
> --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala
> @@ -94,7 +94,7 @@ private[spark] class TypedConfigBuilder[T](
>import ConfigHelpers._
>def this(parent: ConfigBuilder, converter: String => T) = {
> -this(parent, converter, Option(_).map(_.toString).orNull)
> +this(parent, converter, { v: T => v.toString })
>}
>/** Apply a transformation to the user-provided values of the config 
> entry. */
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47974) Remove install_scala from build/mvn

2024-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47974:
---
Labels: pull-request-available  (was: )

> Remove install_scala from build/mvn
> ---
>
> Key: SPARK-47974
> URL: https://issues.apache.org/jira/browse/SPARK-47974
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47974) Remove install_scala from build/mvn

2024-04-24 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-47974:
-

 Summary: Remove install_scala from build/mvn
 Key: SPARK-47974
 URL: https://issues.apache.org/jira/browse/SPARK-47974
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47961) CREATE TABLE AS SELECT changes behaviour in SPARK 3.4.0

2024-04-24 Thread Eugen Stoianovici (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugen Stoianovici updated SPARK-47961:
--
Component/s: Spark Core

> CREATE TABLE AS SELECT changes behaviour in SPARK 3.4.0
> ---
>
> Key: SPARK-47961
> URL: https://issues.apache.org/jira/browse/SPARK-47961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Eugen Stoianovici
>Priority: Major
>
> SPARK-41859 changes the behaviour for `CREATE TABLE AS SELECT ...` from 
> OVERWRITE to APPEND when {{spark.sql.legacy.allowNonEmptyLocationInCTAS}} is 
> set to {{{}true{}}}:
> {{drop table if exists test_table;}}
> {{create table test_table location '/tmp/test_table' stored as parquet as 
> select 1 as col union all select 2 as col;}}
> {{drop table if exists test_table;}}
> {{create table test_table location '/tmp/test_table' stored as parquet as 
> select 3 as col union all select 4 as col;}}
> {{select * from test_table;}}
> This produces {3, 4} in Spark <3.4.0 and {1, 2, 3, 4} in Spark 3.4.0 and 
> later. This is a silent change in 
> {{spark.sql.legacy.allowNonEmptyLocationInCTAS}} behaviour which introduces 
> wrong results in the user application
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47962) Improve doc test in pyspark dataframe

2024-04-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47962:
---
Labels: pull-request-available  (was: )

> Improve doc test in pyspark dataframe
> -
>
> Key: SPARK-47962
> URL: https://issues.apache.org/jira/browse/SPARK-47962
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> The doc test for dataframe's observe API doesn't use a streaming DF which is 
> wrong. We should start a streaming df to make sure it runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47764:
---

Assignee: Bo Zhang

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47764.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45930
[https://github.com/apache/spark/pull/45930]

> Cleanup shuffle dependencies for Spark Connect SQL executions
> -
>
> Key: SPARK-47764
> URL: https://issues.apache.org/jira/browse/SPARK-47764
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Shuffle dependencies are created by shuffle map stages, which consists of 
> files on disks and the corresponding references in Spark JVM heap memory. 
> Currently Spark cleanup unused  shuffle dependencies through JVM GCs, and 
> periodic GCs are triggered once every 30 minutes (see ContextCleaner). 
> However, we still found cases in which the size of the shuffle data files are 
> too large, which makes shuffle data migration slow.
>  
> We do have chances to cleanup shuffle dependencies, especially for SQL 
> queries created by Spark Connect, since we do have better control of the 
> DataFrame instances there. Even if DataFrame instances are reused in the 
> client side, on the server side the instances are still recreated. 
>  
> We might also provide the option to 1. cleanup eagerly after each query 
> executions, or 2. only mark the shuffle executions and do not migrate them at 
> node decommissions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation

2024-04-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47418.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46181
[https://github.com/apache/spark/pull/46181]

> Optimize string predicate expressions for UTF8_BINARY_LCASE collation
> -
>
> Key: SPARK-47418
> URL: https://issues.apache.org/jira/browse/SPARK-47418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string 
> Spark functions using optimized lowercase comparison approach introduced by 
> [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to 
> the latest design and code structure imposed by [~uros-db] in 
> https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation 
> support is introduced for Spark SQL expressions. In addition, review previous 
> Jira tickets under the current parent in order to understand how 
> *StringPredicate* expressions are currently used and tested in Spark:
>  * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131]
>  * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248]
>  * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295]
> These tickets should help you understand what changes were introduced in 
> order to enable collation support for these functions. Lastly, feel free to 
> use your chosen Spark SQL Editor to play around with the existing functions 
> and learn more about how they work.
>  
> The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE 
> implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* 
> functions so that they use optimized lowercase comparison approach (following 
> the general logic in Nikola's PR), and benchmark the results accordingly. As 
> for testing, the currently existing unit test cases and end-to-end tests 
> should already fully cover the expected behaviour of *StringPredicate* 
> expressions for all collation types. In other words, the objective of this 
> ticket is only to enhance the internal implementation, without introducing 
> any user-facing changes to Spark SQL API.
>  
> Finally, feel free to refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47972) Restrict CAST expression for collations

2024-04-24 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47972:
--
Description: Current state of code allows for calls like CAST(1 AS STRING 
COLLATE UNICODE). We want to restrict CAST expression to only be able to cast 
to default collation string, and to only allow COLLATE expression to produce 
explicitly collated strings.

> Restrict CAST expression for collations
> ---
>
> Key: SPARK-47972
> URL: https://issues.apache.org/jira/browse/SPARK-47972
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>
> Current state of code allows for calls like CAST(1 AS STRING COLLATE 
> UNICODE). We want to restrict CAST expression to only be able to cast to 
> default collation string, and to only allow COLLATE expression to produce 
> explicitly collated strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47927) Nullability after join not respected in UDF

2024-04-24 Thread Emil Ejbyfeldt (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ejbyfeldt updated SPARK-47927:
---
Labels: correctness pull-request-available  (was: pull-request-available)

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Priority: Major
>  Labels: correctness, pull-request-available
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47964) Hide SQLContext and HiveContext

2024-04-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47964.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46194
[https://github.com/apache/spark/pull/46194]

> Hide SQLContext and HiveContext
> ---
>
> Key: SPARK-47964
> URL: https://issues.apache.org/jira/browse/SPARK-47964
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47499) Reuse `test_help_command` in Connect

2024-04-24 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-47499:
--
Parent: SPARK-47970
Issue Type: Sub-task  (was: Test)

> Reuse `test_help_command` in Connect
> 
>
> Key: SPARK-47499
> URL: https://issues.apache.org/jira/browse/SPARK-47499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47970) Revisit skipped parity tests for PySpark

2024-04-24 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-47970:
-

 Summary: Revisit skipped parity tests for PySpark
 Key: SPARK-47970
 URL: https://issues.apache.org/jira/browse/SPARK-47970
 Project: Spark
  Issue Type: Umbrella
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47969) Make `test_creation_index` deterministic

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47969.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46200
[https://github.com/apache/spark/pull/46200]

> Make `test_creation_index` deterministic
> 
>
> Key: SPARK-47969
> URL: https://issues.apache.org/jira/browse/SPARK-47969
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org