[jira] [Updated] (SPARK-48267) Regression e2e test with SPARK-47305

2024-05-13 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-48267:
-
Fix Version/s: 3.5.2

> Regression e2e test with SPARK-47305
> 
>
> Key: SPARK-48267
> URL: https://issues.apache.org/jira/browse/SPARK-48267
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the 
> issue and verify the fix works, but the scope of test was just to bugfix 
> itself because we had no clear idea of (simpler) reproducer for e2e example.
> We finally came up with simple reproducer which is e2e streaming query. We'd 
> like to put this reproducer into test as regression test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48267) Regression e2e test with SPARK-47305

2024-05-13 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-48267.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46569
[https://github.com/apache/spark/pull/46569]

> Regression e2e test with SPARK-47305
> 
>
> Key: SPARK-48267
> URL: https://issues.apache.org/jira/browse/SPARK-48267
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the 
> issue and verify the fix works, but the scope of test was just to bugfix 
> itself because we had no clear idea of (simpler) reproducer for e2e example.
> We finally came up with simple reproducer which is e2e streaming query. We'd 
> like to put this reproducer into test as regression test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48267) Regression e2e test with SPARK-47305

2024-05-13 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-48267:


Assignee: Jungtaek Lim

> Regression e2e test with SPARK-47305
> 
>
> Key: SPARK-48267
> URL: https://issues.apache.org/jira/browse/SPARK-48267
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
>
> In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the 
> issue and verify the fix works, but the scope of test was just to bugfix 
> itself because we had no clear idea of (simpler) reproducer for e2e example.
> We finally came up with simple reproducer which is e2e streaming query. We'd 
> like to put this reproducer into test as regression test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48157) CSV expressions (all collations)

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48157.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46504
[https://github.com/apache/spark/pull/46504]

> CSV expressions (all collations)
> 
>
> Key: SPARK-48157
> URL: https://issues.apache.org/jira/browse/SPARK-48157
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for *CSV* built-in string functions in Spark 
> ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm 
> what is the expected behaviour for these functions when given collated 
> strings, and then move on to implementation and testing. You will find these 
> expressions in the *csvExpressions.scala* file, and they should mostly be 
> pass-through functions. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *CSV* expressions so that 
> they support all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, 
> UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48157) CSV expressions (all collations)

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48157:
---

Assignee: Uroš Bojanić

> CSV expressions (all collations)
> 
>
> Key: SPARK-48157
> URL: https://issues.apache.org/jira/browse/SPARK-48157
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for *CSV* built-in string functions in Spark 
> ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm 
> what is the expected behaviour for these functions when given collated 
> strings, and then move on to implementation and testing. You will find these 
> expressions in the *csvExpressions.scala* file, and they should mostly be 
> pass-through functions. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *CSV* expressions so that 
> they support all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, 
> UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48157) CSV expressions (all collations)

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48157:
---
Labels: pull-request-available  (was: )

> CSV expressions (all collations)
> 
>
> Key: SPARK-48157
> URL: https://issues.apache.org/jira/browse/SPARK-48157
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for *CSV* built-in string functions in Spark 
> ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm 
> what is the expected behaviour for these functions when given collated 
> strings, and then move on to implementation and testing. You will find these 
> expressions in the *csvExpressions.scala* file, and they should mostly be 
> pass-through functions. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *CSV* expressions so that 
> they support all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, 
> UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48229) inputFile expressions (all collations)

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48229.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46503
[https://github.com/apache/spark/pull/46503]

> inputFile expressions (all collations)
> --
>
> Key: SPARK-48229
> URL: https://issues.apache.org/jira/browse/SPARK-48229
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48265:
---

Assignee: angerszhu

> Infer window group limit batch should do constant folding
> -
>
> Key: SPARK-48265
> URL: https://issues.apache.org/jira/browse/SPARK-48265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Result of Batch LocalRelation ===
>  GlobalLimit 21                                                               
>                                                               GlobalLimit 21
>  +- LocalLimit 21                                                             
>                                                               +- LocalLimit 21
> !   +- Union false, false                                                     
>                                                                  +- 
> LocalLimit 21
> !      :- LocalLimit 21                                                       
>                                                                     +- 
> Project [item_id#647L]
> !      :  +- Project [item_id#647L]                                           
>                                                                        +- 
> Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))
> !      :     +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) 
> AND (grass_region#735 = BR)) AND isnotnull(grass_region#735))               
> +- Relation db.table[,... 91 more fields] parquet
> !      :        +- Relation db.table[,... 91 more fields] parquet
> !      +- LocalLimit 21
> !         +- Project [item_id#738L]
> !            +- LocalRelation , [, ... 91 more fields]
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian 
> Products has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no 
> effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> NormalizeFloatingNumbers has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> ReplaceUpdateFieldsExpression has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only 
> Query has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has 
> no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from 
> PartitionPruning has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that 
> cannot be pushed down has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
>  GlobalLimit 21                                                               
>                                                            GlobalLimit 21
> !+- LocalLimit 21                                                             
>                                                            +- LocalLimit 
> least(, ... 2 more fields)
> !   +- LocalLimit 21                                                          
>                                                               +- Project 
> [item_id#647L]
> !      +- Project [item_id#647L]                                              
>                                                                  +- Filter 
> (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
> BR)) AND isnotnull(grass_region#735))
> !         +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))            +- 
> Relation db.table[,... 91 more fields] parquet
> !            +- Relation db.table[,... 91 more fields] parquet
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48265.
-
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46568
[https://github.com/apache/spark/pull/46568]

> Infer window group limit batch should do constant folding
> -
>
> Key: SPARK-48265
> URL: https://issues.apache.org/jira/browse/SPARK-48265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> {code:java}
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Result of Batch LocalRelation ===
>  GlobalLimit 21                                                               
>                                                               GlobalLimit 21
>  +- LocalLimit 21                                                             
>                                                               +- LocalLimit 21
> !   +- Union false, false                                                     
>                                                                  +- 
> LocalLimit 21
> !      :- LocalLimit 21                                                       
>                                                                     +- 
> Project [item_id#647L]
> !      :  +- Project [item_id#647L]                                           
>                                                                        +- 
> Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))
> !      :     +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) 
> AND (grass_region#735 = BR)) AND isnotnull(grass_region#735))               
> +- Relation db.table[,... 91 more fields] parquet
> !      :        +- Relation db.table[,... 91 more fields] parquet
> !      +- LocalLimit 21
> !         +- Project [item_id#738L]
> !            +- LocalRelation , [, ... 91 more fields]
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian 
> Products has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no 
> effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> NormalizeFloatingNumbers has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> ReplaceUpdateFieldsExpression has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only 
> Query has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has 
> no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from 
> PartitionPruning has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that 
> cannot be pushed down has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
>  GlobalLimit 21                                                               
>                                                            GlobalLimit 21
> !+- LocalLimit 21                                                             
>                                                            +- LocalLimit 
> least(, ... 2 more fields)
> !   +- LocalLimit 21                                                          
>                                                               +- Project 
> [item_id#647L]
> !      +- Project [item_id#647L]                                              
>                                                                  +- Filter 
> (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
> BR)) AND isnotnull(grass_region#735))
> !         +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))            +- 
> Relation db.table[,... 91 more fields] parquet
> !            +- Relation db.table[,... 91 more fields] parquet
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48266) Move o.a.spark.sql.connect.dsl to test dir

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48266.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46567
[https://github.com/apache/spark/pull/46567]

> Move o.a.spark.sql.connect.dsl to test dir
> --
>
> Key: SPARK-48266
> URL: https://issues.apache.org/jira/browse/SPARK-48266
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48268:
---
Labels: pull-request-available  (was: )

> Add a configuration for SparkContext.setCheckpointDir
> -
>
> Key: SPARK-48268
> URL: https://issues.apache.org/jira/browse/SPARK-48268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48241) CSV parsing failure with char/varchar type columns

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48241.
-
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46565
[https://github.com/apache/spark/pull/46565]

> CSV parsing failure with char/varchar type columns
> --
>
> Key: SPARK-48241
> URL: https://issues.apache.org/jira/browse/SPARK-48241
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jiayi Liu
>Assignee: Jiayi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> CSV table containing char and varchar columns will result in the following 
> error when selecting from the CSV table:
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).
>     at scala.Predef$.require(Predef.scala:281)
>     at 
> org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
> The reason for the error is that the StringType columns in the dataSchema and 
> requiredSchema of UnivocityParser are not consistent. It is due to the 
> metadata contained in the StringType StructField of the dataSchema, which is 
> missing in the requiredSchema. We need to retain the metadata when resolving 
> schema.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48241) CSV parsing failure with char/varchar type columns

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48241:
---

Assignee: Jiayi Liu

> CSV parsing failure with char/varchar type columns
> --
>
> Key: SPARK-48241
> URL: https://issues.apache.org/jira/browse/SPARK-48241
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jiayi Liu
>Assignee: Jiayi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> CSV table containing char and varchar columns will result in the following 
> error when selecting from the CSV table:
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).
>     at scala.Predef$.require(Predef.scala:281)
>     at 
> org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
> The reason for the error is that the StringType columns in the dataSchema and 
> requiredSchema of UnivocityParser are not consistent. It is due to the 
> metadata contained in the StringType StructField of the dataSchema, which is 
> missing in the requiredSchema. We need to retain the metadata when resolving 
> schema.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-13 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48268:


 Summary: Add a configuration for SparkContext.setCheckpointDir
 Key: SPARK-48268
 URL: https://issues.apache.org/jira/browse/SPARK-48268
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48258:
---
Labels: pull-request-available  (was: )

> Implement DataFrame.checkpoint and DataFrame.localCheckpoint
> 
>
> Key: SPARK-48258
> URL: https://issues.apache.org/jira/browse/SPARK-48258
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature 
> parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48261) RoundRobin based coalesce in spark

2024-05-13 Thread Subham Singhal (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subham Singhal resolved SPARK-48261.

Resolution: Abandoned

> RoundRobin based coalesce in spark
> --
>
> Key: SPARK-48261
> URL: https://issues.apache.org/jira/browse/SPARK-48261
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Subham Singhal
>Priority: Minor
>
> Currently default coalsce does not take partition size into account and 
> simply merges partitions. This often results in non-uniform data 
> distribution. There have been proposal for size based 
> coalesce([https://github.com/apache/spark/pull/27248).|https://github.com/apache/spark/pull/27248),]
> I am proposing a custom roundrobin coalesce which will distribute data evenly 
> across partitions within same executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48209) Common (java side): Migrate `error/warn/info` with variables to structured logging framework

2024-05-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-48209:
--

Assignee: BingKun Pan

> Common (java side): Migrate `error/warn/info` with variables to structured 
> logging framework
> 
>
> Key: SPARK-48209
> URL: https://issues.apache.org/jira/browse/SPARK-48209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48209) Common (java side): Migrate `error/warn/info` with variables to structured logging framework

2024-05-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-48209.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46493
[https://github.com/apache/spark/pull/46493]

> Common (java side): Migrate `error/warn/info` with variables to structured 
> logging framework
> 
>
> Key: SPARK-48209
> URL: https://issues.apache.org/jira/browse/SPARK-48209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48267) Regression e2e test with SPARK-47305

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48267:
---
Labels: pull-request-available  (was: )

> Regression e2e test with SPARK-47305
> 
>
> Key: SPARK-48267
> URL: https://issues.apache.org/jira/browse/SPARK-48267
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
>
> In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the 
> issue and verify the fix works, but the scope of test was just to bugfix 
> itself because we had no clear idea of (simpler) reproducer for e2e example.
> We finally came up with simple reproducer which is e2e streaming query. We'd 
> like to put this reproducer into test as regression test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48267) Regression e2e test with SPARK-47305

2024-05-13 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-48267:


 Summary: Regression e2e test with SPARK-47305
 Key: SPARK-48267
 URL: https://issues.apache.org/jira/browse/SPARK-48267
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Jungtaek Lim


In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the 
issue and verify the fix works, but the scope of test was just to bugfix itself 
because we had no clear idea of (simpler) reproducer for e2e example.

We finally came up with simple reproducer which is e2e streaming query. We'd 
like to put this reproducer into test as regression test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48265:
---
Labels: pull-request-available  (was: )

> Infer window group limit batch should do constant folding
> -
>
> Key: SPARK-48265
> URL: https://issues.apache.org/jira/browse/SPARK-48265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Result of Batch LocalRelation ===
>  GlobalLimit 21                                                               
>                                                               GlobalLimit 21
>  +- LocalLimit 21                                                             
>                                                               +- LocalLimit 21
> !   +- Union false, false                                                     
>                                                                  +- 
> LocalLimit 21
> !      :- LocalLimit 21                                                       
>                                                                     +- 
> Project [item_id#647L]
> !      :  +- Project [item_id#647L]                                           
>                                                                        +- 
> Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))
> !      :     +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) 
> AND (grass_region#735 = BR)) AND isnotnull(grass_region#735))               
> +- Relation db.table[,... 91 more fields] parquet
> !      :        +- Relation db.table[,... 91 more fields] parquet
> !      +- LocalLimit 21
> !         +- Project [item_id#738L]
> !            +- LocalRelation , [, ... 91 more fields]
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian 
> Products has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no 
> effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> NormalizeFloatingNumbers has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
> ReplaceUpdateFieldsExpression has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only 
> Query has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has 
> no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from 
> PartitionPruning has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that 
> cannot be pushed down has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs 
> has no effect.
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
>  GlobalLimit 21                                                               
>                                                            GlobalLimit 21
> !+- LocalLimit 21                                                             
>                                                            +- LocalLimit 
> least(, ... 2 more fields)
> !   +- LocalLimit 21                                                          
>                                                               +- Project 
> [item_id#647L]
> !      +- Project [item_id#647L]                                              
>                                                                  +- Filter 
> (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
> BR)) AND isnotnull(grass_region#735))
> !         +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
> (grass_region#735 = BR)) AND isnotnull(grass_region#735))            +- 
> Relation db.table[,... 91 more fields] parquet
> !            +- Relation db.table[,... 91 more fields] parquet
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread angerszhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-48265:
--
Description: 
{code:java}
24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
=== Result of Batch LocalRelation ===
 GlobalLimit 21                                                                 
                                                            GlobalLimit 21
 +- LocalLimit 21                                                               
                                                            +- LocalLimit 21
!   +- Union false, false                                                       
                                                               +- LocalLimit 21
!      :- LocalLimit 21                                                         
                                                                  +- Project 
[item_id#647L]
!      :  +- Project [item_id#647L]                                             
                                                                     +- Filter 
(((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
BR)) AND isnotnull(grass_region#735))
!      :     +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
(grass_region#735 = BR)) AND isnotnull(grass_region#735))               +- 
Relation db.table[,... 91 more fields] parquet
!      :        +- Relation db.table[,... 91 more fields] parquet
!      +- LocalLimit 21
!         +- Project [item_id#738L]
!            +- LocalRelation , [, ... 91 more fields]
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian Products 
has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no 
effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch NormalizeFloatingNumbers 
has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch 
ReplaceUpdateFieldsExpression has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only 
Query has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has no 
effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter has 
no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from 
PartitionPruning has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that 
cannot be pushed down has no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs has 
no effect.
24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
 GlobalLimit 21                                                                 
                                                         GlobalLimit 21
!+- LocalLimit 21                                                               
                                                         +- LocalLimit least(, 
... 2 more fields)
!   +- LocalLimit 21                                                            
                                                            +- Project 
[item_id#647L]
!      +- Project [item_id#647L]                                                
                                                               +- Filter 
(((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = 
BR)) AND isnotnull(grass_region#735))
!         +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND 
(grass_region#735 = BR)) AND isnotnull(grass_region#735))            +- 
Relation db.table[,... 91 more fields] parquet
!            +- Relation db.table[,... 91 more fields] parquet
 {code}

> Infer window group limit batch should do constant folding
> -
>
> Key: SPARK-48265
> URL: https://issues.apache.org/jira/browse/SPARK-48265
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> 24/05/13 17:39:25 ERROR [main] PlanChangeLogger:
> === Result of Batch LocalRelation ===
>  GlobalLimit 21                                                               
>                                                               GlobalLimit 21
>  +- LocalLimit 21                                                             
>                                                               +- LocalLimit 21
> !   +- Union false, false                                                     
>                                                                  +- 
> LocalLimit 21
> !      :- LocalLimit 21                                                       
>                                                                     +- 
> Project [item_id#647L]
> !      :  +- Project [item_id#647L]           

[jira] [Created] (SPARK-48265) Infer window group limit batch should do constant folding

2024-05-13 Thread angerszhu (Jira)
angerszhu created SPARK-48265:
-

 Summary: Infer window group limit batch should do constant folding
 Key: SPARK-48265
 URL: https://issues.apache.org/jira/browse/SPARK-48265
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.1, 4.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48259) Add 3 missing methods in dsl

2024-05-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48259.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46559
[https://github.com/apache/spark/pull/46559]

> Add 3 missing methods in dsl
> 
>
> Key: SPARK-48259
> URL: https://issues.apache.org/jira/browse/SPARK-48259
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48264) Upgrade `datasketches-java` to 6.0.0

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48264:
---
Labels: pull-request-available  (was: )

> Upgrade `datasketches-java` to 6.0.0
> 
>
> Key: SPARK-48264
> URL: https://issues.apache.org/jira/browse/SPARK-48264
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite

2024-05-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-48260:
--

Assignee: Gengliang Wang

> disable output committer coordination in one test of ParquetIOSuite
> ---
>
> Key: SPARK-48260
> URL: https://issues.apache.org/jira/browse/SPARK-48260
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite

2024-05-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-48260.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46562
[https://github.com/apache/spark/pull/46562]

> disable output committer coordination in one test of ParquetIOSuite
> ---
>
> Key: SPARK-48260
> URL: https://issues.apache.org/jira/browse/SPARK-48260
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2024-05-13 Thread Holden Karau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau reassigned SPARK-44953:


Assignee: binjie yang

> Log a warning (or automatically disable) when shuffle tracking is enabled 
> along side another DA supported mechanism
> ---
>
> Key: SPARK-44953
> URL: https://issues.apache.org/jira/browse/SPARK-44953
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Holden Karau
>Assignee: binjie yang
>Priority: Major
>  Labels: pull-request-available
>
> Some people enable both shuffle tracking and another mechanism (like 
> migration) and then are confused when their jobs don't scale down.
>  
> We should at least log a warning here (or automatically disable shuffle 
> tracking?) when it is configured alongside another DA supported mechanism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2024-05-13 Thread Holden Karau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau resolved SPARK-44953.
--
Resolution: Fixed

> Log a warning (or automatically disable) when shuffle tracking is enabled 
> along side another DA supported mechanism
> ---
>
> Key: SPARK-44953
> URL: https://issues.apache.org/jira/browse/SPARK-44953
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Holden Karau
>Assignee: binjie yang
>Priority: Major
>  Labels: pull-request-available
>
> Some people enable both shuffle tracking and another mechanism (like 
> migration) and then are confused when their jobs don't scale down.
>  
> We should at least log a warning here (or automatically disable shuffle 
> tracking?) when it is configured alongside another DA supported mechanism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48233) Tests for non-stateful streaming with collations

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48233:
---
Labels: pull-request-available  (was: )

> Tests for non-stateful streaming with collations
> 
>
> Key: SPARK-48233
> URL: https://issues.apache.org/jira/browse/SPARK-48233
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33752) Avoid the getSimpleMessage of AnalysisException adds semicolon repeatedly

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-33752:
---
Labels: pull-request-available  (was: )

> Avoid the getSimpleMessage of AnalysisException adds semicolon repeatedly
> -
>
> Key: SPARK-33752
> URL: https://issues.apache.org/jira/browse/SPARK-33752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
>
> The current getSimpleMessage of AnalysisException may adds semicolon 
> repeatedly. There show an example below:
> {code:java}
> select decode()
> {code}
> The output will be:
> {code:java}
> org.apache.spark.sql.AnalysisException
> Invalid number of arguments for function decode. Expected: 2; Found: 0;; line 
> 1 pos 7
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48263) Collate expression not working when default collation set

2024-05-13 Thread Nebojsa Savic (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846035#comment-17846035
 ] 

Nebojsa Savic commented on SPARK-48263:
---

Working on it.

> Collate expression not working when default collation set
> -
>
> Key: SPARK-48263
> URL: https://issues.apache.org/jira/browse/SPARK-48263
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nebojsa Savic
>Priority: Major
>
> When default collation level config is set to some collation other than 
> UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or 
> collation) expression, this will fail because it is only accepting 
> StringType(0) as argument for collation name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47008) Spark to support S3 Express One Zone Storage

2024-05-13 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846014#comment-17846014
 ] 

Steve Loughran commented on SPARK-47008:


yes, that looks like it. real PITA this feature, though apparently its there to 
let you know that you have outstanding uploads to purge -no lifecycle rules, 
see.

FWIW you can explictly create the real situation with a touch command under a 
__magic path:

{code}
hadoop fs -touch s3a://stevel--usw1-az2--x-s3/cli/__magic/__base/d/file.txt
{code}

this creates an incomplete upload under /cli//d/file.txt

> Spark to support S3 Express One Zone Storage
> 
>
> Key: SPARK-47008
> URL: https://issues.apache.org/jira/browse/SPARK-47008
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> Hadoop 3.4.0 adds support for AWS S3 Express One Zone Storage.
> Most of this is transparent. However, one aspect which can surface as an 
> issue is that these stores report prefixes in a listing when there are 
> pending uploads, *even when there are no files underneath*
> This leads to a situation where a listStatus of a path returns a list of file 
> status entries which appears to contain one or more directories -but a 
> listStatus on that path raises a FileNotFoundException: there is nothing 
> there.
> HADOOP-18996 handles this in all of hadoop code, including FileInputFormat, 
> A filesystem can now be probed for inconsistent directoriy listings through 
> {{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}
> If true, then treewalking code SHOULD NOT report a failure if, when walking 
> into a subdirectory, a list/getFileStatus on that directory raises a 
> FileNotFoundException.
> Although most of this is handled in the hadoop code, but there some places 
> where treewalking is done inside spark These need to be identified and make 
> resilient to failure on the recurse down the tree
> * SparkHadoopUtil list methods , 
> * especially listLeafStatuses used by OrcFileOperator
> org.apache.spark.util.Utils#fetchHcfsFile
> {{org.apache.hadoop.fs.FileUtil.maybeIgnoreMissingDirectory()}} can assist 
> here, or the logic can be replicated. Using the hadoop implementation would 
> be better from a maintenance perspective



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48263) Collate expression not working when default collation set

2024-05-13 Thread Nebojsa Savic (Jira)
Nebojsa Savic created SPARK-48263:
-

 Summary: Collate expression not working when default collation set
 Key: SPARK-48263
 URL: https://issues.apache.org/jira/browse/SPARK-48263
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Nebojsa Savic


When default collation level config is set to some collation other than 
UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or 
collation) expression, this will fail because it is only accepting 
StringType(0) as argument for collation name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48206.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46492
[https://github.com/apache/spark/pull/46492]

> Add tests for window expression rewrites in RewriteWithExpression
> -
>
> Key: SPARK-48206
> URL: https://issues.apache.org/jira/browse/SPARK-48206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Window expressions can be potentially problematic if we pull out a window 
> expression outside a `Window` operator. Right now this shouldn't happen but 
> we should add some tests to make sure it doesn't break.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48206:
---

Assignee: Kelvin Jiang

> Add tests for window expression rewrites in RewriteWithExpression
> -
>
> Key: SPARK-48206
> URL: https://issues.apache.org/jira/browse/SPARK-48206
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Window expressions can be potentially problematic if we pull out a window 
> expression outside a `Window` operator. Right now this shouldn't happen but 
> we should add some tests to make sure it doesn't break.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48215) DateFormatClass (all collations)

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48215:
---
Labels: pull-request-available  (was: )

> DateFormatClass (all collations)
> 
>
> Key: SPARK-48215
> URL: https://issues.apache.org/jira/browse/SPARK-48215
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *DateFormatClass* built-in function in 
> Spark. First confirm what is the expected behaviour for this expression when 
> given collated strings, and then move on to implementation and testing. You 
> will find this expression in the *datetimeExpressions.scala* file, and it 
> should be considered a pass-through function with respect to collation 
> awareness. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *DateFormatClass* 
> expression so that it supports all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48031) Add schema evolution options to views

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48031:
---

Assignee: Serge Rielau

> Add schema evolution options to views 
> --
>
> Key: SPARK-48031
> URL: https://issues.apache.org/jira/browse/SPARK-48031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> We want to provide the ability for views to react to changes in the query 
> resolution in manners differently than just failing the view.
> For example we want the view to be able to compensate for type changes by 
> casting the query result to the view column types.
> Or to adopt any type of column arity changes into a view.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48031) Add schema evolution options to views

2024-05-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48031.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46267
[https://github.com/apache/spark/pull/46267]

> Add schema evolution options to views 
> --
>
> Key: SPARK-48031
> URL: https://issues.apache.org/jira/browse/SPARK-48031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We want to provide the ability for views to react to changes in the query 
> resolution in manners differently than just failing the view.
> For example we want the view to be able to compensate for type changes by 
> casting the query result to the view column types.
> Or to adopt any type of column arity changes into a view.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48257) Polish POM for Hive dependencies

2024-05-13 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-48257.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46558
[https://github.com/apache/spark/pull/46558]

> Polish POM for Hive dependencies
> 
>
> Key: SPARK-48257
> URL: https://issues.apache.org/jira/browse/SPARK-48257
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48257) Polish POM for Hive dependencies

2024-05-13 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-48257:


Assignee: Cheng Pan

> Polish POM for Hive dependencies
> 
>
> Key: SPARK-48257
> URL: https://issues.apache.org/jira/browse/SPARK-48257
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48043) Kryo serialization issue with push-based shuffle

2024-05-13 Thread Romain Ardiet (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain Ardiet updated SPARK-48043:
--
Description: 
I'm running a spark job on AWS EMR. I wanted to test the new push-based shuffle 
introduced in Spark 3.2 but it's failing with a kryo exception when I'm 
enabling it.

The issue is happening when Executor starts, during 
KryoSerializerInstance.getAutoReset() check:
{code:java}
24/04/24 15:36:22 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting 
due to : Unable to create executor due to Failed to register classes with Kryo
org.apache.spark.SparkException: Failed to register classes with Kryo
at 
org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:186)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
~[scala-library-2.12.15.jar:?]
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:241) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:174) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:105)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
 ~[kryo-shaded-4.0.2.jar:?]
at 
org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:112)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:352)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.serializer.KryoSerializerInstance.getAutoReset(KryoSerializer.scala:452)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:259)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:255)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.util.Utils$.serializerIsSupported$lzycompute$1(Utils.scala:2721)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.serializerIsSupported$1(Utils.scala:2716) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2730) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:554) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.executor.Executor.(Executor.scala:143) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:190)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_402]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_402]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402]
Caused by: java.lang.ClassNotFoundException: com.analytics.AnalyticsEventWrapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_402]
at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_402]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) 
~[?:1.8.0_402]
at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_402]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_402]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_402]
at org.apache.spark.util.Utils$.classForName(Utils.scala:228) 
~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at 
org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$6(KryoSerializer.scala:177)
 ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
~[scala-library-2.12.15.jar:?]
at 
scala.collection.mutable.ResizableArray.foreach$(Resizab

[jira] [Created] (SPARK-48262) Substitute BinaryExpression for explicit Expressions in CollationTypeCast

2024-05-13 Thread Mihailo Milosevic (Jira)
Mihailo Milosevic created SPARK-48262:
-

 Summary: Substitute BinaryExpression for explicit Expressions in 
CollationTypeCast
 Key: SPARK-48262
 URL: https://issues.apache.org/jira/browse/SPARK-48262
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48231) Remove unused CodeHaus Jackson dependencies

2024-05-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48231:
--
Parent: (was: SPARK-47046)
Issue Type: Bug  (was: Sub-task)

> Remove unused CodeHaus Jackson dependencies
> ---
>
> Key: SPARK-48231
> URL: https://issues.apache.org/jira/browse/SPARK-48231
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48230) Remove unused jodd-core

2024-05-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48230:
--
Parent: (was: SPARK-47046)
Issue Type: Bug  (was: Sub-task)

> Remove unused jodd-core
> ---
>
> Key: SPARK-48230
> URL: https://issues.apache.org/jira/browse/SPARK-48230
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48261) RoundRobin based coalesce in spark

2024-05-13 Thread Subham Singhal (Jira)
Subham Singhal created SPARK-48261:
--

 Summary: RoundRobin based coalesce in spark
 Key: SPARK-48261
 URL: https://issues.apache.org/jira/browse/SPARK-48261
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.1
Reporter: Subham Singhal


Currently default coalsce does not take partition size into account and simply 
merges partitions. This often results in non-uniform data distribution. There 
have been proposal for size based 
coalesce([https://github.com/apache/spark/pull/27248).|https://github.com/apache/spark/pull/27248),]

I am proposing a custom roundrobin coalesce which will distribute data evenly 
across partitions within same executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48260:
---
Labels: pull-request-available  (was: )

> disable output committer coordination in one test of ParquetIOSuite
> ---
>
> Key: SPARK-48260
> URL: https://issues.apache.org/jira/browse/SPARK-48260
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite

2024-05-13 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48260:
---

 Summary: disable output committer coordination in one test of 
ParquetIOSuite
 Key: SPARK-48260
 URL: https://issues.apache.org/jira/browse/SPARK-48260
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48215) DateFormatClass (all collations)

2024-05-13 Thread Nebojsa Savic (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845880#comment-17845880
 ] 

Nebojsa Savic commented on SPARK-48215:
---

Starting work.

> DateFormatClass (all collations)
> 
>
> Key: SPARK-48215
> URL: https://issues.apache.org/jira/browse/SPARK-48215
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Enable collation support for the *DateFormatClass* built-in function in 
> Spark. First confirm what is the expected behaviour for this expression when 
> given collated strings, and then move on to implementation and testing. You 
> will find this expression in the *datetimeExpressions.scala* file, and it 
> should be considered a pass-through function with respect to collation 
> awareness. Implement the corresponding E2E SQL tests 
> (CollationSQLExpressionsSuite) to reflect how this function should be used 
> with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor 
> to experiment with the existing functions to learn more about how they work. 
> In addition, look into the possible use-cases and implementation of similar 
> functions within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *DateFormatClass* 
> expression so that it supports all collation types currently supported in 
> Spark. To understand what changes were introduced in order to enable full 
> collation support for other existing functions in Spark, take a look at the 
> Spark PRs and Jira tickets for completed tasks in this parent (for example: 
> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48259) Add 3 missing methods in dsl

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48259:
---
Labels: pull-request-available  (was: )

> Add 3 missing methods in dsl
> 
>
> Key: SPARK-48259
> URL: https://issues.apache.org/jira/browse/SPARK-48259
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48259) Add 3 missing methods in dsl

2024-05-13 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-48259:
-

 Summary: Add 3 missing methods in dsl
 Key: SPARK-48259
 URL: https://issues.apache.org/jira/browse/SPARK-48259
 Project: Spark
  Issue Type: Test
  Components: Connect, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint

2024-05-13 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48258:


 Summary: Implement DataFrame.checkpoint and 
DataFrame.localCheckpoint
 Key: SPARK-48258
 URL: https://issues.apache.org/jira/browse/SPARK-48258
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature 
parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48254.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46555
[https://github.com/apache/spark/pull/46555]

> Enhance Guava version extraction rule in dev/test-dependencies.sh
> -
>
> Key: SPARK-48254
> URL: https://issues.apache.org/jira/browse/SPARK-48254
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48254:


Assignee: Cheng Pan

> Enhance Guava version extraction rule in dev/test-dependencies.sh
> -
>
> Key: SPARK-48254
> URL: https://issues.apache.org/jira/browse/SPARK-48254
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48257) Polish POM for Hive dependencies

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48257:
---
Labels: pull-request-available  (was: )

> Polish POM for Hive dependencies
> 
>
> Key: SPARK-48257
> URL: https://issues.apache.org/jira/browse/SPARK-48257
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48257) Polish POM for Hive dependencies

2024-05-13 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-48257:
-

 Summary: Polish POM for Hive dependencies
 Key: SPARK-48257
 URL: https://issues.apache.org/jira/browse/SPARK-48257
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48256:
---
Labels: pull-request-available  (was: )

> Add a rule to check file headers for the java side, and fix inconsistent files
> --
>
> Key: SPARK-48256
> URL: https://issues.apache.org/jira/browse/SPARK-48256
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48255) Guava should not respect hadoop.deps.scope

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48255:
---
Labels: pull-request-available  (was: )

> Guava should not respect hadoop.deps.scope
> --
>
> Key: SPARK-48255
> URL: https://issues.apache.org/jira/browse/SPARK-48255
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files

2024-05-13 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-48256:
---

 Summary: Add a rule to check file headers for the java side, and 
fix inconsistent files
 Key: SPARK-48256
 URL: https://issues.apache.org/jira/browse/SPARK-48256
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47415:
--

Assignee: (was: Apache Spark)

> Levenshtein (all collations)
> 
>
> Key: SPARK-47415
> URL: https://issues.apache.org/jira/browse/SPARK-47415
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Levenshtein* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. 
> Implement the corresponding unit tests and E2E sql tests to reflect how this 
> function should be used with collation in SparkSQL, and feel free to use your 
> chosen Spark SQL Editor to experiment with the existing functions to learn 
> more about how they work. In addition, look into the possible use-cases and 
> implementation of similar functions within other other open-source DBMS, such 
> as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *Levenshtein* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47415:
--

Assignee: Apache Spark

> Levenshtein (all collations)
> 
>
> Key: SPARK-47415
> URL: https://issues.apache.org/jira/browse/SPARK-47415
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Levenshtein* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. 
> Implement the corresponding unit tests and E2E sql tests to reflect how this 
> function should be used with collation in SparkSQL, and feel free to use your 
> chosen Spark SQL Editor to experiment with the existing functions to learn 
> more about how they work. In addition, look into the possible use-cases and 
> implementation of similar functions within other other open-source DBMS, such 
> as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *Levenshtein* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47415:
--

Assignee: (was: Apache Spark)

> Levenshtein (all collations)
> 
>
> Key: SPARK-47415
> URL: https://issues.apache.org/jira/browse/SPARK-47415
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Levenshtein* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. 
> Implement the corresponding unit tests and E2E sql tests to reflect how this 
> function should be used with collation in SparkSQL, and feel free to use your 
> chosen Spark SQL Editor to experiment with the existing functions to learn 
> more about how they work. In addition, look into the possible use-cases and 
> implementation of similar functions within other other open-source DBMS, such 
> as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *Levenshtein* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47415:
--

Assignee: Apache Spark

> Levenshtein (all collations)
> 
>
> Key: SPARK-47415
> URL: https://issues.apache.org/jira/browse/SPARK-47415
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Levenshtein* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. 
> Implement the corresponding unit tests and E2E sql tests to reflect how this 
> function should be used with collation in SparkSQL, and feel free to use your 
> chosen Spark SQL Editor to experiment with the existing functions to learn 
> more about how they work. In addition, look into the possible use-cases and 
> implementation of similar functions within other other open-source DBMS, such 
> as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *Levenshtein* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48255) Guava should not respect hadoop.deps.scope

2024-05-13 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-48255:
-

 Summary: Guava should not respect hadoop.deps.scope
 Key: SPARK-48255
 URL: https://issues.apache.org/jira/browse/SPARK-48255
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48254:
--

Assignee: (was: Apache Spark)

> Enhance Guava version extraction rule in dev/test-dependencies.sh
> -
>
> Key: SPARK-48254
> URL: https://issues.apache.org/jira/browse/SPARK-48254
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-48254:
--

Assignee: Apache Spark

> Enhance Guava version extraction rule in dev/test-dependencies.sh
> -
>
> Key: SPARK-48254
> URL: https://issues.apache.org/jira/browse/SPARK-48254
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48254:
---
Labels: pull-request-available  (was: )

> Enhance Guava version extraction rule in dev/test-dependencies.sh
> -
>
> Key: SPARK-48254
> URL: https://issues.apache.org/jira/browse/SPARK-48254
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-48254:
-

 Summary: Enhance Guava version extraction rule in 
dev/test-dependencies.sh
 Key: SPARK-48254
 URL: https://issues.apache.org/jira/browse/SPARK-48254
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48253) Support default mode for Pandas API on Spark

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48253:
---
Labels: pull-request-available  (was: )

> Support default mode for Pandas API on Spark
> 
>
> Key: SPARK-48253
> URL: https://issues.apache.org/jira/browse/SPARK-48253
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> To reduce the communication cost between Python process and JVM, suggest to 
> support default mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48248:


Assignee: Hyukjin Kwon

> Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
> -
>
> Key: SPARK-48248
> URL: https://issues.apache.org/jira/browse/SPARK-48248
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
> >>>  True)
> >>> spark.createDataFrame(1, "a")
> DataFrame[_1: array>]
> {code}
> should infer it as an integer of array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48248.
--
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46548
[https://github.com/apache/spark/pull/46548]

> Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
> -
>
> Key: SPARK-48248
> URL: https://issues.apache.org/jira/browse/SPARK-48248
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>
> {code}
> >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
> >>>  True)
> >>> spark.createDataFrame(1, "a")
> DataFrame[_1: array>]
> {code}
> should infer it as an integer of array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48250) Enable array inference tests at test_parity_types.py

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48250.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46550
[https://github.com/apache/spark/pull/46550]

> Enable array inference tests at test_parity_types.py
> 
>
> Key: SPARK-48250
> URL: https://issues.apache.org/jira/browse/SPARK-48250
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Some tests in test_types.py are using RDD unnecessarily. We can remove that 
> to enable some tests with Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48252:
---
Labels: pull-request-available  (was: )

> Update CommonExpressionRef when necessary
> -
>
> Key: SPARK-48252
> URL: https://issues.apache.org/jira/browse/SPARK-48252
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48253) Support default mode for Pandas API on Spark

2024-05-13 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-48253:
---

 Summary: Support default mode for Pandas API on Spark
 Key: SPARK-48253
 URL: https://issues.apache.org/jira/browse/SPARK-48253
 Project: Spark
  Issue Type: Bug
  Components: Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


To reduce the communication cost between Python process and JVM, suggest to 
support default mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-13 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-48252:
---

 Summary: Update CommonExpressionRef when necessary
 Key: SPARK-48252
 URL: https://issues.apache.org/jira/browse/SPARK-48252
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48251) Disable `maven local cache` on GA's step `MIMA test`

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48251:
---
Labels: pull-request-available  (was: )

> Disable `maven local cache` on GA's step `MIMA test`
> 
>
> Key: SPARK-48251
> URL: https://issues.apache.org/jira/browse/SPARK-48251
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48251) Disable `maven local cache` on GA's step `MIMA test`

2024-05-13 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-48251:

Summary: Disable `maven local cache` on GA's step `MIMA test`  (was: 
Disable `maven local cache` on step `MIMA test` of the GA's job `lint`)

> Disable `maven local cache` on GA's step `MIMA test`
> 
>
> Key: SPARK-48251
> URL: https://issues.apache.org/jira/browse/SPARK-48251
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48251) Disable `maven local cache` on step `MIMA test` of the GA's job `lint`

2024-05-13 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-48251:
---

 Summary: Disable `maven local cache` on step `MIMA test` of the 
GA's job `lint`
 Key: SPARK-48251
 URL: https://issues.apache.org/jira/browse/SPARK-48251
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org