[jira] [Created] (SPARK-47947) Add AssertDataFrameEquality util function for scala

2024-04-22 Thread Anh Tuan Pham (Jira)
Anh Tuan Pham created SPARK-47947:
-

 Summary: Add AssertDataFrameEquality util function for scala
 Key: SPARK-47947
 URL: https://issues.apache.org/jira/browse/SPARK-47947
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.5.2
Reporter: Anh Tuan Pham
 Fix For: 3.5.2






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47943:
-

Assignee: Zhou JIANG

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47943.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 7
[https://github.com/apache/spark-kubernetes-operator/pull/7]

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47929) Setup Static Analysis for Operator

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47929.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 6
[https://github.com/apache/spark-kubernetes-operator/pull/6]

> Setup Static Analysis for Operator
> --
>
> Key: SPARK-47929
> URL: https://issues.apache.org/jira/browse/SPARK-47929
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add common analysis tasks including checkstyle, spotbugs, jacoco. Also 
> include spotless for style fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47938.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46164
[https://github.com/apache/spark/pull/46164]

> MsSQLServer: Cannot find data type BYTE error
> -
>
> Key: SPARK-47938
> URL: https://issues.apache.org/jira/browse/SPARK-47938
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47600) MLLib: Migrate logInfo with variables to structured logging framework

2024-04-22 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47600.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46151
[https://github.com/apache/spark/pull/46151]

> MLLib: Migrate logInfo with variables to structured logging framework
> -
>
> Key: SPARK-47600
> URL: https://issues.apache.org/jira/browse/SPARK-47600
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47933:


Assignee: Hyukjin Kwon

> Parent Column class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-47933
> URL: https://issues.apache.org/jira/browse/SPARK-47933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47933.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46155
[https://github.com/apache/spark/pull/46155]

> Parent Column class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-47933
> URL: https://issues.apache.org/jira/browse/SPARK-47933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField

2024-04-22 Thread Junyoung Cho (Jira)
Junyoung Cho created SPARK-47946:


 Summary: Nested field's nullable value could be invalid after 
extracted using GetStructField
 Key: SPARK-47946
 URL: https://issues.apache.org/jira/browse/SPARK-47946
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.4.2
Reporter: Junyoung Cho


I've got error when append to table using DataFrameWriterV2.

The error was occured in TableOutputResolver.checkNullability. This error 
occurs when the data type of the schema is the same, but the order of the 
fields is different.

I found that GetStructField.nullable returns unexpected result.
{code:java}
override def nullable: Boolean = child.nullable || 
childSchema(ordinal).nullable {code}
Even if nested field has not nullability attribute, it returns true when parent 
struct has nullability attribute.
||Parent nullability||Child nullability||Result||
|true|true|true|
|{color:#FF}true{color}|{color:#FF}false{color}|{color:#FF}true{color}|
|{color:#FF}false{color}|{color:#FF}true{color}|{color:#FF}true{color}|
|false|false|false|

 

I think the logic should be changed to AND operation, because both of parent 
and child should be nullable to be considered nullable.

 

I want to check current logic is reasonable, or my suggestion can occur other 
side effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47899) StageFailed event should attach the exception chain

2024-04-22 Thread Arjun Sahoo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arjun Sahoo updated SPARK-47899:

Affects Version/s: 3.5.1

> StageFailed event should attach the exception chain
> ---
>
> Key: SPARK-47899
> URL: https://issues.apache.org/jira/browse/SPARK-47899
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Affects Versions: 3.4.0, 3.5.1
>Reporter: Arjun Sahoo
>Assignee: BingKun Pan
>Priority: Minor
>
> As part of SPARK-39195, task is marked as failed but the exception chain was 
> not sent, ultimately the cause becomes `null` in SparkException. It is not 
> convenient to find the root cause from the detailed message.
> {code}
>   /**
>* Called by the OutputCommitCoordinator to cancel stage due to data 
> duplication may happen.
>*/
>   private[scheduler] def stageFailed(stageId: Int, reason: String): Unit = {
> eventProcessLoop.post(StageFailed(stageId, reason, None))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47943:
---
Labels: pull-request-available  (was: )

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47945) MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47945:
---
Labels: pull-request-available  (was: )

> MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server 
> and add tests
> --
>
> Key: SPARK-47945
> URL: https://issues.apache.org/jira/browse/SPARK-47945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47945) MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests

2024-04-22 Thread Kent Yao (Jira)
Kent Yao created SPARK-47945:


 Summary: MsSQLServer: Document Mapping Spark SQL Data Types from 
Microsoft SQL Server and add tests
 Key: SPARK-47945
 URL: https://issues.apache.org/jira/browse/SPARK-47945
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47937) Fix docstring of `hll_sketch_agg`

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47937.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46163
[https://github.com/apache/spark/pull/46163]

> Fix docstring of `hll_sketch_agg`
> -
>
> Key: SPARK-47937
> URL: https://issues.apache.org/jira/browse/SPARK-47937
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47937) Fix docstring of `hll_sketch_agg`

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47937:
-

Assignee: Ruifeng Zheng

> Fix docstring of `hll_sketch_agg`
> -
>
> Key: SPARK-47937
> URL: https://issues.apache.org/jira/browse/SPARK-47937
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47903) Add remaining scalar types to the Python variant library

2024-04-22 Thread Harsh Motwani (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839897#comment-17839897
 ] 

Harsh Motwani commented on SPARK-47903:
---

Follow Up: Some changes from another branch were accidentally pushed during the 
late stages of this PR. Making another PR to resolve this problem.

> Add remaining scalar types to the Python variant library
> 
>
> Key: SPARK-47903
> URL: https://issues.apache.org/jira/browse/SPARK-47903
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Added support for reading the remaining scalar data types (binary, timestamp, 
> timestamp_ntz, date, float) to the Python Variant library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-22 Thread Zhou JIANG (Jira)
Zhou JIANG created SPARK-47943:
--

 Summary: Add Operator CI Task for Java Build and Test
 Key: SPARK-47943
 URL: https://issues.apache.org/jira/browse/SPARK-47943
 Project: Spark
  Issue Type: Sub-task
  Components: k8s
Affects Versions: kubernetes-operator-0.1.0
Reporter: Zhou JIANG


We need to add CI task to build and test Java code for upcoming operator pull 
requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47903) Add remaining scalar types to the Python variant library

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47903.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46122
[https://github.com/apache/spark/pull/46122]

> Add remaining scalar types to the Python variant library
> 
>
> Key: SPARK-47903
> URL: https://issues.apache.org/jira/browse/SPARK-47903
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Added support for reading the remaining scalar data types (binary, timestamp, 
> timestamp_ntz, date, float) to the Python Variant library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47903) Add remaining scalar types to the Python variant library

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47903:


Assignee: Harsh Motwani

> Add remaining scalar types to the Python variant library
> 
>
> Key: SPARK-47903
> URL: https://issues.apache.org/jira/browse/SPARK-47903
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Added support for reading the remaining scalar data types (binary, timestamp, 
> timestamp_ntz, date, float) to the Python Variant library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47904.
---
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46169
[https://github.com/apache/spark/pull/46169]

> Preserve case in Avro schema when using enableStableIdentifiersForUnionType
> ---
>
> Key: SPARK-47904
> URL: https://issues.apache.org/jira/browse/SPARK-47904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> When enableStableIdentifiersForUnionType is enabled, all of the types are 
> lowercased which creates a problem when field types are case-sensitive: 
> {code:java}
> Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
> Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new 
> Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code}
>  would become
> {code:java}
> struct>  {code}
> but instead should be 
> {code:java}
> struct>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47904:
--
Fix Version/s: 4.0.0

> Preserve case in Avro schema when using enableStableIdentifiersForUnionType
> ---
>
> Key: SPARK-47904
> URL: https://issues.apache.org/jira/browse/SPARK-47904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> When enableStableIdentifiersForUnionType is enabled, all of the types are 
> lowercased which creates a problem when field types are case-sensitive: 
> {code:java}
> Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
> Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new 
> Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code}
>  would become
> {code:java}
> struct>  {code}
> but instead should be 
> {code:java}
> struct>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47904:
-

Assignee: Ivan Sadikov

> Preserve case in Avro schema when using enableStableIdentifiersForUnionType
> ---
>
> Key: SPARK-47904
> URL: https://issues.apache.org/jira/browse/SPARK-47904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
>
> When enableStableIdentifiersForUnionType is enabled, all of the types are 
> lowercased which creates a problem when field types are case-sensitive: 
> {code:java}
> Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
> Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new 
> Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code}
>  would become
> {code:java}
> struct>  {code}
> but instead should be 
> {code:java}
> struct>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47942) Drop K8s v1.26 Support

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47942.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46168
[https://github.com/apache/spark/pull/46168]

> Drop K8s v1.26 Support
> --
>
> Key: SPARK-47942
> URL: https://issues.apache.org/jira/browse/SPARK-47942
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47942) Drop K8s v1.26 Support

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47942:
-

Assignee: Dongjoon Hyun

> Drop K8s v1.26 Support
> --
>
> Key: SPARK-47942
> URL: https://issues.apache.org/jira/browse/SPARK-47942
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47907) Put removal of '!' as a synonym for 'NOT' on a keyword level under a config

2024-04-22 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-47907:
--

Assignee: Serge Rielau

> Put removal of '!' as a synonym for 'NOT' on a keyword level under a config
> ---
>
> Key: SPARK-47907
> URL: https://issues.apache.org/jira/browse/SPARK-47907
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> Recently we dissolved the lexer equivalence between '!' and 'NOT'.
> ! is a prefix operator and a synonym for NOT only in that case.
> But NOT is used in many more cases in the grammar.
> Given that there are a handful of known scenearios where users have exploited 
> the undocumented loophole it's best to add a config.
> Usage found so far is:
> `c1 ! IN(1, 2)`
> `c1 ! BETWEEN 1 AND 2`
> `c1 ! LIKE 'a%'`
>  But there are worse cases:
> c1 IS ! NULL
> CREATE TABLE T(c1 INT ! NULL)
> or even
> CREATE TABLE IF ! EXISTS T(c1 INT)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47907) Put removal of '!' as a synonym for 'NOT' on a keyword level under a config

2024-04-22 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47907.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46138
[https://github.com/apache/spark/pull/46138]

> Put removal of '!' as a synonym for 'NOT' on a keyword level under a config
> ---
>
> Key: SPARK-47907
> URL: https://issues.apache.org/jira/browse/SPARK-47907
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Recently we dissolved the lexer equivalence between '!' and 'NOT'.
> ! is a prefix operator and a synonym for NOT only in that case.
> But NOT is used in many more cases in the grammar.
> Given that there are a handful of known scenearios where users have exploited 
> the undocumented loophole it's best to add a config.
> Usage found so far is:
> `c1 ! IN(1, 2)`
> `c1 ! BETWEEN 1 AND 2`
> `c1 ! LIKE 'a%'`
>  But there are worse cases:
> c1 IS ! NULL
> CREATE TABLE T(c1 INT ! NULL)
> or even
> CREATE TABLE IF ! EXISTS T(c1 INT)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47942) Drop K8s v1.26 Support

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47942:
---
Labels: pull-request-available  (was: )

> Drop K8s v1.26 Support
> --
>
> Key: SPARK-47942
> URL: https://issues.apache.org/jira/browse/SPARK-47942
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47942) Drop K8s v1.26 Support

2024-04-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47942:
-

 Summary: Drop K8s v1.26 Support
 Key: SPARK-47942
 URL: https://issues.apache.org/jira/browse/SPARK-47942
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47940:
--
Reporter: Cheng Pan  (was: Dongjoon Hyun)

> Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
> ---
>
> Key: SPARK-47940
> URL: https://issues.apache.org/jira/browse/SPARK-47940
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47940.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46167
[https://github.com/apache/spark/pull/46167]

> Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
> ---
>
> Key: SPARK-47940
> URL: https://issues.apache.org/jira/browse/SPARK-47940
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47941) Propagate ForeachBatch worker initialization errors to users for PySpark

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47941:
---
Labels: pull-request-available  (was: )

> Propagate ForeachBatch worker initialization errors to users for PySpark
> 
>
> Key: SPARK-47941
> URL: https://issues.apache.org/jira/browse/SPARK-47941
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Eric Marnadi
>Priority: Major
>  Labels: pull-request-available
>
> Ensure that errors and exceptions thrown during foreachBatch worker 
> initialization are propagated to the user, instead of just stderr. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47941) Propagate ForeachBatch worker initialization errors to users for PySpark

2024-04-22 Thread Eric Marnadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Marnadi updated SPARK-47941:
-
Summary: Propagate ForeachBatch worker initialization errors to users for 
PySpark  (was: Propagate ForeachBatch initialization errors to users)

> Propagate ForeachBatch worker initialization errors to users for PySpark
> 
>
> Key: SPARK-47941
> URL: https://issues.apache.org/jira/browse/SPARK-47941
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Eric Marnadi
>Priority: Major
>
> Ensure that errors and exceptions thrown during foreachBatch worker 
> initialization are propagated to the user, instead of just stderr. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47940:
---
Labels: pull-request-available  (was: )

> Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
> ---
>
> Key: SPARK-47940
> URL: https://issues.apache.org/jira/browse/SPARK-47940
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47940:
-

 Summary: Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
 Key: SPARK-47940
 URL: https://issues.apache.org/jira/browse/SPARK-47940
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47935:
-

Assignee: Ruifeng Zheng

> Pin pandas==2.0.3 for pypy3.8
> -
>
> Key: SPARK-47935
> URL: https://issues.apache.org/jira/browse/SPARK-47935
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47935.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46159
[https://github.com/apache/spark/pull/46159]

> Pin pandas==2.0.3 for pypy3.8
> -
>
> Key: SPARK-47935
> URL: https://issues.apache.org/jira/browse/SPARK-47935
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47010) Kubernetes: support csi driver for volume type

2024-04-22 Thread Oleg Frenkel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839707#comment-17839707
 ] 

Oleg Frenkel commented on SPARK-47010:
--

Posted question on Stackoverflow: 
https://stackoverflow.com/questions/78366961/apache-spark-supporting-csi-driver-for-volume-type

> Kubernetes: support csi driver for volume type
> --
>
> Key: SPARK-47010
> URL: https://issues.apache.org/jira/browse/SPARK-47010
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Oleg Frenkel
>Priority: Major
>
> Today Spark supports the following types of Kubernetes 
> [volumes|https://kubernetes.io/docs/concepts/storage/volumes/]: hostPath, 
> emptyDir, nfs and persistentVolumeClaim.
> In our case, Kubernetes cluster is multi-tenant and we cannot make 
> cluster-wide changes when deploying our application to the Kubernetes 
> cluster. Our application requires static shared file system. So, we cannot 
> use hostPath (don't have control of hosting VMs) and persistentVolumeClaim 
> (requires cluster-wide change when deploying PV). Our security department 
> does not allow nfs. 
> What would help in our case, is the use of csi driver (taken from here: 
> https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/deploy/example/e2e_usage.md#option3-inline-volume):
> {code:java}
> kind: Pod
> apiVersion: v1
> metadata:
>   name: nginx-azurefile-inline-volume
> spec:
>   nodeSelector:
> "kubernetes.io/os": linux
>   containers:
> - image: mcr.microsoft.com/oss/nginx/nginx:1.19.5
>   name: nginx-azurefile
>   command:
> - "/bin/bash"
> - "-c"
> - set -euo pipefail; while true; do echo $(date) >> 
> /mnt/azurefile/outfile; sleep 1; done
>   volumeMounts:
> - name: persistent-storage
>   mountPath: "/mnt/azurefile"
>   readOnly: false
>   volumes:
> - name: persistent-storage
>   csi:
> driver: file.csi.azure.com
> volumeAttributes:
>   shareName: EXISTING_SHARE_NAME  # required
>   secretName: azure-secret  # required
>   mountOptions: 
> "dir_mode=0777,file_mode=0777,cache=strict,actimeo=30,nosharesock"  # 
> optional {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47869) Upgrade built in hive to Hive-4.0

2024-04-22 Thread Cheng Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839669#comment-17839669
 ] 

Cheng Pan commented on SPARK-47869:
---

cross link SPARK-44114

> Upgrade built in hive to Hive-4.0
> -
>
> Key: SPARK-47869
> URL: https://issues.apache.org/jira/browse/SPARK-47869
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Simhadri Govindappa
>Priority: Major
>
> Hive 4.0 has been released. It brings in a lot of new features, bug fixes and 
> performance improvements. 
> We would like to update the version of hive used in spark to hive-4.0
> [https://lists.apache.org/thread/2jqpvsx8n801zb5pmlhb8f4zloq27p82] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error

2024-04-22 Thread Vladimir Golubev (Jira)
Vladimir Golubev created SPARK-47939:


 Summary: Parameterized queries fail for DESCRIBE & EXPLAIN w/ 
UNBOUND_SQL_PARAMETER error
 Key: SPARK-47939
 URL: https://issues.apache.org/jira/browse/SPARK-47939
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Vladimir Golubev


*Succeeds:* scala> spark.sql("select ?", Array(1)).show();

*Fails:* spark.sql("describe select ?", Array(1)).show();

*Fails:* spark.sql("explain select ?", Array(1)).show();

Failures are of the form:

org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
[UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` 
and provide a mapping of the parameter to either a SQL literal or collection 
constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 42P02; 
line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- OneRowRelation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47411:
---

Assignee: Milan Dankovic

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47411.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45643
[https://github.com/apache/spark/pull/45643]

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47938:
---
Labels: pull-request-available  (was: )

> MsSQLServer: Cannot find data type BYTE error
> -
>
> Key: SPARK-47938
> URL: https://issues.apache.org/jira/browse/SPARK-47938
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error

2024-04-22 Thread Kent Yao (Jira)
Kent Yao created SPARK-47938:


 Summary: MsSQLServer: Cannot find data type BYTE error
 Key: SPARK-47938
 URL: https://issues.apache.org/jira/browse/SPARK-47938
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"

2024-04-22 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-47928.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46150
[https://github.com/apache/spark/pull/46150]

> Speed up test "Add jar support Ivy URI in SQL"
> --
>
> Key: SPARK-47928
> URL: https://issues.apache.org/jira/browse/SPARK-47928
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"

2024-04-22 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-47928:


Assignee: Cheng Pan

> Speed up test "Add jar support Ivy URI in SQL"
> --
>
> Key: SPARK-47928
> URL: https://issues.apache.org/jira/browse/SPARK-47928
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47351) StringToMap & Mask (all collations)

2024-04-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47351:
-
Summary: StringToMap & Mask (all collations)  (was: StringToMap (all 
collations))

> StringToMap & Mask (all collations)
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47421) TBD

2024-04-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47421:
-
Summary: TBD  (was: Mask (all collations))

> TBD
> ---
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47937) Fix docstring of `hll_sketch_agg`

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47937:
---
Labels: pull-request-available  (was: )

> Fix docstring of `hll_sketch_agg`
> -
>
> Key: SPARK-47937
> URL: https://issues.apache.org/jira/browse/SPARK-47937
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47937) Fix docstring of `hll_sketch_agg`

2024-04-22 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-47937:
-

 Summary: Fix docstring of `hll_sketch_agg`
 Key: SPARK-47937
 URL: https://issues.apache.org/jira/browse/SPARK-47937
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47936:
--

Assignee: (was: Apache Spark)

> Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
> --
>
> Key: SPARK-47936
> URL: https://issues.apache.org/jira/browse/SPARK-47936
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47936:
--

Assignee: (was: Apache Spark)

> Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
> --
>
> Key: SPARK-47936
> URL: https://issues.apache.org/jira/browse/SPARK-47936
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47936:
---
Labels: pull-request-available  (was: )

> Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
> --
>
> Key: SPARK-47936
> URL: https://issues.apache.org/jira/browse/SPARK-47936
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47936:
--

Assignee: Apache Spark

> Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
> --
>
> Key: SPARK-47936
> URL: https://issues.apache.org/jira/browse/SPARK-47936
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47936:
--

Assignee: Apache Spark

> Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
> --
>
> Key: SPARK-47936
> URL: https://issues.apache.org/jira/browse/SPARK-47936
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47873:
--

Assignee: (was: Apache Spark)

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47873:
--

Assignee: Apache Spark

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules

2024-04-22 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-47936:
---

 Summary: Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` 
rules
 Key: SPARK-47936
 URL: https://issues.apache.org/jira/browse/SPARK-47936
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47351) StringToMap (all collations)

2024-04-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839550#comment-17839550
 ] 

Uroš Bojanić commented on SPARK-47351:
--

working on this

> StringToMap (all collations)
> 
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47935:
---
Labels: pull-request-available  (was: )

> Pin pandas==2.0.3 for pypy3.8
> -
>
> Key: SPARK-47935
> URL: https://issues.apache.org/jira/browse/SPARK-47935
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8

2024-04-22 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-47935:
-

 Summary: Pin pandas==2.0.3 for pypy3.8
 Key: SPARK-47935
 URL: https://issues.apache.org/jira/browse/SPARK-47935
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47350) SplitPart (binary & lowercase collation only)

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47350:
---
Labels: pull-request-available  (was: )

> SplitPart (binary & lowercase collation only)
> -
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47900) Fix check for implicit collation

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47900.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46116
[https://github.com/apache/spark/pull/46116]

> Fix check for implicit collation
> 
>
> Key: SPARK-47900
> URL: https://issues.apache.org/jira/browse/SPARK-47900
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47930) Upgrade RoaringBitmap to 1.0.6

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47930.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46152
[https://github.com/apache/spark/pull/46152]

> Upgrade RoaringBitmap to 1.0.6
> --
>
> Key: SPARK-47930
> URL: https://issues.apache.org/jira/browse/SPARK-47930
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47934:
---
Labels: pull-request-available  (was: )

> Inefficient Redirect Handling Due to Missing Trailing Slashes in URL 
> Redirection
> 
>
> Key: SPARK-47934
> URL: https://issues.apache.org/jira/browse/SPARK-47934
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3
>Reporter: huangzhir
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-04-22-15-14-13-468.png
>
>
> *Summary:*
> The current implementation of URL redirection in Spark's history web UI does 
> not consistently add trailing slashes to URLs when constructing redirection 
> targets. This inconsistency leads to additional HTTP redirects by Jetty, 
> which increases the load time and reduces the efficiency of the Spark UI.
> *Problem Description:*
> When constructing redirect URLs, particularly in scenarios where an attempt 
> ID needs to be appended, the system does not ensure that the base URL ends 
> with a slash. This omission results in the generated URL being redirected by 
> Jetty to add a trailing slash, thus causing an unnecessary additional HTTP 
> redirect.
> For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
> without a trailing slash before the attempt ID is appended, leading to two 
> redirects: one by our logic to add the attempt ID, and another by Jetty to 
> correct the missing slash. 
> !image-2024-04-22-15-14-13-468.png!
> *Proposed Solution:*
> [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]
> Ensure that all redirect URLs uniformly end with a trailing slash regardless 
> of whether an attempt ID is appended. This can be achieved by modifying the 
> URL construction logic as follows:
> ```scala
> val redirect = if (shouldAppendAttemptId)
> { req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" }
> else
> { req.getRequestURI.stripSuffix("/") + "/" }
> ```
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47890) Add python and scala dataframe variant expression aliases.

2024-04-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47890.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46123
[https://github.com/apache/spark/pull/46123]

> Add python and scala dataframe variant expression aliases.
> --
>
> Key: SPARK-47890
> URL: https://issues.apache.org/jira/browse/SPARK-47890
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47413:
---

Assignee: Gideon P

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47413) Substring, Right, Left (all collations)

2024-04-22 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47413.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46040
[https://github.com/apache/spark/pull/46040]

> Substring, Right, Left (all collations)
> ---
>
> Key: SPARK-47413
> URL: https://issues.apache.org/jira/browse/SPARK-47413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Gideon P
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *Substring* built-in string function in 
> Spark (including *Right* and *Left* functions). First confirm what is the 
> expected behaviour for these functions when given collated strings, then move 
> on to the implementation that would enable handling strings of all collation 
> types. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the {*}Substring{*}, 
> {*}Right{*}, and *Left* functions so that they support all collation types 
> currently supported in Spark. To understand what changes were introduced in 
> order to enable full collation support for other existing functions in Spark, 
> take a look at the Spark PRs and Jira tickets for completed tasks in this 
> parent (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical 
> Standard for 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection

2024-04-22 Thread huangzhir (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangzhir updated SPARK-47934:
--
Attachment: image-2024-04-22-15-14-13-468.png

> Inefficient Redirect Handling Due to Missing Trailing Slashes in URL 
> Redirection
> 
>
> Key: SPARK-47934
> URL: https://issues.apache.org/jira/browse/SPARK-47934
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3
>Reporter: huangzhir
>Priority: Trivial
> Attachments: image-2024-04-22-15-14-13-468.png
>
>
> *{*}Summary:{*}*
> The current implementation of URL redirection in Spark's history web UI does 
> not consistently add trailing slashes to URLs when constructing redirection 
> targets. This inconsistency leads to additional HTTP redirects by Jetty, 
> which increases the load time and reduces the efficiency of the Spark UI.
> *{*}Problem Description:{*}*
> When constructing redirect URLs, particularly in scenarios where an attempt 
> ID needs to be appended, the system does not ensure that the base URL ends 
> with a slash. This omission results in the generated URL being redirected by 
> Jetty to add a trailing slash, thus causing an unnecessary additional HTTP 
> redirect.
> For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
> without a trailing slash before the attempt ID is appended, leading to two 
> redirects: one by our logic to add the attempt ID, and another by Jetty to 
> correct the missing slash. 
> !image-2024-04-22-15-06-29-357.png!
> *{*}Proposed Solution:{*}*
> [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]
> Ensure that all redirect URLs uniformly end with a trailing slash regardless 
> of whether an attempt ID is appended. This can be achieved by modifying the 
> URL construction logic as follows:
> ```scala
> val redirect = if (shouldAppendAttemptId) {
> req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/"
> } else {
> req.getRequestURI.stripSuffix("/") + "/"
> }
> ```
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection

2024-04-22 Thread huangzhir (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangzhir updated SPARK-47934:
--
Description: 
*Summary:*
The current implementation of URL redirection in Spark's history web UI does 
not consistently add trailing slashes to URLs when constructing redirection 
targets. This inconsistency leads to additional HTTP redirects by Jetty, which 
increases the load time and reduces the efficiency of the Spark UI.

*Problem Description:*
When constructing redirect URLs, particularly in scenarios where an attempt ID 
needs to be appended, the system does not ensure that the base URL ends with a 
slash. This omission results in the generated URL being redirected by Jetty to 
add a trailing slash, thus causing an unnecessary additional HTTP redirect.

For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
without a trailing slash before the attempt ID is appended, leading to two 
redirects: one by our logic to add the attempt ID, and another by Jetty to 
correct the missing slash. 

!image-2024-04-22-15-14-13-468.png!

*Proposed Solution:*

[https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]

Ensure that all redirect URLs uniformly end with a trailing slash regardless of 
whether an attempt ID is appended. This can be achieved by modifying the URL 
construction logic as follows:

```scala
val redirect = if (shouldAppendAttemptId)

{ req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" }

else

{ req.getRequestURI.stripSuffix("/") + "/" }

```
 

  was:
{*}{{*}}Summary:{{*}}{*}
The current implementation of URL redirection in Spark's history web UI does 
not consistently add trailing slashes to URLs when constructing redirection 
targets. This inconsistency leads to additional HTTP redirects by Jetty, which 
increases the load time and reduces the efficiency of the Spark UI.

{*}{{*}}Problem Description:{{*}}{*}
When constructing redirect URLs, particularly in scenarios where an attempt ID 
needs to be appended, the system does not ensure that the base URL ends with a 
slash. This omission results in the generated URL being redirected by Jetty to 
add a trailing slash, thus causing an unnecessary additional HTTP redirect.

For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
without a trailing slash before the attempt ID is appended, leading to two 
redirects: one by our logic to add the attempt ID, and another by Jetty to 
correct the missing slash. 

!image-2024-04-22-15-14-13-468.png!

{*}{{*}}Proposed Solution:{{*}}{*}

[https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]

Ensure that all redirect URLs uniformly end with a trailing slash regardless of 
whether an attempt ID is appended. This can be achieved by modifying the URL 
construction logic as follows:

```scala
val redirect = if (shouldAppendAttemptId)

{ req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" }

else

{ req.getRequestURI.stripSuffix("/") + "/" }

```
 


> Inefficient Redirect Handling Due to Missing Trailing Slashes in URL 
> Redirection
> 
>
> Key: SPARK-47934
> URL: https://issues.apache.org/jira/browse/SPARK-47934
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3
>Reporter: huangzhir
>Priority: Trivial
> Attachments: image-2024-04-22-15-14-13-468.png
>
>
> *Summary:*
> The current implementation of URL redirection in Spark's history web UI does 
> not consistently add trailing slashes to URLs when constructing redirection 
> targets. This inconsistency leads to additional HTTP redirects by Jetty, 
> which increases the load time and reduces the efficiency of the Spark UI.
> *Problem Description:*
> When constructing redirect URLs, particularly in scenarios where an attempt 
> ID needs to be appended, the system does not ensure that the base URL ends 
> with a slash. This omission results in the generated URL being redirected by 
> Jetty to add a trailing slash, thus causing an unnecessary additional HTTP 
> redirect.
> For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
> without a trailing slash before the attempt ID is appended, leading to two 
> redirects: one by our logic to add the attempt ID, and another by Jetty to 
> correct the missing slash. 
> !image-2024-04-22-15-14-13-468.png!
> *Proposed Solution:*
> [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]
> Ensure that all 

[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection

2024-04-22 Thread huangzhir (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangzhir updated SPARK-47934:
--
Description: 
{*}{{*}}Summary:{{*}}{*}
The current implementation of URL redirection in Spark's history web UI does 
not consistently add trailing slashes to URLs when constructing redirection 
targets. This inconsistency leads to additional HTTP redirects by Jetty, which 
increases the load time and reduces the efficiency of the Spark UI.

{*}{{*}}Problem Description:{{*}}{*}
When constructing redirect URLs, particularly in scenarios where an attempt ID 
needs to be appended, the system does not ensure that the base URL ends with a 
slash. This omission results in the generated URL being redirected by Jetty to 
add a trailing slash, thus causing an unnecessary additional HTTP redirect.

For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
without a trailing slash before the attempt ID is appended, leading to two 
redirects: one by our logic to add the attempt ID, and another by Jetty to 
correct the missing slash. 

!image-2024-04-22-15-14-13-468.png!

{*}{{*}}Proposed Solution:{{*}}{*}

[https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]

Ensure that all redirect URLs uniformly end with a trailing slash regardless of 
whether an attempt ID is appended. This can be achieved by modifying the URL 
construction logic as follows:

```scala
val redirect = if (shouldAppendAttemptId)

{ req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" }

else

{ req.getRequestURI.stripSuffix("/") + "/" }

```
 

  was:
*{*}Summary:{*}*
The current implementation of URL redirection in Spark's history web UI does 
not consistently add trailing slashes to URLs when constructing redirection 
targets. This inconsistency leads to additional HTTP redirects by Jetty, which 
increases the load time and reduces the efficiency of the Spark UI.

*{*}Problem Description:{*}*
When constructing redirect URLs, particularly in scenarios where an attempt ID 
needs to be appended, the system does not ensure that the base URL ends with a 
slash. This omission results in the generated URL being redirected by Jetty to 
add a trailing slash, thus causing an unnecessary additional HTTP redirect.

For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
without a trailing slash before the attempt ID is appended, leading to two 
redirects: one by our logic to add the attempt ID, and another by Jetty to 
correct the missing slash. 

!image-2024-04-22-15-06-29-357.png!

*{*}Proposed Solution:{*}*

[https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]

Ensure that all redirect URLs uniformly end with a trailing slash regardless of 
whether an attempt ID is appended. This can be achieved by modifying the URL 
construction logic as follows:

```scala
val redirect = if (shouldAppendAttemptId) {
req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/"
} else {
req.getRequestURI.stripSuffix("/") + "/"
}

```
 


> Inefficient Redirect Handling Due to Missing Trailing Slashes in URL 
> Redirection
> 
>
> Key: SPARK-47934
> URL: https://issues.apache.org/jira/browse/SPARK-47934
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3
>Reporter: huangzhir
>Priority: Trivial
> Attachments: image-2024-04-22-15-14-13-468.png
>
>
> {*}{{*}}Summary:{{*}}{*}
> The current implementation of URL redirection in Spark's history web UI does 
> not consistently add trailing slashes to URLs when constructing redirection 
> targets. This inconsistency leads to additional HTTP redirects by Jetty, 
> which increases the load time and reduces the efficiency of the Spark UI.
> {*}{{*}}Problem Description:{{*}}{*}
> When constructing redirect URLs, particularly in scenarios where an attempt 
> ID needs to be appended, the system does not ensure that the base URL ends 
> with a slash. This omission results in the generated URL being redirected by 
> Jetty to add a trailing slash, thus causing an unnecessary additional HTTP 
> redirect.
> For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
> without a trailing slash before the attempt ID is appended, leading to two 
> redirects: one by our logic to add the attempt ID, and another by Jetty to 
> correct the missing slash. 
> !image-2024-04-22-15-14-13-468.png!
> {*}{{*}}Proposed Solution:{{*}}{*}
> 

[jira] [Created] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection

2024-04-22 Thread huangzhir (Jira)
huangzhir created SPARK-47934:
-

 Summary: Inefficient Redirect Handling Due to Missing Trailing 
Slashes in URL Redirection
 Key: SPARK-47934
 URL: https://issues.apache.org/jira/browse/SPARK-47934
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.3, 3.5.1, 3.3.2, 3.2.4
Reporter: huangzhir


*{*}Summary:{*}*
The current implementation of URL redirection in Spark's history web UI does 
not consistently add trailing slashes to URLs when constructing redirection 
targets. This inconsistency leads to additional HTTP redirects by Jetty, which 
increases the load time and reduces the efficiency of the Spark UI.

*{*}Problem Description:{*}*
When constructing redirect URLs, particularly in scenarios where an attempt ID 
needs to be appended, the system does not ensure that the base URL ends with a 
slash. This omission results in the generated URL being redirected by Jetty to 
add a trailing slash, thus causing an unnecessary additional HTTP redirect.

For example, when the `shouldAppendAttemptId` flag is true, the URL is formed 
without a trailing slash before the attempt ID is appended, leading to two 
redirects: one by our logic to add the attempt ID, and another by Jetty to 
correct the missing slash. 

!image-2024-04-22-15-06-29-357.png!

*{*}Proposed Solution:{*}*

[https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118]

Ensure that all redirect URLs uniformly end with a trailing slash regardless of 
whether an attempt ID is appended. This can be achieved by modifying the URL 
construction logic as follows:

```scala
val redirect = if (shouldAppendAttemptId) {
req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/"
} else {
req.getRequestURI.stripSuffix("/") + "/"
}

```
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47927) Nullability after join not respected in UDF

2024-04-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47927:
---
Labels: pull-request-available  (was: )

> Nullability after join not respected in UDF
> ---
>
> Key: SPARK-47927
> URL: https://issues.apache.org/jira/browse/SPARK-47927
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: Emil Ejbyfeldt
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> val ds1 = Seq(1).toDS()
> val ds2 = Seq[Int]().toDS()
> val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(f(struct(ds1("value"), ds2("value".show()
> ds1.join(ds2, ds1("value") === ds2("value"), 
> "outer").select(struct(ds1("value"), ds2("value"))).show() {code}
> outputs
> {code:java}
> +---+
> |UDF(struct(value, value, value, value))|
> +---+
> |                                 {1, 0}|
> +---+
> ++
> |struct(value, value)|
> ++
> |           {1, NULL}|
> ++ {code}
> So when the result is passed to UDF the null-ability after the the join is 
> not respected and we incorrectly end up with a 0 value instead of a null/None 
> value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47932) Avoid using legacy commons-lang

2024-04-22 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-47932.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46154
[https://github.com/apache/spark/pull/46154]

> Avoid using legacy commons-lang
> ---
>
> Key: SPARK-47932
> URL: https://issues.apache.org/jira/browse/SPARK-47932
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-22 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
SPIP doc: 
[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]

 

Refined SPIP doc:

[https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit#heading=h.ud7930xhlsm6]

 

This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
SPIP doc: 
https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing

This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> [https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  
> Refined SPIP doc:
> [https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit#heading=h.ud7930xhlsm6]
>  
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical 

[jira] [Comment Edited] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-22 Thread Ke Jia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839497#comment-17839497
 ] 

Ke Jia edited comment on SPARK-47773 at 4/22/24 6:22 AM:
-

We have refined the above SPIP in accordance with the specifications from the 
Spark community. The latest version of the SPIP is now available 
[here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing].
  Welcome and value your suggestions and comments.


was (Author: jk_self):
We have refined the above [SPIP  
|[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]]in
 accordance with the specifications from the Spark community. The latest 
version of the SPIP is now available 
[here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing].
  Welcome and value your suggestions and comments.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-22 Thread Ke Jia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839497#comment-17839497
 ] 

Ke Jia edited comment on SPARK-47773 at 4/22/24 6:22 AM:
-

We have refined the above [SPIP  
|[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]]in
 accordance with the specifications from the Spark community. The latest 
version of the SPIP is now available 
[here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing].
  Welcome and value your suggestions and comments.


was (Author: jk_self):
We have refined the above 
[SPIP|[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]]
  in accordance with the specifications from the Spark community. The latest 
version of the SPIP is now available 
[here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing].
  Welcome and value your suggestions and comments.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-22 Thread Ke Jia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839497#comment-17839497
 ] 

Ke Jia commented on SPARK-47773:


We have refined the above 
[SPIP|[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]]
  in accordance with the specifications from the Spark community. The latest 
version of the SPIP is now available 
[here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing].
  Welcome and value your suggestions and comments.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org