from:"Apache Spark \\\(JIRA\\\)"

[jira] [Commented] (SPARK-42631) Support custom extensions in Spark Connect Scala client

2023-03-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695254#comment-17695254
 ] 

Apache Spark commented on SPARK-42631:
--

User 'tomvanbussel' has created a pull request for this issue:
https://github.com/apache/spark/pull/40234

> Support custom extensions in Spark Connect Scala client
> ---
>
> Key: SPARK-42631
> URL: https://issues.apache.org/jira/browse/SPARK-42631
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Tom van Bussel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42631) Support custom extensions in Spark Connect Scala client

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42631:


Assignee: Apache Spark

> Support custom extensions in Spark Connect Scala client
> ---
>
> Key: SPARK-42631
> URL: https://issues.apache.org/jira/browse/SPARK-42631
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Tom van Bussel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42631) Support custom extensions in Spark Connect Scala client

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42631:


Assignee: (was: Apache Spark)

> Support custom extensions in Spark Connect Scala client
> ---
>
> Key: SPARK-42631
> URL: https://issues.apache.org/jira/browse/SPARK-42631
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Tom van Bussel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42631) Support custom extensions in Spark Connect Scala client

2023-03-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695252#comment-17695252
 ] 

Apache Spark commented on SPARK-42631:
--

User 'tomvanbussel' has created a pull request for this issue:
https://github.com/apache/spark/pull/40234

> Support custom extensions in Spark Connect Scala client
> ---
>
> Key: SPARK-42631
> URL: https://issues.apache.org/jira/browse/SPARK-42631
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Tom van Bussel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42632) Fix scala paths in tests

2023-03-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695232#comment-17695232
 ] 

Apache Spark commented on SPARK-42632:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40235

> Fix scala paths in tests
> 
>
> Key: SPARK-42632
> URL: https://issues.apache.org/jira/browse/SPARK-42632
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> The jar resolution in the connect client tests can resolve the jar for the 
> wrong scala version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42632) Fix scala paths in tests

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42632:


Assignee: Apache Spark  (was: Herman van Hövell)

> Fix scala paths in tests
> 
>
> Key: SPARK-42632
> URL: https://issues.apache.org/jira/browse/SPARK-42632
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> The jar resolution in the connect client tests can resolve the jar for the 
> wrong scala version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42632) Fix scala paths in tests

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42632:


Assignee: Herman van Hövell  (was: Apache Spark)

> Fix scala paths in tests
> 
>
> Key: SPARK-42632
> URL: https://issues.apache.org/jira/browse/SPARK-42632
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> The jar resolution in the connect client tests can resolve the jar for the 
> wrong scala version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34827:


Assignee: Apache Spark

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34827:


Assignee: (was: Apache Spark)

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2023-03-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695157#comment-17695157
 ] 

Apache Spark commented on SPARK-34827:
--

User 'tomvanbussel' has created a pull request for this issue:
https://github.com/apache/spark/pull/40234

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42630:


Assignee: (was: Apache Spark)

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42630:


Assignee: Apache Spark

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695056#comment-17695056
 ] 

Apache Spark commented on SPARK-42630:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40233

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42629) Update the description of default data source in the document

2023-03-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695046#comment-17695046
 ] 

Apache Spark commented on SPARK-42629:
--

User 'huangxiaopingRD' has created a pull request for this issue:
https://github.com/apache/spark/pull/40232

> Update the description of default data source in the document
> -
>
> Key: SPARK-42629
> URL: https://issues.apache.org/jira/browse/SPARK-42629
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: xiaoping.huang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42629) Update the description of default data source in the document

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42629:


Assignee: (was: Apache Spark)

> Update the description of default data source in the document
> -
>
> Key: SPARK-42629
> URL: https://issues.apache.org/jira/browse/SPARK-42629
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: xiaoping.huang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42629) Update the description of default data source in the document

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42629:


Assignee: Apache Spark

> Update the description of default data source in the document
> -
>
> Key: SPARK-42629
> URL: https://issues.apache.org/jira/browse/SPARK-42629
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: xiaoping.huang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42628) Add a migration note for bloom filter join

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42628:


Assignee: (was: Apache Spark)

> Add a migration note for bloom filter join
> --
>
> Key: SPARK-42628
> URL: https://issues.apache.org/jira/browse/SPARK-42628
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42628) Add a migration note for bloom filter join

2023-03-01 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694958#comment-17694958
 ] 

Apache Spark commented on SPARK-42628:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40231

> Add a migration note for bloom filter join
> --
>
> Key: SPARK-42628
> URL: https://issues.apache.org/jira/browse/SPARK-42628
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42628) Add a migration note for bloom filter join

2023-03-01 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42628:


Assignee: Apache Spark

> Add a migration note for bloom filter join
> --
>
> Key: SPARK-42628
> URL: https://issues.apache.org/jira/browse/SPARK-42628
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42521) Add NULL values for INSERT commands with user-specified lists of fewer columns than the target table

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694799#comment-17694799
 ] 

Apache Spark commented on SPARK-42521:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/40229

> Add NULL values for INSERT commands with user-specified lists of fewer 
> columns than the target table
> 
>
> Key: SPARK-42521
> URL: https://issues.apache.org/jira/browse/SPARK-42521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41874) Implement DataFrame `sameSemantics`

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694796#comment-17694796
 ] 

Apache Spark commented on SPARK-41874:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40228

> Implement DataFrame `sameSemantics`
> ---
>
> Key: SPARK-41874
> URL: https://issues.apache.org/jira/browse/SPARK-41874
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41874) Implement DataFrame `sameSemantics`

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694795#comment-17694795
 ] 

Apache Spark commented on SPARK-41874:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40228

> Implement DataFrame `sameSemantics`
> ---
>
> Key: SPARK-41874
> URL: https://issues.apache.org/jira/browse/SPARK-41874
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41870) Handle duplicate columns in `createDataFrame`

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694791#comment-17694791
 ] 

Apache Spark commented on SPARK-41870:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40227

> Handle duplicate columns in `createDataFrame`
> -
>
> Key: SPARK-41870
> URL: https://issues.apache.org/jira/browse/SPARK-41870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 65, in test_duplicated_column_names
>     df = self.spark.createDataFrame([(1, 2)], ["c", "c"])
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 277, in createDataFrame
>     raise ValueError(
> ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 
> elements{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41870) Handle duplicate columns in `createDataFrame`

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41870:


Assignee: (was: Apache Spark)

> Handle duplicate columns in `createDataFrame`
> -
>
> Key: SPARK-41870
> URL: https://issues.apache.org/jira/browse/SPARK-41870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 65, in test_duplicated_column_names
>     df = self.spark.createDataFrame([(1, 2)], ["c", "c"])
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 277, in createDataFrame
>     raise ValueError(
> ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 
> elements{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41870) Handle duplicate columns in `createDataFrame`

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41870:


Assignee: Apache Spark

> Handle duplicate columns in `createDataFrame`
> -
>
> Key: SPARK-41870
> URL: https://issues.apache.org/jira/browse/SPARK-41870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 65, in test_duplicated_column_names
>     df = self.spark.createDataFrame([(1, 2)], ["c", "c"])
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 277, in createDataFrame
>     raise ValueError(
> ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 
> elements{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41868) Support data type Duration(NANOSECOND)

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694781#comment-17694781
 ] 

Apache Spark commented on SPARK-41868:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40226

> Support data type Duration(NANOSECOND)
> --
>
> Key: SPARK-41868
> URL: https://issues.apache.org/jira/browse/SPARK-41868
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> import pandas as pd
> from datetime import timedelta
> df = self.spark.createDataFrame(pd.DataFrame({"a": 
> [timedelta(microseconds=123)]})) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1291, in test_create_dataframe_from_pandas_with_day_time_interval
>     self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>     return self._session.client.to_pandas(query)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>     return self._execute_and_fetch(req)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>     self._handle_error(rpc_error)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>     raise SparkConnectException(status.message, info.reason) from None
> pyspark.sql.connect.client.SparkConnectException: 
> (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
> Duration(NANOSECOND){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41868) Support data type Duration(NANOSECOND)

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41868:


Assignee: Apache Spark

> Support data type Duration(NANOSECOND)
> --
>
> Key: SPARK-41868
> URL: https://issues.apache.org/jira/browse/SPARK-41868
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> import pandas as pd
> from datetime import timedelta
> df = self.spark.createDataFrame(pd.DataFrame({"a": 
> [timedelta(microseconds=123)]})) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1291, in test_create_dataframe_from_pandas_with_day_time_interval
>     self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>     return self._session.client.to_pandas(query)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>     return self._execute_and_fetch(req)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>     self._handle_error(rpc_error)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>     raise SparkConnectException(status.message, info.reason) from None
> pyspark.sql.connect.client.SparkConnectException: 
> (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
> Duration(NANOSECOND){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41868) Support data type Duration(NANOSECOND)

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41868:


Assignee: (was: Apache Spark)

> Support data type Duration(NANOSECOND)
> --
>
> Key: SPARK-41868
> URL: https://issues.apache.org/jira/browse/SPARK-41868
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> import pandas as pd
> from datetime import timedelta
> df = self.spark.createDataFrame(pd.DataFrame({"a": 
> [timedelta(microseconds=123)]})) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1291, in test_create_dataframe_from_pandas_with_day_time_interval
>     self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>     return self._session.client.to_pandas(query)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>     return self._execute_and_fetch(req)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>     self._handle_error(rpc_error)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>     raise SparkConnectException(status.message, info.reason) from None
> pyspark.sql.connect.client.SparkConnectException: 
> (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
> Duration(NANOSECOND){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41868) Support data type Duration(NANOSECOND)

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694780#comment-17694780
 ] 

Apache Spark commented on SPARK-41868:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40226

> Support data type Duration(NANOSECOND)
> --
>
> Key: SPARK-41868
> URL: https://issues.apache.org/jira/browse/SPARK-41868
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> import pandas as pd
> from datetime import timedelta
> df = self.spark.createDataFrame(pd.DataFrame({"a": 
> [timedelta(microseconds=123)]})) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1291, in test_create_dataframe_from_pandas_with_day_time_interval
>     self.assertEqual(df.toPandas().a.iloc[0], timedelta(microseconds=123))
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>     return self._session.client.to_pandas(query)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>     return self._execute_and_fetch(req)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>     self._handle_error(rpc_error)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>     raise SparkConnectException(status.message, info.reason) from None
> pyspark.sql.connect.client.SparkConnectException: 
> (org.apache.spark.SparkUnsupportedOperationException) Unsupported data type: 
> Duration(NANOSECOND){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42625) Upgrade zstd-jni to 1.5.4-2

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42625:


Assignee: (was: Apache Spark)

> Upgrade zstd-jni to 1.5.4-2
> ---
>
> Key: SPARK-42625
> URL: https://issues.apache.org/jira/browse/SPARK-42625
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42625) Upgrade zstd-jni to 1.5.4-2

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42625:


Assignee: Apache Spark

> Upgrade zstd-jni to 1.5.4-2
> ---
>
> Key: SPARK-42625
> URL: https://issues.apache.org/jira/browse/SPARK-42625
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42625) Upgrade zstd-jni to 1.5.4-2

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694755#comment-17694755
 ] 

Apache Spark commented on SPARK-42625:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40225

> Upgrade zstd-jni to 1.5.4-2
> ---
>
> Key: SPARK-42625
> URL: https://issues.apache.org/jira/browse/SPARK-42625
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694753#comment-17694753
 ] 

Apache Spark commented on SPARK-42539:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40224

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-2683

[jira] [Commented] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694752#comment-17694752
 ] 

Apache Spark commented on SPARK-42539:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40224

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-2683

[jira] [Assigned] (SPARK-42624) Reorganize imports in test_functions

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42624:


Assignee: (was: Apache Spark)

> Reorganize imports in test_functions
> 
>
> Key: SPARK-42624
> URL: https://issues.apache.org/jira/browse/SPARK-42624
> Project: Spark
>  Issue Type: Task
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42624) Reorganize imports in test_functions

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42624:


Assignee: Apache Spark

> Reorganize imports in test_functions
> 
>
> Key: SPARK-42624
> URL: https://issues.apache.org/jira/browse/SPARK-42624
> Project: Spark
>  Issue Type: Task
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42624) Reorganize imports in test_functions

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694729#comment-17694729
 ] 

Apache Spark commented on SPARK-42624:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40223

> Reorganize imports in test_functions
> 
>
> Key: SPARK-42624
> URL: https://issues.apache.org/jira/browse/SPARK-42624
> Project: Spark
>  Issue Type: Task
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694661#comment-17694661
 ] 

Apache Spark commented on SPARK-42615:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40222

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41551) Improve/complete PathOutputCommitProtocol support for dynamic partitioning

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694632#comment-17694632
 ] 

Apache Spark commented on SPARK-41551:
--

User 'steveloughran' has created a pull request for this issue:
https://github.com/apache/spark/pull/40221

> Improve/complete PathOutputCommitProtocol support for dynamic partitioning
> --
>
> Key: SPARK-41551
> URL: https://issues.apache.org/jira/browse/SPARK-41551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Priority: Minor
>
> Followup to SPARK-40034 as 
> * that is incomplete as it doesn't record the partitions
> * as long at the job doesn't call `newTaskTempFileAbsPath()`, and slow 
> renames are ok, both s3a committers are actually OK to use.
> It's only that newTaskTempFileAbsPath operation which is unsupported in s3a 
> committers; the post-job dir rename is O(data) but file by file rename is 
> correct for a non-atomic job commit.
> # Cut PathOutputCommitProtocol.newTaskTempFile; to update super 
> partitionPaths (needs a setter). The superclass can't just say if (committer 
> instance of PathOutputCommitter as spark-core needs to compile with older 
> hadoop versions)
> # downgrade failure in setup to log (info?)
> # retain failure in the newTaskTempFileAbsPath call.
> Testing: yes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41551) Improve/complete PathOutputCommitProtocol support for dynamic partitioning

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694629#comment-17694629
 ] 

Apache Spark commented on SPARK-41551:
--

User 'steveloughran' has created a pull request for this issue:
https://github.com/apache/spark/pull/40221

> Improve/complete PathOutputCommitProtocol support for dynamic partitioning
> --
>
> Key: SPARK-41551
> URL: https://issues.apache.org/jira/browse/SPARK-41551
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Priority: Minor
>
> Followup to SPARK-40034 as 
> * that is incomplete as it doesn't record the partitions
> * as long at the job doesn't call `newTaskTempFileAbsPath()`, and slow 
> renames are ok, both s3a committers are actually OK to use.
> It's only that newTaskTempFileAbsPath operation which is unsupported in s3a 
> committers; the post-job dir rename is O(data) but file by file rename is 
> correct for a non-atomic job commit.
> # Cut PathOutputCommitProtocol.newTaskTempFile; to update super 
> partitionPaths (needs a setter). The superclass can't just say if (committer 
> instance of PathOutputCommitter as spark-core needs to compile with older 
> hadoop versions)
> # downgrade failure in setup to log (info?)
> # retain failure in the newTaskTempFileAbsPath call.
> Testing: yes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42622) StackOverflowError reading json that does not conform to schema

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694622#comment-17694622
 ] 

Apache Spark commented on SPARK-42622:
--

User 'jelmerk' has created a pull request for this issue:
https://github.com/apache/spark/pull/40219

> StackOverflowError reading json that does not conform to schema
> ---
>
> Key: SPARK-42622
> URL: https://issues.apache.org/jira/browse/SPARK-42622
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.0
>Reporter: Jelmer Kuperus
>Priority: Critical
>
> Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we 
> encountered the following problem
>  
> !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42622) StackOverflowError reading json that does not conform to schema

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42622:


Assignee: (was: Apache Spark)

> StackOverflowError reading json that does not conform to schema
> ---
>
> Key: SPARK-42622
> URL: https://issues.apache.org/jira/browse/SPARK-42622
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.0
>Reporter: Jelmer Kuperus
>Priority: Critical
>
> Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we 
> encountered the following problem
>  
> !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42622) StackOverflowError reading json that does not conform to schema

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694621#comment-17694621
 ] 

Apache Spark commented on SPARK-42622:
--

User 'jelmerk' has created a pull request for this issue:
https://github.com/apache/spark/pull/40219

> StackOverflowError reading json that does not conform to schema
> ---
>
> Key: SPARK-42622
> URL: https://issues.apache.org/jira/browse/SPARK-42622
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.0
>Reporter: Jelmer Kuperus
>Priority: Critical
>
> Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we 
> encountered the following problem
>  
> !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42622) StackOverflowError reading json that does not conform to schema

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42622:


Assignee: Apache Spark

> StackOverflowError reading json that does not conform to schema
> ---
>
> Key: SPARK-42622
> URL: https://issues.apache.org/jira/browse/SPARK-42622
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.4.0
>Reporter: Jelmer Kuperus
>Assignee: Apache Spark
>Priority: Critical
>
> Databricks runtime 12.1 uses a pre-release version of spark 3.4.x we 
> encountered the following problem
>  
> !https://user-images.githubusercontent.com/133639/221866500-99f187a0-8db3-42a7-85ca-b027fdec160d.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42579) Extend function.lit() to match Literal.apply()

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694533#comment-17694533
 ] 

Apache Spark commented on SPARK-42579:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40218

> Extend function.lit() to match Literal.apply()
> --
>
> Key: SPARK-42579
> URL: https://issues.apache.org/jira/browse/SPARK-42579
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> function.lit should support the same conversions as the original.
> This requires an addition to the connect protocol, since it does not support 
> nested type literals at the moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42579) Extend function.lit() to match Literal.apply()

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694532#comment-17694532
 ] 

Apache Spark commented on SPARK-42579:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40218

> Extend function.lit() to match Literal.apply()
> --
>
> Key: SPARK-42579
> URL: https://issues.apache.org/jira/browse/SPARK-42579
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> function.lit should support the same conversions as the original.
> This requires an addition to the connect protocol, since it does not support 
> nested type literals at the moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42579) Extend function.lit() to match Literal.apply()

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42579:


Assignee: (was: Apache Spark)

> Extend function.lit() to match Literal.apply()
> --
>
> Key: SPARK-42579
> URL: https://issues.apache.org/jira/browse/SPARK-42579
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> function.lit should support the same conversions as the original.
> This requires an addition to the connect protocol, since it does not support 
> nested type literals at the moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42579) Extend function.lit() to match Literal.apply()

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42579:


Assignee: Apache Spark

> Extend function.lit() to match Literal.apply()
> --
>
> Key: SPARK-42579
> URL: https://issues.apache.org/jira/browse/SPARK-42579
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> function.lit should support the same conversions as the original.
> This requires an addition to the connect protocol, since it does not support 
> nested type literals at the moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42559) Implement DataFrameNaFunctions

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42559:


Assignee: Apache Spark  (was: BingKun Pan)

> Implement DataFrameNaFunctions
> --
>
> Key: SPARK-42559
> URL: https://issues.apache.org/jira/browse/SPARK-42559
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Implement DataFrameNaFunctions for connect and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42559) Implement DataFrameNaFunctions

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42559:


Assignee: BingKun Pan  (was: Apache Spark)

> Implement DataFrameNaFunctions
> --
>
> Key: SPARK-42559
> URL: https://issues.apache.org/jira/browse/SPARK-42559
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
>
> Implement DataFrameNaFunctions for connect and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42559) Implement DataFrameNaFunctions

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694502#comment-17694502
 ] 

Apache Spark commented on SPARK-42559:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40217

> Implement DataFrameNaFunctions
> --
>
> Key: SPARK-42559
> URL: https://issues.apache.org/jira/browse/SPARK-42559
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: BingKun Pan
>Priority: Major
>
> Implement DataFrameNaFunctions for connect and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42593:


Assignee: (was: Apache Spark)

> Deprecate & remove the APIs that will be removed in pandas 2.0.
> ---
>
> Key: SPARK-42593
> URL: https://issues.apache.org/jira/browse/SPARK-42593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> pandas is preparing to release 2.0 which includes bunch of API changes. 
> ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes])
> We also should deprecate the APIs so that we can remove the API in next 
> release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42593:


Assignee: Apache Spark

> Deprecate & remove the APIs that will be removed in pandas 2.0.
> ---
>
> Key: SPARK-42593
> URL: https://issues.apache.org/jira/browse/SPARK-42593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> pandas is preparing to release 2.0 which includes bunch of API changes. 
> ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes])
> We also should deprecate the APIs so that we can remove the API in next 
> release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694464#comment-17694464
 ] 

Apache Spark commented on SPARK-42593:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40216

> Deprecate & remove the APIs that will be removed in pandas 2.0.
> ---
>
> Key: SPARK-42593
> URL: https://issues.apache.org/jira/browse/SPARK-42593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> pandas is preparing to release 2.0 which includes bunch of API changes. 
> ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes])
> We also should deprecate the APIs so that we can remove the API in next 
> release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42593) Deprecate & remove the APIs that will be removed in pandas 2.0.

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694463#comment-17694463
 ] 

Apache Spark commented on SPARK-42593:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40216

> Deprecate & remove the APIs that will be removed in pandas 2.0.
> ---
>
> Key: SPARK-42593
> URL: https://issues.apache.org/jira/browse/SPARK-42593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> pandas is preparing to release 2.0 which includes bunch of API changes. 
> ([https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes])
> We also should deprecate the APIs so that we can remove the API in next 
> release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42376) Introduce watermark propagation among operators

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694401#comment-17694401
 ] 

Apache Spark commented on SPARK-42376:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/40215

> Introduce watermark propagation among operators
> ---
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple 
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the 
> late record filtering in stateful operators to allow chaining of stateful 
> operators {*}which do not produce delayed records (like time-interval join or 
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join 
> followed by stateful operator (e.g. window aggregation), and propose to 
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing 
> simulation of watermark propagation among operators. As of now, Spark 
> considers all stateful operators to have same input watermark and output 
> watermark, which introduced the limitation. With this ticket, we construct 
> the logic to simulate watermark propagation so that each operator can have 
> its own (input watermark, output watermark). Operators introducing delayed 
> records will produce delayed output watermark, and downstream operator can 
> take the delay into account as input watermark will be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42591) Document SS guide doc for introducing watermark propagation among operators

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42591:


Assignee: (was: Apache Spark)

> Document SS guide doc for introducing watermark propagation among operators
> ---
>
> Key: SPARK-42591
> URL: https://issues.apache.org/jira/browse/SPARK-42591
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Once SPARK-42376 has merged, we would want to also provide the example of 
> using stream-stream time interval join followed by streaming aggregation. 
> Just adding the feature without proper document may lead to the case no one 
> even knows this is supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42591) Document SS guide doc for introducing watermark propagation among operators

2023-02-28 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694400#comment-17694400
 ] 

Apache Spark commented on SPARK-42591:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/40215

> Document SS guide doc for introducing watermark propagation among operators
> ---
>
> Key: SPARK-42591
> URL: https://issues.apache.org/jira/browse/SPARK-42591
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> Once SPARK-42376 has merged, we would want to also provide the example of 
> using stream-stream time interval join followed by streaming aggregation. 
> Just adding the feature without proper document may lead to the case no one 
> even knows this is supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42591) Document SS guide doc for introducing watermark propagation among operators

2023-02-28 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42591:


Assignee: Apache Spark

> Document SS guide doc for introducing watermark propagation among operators
> ---
>
> Key: SPARK-42591
> URL: https://issues.apache.org/jira/browse/SPARK-42591
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> Once SPARK-42376 has merged, we would want to also provide the example of 
> using stream-stream time interval join followed by streaming aggregation. 
> Just adding the feature without proper document may lead to the case no one 
> even knows this is supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42491:


Assignee: (was: Apache Spark)

> Upgrade jetty to  9.4.51.v20230217
> --
>
> Key: SPARK-42491
> URL: https://issues.apache.org/jira/browse/SPARK-42491
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42491:


Assignee: Apache Spark

> Upgrade jetty to  9.4.51.v20230217
> --
>
> Key: SPARK-42491
> URL: https://issues.apache.org/jira/browse/SPARK-42491
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694393#comment-17694393
 ] 

Apache Spark commented on SPARK-42491:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40214

> Upgrade jetty to  9.4.51.v20230217
> --
>
> Key: SPARK-42491
> URL: https://issues.apache.org/jira/browse/SPARK-42491
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42599) Make `CompatibilitySuite` as a tool like `dev/mima`

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694378#comment-17694378
 ] 

Apache Spark commented on SPARK-42599:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40213

> Make `CompatibilitySuite` as a tool like `dev/mima`
> ---
>
> Key: SPARK-42599
> URL: https://issues.apache.org/jira/browse/SPARK-42599
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> Using maven to test `CompatibilitySuite` requires some pre-work(need maven 
> build sql & 
> connect-client-jvm module before test), so when we run `mvn package test`, 
> there will be following errors:
>  
> {code:java}
> CompatibilitySuite:
> - compatibility MiMa tests *** FAILED ***
>   java.lang.AssertionError: assertion failed: Failed to find the jar inside 
> folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
>   at scala.Predef$.assert(Predef.scala:223)
>   at 
> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
> - compatibility API tests: Dataset *** FAILED ***
>   java.lang.AssertionError: assertion failed: Failed to find the jar inside 
> folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
>   at scala.Predef$.assert(Predef.scala:223)
>   at 
> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:110)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42616:


Assignee: Apache Spark

> SparkSQLCLIDriver shall only close started hive sessionState
> 
>
> Key: SPARK-42616
> URL: https://issues.apache.org/jira/browse/SPARK-42616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42613:


Assignee: Apache Spark

> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Assignee: Apache Spark
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores` as described in [PR 
> #38699|https://github.com/apache/spark/pull/38699].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694324#comment-17694324
 ] 

Apache Spark commented on SPARK-42613:
--

User 'jzhuge' has created a pull request for this issue:
https://github.com/apache/spark/pull/40212

> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores` as described in [PR 
> #38699|https://github.com/apache/spark/pull/38699].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42616:


Assignee: (was: Apache Spark)

> SparkSQLCLIDriver shall only close started hive sessionState
> 
>
> Key: SPARK-42616
> URL: https://issues.apache.org/jira/browse/SPARK-42616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42613:


Assignee: (was: Apache Spark)

> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores` as described in [PR 
> #38699|https://github.com/apache/spark/pull/38699].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694323#comment-17694323
 ] 

Apache Spark commented on SPARK-42616:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/40211

> SparkSQLCLIDriver shall only close started hive sessionState
> 
>
> Key: SPARK-42616
> URL: https://issues.apache.org/jira/browse/SPARK-42616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694313#comment-17694313
 ] 

Apache Spark commented on SPARK-42615:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40210

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42615:


Assignee: (was: Apache Spark)

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42615:


Assignee: Apache Spark

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42427) Conv should return an error if the internal conversion overflows

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694299#comment-17694299
 ] 

Apache Spark commented on SPARK-42427:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40209

> Conv should return an error if the internal conversion overflows
> 
>
> Key: SPARK-42427
> URL: https://issues.apache.org/jira/browse/SPARK-42427
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42539:


Assignee: Apache Spark

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
> *(B) Reverse the ord

[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42539:


Assignee: (was: Apache Spark)

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
> *(B) Reverse the ordering of parent/child JAR

[jira] [Commented] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694281#comment-17694281
 ] 

Apache Spark commented on SPARK-42592:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/40208

> Document SS guide doc for supporting multiple stateful operators (especially 
> chained aggregations)
> --
>
> Key: SPARK-42592
> URL: https://issues.apache.org/jira/browse/SPARK-42592
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from 
> SPARK-42105 we only removed the section of "limitation of global watermark". 
> That said, we haven't provided any example of new functionality, especially 
> that users need to know about the change of SQL function (window) in chained 
> time window aggregations.
> In this ticket, we will add the example of chained time window aggregations, 
> with introducing new functionality of SQL function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694279#comment-17694279
 ] 

Apache Spark commented on SPARK-42614:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40207

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42614:


Assignee: Herman van Hövell  (was: Apache Spark)

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42614:


Assignee: Apache Spark  (was: Herman van Hövell)

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694278#comment-17694278
 ] 

Apache Spark commented on SPARK-42614:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40207

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42611:


Assignee: (was: Apache Spark)

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694272#comment-17694272
 ] 

Apache Spark commented on SPARK-42611:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40206

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42611:


Assignee: Apache Spark

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694271#comment-17694271
 ] 

Apache Spark commented on SPARK-42611:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40206

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42610) Add implicit encoders to SQLImplicits

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694265#comment-17694265
 ] 

Apache Spark commented on SPARK-42610:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40205

> Add implicit encoders to SQLImplicits
> -
>
> Key: SPARK-42610
> URL: https://issues.apache.org/jira/browse/SPARK-42610
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694259#comment-17694259
 ] 

Apache Spark commented on SPARK-42601:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40204

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42601:


Assignee: (was: Apache Spark)

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42601:


Assignee: Apache Spark

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694258#comment-17694258
 ] 

Apache Spark commented on SPARK-42601:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40204

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42612:


Assignee: (was: Apache Spark)

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42612:


Assignee: Apache Spark

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694244#comment-17694244
 ] 

Apache Spark commented on SPARK-42612:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40203

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694245#comment-17694245
 ] 

Apache Spark commented on SPARK-42612:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40203

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42608) Use full column names for inner fields in resolution errors

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694242#comment-17694242
 ] 

Apache Spark commented on SPARK-42608:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40202

> Use full column names for inner fields in resolution errors
> ---
>
> Key: SPARK-42608
> URL: https://issues.apache.org/jira/browse/SPARK-42608
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> If there are multiple inner columns with the same name, resolution errors may 
> be confusing as we only use field names, not full column names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42608) Use full column names for inner fields in resolution errors

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42608:


Assignee: Apache Spark

> Use full column names for inner fields in resolution errors
> ---
>
> Key: SPARK-42608
> URL: https://issues.apache.org/jira/browse/SPARK-42608
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> If there are multiple inner columns with the same name, resolution errors may 
> be confusing as we only use field names, not full column names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42608) Use full column names for inner fields in resolution errors

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42608:


Assignee: (was: Apache Spark)

> Use full column names for inner fields in resolution errors
> ---
>
> Key: SPARK-42608
> URL: https://issues.apache.org/jira/browse/SPARK-42608
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> If there are multiple inner columns with the same name, resolution errors may 
> be confusing as we only use field names, not full column names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41725:


Assignee: (was: Apache Spark)

> Remove the workaround of sql(...).collect back in PySpark tests
> ---
>
> Key: SPARK-41725
> URL: https://issues.apache.org/jira/browse/SPARK-41725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See https://github.com/apache/spark/pull/39224/files#r1057436437
> We don't have to `collect` for every `sql` but Spark Connect requires it. We 
> should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694208#comment-17694208
 ] 

Apache Spark commented on SPARK-41725:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40160

> Remove the workaround of sql(...).collect back in PySpark tests
> ---
>
> Key: SPARK-41725
> URL: https://issues.apache.org/jira/browse/SPARK-41725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See https://github.com/apache/spark/pull/39224/files#r1057436437
> We don't have to `collect` for every `sql` but Spark Connect requires it. We 
> should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41725:


Assignee: Apache Spark

> Remove the workaround of sql(...).collect back in PySpark tests
> ---
>
> Key: SPARK-41725
> URL: https://issues.apache.org/jira/browse/SPARK-41725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> See https://github.com/apache/spark/pull/39224/files#r1057436437
> We don't have to `collect` for every `sql` but Spark Connect requires it. We 
> should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42510) Implement `DataFrame.mapInPandas`

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694162#comment-17694162
 ] 

Apache Spark commented on SPARK-42510:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40201

> Implement `DataFrame.mapInPandas`
> -
>
> Key: SPARK-42510
> URL: https://issues.apache.org/jira/browse/SPARK-42510
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement `DataFrame.mapInPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 21094 matches

Mail list logo