date:20230211

[jira] [Assigned] (SPARK-42396) Upgrade Apache Kafka to 3.4.0

2023-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42396:
-

Assignee: Bjørn Jørgensen

> Upgrade Apache Kafka to 3.4.0
> -
>
> Key: SPARK-42396
> URL: https://issues.apache.org/jira/browse/SPARK-42396
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>
> [CVE-2023-25194|https://www.cve.org/CVERecord?id=CVE-2023-25194]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42396) Upgrade Apache Kafka to 3.4.0

2023-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42396.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 39969
[https://github.com/apache/spark/pull/39969]

> Upgrade Apache Kafka to 3.4.0
> -
>
> Key: SPARK-42396
> URL: https://issues.apache.org/jira/browse/SPARK-42396
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.5.0
>
>
> [CVE-2023-25194|https://www.cve.org/CVERecord?id=CVE-2023-25194]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42339) Improve Kryo Serializer Support

2023-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42339:
--
Summary: Improve Kryo Serializer Support  (was: Improve Kryo Serialize 
Support)

> Improve Kryo Serializer Support
> ---
>
> Key: SPARK-42339
> URL: https://issues.apache.org/jira/browse/SPARK-42339
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42408) Register DoubleType to KryoSerializer

2023-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42408:
--
Parent: SPARK-42339
Issue Type: Sub-task  (was: Improvement)

> Register DoubleType to KryoSerializer
> -
>
> Key: SPARK-42408
> URL: https://issues.apache.org/jira/browse/SPARK-42408
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42408) Register DoubleType to KryoSerializer

2023-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42408:
-

Assignee: William Hyun

> Register DoubleType to KryoSerializer
> -
>
> Key: SPARK-42408
> URL: https://issues.apache.org/jira/browse/SPARK-42408
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42408) Register DoubleType to KryoSerializer

2023-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42408.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39978
[https://github.com/apache/spark/pull/39978]

> Register DoubleType to KryoSerializer
> -
>
> Key: SPARK-42408
> URL: https://issues.apache.org/jira/browse/SPARK-42408
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42327) Assign name to_LEGACY_ERROR_TEMP_2177

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42327:


Assignee: Apache Spark

> Assign name to_LEGACY_ERROR_TEMP_2177
> -
>
> Key: SPARK-42327
> URL: https://issues.apache.org/jira/browse/SPARK-42327
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42327) Assign name to_LEGACY_ERROR_TEMP_2177

2023-02-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687519#comment-17687519
 ] 

Apache Spark commented on SPARK-42327:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39980

> Assign name to_LEGACY_ERROR_TEMP_2177
> -
>
> Key: SPARK-42327
> URL: https://issues.apache.org/jira/browse/SPARK-42327
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42327) Assign name to_LEGACY_ERROR_TEMP_2177

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42327:


Assignee: (was: Apache Spark)

> Assign name to_LEGACY_ERROR_TEMP_2177
> -
>
> Key: SPARK-42327
> URL: https://issues.apache.org/jira/browse/SPARK-42327
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42326) Assign name to _LEGACY_ERROR_TEMP_2099

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42326:


Assignee: (was: Apache Spark)

> Assign name to _LEGACY_ERROR_TEMP_2099
> --
>
> Key: SPARK-42326
> URL: https://issues.apache.org/jira/browse/SPARK-42326
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42326) Assign name to _LEGACY_ERROR_TEMP_2099

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42326:


Assignee: Apache Spark

> Assign name to _LEGACY_ERROR_TEMP_2099
> --
>
> Key: SPARK-42326
> URL: https://issues.apache.org/jira/browse/SPARK-42326
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42326) Assign name to _LEGACY_ERROR_TEMP_2099

2023-02-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687517#comment-17687517
 ] 

Apache Spark commented on SPARK-42326:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39979

> Assign name to _LEGACY_ERROR_TEMP_2099
> --
>
> Key: SPARK-42326
> URL: https://issues.apache.org/jira/browse/SPARK-42326
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42408) Register DoubleType to KryoSerializer

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42408:


Assignee: (was: Apache Spark)

> Register DoubleType to KryoSerializer
> -
>
> Key: SPARK-42408
> URL: https://issues.apache.org/jira/browse/SPARK-42408
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42408) Register DoubleType to KryoSerializer

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42408:


Assignee: Apache Spark

> Register DoubleType to KryoSerializer
> -
>
> Key: SPARK-42408
> URL: https://issues.apache.org/jira/browse/SPARK-42408
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42408) Register DoubleType to KryoSerializer

2023-02-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687516#comment-17687516
 ] 

Apache Spark commented on SPARK-42408:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39978

> Register DoubleType to KryoSerializer
> -
>
> Key: SPARK-42408
> URL: https://issues.apache.org/jira/browse/SPARK-42408
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42377) Test Framework for Connect Scala Client

2023-02-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42377.
---
Fix Version/s: 3.4.0
 Assignee: Herman van Hövell
   Resolution: Fixed

> Test Framework for Connect Scala Client
> ---
>
> Key: SPARK-42377
> URL: https://issues.apache.org/jira/browse/SPARK-42377
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42408) Register DoubleType to KryoSerializer

2023-02-11 Thread William Hyun (Jira)

William Hyun created SPARK-42408:


 Summary: Register DoubleType to KryoSerializer
 Key: SPARK-42408
 URL: https://issues.apache.org/jira/browse/SPARK-42408
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.4.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42390) Upgrade buf from 1.13.1 to 1.14.0

2023-02-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42390.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39959
[https://github.com/apache/spark/pull/39959]

> Upgrade buf from 1.13.1 to 1.14.0
> -
>
> Key: SPARK-42390
> URL: https://issues.apache.org/jira/browse/SPARK-42390
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42390) Upgrade buf from 1.13.1 to 1.14.0

2023-02-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42390:


Assignee: BingKun Pan

> Upgrade buf from 1.13.1 to 1.14.0
> -
>
> Key: SPARK-42390
> URL: https://issues.apache.org/jira/browse/SPARK-42390
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289

2023-02-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42310.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39946
[https://github.com/apache/spark/pull/39946]

> Assign name to _LEGACY_ERROR_TEMP_1289
> --
>
> Key: SPARK-42310
> URL: https://issues.apache.org/jira/browse/SPARK-42310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42310) Assign name to _LEGACY_ERROR_TEMP_1289

2023-02-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42310:


Assignee: Haejoon Lee

> Assign name to _LEGACY_ERROR_TEMP_1289
> --
>
> Key: SPARK-42310
> URL: https://issues.apache.org/jira/browse/SPARK-42310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42402) Support parameterized SQL by sql()

2023-02-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42402:


Assignee: Takuya Ueshin

> Support parameterized SQL by sql()
> --
>
> Key: SPARK-42402
> URL: https://issues.apache.org/jira/browse/SPARK-42402
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42402) Support parameterized SQL by sql()

2023-02-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42402.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39971
[https://github.com/apache/spark/pull/39971]

> Support parameterized SQL by sql()
> --
>
> Key: SPARK-42402
> URL: https://issues.apache.org/jira/browse/SPARK-42402
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42407) `with as` executed again

2023-02-11 Thread yiku123 (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yiku123 updated SPARK-42407:

Summary: `with as` executed again  (was: with as )

> `with as` executed again
> 
>
> Key: SPARK-42407
> URL: https://issues.apache.org/jira/browse/SPARK-42407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: yiku123
>Priority: Critical
>
> When 'with as' is used multiple times, it will be executed again each time 
> without saving the results of' with as', resulting in low efficiency.
> Will you consider improving the behavior of 'with as'
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42407) with as

2023-02-11 Thread yiku123 (Jira)

yiku123 created SPARK-42407:
---

 Summary: with as 
 Key: SPARK-42407
 URL: https://issues.apache.org/jira/browse/SPARK-42407
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.3
Reporter: yiku123


When 'with as' is used multiple times, it will be executed again each time 
without saving the results of' with as', resulting in low efficiency.

Will you consider improving the behavior of 'with as'

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-11 Thread Raghu Angadi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687489#comment-17687489
 ] 

Raghu Angadi commented on SPARK-42406:
--

cc: [~sanysand...@gmail.com] PTAL.

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.4.1
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-11 Thread Raghu Angadi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated SPARK-42406:
-
Description: 
Protobuf deserializer (`from_protobuf()` function()) optionally supports 
recursive fields by limiting the depth to certain level. See example below. It 
assigns a 'NullType' for such a field when allowed depth is reached. 

It causes a few issues. E.g. a repeated field as in the following example 
results in a Array field with 'NullType'. Delta does not support null type in a 
complex type.

Actually `Array[NullType]` is not really useful anyway.

How about this fix: Drop the recursive field when the limit reached rather than 
using a NullType. 

The example below makes it clear:

Consider a recursive Protobuf:

 
{code:python}
message TreeNode {
  string value = 1;
  repeated TreeNode children = 2;
}
{code}
Allow depth of 2: 

 
{code:python}
   df.select(
    'proto',
     messageName = 'TreeNode',
     options = { ... "recursive.fields.max.depth" : "2" }
  ).printSchema()
{code}
Schema looks like this:

{noformat}
root
|– from_protobuf(proto): struct (nullable = true)|
| |– value: string (nullable = true)|
| |– children: array (nullable = false)|
| | |– element: struct (containsNull = false)|
| | | |– value: string (nullable = true)|
| | | |– children: array (nullable = false)|
| | | | |– element: struct (containsNull = false)|
| | | | | |– value: string (nullable = true)|
| | | | | |– children: array (nullable = false). [ === Proposed fix: Drop this 
field === ]|
| | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
=== ] 
{noformat}

When we try to write this to a delta table, we get an error:
{noformat}
AnalysisException: Found nested NullType in column 
from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
writing NullType in complex types.
{noformat}
 
We could just drop the field 'element' when recursion depth is reached. It is 
simpler and does not need to deal with NullType. We are ignoring the value 
anyway. There is no use in keeping the field.

 

 

  was:
Protobuf deserializer (`from_protobuf()` function()) optionally supports 
recursive fields by limiting the depth to certain level. See example below. It 
assigns a 'NullType' for such a field when allowed depth is reached. 

It causes a few issues. E.g. a repeated field as in the following example 
results in a Array field with 'NullType'. Delta does not support null type in a 
complex type.

Actually `Array[NullType]` is not really useful anyway.

How about this fix: Drop the recursive field when the limit reached rather than 
using a NullType. 

The example below makes it clear:

Consider a recursive Protobuf:
```
message TreeNode {
  string value = 1;
  repeated TreeNode children = 2;
}
```

Allow depth of 2: 

```python
   df.select(
    'proto',
     messageName = 'TreeNode',
    options = { ... "recursive.fields.max.depth" : "2" }
  ).printSchema()
```
Schema looks like this:
```
root
 |-- from_protobuf(proto): struct (nullable = true)
 ||-- value: string (nullable = true)
 ||-- children: array (nullable = false)
 |||-- element: struct (containsNull = false)
 ||||-- value: string (nullable = true)
 ||||-- children: array (nullable = false)
 |||||-- element: struct (containsNull = false)
 ||||||-- value: string (nullable = true)
 ||||||-- children: array (nullable = false). [ === 
Proposed fix: Drop this field === ] 
 |||||||-- element: void (containsNull = false) [ === 
NOTICE 'void' HERE === ] 
```

When we try to write this to a delta table, we get an error:
```
AnalysisException:  Found nested NullType in column 
from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
writing NullType in complex types.
```
 
We could just drop the field 'element' when recursion depth is reached. It is 
simpler and does not need to deal with NullType. We are ignoring the value 
anyway. There is no use in keeping the field. 

 

 


> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.4.1
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null

[jira] [Created] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-11 Thread Raghu Angadi (Jira)

Raghu Angadi created SPARK-42406:


 Summary: [PROTOBUF] Recursive field handling is incompatible with 
delta
 Key: SPARK-42406
 URL: https://issues.apache.org/jira/browse/SPARK-42406
 Project: Spark
  Issue Type: Bug
  Components: Protobuf
Affects Versions: 3.4.0
Reporter: Raghu Angadi
 Fix For: 3.4.1


Protobuf deserializer (`from_protobuf()` function()) optionally supports 
recursive fields by limiting the depth to certain level. See example below. It 
assigns a 'NullType' for such a field when allowed depth is reached. 

It causes a few issues. E.g. a repeated field as in the following example 
results in a Array field with 'NullType'. Delta does not support null type in a 
complex type.

Actually `Array[NullType]` is not really useful anyway.

How about this fix: Drop the recursive field when the limit reached rather than 
using a NullType. 

The example below makes it clear:

Consider a recursive Protobuf:
```
message TreeNode {
  string value = 1;
  repeated TreeNode children = 2;
}
```

Allow depth of 2: 

```python
   df.select(
    'proto',
     messageName = 'TreeNode',
    options = { ... "recursive.fields.max.depth" : "2" }
  ).printSchema()
```
Schema looks like this:
```
root
 |-- from_protobuf(proto): struct (nullable = true)
 ||-- value: string (nullable = true)
 ||-- children: array (nullable = false)
 |||-- element: struct (containsNull = false)
 ||||-- value: string (nullable = true)
 ||||-- children: array (nullable = false)
 |||||-- element: struct (containsNull = false)
 ||||||-- value: string (nullable = true)
 ||||||-- children: array (nullable = false). [ === 
Proposed fix: Drop this field === ] 
 |||||||-- element: void (containsNull = false) [ === 
NOTICE 'void' HERE === ] 
```

When we try to write this to a delta table, we get an error:
```
AnalysisException:  Found nested NullType in column 
from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
writing NullType in complex types.
```
 
We could just drop the field 'element' when recursion depth is reached. It is 
simpler and does not need to deal with NullType. We are ignoring the value 
anyway. There is no use in keeping the field. 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42034) QueryExecutionListener and Observation API, df.observe do not work with `foreach` action.

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42034:


Assignee: Apache Spark

> QueryExecutionListener and Observation API, df.observe do not work with 
> `foreach` action.
> -
>
> Key: SPARK-42034
> URL: https://issues.apache.org/jira/browse/SPARK-42034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
> Environment: I test it locally and on YARN in cluster mode.
> Spark 3.3.1 and 3.2.2 and 3.1.1.
> Yarn 2.9.2 and 3.2.1.
>Reporter: Nick Hryhoriev
>Assignee: Apache Spark
>Priority: Major
>  Labels: sql-api
>
> Observation API, {{observe}} dataframe transformation, and custom 
> QueryExecutionListener.
> Do not work with {{foreach}} or {{foreachPartition actions.}}
> {{This is due to }}QueryExecutionListener functions do not trigger on queries 
> whose action is {{foreach}} or {{{}foreachPartition{}}}.
> But the Spark GUI SQL tab sees this query as SQL query and shows its query 
> plans and etc.
> here is the code to reproduce it:
> https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42034) QueryExecutionListener and Observation API, df.observe do not work with `foreach` action.

2023-02-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687465#comment-17687465
 ] 

Apache Spark commented on SPARK-42034:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/39976

> QueryExecutionListener and Observation API, df.observe do not work with 
> `foreach` action.
> -
>
> Key: SPARK-42034
> URL: https://issues.apache.org/jira/browse/SPARK-42034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
> Environment: I test it locally and on YARN in cluster mode.
> Spark 3.3.1 and 3.2.2 and 3.1.1.
> Yarn 2.9.2 and 3.2.1.
>Reporter: Nick Hryhoriev
>Priority: Major
>  Labels: sql-api
>
> Observation API, {{observe}} dataframe transformation, and custom 
> QueryExecutionListener.
> Do not work with {{foreach}} or {{foreachPartition actions.}}
> {{This is due to }}QueryExecutionListener functions do not trigger on queries 
> whose action is {{foreach}} or {{{}foreachPartition{}}}.
> But the Spark GUI SQL tab sees this query as SQL query and shows its query 
> plans and etc.
> here is the code to reproduce it:
> https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42034) QueryExecutionListener and Observation API, df.observe do not work with `foreach` action.

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42034:


Assignee: (was: Apache Spark)

> QueryExecutionListener and Observation API, df.observe do not work with 
> `foreach` action.
> -
>
> Key: SPARK-42034
> URL: https://issues.apache.org/jira/browse/SPARK-42034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
> Environment: I test it locally and on YARN in cluster mode.
> Spark 3.3.1 and 3.2.2 and 3.1.1.
> Yarn 2.9.2 and 3.2.1.
>Reporter: Nick Hryhoriev
>Priority: Major
>  Labels: sql-api
>
> Observation API, {{observe}} dataframe transformation, and custom 
> QueryExecutionListener.
> Do not work with {{foreach}} or {{foreachPartition actions.}}
> {{This is due to }}QueryExecutionListener functions do not trigger on queries 
> whose action is {{foreach}} or {{{}foreachPartition{}}}.
> But the Spark GUI SQL tab sees this query as SQL query and shows its query 
> plans and etc.
> here is the code to reproduce it:
> https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42405) Better documentation of array_insert function

2023-02-11 Thread Daniel Davies (Jira)

Daniel Davies created SPARK-42405:
-

 Summary: Better documentation of array_insert function
 Key: SPARK-42405
 URL: https://issues.apache.org/jira/browse/SPARK-42405
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.4.0
Reporter: Daniel Davies


See the following thread for discussion: 
https://github.com/apache/spark/pull/38867#discussion_r1097054656



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42235) Missing typing for pandas_udf

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42235:


Assignee: (was: Apache Spark)

> Missing typing for pandas_udf
> -
>
> Key: SPARK-42235
> URL: https://issues.apache.org/jira/browse/SPARK-42235
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Donggu Kang
>Priority: Minor
>
> The typing stub {{site-packages/pyspark/sql/pandas/functions.pyi}} has a list 
> of possible signatures of {{pandas_udf}}. It is missing a case
> {code:python}
> import pyspark.sql.functions as F
> # PySpark3's typing stub error
> @F.pandas_udf(
> returnType=LongType()
> )
> def my_udf(dummy_col: pd.Series) -> pd.Series:
> ...
> {code}
> The stub defined {{pandas_udf(f, returnType, functionType)}} but not 
> {{pandas_udf(f, returnType)}}. The official documentation recommends using a 
> return type hint instead of {{returnType}}.
>  
> {code:python}
> # defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType], functionType: PandasScalarUDFType, 
> ) -> UserDefinedFunctionLike: ...
>  
> # not defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType]) -> UserDefinedFunctionLike: ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42235) Missing typing for pandas_udf

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42235:


Assignee: Apache Spark

> Missing typing for pandas_udf
> -
>
> Key: SPARK-42235
> URL: https://issues.apache.org/jira/browse/SPARK-42235
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Donggu Kang
>Assignee: Apache Spark
>Priority: Minor
>
> The typing stub {{site-packages/pyspark/sql/pandas/functions.pyi}} has a list 
> of possible signatures of {{pandas_udf}}. It is missing a case
> {code:python}
> import pyspark.sql.functions as F
> # PySpark3's typing stub error
> @F.pandas_udf(
> returnType=LongType()
> )
> def my_udf(dummy_col: pd.Series) -> pd.Series:
> ...
> {code}
> The stub defined {{pandas_udf(f, returnType, functionType)}} but not 
> {{pandas_udf(f, returnType)}}. The official documentation recommends using a 
> return type hint instead of {{returnType}}.
>  
> {code:python}
> # defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType], functionType: PandasScalarUDFType, 
> ) -> UserDefinedFunctionLike: ...
>  
> # not defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType]) -> UserDefinedFunctionLike: ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42235) Missing typing for pandas_udf

2023-02-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687447#comment-17687447
 ] 

Apache Spark commented on SPARK-42235:
--

User 'wayneguow' has created a pull request for this issue:
https://github.com/apache/spark/pull/39974

> Missing typing for pandas_udf
> -
>
> Key: SPARK-42235
> URL: https://issues.apache.org/jira/browse/SPARK-42235
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Donggu Kang
>Priority: Minor
>
> The typing stub {{site-packages/pyspark/sql/pandas/functions.pyi}} has a list 
> of possible signatures of {{pandas_udf}}. It is missing a case
> {code:python}
> import pyspark.sql.functions as F
> # PySpark3's typing stub error
> @F.pandas_udf(
> returnType=LongType()
> )
> def my_udf(dummy_col: pd.Series) -> pd.Series:
> ...
> {code}
> The stub defined {{pandas_udf(f, returnType, functionType)}} but not 
> {{pandas_udf(f, returnType)}}. The official documentation recommends using a 
> return type hint instead of {{returnType}}.
>  
> {code:python}
> # defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType], functionType: PandasScalarUDFType, 
> ) -> UserDefinedFunctionLike: ...
>  
> # not defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType]) -> UserDefinedFunctionLike: ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42235) Missing typing for pandas_udf

2023-02-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687446#comment-17687446
 ] 

Apache Spark commented on SPARK-42235:
--

User 'wayneguow' has created a pull request for this issue:
https://github.com/apache/spark/pull/39974

> Missing typing for pandas_udf
> -
>
> Key: SPARK-42235
> URL: https://issues.apache.org/jira/browse/SPARK-42235
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Donggu Kang
>Priority: Minor
>
> The typing stub {{site-packages/pyspark/sql/pandas/functions.pyi}} has a list 
> of possible signatures of {{pandas_udf}}. It is missing a case
> {code:python}
> import pyspark.sql.functions as F
> # PySpark3's typing stub error
> @F.pandas_udf(
> returnType=LongType()
> )
> def my_udf(dummy_col: pd.Series) -> pd.Series:
> ...
> {code}
> The stub defined {{pandas_udf(f, returnType, functionType)}} but not 
> {{pandas_udf(f, returnType)}}. The official documentation recommends using a 
> return type hint instead of {{returnType}}.
>  
> {code:python}
> # defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType], functionType: PandasScalarUDFType, 
> ) -> UserDefinedFunctionLike: ...
>  
> # not defined
> @overload
> def pandas_udf( f: PandasScalarToScalarFunction, returnType: 
> Union[AtomicDataTypeOrString, ArrayType]) -> UserDefinedFunctionLike: ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39142) Type overloads in `pandas_udf`

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39142:


Assignee: (was: Apache Spark)

> Type overloads in `pandas_udf` 
> ---
>
> Key: SPARK-39142
> URL: https://issues.apache.org/jira/browse/SPARK-39142
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Philip Kahn
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It seems that the `returnType` in the type overloads for `pandas_udf` never 
> specify a generic for PySpark SQL types or explicitly list those types:
>  
> [https://github.com/apache/spark/blob/f84018a4810867afa84658fec76494aaae6d57fc/python/pyspark/sql/pandas/functions.pyi]
>  
> This results in static type checkers flagging the type of the decorated 
> functions (and their parameters) as incorrect, see 
> [https://github.com/microsoft/pylance-release/issues/2789] as an example.
>  
> For someone familiar with the code base, this should be a very fast patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39142) Type overloads in `pandas_udf`

2023-02-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39142:


Assignee: Apache Spark

> Type overloads in `pandas_udf` 
> ---
>
> Key: SPARK-39142
> URL: https://issues.apache.org/jira/browse/SPARK-39142
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Philip Kahn
>Assignee: Apache Spark
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It seems that the `returnType` in the type overloads for `pandas_udf` never 
> specify a generic for PySpark SQL types or explicitly list those types:
>  
> [https://github.com/apache/spark/blob/f84018a4810867afa84658fec76494aaae6d57fc/python/pyspark/sql/pandas/functions.pyi]
>  
> This results in static type checkers flagging the type of the decorated 
> functions (and their parameters) as incorrect, see 
> [https://github.com/microsoft/pylance-release/issues/2789] as an example.
>  
> For someone familiar with the code base, this should be a very fast patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39142) Type overloads in `pandas_udf`

2023-02-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687445#comment-17687445
 ] 

Apache Spark commented on SPARK-39142:
--

User 'wayneguow' has created a pull request for this issue:
https://github.com/apache/spark/pull/39974

> Type overloads in `pandas_udf` 
> ---
>
> Key: SPARK-39142
> URL: https://issues.apache.org/jira/browse/SPARK-39142
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Philip Kahn
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It seems that the `returnType` in the type overloads for `pandas_udf` never 
> specify a generic for PySpark SQL types or explicitly list those types:
>  
> [https://github.com/apache/spark/blob/f84018a4810867afa84658fec76494aaae6d57fc/python/pyspark/sql/pandas/functions.pyi]
>  
> This results in static type checkers flagging the type of the decorated 
> functions (and their parameters) as incorrect, see 
> [https://github.com/microsoft/pylance-release/issues/2789] as an example.
>  
> For someone familiar with the code base, this should be a very fast patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

39 matches

Mail list logo