date:20221227

[jira] [Commented] (SPARK-41579) Assign name to _LEGACY_ERROR_TEMP_1249

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652407#comment-17652407
 ] 

Apache Spark commented on SPARK-41579:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39260

> Assign name to _LEGACY_ERROR_TEMP_1249
> --
>
> Key: SPARK-41579
> URL: https://issues.apache.org/jira/browse/SPARK-41579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41579) Assign name to _LEGACY_ERROR_TEMP_1249

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41579:


Assignee: (was: Apache Spark)

> Assign name to _LEGACY_ERROR_TEMP_1249
> --
>
> Key: SPARK-41579
> URL: https://issues.apache.org/jira/browse/SPARK-41579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41579) Assign name to _LEGACY_ERROR_TEMP_1249

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41579:


Assignee: Apache Spark

> Assign name to _LEGACY_ERROR_TEMP_1249
> --
>
> Key: SPARK-41579
> URL: https://issues.apache.org/jira/browse/SPARK-41579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41739) CheckRule should not be executed when analyze view child

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41739:


Assignee: (was: Apache Spark)

> CheckRule should not be executed when analyze view child
> 
>
> Key: SPARK-41739
> URL: https://issues.apache.org/jira/browse/SPARK-41739
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Yi Zhu
>Priority: Major
>
> Currently when analyze view will call analysis for view's child, and will 
> call execute check rule. It's not correct



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41739) CheckRule should not be executed when analyze view child

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652402#comment-17652402
 ] 

Apache Spark commented on SPARK-41739:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/39259

> CheckRule should not be executed when analyze view child
> 
>
> Key: SPARK-41739
> URL: https://issues.apache.org/jira/browse/SPARK-41739
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Yi Zhu
>Priority: Major
>
> Currently when analyze view will call analysis for view's child, and will 
> call execute check rule. It's not correct



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41739) CheckRule should not be executed when analyze view child

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41739:


Assignee: Apache Spark

> CheckRule should not be executed when analyze view child
> 
>
> Key: SPARK-41739
> URL: https://issues.apache.org/jira/browse/SPARK-41739
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Yi Zhu
>Assignee: Apache Spark
>Priority: Major
>
> Currently when analyze view will call analysis for view's child, and will 
> call execute check rule. It's not correct



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41739) CheckRule should not be executed when analyze view child

2022-12-27 Thread Yi Zhu (Jira)

Yi Zhu created SPARK-41739:
--

 Summary: CheckRule should not be executed when analyze view child
 Key: SPARK-41739
 URL: https://issues.apache.org/jira/browse/SPARK-41739
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.1
Reporter: Yi Zhu


Currently when analyze view will call analysis for view's child, and will call 
execute check rule. It's not correct



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41572) Assign name to _LEGACY_ERROR_TEMP_2149

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41572:


Assignee: (was: Apache Spark)

> Assign name to _LEGACY_ERROR_TEMP_2149
> --
>
> Key: SPARK-41572
> URL: https://issues.apache.org/jira/browse/SPARK-41572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41572) Assign name to _LEGACY_ERROR_TEMP_2149

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41572:


Assignee: Apache Spark

> Assign name to _LEGACY_ERROR_TEMP_2149
> --
>
> Key: SPARK-41572
> URL: https://issues.apache.org/jira/browse/SPARK-41572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41572) Assign name to _LEGACY_ERROR_TEMP_2149

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652401#comment-17652401
 ] 

Apache Spark commented on SPARK-41572:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39258

> Assign name to _LEGACY_ERROR_TEMP_2149
> --
>
> Key: SPARK-41572
> URL: https://issues.apache.org/jira/browse/SPARK-41572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41049) Nondeterministic expressions have unstable values if they are children of CodegenFallback expressions

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652396#comment-17652396
 ] 

Apache Spark commented on SPARK-41049:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39248

> Nondeterministic expressions have unstable values if they are children of 
> CodegenFallback expressions
> -
>
> Key: SPARK-41049
> URL: https://issues.apache.org/jira/browse/SPARK-41049
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Guy Boo
>Priority: Major
>
> h2. Expectation
> For a given row, Nondeterministic expressions are expected to have stable 
> values.
> {code:scala}
> import org.apache.spark.sql.functions._
> val df = sparkContext.parallelize(1 to 5).toDF("x")
> val v1 = rand().*(lit(1)).cast(IntegerType)
> df.select(v1, v1).collect{code}
> Returns a set like this:
> |8777|8777|
> |1357|1357|
> |3435|3435|
> |9204|9204|
> |3870|3870|
> where both columns always have the same value, but what that value is changes 
> from row to row. This is different from the following:
> {code:scala}
> df.select(rand(), rand()).collect{code}
> In this case, because the rand() calls are distinct, the values in both 
> columns should be different.
> h2. Problem
> This expectation does not appear to be stable in the event that any 
> subsequent expression is a CodegenFallback. This program:
> {code:scala}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions._
> val sparkSession = SparkSession.builder().getOrCreate()
> val df = sparkSession.sparkContext.parallelize(1 to 5).toDF("x")
> val v1 = rand().*(lit(1)).cast(IntegerType)
> val v2 = to_csv(struct(v1.as("a"))) // to_csv is CodegenFallback
> df.select(v1, v1, v2, v2).collect {code}
> produces output like this:
> |8159|8159|8159|{color:#ff}2028{color}|
> |8320|8320|8320|{color:#ff}1640{color}|
> |7937|7937|7937|{color:#ff}769{color}|
> |436|436|436|{color:#ff}8924{color}|
> |8924|8924|2827|{color:#ff}2731{color}|
> Not sure why the first call via the CodegenFallback path should be correct 
> while subsequent calls aren't.
> h2. Workaround
> If the Nondeterministic expression is moved to a separate, earlier select() 
> call, so the CodegenFallback instead only refers to a column reference, then 
> the problem seems to go away. But this workaround may not be reliable if 
> optimization is ever able to restructure adjacent select()s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41733) Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern based pruning

2022-12-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-41733:


Assignee: Jungtaek Lim

> Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern 
> based pruning
> ---
>
> Key: SPARK-41733
> URL: https://issues.apache.org/jira/browse/SPARK-41733
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule ResolveWindowTime, 
> which leads to evaluate ResolveWindowTime unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41733) Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern based pruning

2022-12-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-41733.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39247
[https://github.com/apache/spark/pull/39247]

> Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern 
> based pruning
> ---
>
> Key: SPARK-41733
> URL: https://issues.apache.org/jira/browse/SPARK-41733
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.0
>
>
> We missed to apply tree-pattern based pruning in the rule ResolveWindowTime, 
> which leads to evaluate ResolveWindowTime unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41736) pyspark_types_to_proto_types should supports ArrayType

2022-12-27 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-41736.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39251
[https://github.com/apache/spark/pull/39251]

> pyspark_types_to_proto_types should supports ArrayType
> --
>
> Key: SPARK-41736
> URL: https://issues.apache.org/jira/browse/SPARK-41736
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> pyspark_types_to_proto_types doesn't support ArrayType now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41736) pyspark_types_to_proto_types should supports ArrayType

2022-12-27 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-41736:
-

Assignee: jiaan.geng

> pyspark_types_to_proto_types should supports ArrayType
> --
>
> Key: SPARK-41736
> URL: https://issues.apache.org/jira/browse/SPARK-41736
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> pyspark_types_to_proto_types doesn't support ArrayType now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41734) Wrap catalog messages into a parent message

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41734.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39252
[https://github.com/apache/spark/pull/39252]

> Wrap catalog messages into a parent message
> ---
>
> Key: SPARK-41734
> URL: https://issues.apache.org/jira/browse/SPARK-41734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41734) Wrap catalog messages into a parent message

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41734:


Assignee: Hyukjin Kwon

> Wrap catalog messages into a parent message
> ---
>
> Key: SPARK-41734
> URL: https://issues.apache.org/jira/browse/SPARK-41734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41729) Assign name to _LEGACY_ERROR_TEMP_0011

2022-12-27 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41729.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39235
[https://github.com/apache/spark/pull/39235]

>  Assign name to _LEGACY_ERROR_TEMP_0011
> ---
>
> Key: SPARK-41729
> URL: https://issues.apache.org/jira/browse/SPARK-41729
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41729) Assign name to _LEGACY_ERROR_TEMP_0011

2022-12-27 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41729:


Assignee: Yang Jie

>  Assign name to _LEGACY_ERROR_TEMP_0011
> ---
>
> Key: SPARK-41729
> URL: https://issues.apache.org/jira/browse/SPARK-41729
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41738) Client ID should be mixed into SparkSession cache

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41738:


Assignee: (was: Apache Spark)

> Client ID should be mixed into SparkSession cache
> -
>
> Key: SPARK-41738
> URL: https://issues.apache.org/jira/browse/SPARK-41738
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41738) Client ID should be mixed into SparkSession cache

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41738:


Assignee: Apache Spark

> Client ID should be mixed into SparkSession cache
> -
>
> Key: SPARK-41738
> URL: https://issues.apache.org/jira/browse/SPARK-41738
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41738) Client ID should be mixed into SparkSession cache

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652375#comment-17652375
 ] 

Apache Spark commented on SPARK-41738:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/39256

> Client ID should be mixed into SparkSession cache
> -
>
> Key: SPARK-41738
> URL: https://issues.apache.org/jira/browse/SPARK-41738
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41738) Client ID should be mixed into SparkSession cache

2022-12-27 Thread Martin Grund (Jira)

Martin Grund created SPARK-41738:


 Summary: Client ID should be mixed into SparkSession cache
 Key: SPARK-41738
 URL: https://issues.apache.org/jira/browse/SPARK-41738
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Martin Grund






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41737) Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652373#comment-17652373
 ] 

Apache Spark commented on SPARK-41737:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39254

> Implement `GroupedData.{min, max, avg, sum}`
> 
>
> Key: SPARK-41737
> URL: https://issues.apache.org/jira/browse/SPARK-41737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41737) Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41737:


Assignee: (was: Apache Spark)

> Implement `GroupedData.{min, max, avg, sum}`
> 
>
> Key: SPARK-41737
> URL: https://issues.apache.org/jira/browse/SPARK-41737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41737) Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41737:


Assignee: Apache Spark

> Implement `GroupedData.{min, max, avg, sum}`
> 
>
> Key: SPARK-41737
> URL: https://issues.apache.org/jira/browse/SPARK-41737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41737) Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652372#comment-17652372
 ] 

Apache Spark commented on SPARK-41737:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39254

> Implement `GroupedData.{min, max, avg, sum}`
> 
>
> Key: SPARK-41737
> URL: https://issues.apache.org/jira/browse/SPARK-41737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41737) Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652371#comment-17652371
 ] 

Apache Spark commented on SPARK-41737:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39254

> Implement `GroupedData.{min, max, avg, sum}`
> 
>
> Key: SPARK-41737
> URL: https://issues.apache.org/jira/browse/SPARK-41737
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41333) Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41333:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark
> -
>
> Key: SPARK-41333
> URL: https://issues.apache.org/jira/browse/SPARK-41333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41333) Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41333:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark
> -
>
> Key: SPARK-41333
> URL: https://issues.apache.org/jira/browse/SPARK-41333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41333) Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652370#comment-17652370
 ] 

Apache Spark commented on SPARK-41333:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39254

> Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark
> -
>
> Key: SPARK-41333
> URL: https://issues.apache.org/jira/browse/SPARK-41333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41333) Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652369#comment-17652369
 ] 

Apache Spark commented on SPARK-41333:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39254

> Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark
> -
>
> Key: SPARK-41333
> URL: https://issues.apache.org/jira/browse/SPARK-41333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41737) Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41737:
-

 Summary: Implement `GroupedData.{min, max, avg, sum}`
 Key: SPARK-41737
 URL: https://issues.apache.org/jira/browse/SPARK-41737
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-41732.
--
Fix Version/s: 3.4.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

Issue resolved via https://github.com/apache/spark/pull/39245

> Session window: analysis rule "SessionWindowing" does not apply tree-pattern 
> based pruning
> --
>
> Key: SPARK-41732
> URL: https://issues.apache.org/jira/browse/SPARK-41732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.2, 3.3.1, 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.0
>
>
> We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
> which leads to evaluate SessionWindowing unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652363#comment-17652363
 ] 

Apache Spark commented on SPARK-41732:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39253

> Session window: analysis rule "SessionWindowing" does not apply tree-pattern 
> based pruning
> --
>
> Key: SPARK-41732
> URL: https://issues.apache.org/jira/browse/SPARK-41732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.2, 3.3.1, 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
> which leads to evaluate SessionWindowing unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-41069) Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile`

2022-12-27 Thread jiaan.geng (Jira)



[ https://issues.apache.org/jira/browse/SPARK-41069 ]


jiaan.geng deleted comment on SPARK-41069:


was (Author: beliefer):
I will try this.

> Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile`
> 
>
> Key: SPARK-41069
> URL: https://issues.apache.org/jira/browse/SPARK-41069
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41734) Wrap catalog messages into a parent message

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652357#comment-17652357
 ] 

Apache Spark commented on SPARK-41734:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39252

> Wrap catalog messages into a parent message
> ---
>
> Key: SPARK-41734
> URL: https://issues.apache.org/jira/browse/SPARK-41734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41734) Wrap catalog messages into a parent message

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41734:


Assignee: (was: Apache Spark)

> Wrap catalog messages into a parent message
> ---
>
> Key: SPARK-41734
> URL: https://issues.apache.org/jira/browse/SPARK-41734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41734) Wrap catalog messages into a parent message

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41734:


Assignee: Apache Spark

> Wrap catalog messages into a parent message
> ---
>
> Key: SPARK-41734
> URL: https://issues.apache.org/jira/browse/SPARK-41734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41734) Wrap catalog messages into a parent message

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652356#comment-17652356
 ] 

Apache Spark commented on SPARK-41734:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39252

> Wrap catalog messages into a parent message
> ---
>
> Key: SPARK-41734
> URL: https://issues.apache.org/jira/browse/SPARK-41734
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41736) pyspark_types_to_proto_types should supports ArrayType

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652354#comment-17652354
 ] 

Apache Spark commented on SPARK-41736:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39251

> pyspark_types_to_proto_types should supports ArrayType
> --
>
> Key: SPARK-41736
> URL: https://issues.apache.org/jira/browse/SPARK-41736
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> pyspark_types_to_proto_types doesn't support ArrayType now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41069) Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile`

2022-12-27 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652355#comment-17652355
 ] 

jiaan.geng commented on SPARK-41069:


I will try this.

> Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile`
> 
>
> Key: SPARK-41069
> URL: https://issues.apache.org/jira/browse/SPARK-41069
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41736) pyspark_types_to_proto_types should supports ArrayType

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652353#comment-17652353
 ] 

Apache Spark commented on SPARK-41736:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39251

> pyspark_types_to_proto_types should supports ArrayType
> --
>
> Key: SPARK-41736
> URL: https://issues.apache.org/jira/browse/SPARK-41736
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> pyspark_types_to_proto_types doesn't support ArrayType now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41736) pyspark_types_to_proto_types should supports ArrayType

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41736:


Assignee: (was: Apache Spark)

> pyspark_types_to_proto_types should supports ArrayType
> --
>
> Key: SPARK-41736
> URL: https://issues.apache.org/jira/browse/SPARK-41736
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> pyspark_types_to_proto_types doesn't support ArrayType now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41736) pyspark_types_to_proto_types should supports ArrayType

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41736:


Assignee: Apache Spark

> pyspark_types_to_proto_types should supports ArrayType
> --
>
> Key: SPARK-41736
> URL: https://issues.apache.org/jira/browse/SPARK-41736
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> pyspark_types_to_proto_types doesn't support ArrayType now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41736) pyspark_types_to_proto_types should supports ArrayType

2022-12-27 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-41736:
--

 Summary: pyspark_types_to_proto_types should supports ArrayType
 Key: SPARK-41736
 URL: https://issues.apache.org/jira/browse/SPARK-41736
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: jiaan.geng


pyspark_types_to_proto_types doesn't support ArrayType now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41735) Any SparkThrowable (with an error class) not in `error-classes.json` is masked in `SQLExecution.withNewExecutionId` and end-user will see `org.apache.spark.SparkExceptio

2022-12-27 Thread Allison Portis (Jira)

Allison Portis created SPARK-41735:
--

 Summary: Any SparkThrowable (with an error class) not in 
`error-classes.json` is masked in `SQLExecution.withNewExecutionId` and 
end-user will see `org.apache.spark.SparkException: [INTERNAL_ERROR]` 
 Key: SPARK-41735
 URL: https://issues.apache.org/jira/browse/SPARK-41735
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Allison Portis


This change 
[here|https://github.com/apache/spark/pull/38302/files#diff-fdd1e9e26aa1ba9d1cc923ee7c84a1935dcc285502330a471f1ade7f3ad08bf9]
 means that any seen error is passed to `SparkThrowableHelper.getMessage(...)`. 
Any SparkThrowable with an error class (for example, if a connector uses the 
spark error format i.e. see `ErrorClassesJsonReader`) will be masked as 
{code:java}
org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error class 
'SOME_ERROR_CLASS'{code}
in `SparkThrowableHelper.getMessage` since 
`errorReader.getMessageTemplate(errorClass)` will fail for the error class not 
defined in `error-classes.json`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41654) Enable doctests in pyspark.sql.connect.window

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41654.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39225
[https://github.com/apache/spark/pull/39225]

> Enable doctests in pyspark.sql.connect.window
> -
>
> Key: SPARK-41654
> URL: https://issues.apache.org/jira/browse/SPARK-41654
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41654) Enable doctests in pyspark.sql.connect.window

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41654:


Assignee: Hyukjin Kwon

> Enable doctests in pyspark.sql.connect.window
> -
>
> Key: SPARK-41654
> URL: https://issues.apache.org/jira/browse/SPARK-41654
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41655) Enable doctests in pyspark.sql.connect.column

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41655:


Assignee: (was: Apache Spark)

> Enable doctests in pyspark.sql.connect.column
> -
>
> Key: SPARK-41655
> URL: https://issues.apache.org/jira/browse/SPARK-41655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41655) Enable doctests in pyspark.sql.connect.column

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41655:


Assignee: Apache Spark

> Enable doctests in pyspark.sql.connect.column
> -
>
> Key: SPARK-41655
> URL: https://issues.apache.org/jira/browse/SPARK-41655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41655) Enable doctests in pyspark.sql.connect.column

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652324#comment-17652324
 ] 

Apache Spark commented on SPARK-41655:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39249

> Enable doctests in pyspark.sql.connect.column
> -
>
> Key: SPARK-41655
> URL: https://issues.apache.org/jira/browse/SPARK-41655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41655) Enable doctests in pyspark.sql.connect.column

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652325#comment-17652325
 ] 

Apache Spark commented on SPARK-41655:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39249

> Enable doctests in pyspark.sql.connect.column
> -
>
> Key: SPARK-41655
> URL: https://issues.apache.org/jira/browse/SPARK-41655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated

2022-12-27 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-41469.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39011
[https://github.com/apache/spark/pull/39011]

> Task rerun on decommissioned executor can be avoided if shuffle data has 
> migrated
> -
>
> Key: SPARK-41469
> URL: https://issues.apache.org/jira/browse/SPARK-41469
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, we will always rerun a finished shuffle map task if it once runs 
> the lost executor. However, in the case of the executor loss is caused by 
> decommission, the shuffle data might be migrated so that task doesn't need to 
> rerun.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated

2022-12-27 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-41469:
---

Assignee: wuyi

> Task rerun on decommissioned executor can be avoided if shuffle data has 
> migrated
> -
>
> Key: SPARK-41469
> URL: https://issues.apache.org/jira/browse/SPARK-41469
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> Currently, we will always rerun a finished shuffle map task if it once runs 
> the lost executor. However, in the case of the executor loss is caused by 
> decommission, the shuffle data might be migrated so that task doesn't need to 
> rerun.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41714) Update maven-checkstyle-plugin from 3.1.2 to 3.2.0

2022-12-27 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-41714:


Assignee: BingKun Pan

> Update maven-checkstyle-plugin from 3.1.2 to 3.2.0
> --
>
> Key: SPARK-41714
> URL: https://issues.apache.org/jira/browse/SPARK-41714
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41714) Update maven-checkstyle-plugin from 3.1.2 to 3.2.0

2022-12-27 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-41714.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39218
[https://github.com/apache/spark/pull/39218]

> Update maven-checkstyle-plugin from 3.1.2 to 3.2.0
> --
>
> Key: SPARK-41714
> URL: https://issues.apache.org/jira/browse/SPARK-41714
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41714) Update maven-checkstyle-plugin from 3.1.2 to 3.2.0

2022-12-27 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-41714:
-
Priority: Trivial  (was: Minor)

> Update maven-checkstyle-plugin from 3.1.2 to 3.2.0
> --
>
> Key: SPARK-41714
> URL: https://issues.apache.org/jira/browse/SPARK-41714
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41731) Implement the column accessor

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41731.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39241
[https://github.com/apache/spark/pull/39241]

> Implement the column accessor
> -
>
> Key: SPARK-41731
> URL: https://issues.apache.org/jira/browse/SPARK-41731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41731) Implement the column accessor

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41731:


Assignee: Ruifeng Zheng

> Implement the column accessor
> -
>
> Key: SPARK-41731
> URL: https://issues.apache.org/jira/browse/SPARK-41731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41065) Implement `DataFrame.freqItems ` and `DataFrame.stat.freqItems `

2022-12-27 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652239#comment-17652239
 ] 

jiaan.geng commented on SPARK-41065:


I'll try this.

> Implement `DataFrame.freqItems ` and `DataFrame.stat.freqItems `
> 
>
> Key: SPARK-41065
> URL: https://issues.apache.org/jira/browse/SPARK-41065
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-41068) Implement `DataFrame.stat.corr`

2022-12-27 Thread jiaan.geng (Jira)



[ https://issues.apache.org/jira/browse/SPARK-41068 ]


jiaan.geng deleted comment on SPARK-41068:


was (Author: beliefer):
I will try.

> Implement `DataFrame.stat.corr`
> ---
>
> Key: SPARK-41068
> URL: https://issues.apache.org/jira/browse/SPARK-41068
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-41067) Implement `DataFrame.stat.cov`

2022-12-27 Thread jiaan.geng (Jira)



[ https://issues.apache.org/jira/browse/SPARK-41067 ]


jiaan.geng deleted comment on SPARK-41067:


was (Author: beliefer):
I will try it !

> Implement `DataFrame.stat.cov`
> --
>
> Key: SPARK-41067
> URL: https://issues.apache.org/jira/browse/SPARK-41067
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41734) Wrap catalog messages into a parent message

2022-12-27 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-41734:


 Summary: Wrap catalog messages into a parent message
 Key: SPARK-41734
 URL: https://issues.apache.org/jira/browse/SPARK-41734
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41733) Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41733:


Assignee: Apache Spark

> Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern 
> based pruning
> ---
>
> Key: SPARK-41733
> URL: https://issues.apache.org/jira/browse/SPARK-41733
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule ResolveWindowTime, 
> which leads to evaluate ResolveWindowTime unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41733) Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652236#comment-17652236
 ] 

Apache Spark commented on SPARK-41733:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39247

> Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern 
> based pruning
> ---
>
> Key: SPARK-41733
> URL: https://issues.apache.org/jira/browse/SPARK-41733
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule ResolveWindowTime, 
> which leads to evaluate ResolveWindowTime unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41067) Implement `DataFrame.stat.cov`

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652235#comment-17652235
 ] 

Apache Spark commented on SPARK-41067:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39246

> Implement `DataFrame.stat.cov`
> --
>
> Key: SPARK-41067
> URL: https://issues.apache.org/jira/browse/SPARK-41067
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41733) Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41733:


Assignee: (was: Apache Spark)

> Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern 
> based pruning
> ---
>
> Key: SPARK-41733
> URL: https://issues.apache.org/jira/browse/SPARK-41733
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule ResolveWindowTime, 
> which leads to evaluate ResolveWindowTime unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41067) Implement `DataFrame.stat.cov`

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41067:


Assignee: Ruifeng Zheng  (was: Apache Spark)

> Implement `DataFrame.stat.cov`
> --
>
> Key: SPARK-41067
> URL: https://issues.apache.org/jira/browse/SPARK-41067
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41067) Implement `DataFrame.stat.cov`

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652234#comment-17652234
 ] 

Apache Spark commented on SPARK-41067:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39246

> Implement `DataFrame.stat.cov`
> --
>
> Key: SPARK-41067
> URL: https://issues.apache.org/jira/browse/SPARK-41067
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41067) Implement `DataFrame.stat.cov`

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41067:


Assignee: Apache Spark  (was: Ruifeng Zheng)

> Implement `DataFrame.stat.cov`
> --
>
> Key: SPARK-41067
> URL: https://issues.apache.org/jira/browse/SPARK-41067
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652231#comment-17652231
 ] 

Apache Spark commented on SPARK-41732:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39245

> Session window: analysis rule "SessionWindowing" does not apply tree-pattern 
> based pruning
> --
>
> Key: SPARK-41732
> URL: https://issues.apache.org/jira/browse/SPARK-41732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.2, 3.3.1, 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
> which leads to evaluate SessionWindowing unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41732:


Assignee: (was: Apache Spark)

> Session window: analysis rule "SessionWindowing" does not apply tree-pattern 
> based pruning
> --
>
> Key: SPARK-41732
> URL: https://issues.apache.org/jira/browse/SPARK-41732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.2, 3.3.1, 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
> which leads to evaluate SessionWindowing unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41732:


Assignee: Apache Spark

> Session window: analysis rule "SessionWindowing" does not apply tree-pattern 
> based pruning
> --
>
> Key: SPARK-41732
> URL: https://issues.apache.org/jira/browse/SPARK-41732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.2, 3.3.1, 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
> which leads to evaluate SessionWindowing unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652230#comment-17652230
 ] 

Apache Spark commented on SPARK-41732:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/39245

> Session window: analysis rule "SessionWindowing" does not apply tree-pattern 
> based pruning
> --
>
> Key: SPARK-41732
> URL: https://issues.apache.org/jira/browse/SPARK-41732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.2, 3.3.1, 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
> which leads to evaluate SessionWindowing unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41733) Session window: analysis rule "ResolveWindowTime" does not apply tree-pattern based pruning

2022-12-27 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-41733:


 Summary: Session window: analysis rule "ResolveWindowTime" does 
not apply tree-pattern based pruning
 Key: SPARK-41733
 URL: https://issues.apache.org/jira/browse/SPARK-41733
 Project: Spark
  Issue Type: Bug
  Components: SQL, Structured Streaming
Affects Versions: 3.4.0
Reporter: Jungtaek Lim


We missed to apply tree-pattern based pruning in the rule ResolveWindowTime, 
which leads to evaluate ResolveWindowTime unnecessarily to the other logical 
nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1765#comment-1765
 ] 

Jungtaek Lim commented on SPARK-41732:
--

Will submit a PR sooner.

> Session window: analysis rule "SessionWindowing" does not apply tree-pattern 
> based pruning
> --
>
> Key: SPARK-41732
> URL: https://issues.apache.org/jira/browse/SPARK-41732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.2, 3.3.1, 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
> which leads to evaluate SessionWindowing unnecessarily to the other logical 
> nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41732) Session window: analysis rule "SessionWindowing" does not apply tree-pattern based pruning

2022-12-27 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-41732:


 Summary: Session window: analysis rule "SessionWindowing" does not 
apply tree-pattern based pruning
 Key: SPARK-41732
 URL: https://issues.apache.org/jira/browse/SPARK-41732
 Project: Spark
  Issue Type: Bug
  Components: SQL, Structured Streaming
Affects Versions: 3.3.1, 3.2.2, 3.4.0
Reporter: Jungtaek Lim


We missed to apply tree-pattern based pruning in the rule SessionWindowing, 
which leads to evaluate SessionWindowing unnecessarily to the other logical 
nodes multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652219#comment-17652219
 ] 

Apache Spark commented on SPARK-41643:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39244

> Deduplicate docstrings in pyspark.sql.connect.column
> 
>
> Key: SPARK-41643
> URL: https://issues.apache.org/jira/browse/SPARK-41643
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652220#comment-17652220
 ] 

Apache Spark commented on SPARK-41643:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39244

> Deduplicate docstrings in pyspark.sql.connect.column
> 
>
> Key: SPARK-41643
> URL: https://issues.apache.org/jira/browse/SPARK-41643
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41697) Enable test_df_show, test_drop, test_dropna, test_toDF_with_schema_string and test_with_columns_renamed

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652213#comment-17652213
 ] 

Apache Spark commented on SPARK-41697:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39243

> Enable test_df_show, test_drop, test_dropna, test_toDF_with_schema_string and 
> test_with_columns_renamed
> ---
>
> Key: SPARK-41697
> URL: https://issues.apache.org/jira/browse/SPARK-41697
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> These tests pass now. Should enable them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41649) Deduplicate docstrings in pyspark.sql.connect.window

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652211#comment-17652211
 ] 

Apache Spark commented on SPARK-41649:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39242

> Deduplicate docstrings in pyspark.sql.connect.window
> 
>
> Key: SPARK-41649
> URL: https://issues.apache.org/jira/browse/SPARK-41649
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41731) Implement the column accessor

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41731:


Assignee: (was: Apache Spark)

> Implement the column accessor
> -
>
> Key: SPARK-41731
> URL: https://issues.apache.org/jira/browse/SPARK-41731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41731) Implement the column accessor

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41731:


Assignee: Apache Spark

> Implement the column accessor
> -
>
> Key: SPARK-41731
> URL: https://issues.apache.org/jira/browse/SPARK-41731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41731) Implement the column accessor

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652210#comment-17652210
 ] 

Apache Spark commented on SPARK-41731:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39241

> Implement the column accessor
> -
>
> Key: SPARK-41731
> URL: https://issues.apache.org/jira/browse/SPARK-41731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41731) Implement the column accessor

2022-12-27 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41731:
-

 Summary: Implement the column accessor
 Key: SPARK-41731
 URL: https://issues.apache.org/jira/browse/SPARK-41731
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41728) Implement `unwrap_udt` function

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41728:


Assignee: Ruifeng Zheng

> Implement `unwrap_udt` function
> ---
>
> Key: SPARK-41728
> URL: https://issues.apache.org/jira/browse/SPARK-41728
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41728) Implement `unwrap_udt` function

2022-12-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41728.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39234
[https://github.com/apache/spark/pull/39234]

> Implement `unwrap_udt` function
> ---
>
> Key: SPARK-41728
> URL: https://issues.apache.org/jira/browse/SPARK-41728
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41730) `min` fails on the minimal timestamp

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41730:


Assignee: Max Gekk  (was: Apache Spark)

> `min` fails on the minimal timestamp
> 
>
> Key: SPARK-41730
> URL: https://issues.apache.org/jira/browse/SPARK-41730
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The code below demonstrates the issue:
> {code:python}
> >>> from datetime import datetime, timezone
> >>> from pyspark.sql.types import TimestampType
> >>> from pyspark.sql import functions as F
> >>> ts = spark.createDataFrame([datetime(1, 1, 1, 0, 0, 0, 0, 
> >>> tzinfo=timezone.utc)], TimestampType()).toDF("test_column")
> >>> ts.select(F.min('test_column')).first()[0]
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2762, in first
> return self.head()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2738, in head
> rs = self.head(1)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2740, in head
> return self.take(n)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1297, in take
> return self.limit(num).collect()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1198, in collect
> return list(_load_from_socket(sock_info, 
> BatchedSerializer(CPickleSerializer(
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 152, in load_stream
> yield self._read_with_length(stream)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 174, in _read_with_length
> return self.loads(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 472, in loads
> return cloudpickle.loads(obj, encoding=encoding)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 2010, in 
> return lambda *a: dataType.fromInternal(a)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1018, in fromInternal
> values = [
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1019, in 
> f.fromInternal(v) if c else v
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 667, in fromInternal
> return self.dataType.fromInternal(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 279, in fromInternal
> return datetime.datetime.fromtimestamp(ts // 
> 100).replace(microsecond=ts % 100)
> ValueError: year 0 is out of range
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41730) `min` fails on the minimal timestamp

2022-12-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41730:


Assignee: Apache Spark  (was: Max Gekk)

> `min` fails on the minimal timestamp
> 
>
> Key: SPARK-41730
> URL: https://issues.apache.org/jira/browse/SPARK-41730
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The code below demonstrates the issue:
> {code:python}
> >>> from datetime import datetime, timezone
> >>> from pyspark.sql.types import TimestampType
> >>> from pyspark.sql import functions as F
> >>> ts = spark.createDataFrame([datetime(1, 1, 1, 0, 0, 0, 0, 
> >>> tzinfo=timezone.utc)], TimestampType()).toDF("test_column")
> >>> ts.select(F.min('test_column')).first()[0]
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2762, in first
> return self.head()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2738, in head
> rs = self.head(1)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2740, in head
> return self.take(n)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1297, in take
> return self.limit(num).collect()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1198, in collect
> return list(_load_from_socket(sock_info, 
> BatchedSerializer(CPickleSerializer(
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 152, in load_stream
> yield self._read_with_length(stream)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 174, in _read_with_length
> return self.loads(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 472, in loads
> return cloudpickle.loads(obj, encoding=encoding)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 2010, in 
> return lambda *a: dataType.fromInternal(a)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1018, in fromInternal
> values = [
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1019, in 
> f.fromInternal(v) if c else v
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 667, in fromInternal
> return self.dataType.fromInternal(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 279, in fromInternal
> return datetime.datetime.fromtimestamp(ts // 
> 100).replace(microsecond=ts % 100)
> ValueError: year 0 is out of range
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41730) `min` fails on the minimal timestamp

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652199#comment-17652199
 ] 

Apache Spark commented on SPARK-41730:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/39239

> `min` fails on the minimal timestamp
> 
>
> Key: SPARK-41730
> URL: https://issues.apache.org/jira/browse/SPARK-41730
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The code below demonstrates the issue:
> {code:python}
> >>> from datetime import datetime, timezone
> >>> from pyspark.sql.types import TimestampType
> >>> from pyspark.sql import functions as F
> >>> ts = spark.createDataFrame([datetime(1, 1, 1, 0, 0, 0, 0, 
> >>> tzinfo=timezone.utc)], TimestampType()).toDF("test_column")
> >>> ts.select(F.min('test_column')).first()[0]
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2762, in first
> return self.head()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2738, in head
> rs = self.head(1)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2740, in head
> return self.take(n)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1297, in take
> return self.limit(num).collect()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1198, in collect
> return list(_load_from_socket(sock_info, 
> BatchedSerializer(CPickleSerializer(
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 152, in load_stream
> yield self._read_with_length(stream)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 174, in _read_with_length
> return self.loads(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 472, in loads
> return cloudpickle.loads(obj, encoding=encoding)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 2010, in 
> return lambda *a: dataType.fromInternal(a)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1018, in fromInternal
> values = [
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1019, in 
> f.fromInternal(v) if c else v
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 667, in fromInternal
> return self.dataType.fromInternal(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 279, in fromInternal
> return datetime.datetime.fromtimestamp(ts // 
> 100).replace(microsecond=ts % 100)
> ValueError: year 0 is out of range
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41440) Implement DataFrame.randomSplit

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652198#comment-17652198
 ] 

Apache Spark commented on SPARK-41440:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39240

> Implement DataFrame.randomSplit
> ---
>
> Key: SPARK-41440
> URL: https://issues.apache.org/jira/browse/SPARK-41440
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41440) Implement DataFrame.randomSplit

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652197#comment-17652197
 ] 

Apache Spark commented on SPARK-41440:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/39240

> Implement DataFrame.randomSplit
> ---
>
> Key: SPARK-41440
> URL: https://issues.apache.org/jira/browse/SPARK-41440
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41730) `min` fails on the minimal timestamp

2022-12-27 Thread Max Gekk (Jira)

Max Gekk created SPARK-41730:


 Summary: `min` fails on the minimal timestamp
 Key: SPARK-41730
 URL: https://issues.apache.org/jira/browse/SPARK-41730
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


The code below demonstrates the issue:

{code:python}
>>> from datetime import datetime, timezone
>>> from pyspark.sql.types import TimestampType
>>> from pyspark.sql import functions as F
>>> ts = spark.createDataFrame([datetime(1, 1, 1, 0, 0, 0, 0, 
>>> tzinfo=timezone.utc)], TimestampType()).toDF("test_column")
>>> ts.select(F.min('test_column')).first()[0]
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
line 2762, in first
return self.head()
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
line 2738, in head
rs = self.head(1)
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
line 2740, in head
return self.take(n)
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
line 1297, in take
return self.limit(num).collect()
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
line 1198, in collect
return list(_load_from_socket(sock_info, 
BatchedSerializer(CPickleSerializer(
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", line 
152, in load_stream
yield self._read_with_length(stream)
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", line 
174, in _read_with_length
return self.loads(obj)
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", line 
472, in loads
return cloudpickle.loads(obj, encoding=encoding)
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
2010, in 
return lambda *a: dataType.fromInternal(a)
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
1018, in fromInternal
values = [
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
1019, in 
f.fromInternal(v) if c else v
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
667, in fromInternal
return self.dataType.fromInternal(obj)
  File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
279, in fromInternal
return datetime.datetime.fromtimestamp(ts // 
100).replace(microsecond=ts % 100)
ValueError: year 0 is out of range
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41649) Deduplicate docstrings in pyspark.sql.connect.window

2022-12-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652192#comment-17652192
 ] 

Apache Spark commented on SPARK-41649:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/39238

> Deduplicate docstrings in pyspark.sql.connect.window
> 
>
> Key: SPARK-41649
> URL: https://issues.apache.org/jira/browse/SPARK-41649
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

95 matches

Mail list logo