[jira] [Assigned] (SPARK-41414) Implement date/timestamp functions

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41414:


Assignee: Apache Spark

> Implement date/timestamp functions
> --
>
> Key: SPARK-41414
> URL: https://issues.apache.org/jira/browse/SPARK-41414
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41414) Implement date/timestamp functions

2022-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41414:
-
Summary: Implement date/timestamp functions  (was: Implement data/timestamp 
functions)

> Implement date/timestamp functions
> --
>
> Key: SPARK-41414
> URL: https://issues.apache.org/jira/browse/SPARK-41414
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41410.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38943
[https://github.com/apache/spark/pull/38943]

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41410:
-

Assignee: Dongjoon Hyun

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41414) Implement data/timestamp functions

2022-12-06 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-41414:


 Summary: Implement data/timestamp functions
 Key: SPARK-41414
 URL: https://issues.apache.org/jira/browse/SPARK-41414
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation

2022-12-06 Thread Wing Yew Poon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644041#comment-17644041
 ] 

Wing Yew Poon commented on SPARK-38918:
---

It seems that this is fixed in 3.2.2 
([7c0b9e6e|https://github.com/apache/spark/commit/7c0b9e6e6f680db45c1e2602b85753d9b521bb58]),
 but for some reason, 3.2.2 is not in the Fixed Version/s. Can we please 
correct this?
Probably because of this, this issue does not appear in 
https://spark.apache.org/releases/spark-release-3-2-2.html.

> Nested column pruning should filter out attributes that do not belong to the 
> current relation
> -
>
> Key: SPARK-38918
> URL: https://issues.apache.org/jira/browse/SPARK-38918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.1.3, 3.0.4, 3.3.0, 3.4.0
>
>
> `SchemaPruning` currently does not check if the root field of a nested column 
> belongs to the current relation. This can happen when the filter contains 
> correlated subqueries, where the children field can contain attributes from 
> both the inner and the outer query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread Chao Sun (Jira)
Chao Sun created SPARK-41413:


 Summary: Storage-Partitioned Join should avoid shuffle when 
partition keys mismatch, but join expressions are compatible
 Key: SPARK-41413
 URL: https://issues.apache.org/jira/browse/SPARK-41413
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.1
Reporter: Chao Sun


Currently when checking whether two sides of a Storage Partitioned Join are 
compatible, we requires both the partition expressions as well as the partition 
keys are compatible. However, this condition could be relaxed so that we only 
require the former. In the case that the latter is not compatible, we can 
calculate a common superset of keys and push down the information to both sides 
of the join, and use empty partitions for the missing keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41412) Implement `Cast`

2022-12-06 Thread Rui Wang (Jira)
Rui Wang created SPARK-41412:


 Summary: Implement `Cast`
 Key: SPARK-41412
 URL: https://issues.apache.org/jira/browse/SPARK-41412
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-41411:

Affects Version/s: (was: 3.3.2)

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41411 ]


Wei Liu deleted comment on SPARK-41411:
-

was (Author: JIRAUSER295948):
PR: https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644008#comment-17644008
 ] 

Apache Spark commented on SPARK-41411:
--

User 'WweiL' has created a pull request for this issue:
https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644007#comment-17644007
 ] 

Apache Spark commented on SPARK-41411:
--

User 'WweiL' has created a pull request for this issue:
https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41411:


Assignee: (was: Apache Spark)

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41411:


Assignee: Apache Spark

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Assignee: Apache Spark
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644006#comment-17644006
 ] 

Wei Liu commented on SPARK-41411:
-

PR: https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)
Wei Liu created SPARK-41411:
---

 Summary: Multi-Stateful Operator watermark support bug fix
 Key: SPARK-41411
 URL: https://issues.apache.org/jira/browse/SPARK-41411
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.3.2, 3.4.0
Reporter: Wei Liu


A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
causes logic errrors. With the bug, the query would work with no error reported 
but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41369) Refactor connect directory structure

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643976#comment-17643976
 ] 

Apache Spark commented on SPARK-41369:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/38944

> Refactor connect directory structure
> 
>
> Key: SPARK-41369
> URL: https://issues.apache.org/jira/browse/SPARK-41369
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Currently, `spark/connector/connect/` is a single module that contains both 
> the "server"/service as well as the protobuf definitions.
> However, this module can be split into multiple modules - "server" and 
> "common". This brings the advantage of separating out the protobuf generation 
> from the core "server" module for efficient reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41410:


Assignee: (was: Apache Spark)

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643973#comment-17643973
 ] 

Apache Spark commented on SPARK-41410:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38943

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41410:


Assignee: Apache Spark

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-41410:
-

 Summary: Support PVC-oriented executor pod allocation
 Key: SPARK-41410
 URL: https://issues.apache.org/jira/browse/SPARK-41410
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41393) Upgrade slf4j to 2.0.5

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41393:
-

Assignee: Yang Jie

> Upgrade slf4j to 2.0.5
> --
>
> Key: SPARK-41393
> URL: https://issues.apache.org/jira/browse/SPARK-41393
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> https://www.slf4j.org/news.html#2.0.5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41393) Upgrade slf4j to 2.0.5

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41393.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38918
[https://github.com/apache/spark/pull/38918]

> Upgrade slf4j to 2.0.5
> --
>
> Key: SPARK-41393
> URL: https://issues.apache.org/jira/browse/SPARK-41393
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://www.slf4j.org/news.html#2.0.5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41398.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38924
[https://github.com/apache/spark/pull/38924]

> Relax constraints on Storage-Partitioned Join when partition keys after 
> runtime filtering do not match
> --
>
> Key: SPARK-41398
> URL: https://issues.apache.org/jira/browse/SPARK-41398
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41398:
-

Assignee: Chao Sun

> Relax constraints on Storage-Partitioned Join when partition keys after 
> runtime filtering do not match
> --
>
> Key: SPARK-41398
> URL: https://issues.apache.org/jira/browse/SPARK-41398
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643930#comment-17643930
 ] 

Apache Spark commented on SPARK-41409:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38940

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41409:


Assignee: Apache Spark

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643929#comment-17643929
 ] 

Apache Spark commented on SPARK-41409:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38940

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41409:


Assignee: (was: Apache Spark)

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Yang Jie (Jira)
Yang Jie created SPARK-41409:


 Summary: Reuse `WRONG_NUM_ARGS` instead of 
`_LEGACY_ERROR_TEMP_1043`
 Key: SPARK-41409
 URL: https://issues.apache.org/jira/browse/SPARK-41409
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41408:


Assignee: (was: Apache Spark)

> Upgrade scala-maven-plugin to 4.8.0
> ---
>
> Key: SPARK-41408
> URL: https://issues.apache.org/jira/browse/SPARK-41408
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41408:


Assignee: Apache Spark

> Upgrade scala-maven-plugin to 4.8.0
> ---
>
> Key: SPARK-41408
> URL: https://issues.apache.org/jira/browse/SPARK-41408
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643892#comment-17643892
 ] 

Apache Spark commented on SPARK-41408:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38936

> Upgrade scala-maven-plugin to 4.8.0
> ---
>
> Key: SPARK-41408
> URL: https://issues.apache.org/jira/browse/SPARK-41408
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Yang Jie (Jira)
Yang Jie created SPARK-41408:


 Summary: Upgrade scala-maven-plugin to 4.8.0
 Key: SPARK-41408
 URL: https://issues.apache.org/jira/browse/SPARK-41408
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643861#comment-17643861
 ] 

Apache Spark commented on SPARK-41407:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38939

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41407:


Assignee: (was: Apache Spark)

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41407:


Assignee: Apache Spark

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643859#comment-17643859
 ] 

Apache Spark commented on SPARK-41407:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38939

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread XiDuo You (Jira)
XiDuo You created SPARK-41407:
-

 Summary: Pull out v1 write to WriteFiles
 Key: SPARK-41407
 URL: https://issues.apache.org/jira/browse/SPARK-41407
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


Add new plan WriteFiles to do write files for v1writes.

We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643850#comment-17643850
 ] 

Apache Spark commented on SPARK-41403:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/38938

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41403:


Assignee: (was: Apache Spark)

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41403:


Assignee: Apache Spark

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41406:


Assignee: (was: Apache Spark)

> Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
> -
>
> Key: SPARK-41406
> URL: https://issues.apache.org/jira/browse/SPARK-41406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643845#comment-17643845
 ] 

Apache Spark commented on SPARK-41406:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38937

> Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
> -
>
> Key: SPARK-41406
> URL: https://issues.apache.org/jira/browse/SPARK-41406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41406:


Assignee: Apache Spark

> Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
> -
>
> Key: SPARK-41406
> URL: https://issues.apache.org/jira/browse/SPARK-41406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28869) Roll over event log files

2022-12-06 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643804#comment-17643804
 ] 

Ranga Reddy commented on SPARK-28869:
-

Hi [~kabhwan] 

I have enabled the eventlog rolling for the spark streaming network word count 
example, but event log files are not compacted. 

*Configuration Parameters:*
{code:java}
spark.eventLog.rolling.enabled=true
spark.eventLog.rolling.maxFileSize=10m
spark.history.fs.eventLog.rolling.maxFilesToRetain=2
spark.history.fs.cleaner.interval=1800{code}
*Event log file list:*

[^application_1670216197043_0012.log]

^Could you please check the issue.^

> Roll over event log files
> -
>
> Key: SPARK-28869
> URL: https://issues.apache.org/jira/browse/SPARK-28869
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: application_1670216197043_0012.log
>
>
> This issue tracks the effort on rolling over event log files in driver and 
> let SHS replay the multiple event log files correctly.
> This issue doesn't deal with overall size of event log, as well as no 
> guarantee when deleting old event log files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28869) Roll over event log files

2022-12-06 Thread Ranga Reddy (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ranga Reddy updated SPARK-28869:

Attachment: application_1670216197043_0012.log

> Roll over event log files
> -
>
> Key: SPARK-28869
> URL: https://issues.apache.org/jira/browse/SPARK-28869
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: application_1670216197043_0012.log
>
>
> This issue tracks the effort on rolling over event log files in driver and 
> let SHS replay the multiple event log files correctly.
> This issue doesn't deal with overall size of event log, as well as no 
> guarantee when deleting old event log files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41319:


Assignee: (was: Apache Spark)

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41319:


Assignee: Apache Spark

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643775#comment-17643775
 ] 

Apache Spark commented on SPARK-41319:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38935

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643776#comment-17643776
 ] 

Apache Spark commented on SPARK-41319:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38935

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41392) spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin

2022-12-06 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643774#comment-17643774
 ] 

Steve Loughran commented on SPARK-41392:


may relate to the bouncy castle 1.68 update of HADOOP-1756 -but this is also in 
the 3.3.5/3.3 branches and spark is happy there. so there must be more to it

> spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin
> ---
>
> Key: SPARK-41392
> URL: https://issues.apache.org/jira/browse/SPARK-41392
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Minor
>
> on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE
> {code}
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> {code}
> full stack
> {code}
> [ERROR] Failed to execute goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
> (scala-test-compile-first) on project spark-sql_2.12: Execution 
> scala-test-compile-first of goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
> class was missing while executing 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> [ERROR] -
> [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar
> [ERROR] urls[1] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar
> [ERROR] urls[2] = 
> file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar
> [ERROR] urls[3] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar
> [ERROR] urls[4] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar
> [ERROR] urls[5] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar
> [ERROR] urls[6] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar
> [ERROR] urls[7] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar
> [ERROR] urls[8] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar
> [ERROR] urls[9] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar
> [ERROR] urls[10] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar
> [ERROR] urls[11] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar
> [ERROR] urls[12] = 
> file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar
> [ERROR] urls[13] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar
> [ERROR] urls[14] = 
> file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar
> [ERROR] urls[15] = 
> file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar
> [ERROR] urls[16] = 
> file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar
> [ERROR] urls[17] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar
> [ERROR] urls[18] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar
> [ERROR] urls[19] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar
> [ERROR] urls[20] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar
> [ERROR] urls[21] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar
> [ERROR] urls[22] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar
> [ERROR] urls[23] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar
> [ERROR] urls[24] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar
> [ERROR] urls[25] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar
> [ERROR] urls[26] = 
> 

[jira] [Created] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-41406:
---

 Summary: Refactor error message for `NUM_COLUMNS_MISMATCH` to make 
it more generic
 Key: SPARK-41406
 URL: https://issues.apache.org/jira/browse/SPARK-41406
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41121) Upgrade sbt-assembly from 1.2.0 to 2.0.0

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41121:
-

Assignee: BingKun Pan

> Upgrade sbt-assembly from 1.2.0 to 2.0.0
> 
>
> Key: SPARK-41121
> URL: https://issues.apache.org/jira/browse/SPARK-41121
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41121) Upgrade sbt-assembly from 1.2.0 to 2.0.0

2022-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41121.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38637
[https://github.com/apache/spark/pull/38637]

> Upgrade sbt-assembly from 1.2.0 to 2.0.0
> 
>
> Key: SPARK-41121
> URL: https://issues.apache.org/jira/browse/SPARK-41121
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41317) PySpark write API for Spark Connect

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643747#comment-17643747
 ] 

Apache Spark commented on SPARK-41317:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38934

> PySpark write API for Spark Connect
> ---
>
> Key: SPARK-41317
> URL: https://issues.apache.org/jira/browse/SPARK-41317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41317) PySpark write API for Spark Connect

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643746#comment-17643746
 ] 

Apache Spark commented on SPARK-41317:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38934

> PySpark write API for Spark Connect
> ---
>
> Key: SPARK-41317
> URL: https://issues.apache.org/jira/browse/SPARK-41317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643743#comment-17643743
 ] 

Apache Spark commented on SPARK-41405:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/3

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41405:


Assignee: (was: Apache Spark)

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41405:


Assignee: Apache Spark

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643742#comment-17643742
 ] 

Apache Spark commented on SPARK-41405:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/3

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-41405:
---

 Summary: centralize the column resolution logic
 Key: SPARK-41405
 URL: https://issues.apache.org/jira/browse/SPARK-41405
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643733#comment-17643733
 ] 

jiaan.geng commented on SPARK-41403:


[~podongfeng] Thank you for your ping. I will try to do this!

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41404:


Assignee: (was: Apache Spark)

> Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
> ---
>
> Key: SPARK-41404
> URL: https://issues.apache.org/jira/browse/SPARK-41404
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643718#comment-17643718
 ] 

Apache Spark commented on SPARK-41404:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38933

> Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
> ---
>
> Key: SPARK-41404
> URL: https://issues.apache.org/jira/browse/SPARK-41404
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41404:


Assignee: Apache Spark

> Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
> ---
>
> Key: SPARK-41404
> URL: https://issues.apache.org/jira/browse/SPARK-41404
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Yang Jie (Jira)
Yang Jie created SPARK-41404:


 Summary: Support `ColumnarBatchSuite#testRandomRows` to test more 
primitive dataType
 Key: SPARK-41404
 URL: https://issues.apache.org/jira/browse/SPARK-41404
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643714#comment-17643714
 ] 

Ruifeng Zheng commented on SPARK-41403:
---

[~beliefer] Jiaan, would you want to have a try? You may refer to 
https://issues.apache.org/jira/browse/SPARK-40852

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-41403:
-

 Summary: Implement DataFrame.describe
 Key: SPARK-41403
 URL: https://issues.apache.org/jira/browse/SPARK-41403
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41266) Spark does not parse timestamp strings when using the IN operator

2022-12-06 Thread huldar chen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643706#comment-17643706
 ] 

huldar chen edited comment on SPARK-41266 at 12/6/22 8:11 AM:
--

You can try to use ANSI compliance:
{code:java}
spark.sql.ansi.enabled=true {code}
In the default hive compliance: promotes all the way to StringType.

In the ANSI compliance: promotes StringType to other data types.


was (Author: huldar):
You can try to use ANSI compliance:
{code:java}
spark.sql.ansi.enabled=true {code}

> Spark does not parse timestamp strings when using the IN operator
> -
>
> Key: SPARK-41266
> URL: https://issues.apache.org/jira/browse/SPARK-41266
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: Windows 10, Spark 3.2.1 with Java 11
>Reporter: Laurens Versluis
>Priority: Major
>
> Likely affects more versions, tested only with 3.2.1.
>  
> Summary:
> Spark will convert a timestamp string to a timestamp when using the equal 
> operator (=), yet won't do this when using the IN operator.
>  
> Details:
> While debugging an issue why we got no results on a query, we found out that 
> when using the equal symbol `=` in the WHERE clause combined with a 
> TimeStampType column that Spark will convert the string to a timestamp and 
> filter.
> However, when using the IN operator (our query), it will not do so, and 
> perform a cast to string. We expected the behavior to be similar, or at least 
> that Spark realizes the IN clause operates on a TimeStampType column and thus 
> attempts to convert to timestamp first before falling back to string 
> comparison.
>  
> *Minimal reproducible example:*
> Suppose we have a one-line dataset with the follow contents and schema:
>  
> {noformat}
> ++
> |starttime   |
> ++
> |2019-08-11 19:33:05         |
> ++
> root
>  |-- starttime: timestamp (nullable = true){noformat}
> Then if we fire the following queries, we will not get results for the 
> IN-clause one using a timestamp string with timezone information:
>  
>  
> {code:java}
> // Works - Spark casts the argument to a string and the internal 
> representation of the time seems to match it...
> singleCol.filter("starttime IN ('2019-08-11 19:33:05')").show();
> // Works
> singleCol.filter("starttime = '2019-08-11 19:33:05'").show();
> // Works
> singleCol.filter("starttime = '2019-08-11T19:33:05Z'").show();
> // Doesn't work
> singleCol.filter("starttime IN ('2019-08-11T19:33:05Z')").show();
> //Works
> singleCol.filter("starttime IN 
> (to_timestamp('2019-08-11T19:33:05Z'))").show(); {code}
>  
> We can see from the output that a cast to string is taking place:
> {noformat}
> [...] isnotnull(starttime#59),(cast(starttime#59 as string) = 2019-08-11 
> 19:33:05){noformat}
> Since the = operator does work, it would be consistent if operators such as 
> the IN operator would have similar, consistent behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2