date:20220908

[jira] [Commented] (SPARK-40398) Use Loop instead of Arrays.stream api

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602136#comment-17602136
 ] 

Apache Spark commented on SPARK-40398:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37843

> Use Loop instead of Arrays.stream api
> -
>
> Key: SPARK-40398
> URL: https://issues.apache.org/jira/browse/SPARK-40398
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> When the logic of stream pipe is relatively simple, using Arrays.stream is 
> always slower than using loop directly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40398) Use Loop instead of Arrays.stream api

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40398:


Assignee: (was: Apache Spark)

> Use Loop instead of Arrays.stream api
> -
>
> Key: SPARK-40398
> URL: https://issues.apache.org/jira/browse/SPARK-40398
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> When the logic of stream pipe is relatively simple, using Arrays.stream is 
> always slower than using loop directly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40398) Use Loop instead of Arrays.stream api

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40398:


Assignee: Apache Spark

> Use Loop instead of Arrays.stream api
> -
>
> Key: SPARK-40398
> URL: https://issues.apache.org/jira/browse/SPARK-40398
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> When the logic of stream pipe is relatively simple, using Arrays.stream is 
> always slower than using loop directly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40398) Use Loop instead of Arrays.stream api

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602135#comment-17602135
 ] 

Apache Spark commented on SPARK-40398:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37843

> Use Loop instead of Arrays.stream api
> -
>
> Key: SPARK-40398
> URL: https://issues.apache.org/jira/browse/SPARK-40398
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> When the logic of stream pipe is relatively simple, using Arrays.stream is 
> always slower than using loop directly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40398) Use Loop instead of Arrays.stream

2022-09-08 Thread Yang Jie (Jira)

Yang Jie created SPARK-40398:


 Summary: Use Loop instead of Arrays.stream
 Key: SPARK-40398
 URL: https://issues.apache.org/jira/browse/SPARK-40398
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Yang Jie


When the logic of stream pipe is relatively simple, using Arrays.stream is 
always slower than using loop directly

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40398) Use Loop instead of Arrays.stream api

2022-09-08 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40398:
-
Summary: Use Loop instead of Arrays.stream api  (was: Use Loop instead of 
Arrays.stream)

> Use Loop instead of Arrays.stream api
> -
>
> Key: SPARK-40398
> URL: https://issues.apache.org/jira/browse/SPARK-40398
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> When the logic of stream pipe is relatively simple, using Arrays.stream is 
> always slower than using loop directly
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40377) Allow customize maxBroadcastTableBytes and maxBroadcastRows

2022-09-08 Thread LeeeeLiu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LLiu updated SPARK-40377:
-
Description: 
Recently, we encountered some driver OOM problems. Some large tables were 
compressed using Snappy and then broadcast join was performed, but the actual 
data volume was too large, which resulted in driver OOM.

The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB 
and 51200 respectively. Maybe we can allow customization of these values, 
configure smaller values according to different scenarios, and prohibit 
broadcast joins for some large tables to avoid driver OOM.

  was:
Recently, we encountered some driver OOM problems. Some tables with large data 
volume were compressed using Snappy and then broadcast join was performed, but 
the actual data volume was too large, which resulted in driver OOM.

The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB 
and 51200 respectively. Maybe we can allow customization of these values, 
configure smaller values according to different scenarios, and prohibit 
broadcast joins for tables with large data volumes to avoid driver OOM.


> Allow customize maxBroadcastTableBytes and maxBroadcastRows
> ---
>
> Key: SPARK-40377
> URL: https://issues.apache.org/jira/browse/SPARK-40377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: LLiu
>Priority: Major
> Attachments: 截屏2022-09-07 20.40.06.png, 截屏2022-09-07 20.40.16.png
>
>
> Recently, we encountered some driver OOM problems. Some large tables were 
> compressed using Snappy and then broadcast join was performed, but the actual 
> data volume was too large, which resulted in driver OOM.
> The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB 
> and 51200 respectively. Maybe we can allow customization of these values, 
> configure smaller values according to different scenarios, and prohibit 
> broadcast joins for some large tables to avoid driver OOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40397) Migrate selenium-java from 3.1 to 4.2 and upgrade org.scalatestplus:selenium to 3.2.13.0

2022-09-08 Thread Yang Jie (Jira)

Yang Jie created SPARK-40397:


 Summary: Migrate selenium-java from 3.1 to 4.2 and upgrade 
org.scalatestplus:selenium to 3.2.13.0
 Key: SPARK-40397
 URL: https://issues.apache.org/jira/browse/SPARK-40397
 Project: Spark
  Issue Type: Improvement
  Components: Build, Tests
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40396) Update scalatest and scalatestplus to use latest version

2022-09-08 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40396:
-
Summary: Update scalatest and scalatestplus to use latest version  (was: 
Update scalatest and scalatestplus to use a stable version)

> Update scalatest and scalatestplus to use latest version
> 
>
> Key: SPARK-40396
> URL: https://issues.apache.org/jira/browse/SPARK-40396
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Currently, they use 3.3.0-snap3
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40396) Update scalatest and scalatestplus to use a stable version

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40396:


Assignee: Apache Spark

> Update scalatest and scalatestplus to use a stable version
> --
>
> Key: SPARK-40396
> URL: https://issues.apache.org/jira/browse/SPARK-40396
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, they use 3.3.0-snap3
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40396) Update scalatest and scalatestplus to use a stable version

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602092#comment-17602092
 ] 

Apache Spark commented on SPARK-40396:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37842

> Update scalatest and scalatestplus to use a stable version
> --
>
> Key: SPARK-40396
> URL: https://issues.apache.org/jira/browse/SPARK-40396
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Currently, they use 3.3.0-snap3
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40396) Update scalatest and scalatestplus to use a stable version

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40396:


Assignee: (was: Apache Spark)

> Update scalatest and scalatestplus to use a stable version
> --
>
> Key: SPARK-40396
> URL: https://issues.apache.org/jira/browse/SPARK-40396
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Currently, they use 3.3.0-snap3
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40396) Update scalatest and scalatestplus to use a stable version

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602091#comment-17602091
 ] 

Apache Spark commented on SPARK-40396:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37842

> Update scalatest and scalatestplus to use a stable version
> --
>
> Key: SPARK-40396
> URL: https://issues.apache.org/jira/browse/SPARK-40396
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Currently, they use 3.3.0-snap3
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40396) Update scalatest and scalatestplus to use a stable version

2022-09-08 Thread Yang Jie (Jira)

Yang Jie created SPARK-40396:


 Summary: Update scalatest and scalatestplus to use a stable version
 Key: SPARK-40396
 URL: https://issues.apache.org/jira/browse/SPARK-40396
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie


Currently, they use 3.3.0-snap3

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40375) Implement `spark.show_versions`

2022-09-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602089#comment-17602089
 ] 

Yikun Jiang commented on SPARK-40375:
-

For sys part is ok, but for deps part might depends on a 3rd party lib 
`distutils`, there were some discussion on it:

[1] [https://github.com/apache/spark/pull/35977#issuecomment-1079557507]

> Implement `spark.show_versions`
> ---
>
> Key: SPARK-40375
> URL: https://issues.apache.org/jira/browse/SPARK-40375
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We might want to have `spark.show_versions` to provide useful environment 
> informations similar to 
> [https://pandas.pydata.org/docs/reference/api/pandas.show_versions.html].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40373) Implement `ps.show_versions`

2022-09-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602088#comment-17602088
 ] 

Yikun Jiang commented on SPARK-40373:
-

For sys part is ok, but for deps part might depends on a 3rd party lib 
`distutils`, there were some discussion on it:

[1] https://github.com/apache/spark/pull/35977#issuecomment-1079557507

> Implement `ps.show_versions`
> 
>
> Key: SPARK-40373
> URL: https://issues.apache.org/jira/browse/SPARK-40373
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We want to have `ps.show_versions` to reach the pandas parity.
>  
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.show_versions.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`

2022-09-08 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-3.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37610
[https://github.com/apache/spark/pull/37610]

> Add `RocksDBProvider` similar to `LevelDBProvider`
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and 
> `YarnShuffleService`, a corresponding `RocksDB` implementation should be added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`

2022-09-08 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-3:
---

Assignee: Yang Jie

> Add `RocksDBProvider` similar to `LevelDBProvider`
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and 
> `YarnShuffleService`, a corresponding `RocksDB` implementation should be added



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40324) Provide a query context of ParseException

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602053#comment-17602053
 ] 

Apache Spark commented on SPARK-40324:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37841

> Provide a query context of ParseException
> -
>
> Key: SPARK-40324
> URL: https://issues.apache.org/jira/browse/SPARK-40324
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Extends the exception ParseException and add a queryContext into it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40332) Implement `GroupBy.quantile`.

2022-09-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40332.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37816
[https://github.com/apache/spark/pull/37816]

> Implement `GroupBy.quantile`.
> -
>
> Key: SPARK-40332
> URL: https://issues.apache.org/jira/browse/SPARK-40332
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> We should implement `GroupBy.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40332) Implement `GroupBy.quantile`.

2022-09-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40332:
-

Assignee: Yikun Jiang

> Implement `GroupBy.quantile`.
> -
>
> Key: SPARK-40332
> URL: https://issues.apache.org/jira/browse/SPARK-40332
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Yikun Jiang
>Priority: Major
>
> We should implement `GroupBy.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40395) Provide query context in AnalysisException

2022-09-08 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-40395:
--

 Summary: Provide query context in AnalysisException
 Key: SPARK-40395
 URL: https://issues.apache.org/jira/browse/SPARK-40395
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Provide query context in AnalysisException for better error messages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40383) Pin mypy ==0.920 in dev/requirements.txt

2022-09-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40383:
-

Assignee: Ruifeng Zheng

> Pin mypy ==0.920 in dev/requirements.txt
> 
>
> Key: SPARK-40383
> URL: https://issues.apache.org/jira/browse/SPARK-40383
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40383) Pin mypy ==0.920 in dev/requirements.txt

2022-09-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40383.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37827
[https://github.com/apache/spark/pull/37827]

> Pin mypy ==0.920 in dev/requirements.txt
> 
>
> Key: SPARK-40383
> URL: https://issues.apache.org/jira/browse/SPARK-40383
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40394) Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602042#comment-17602042
 ] 

Apache Spark commented on SPARK-40394:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/37840

> Move subquery expression CheckAnalysis error messages to use the new error 
> framework
> 
>
> Key: SPARK-40394
> URL: https://issues.apache.org/jira/browse/SPARK-40394
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40394) Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40394:


Assignee: (was: Apache Spark)

> Move subquery expression CheckAnalysis error messages to use the new error 
> framework
> 
>
> Key: SPARK-40394
> URL: https://issues.apache.org/jira/browse/SPARK-40394
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40394) Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40394:


Assignee: Apache Spark

> Move subquery expression CheckAnalysis error messages to use the new error 
> framework
> 
>
> Key: SPARK-40394
> URL: https://issues.apache.org/jira/browse/SPARK-40394
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40394) Move subquery expression CheckAnalysis error messages to use the new error framework

2022-09-08 Thread Daniel (Jira)

Daniel created SPARK-40394:
--

 Summary: Move subquery expression CheckAnalysis error messages to 
use the new error framework
 Key: SPARK-40394
 URL: https://issues.apache.org/jira/browse/SPARK-40394
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40389) Decimals can't upcast as integral types if the cast can overflow

2022-09-08 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-40389:
---
Affects Version/s: 3.3.1

> Decimals can't upcast as integral types if the cast can overflow
> 
>
> Key: SPARK-40389
> URL: https://issues.apache.org/jira/browse/SPARK-40389
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>
> In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast 
> the `from` type to `to` type without any truncating or precision lose or 
> possible runtime failures.
> Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
> This is wrong, since casting 90BD as Integer type will overflow.
> As a result:
>  * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
> will mistakenly convert "cast(cast(90BD as int) as long)" as 
> "cast(90BD as long)"
>  * The STRICT store assignment policy relies on this method too. With the 
> policy enabled, inserting 90BD into integer columns will pass 
> compiling time check and insert an unexpected value 410065408.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40389) Decimals can't upcast as integral types if the cast can overflow

2022-09-08 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-40389.

Fix Version/s: 3.3.1
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 37832
[https://github.com/apache/spark/pull/37832]

> Decimals can't upcast as integral types if the cast can overflow
> 
>
> Key: SPARK-40389
> URL: https://issues.apache.org/jira/browse/SPARK-40389
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast 
> the `from` type to `to` type without any truncating or precision lose or 
> possible runtime failures.
> Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
> This is wrong, since casting 90BD as Integer type will overflow.
> As a result:
>  * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
> will mistakenly convert "cast(cast(90BD as int) as long)" as 
> "cast(90BD as long)"
>  * The STRICT store assignment policy relies on this method too. With the 
> policy enabled, inserting 90BD into integer columns will pass 
> compiling time check and insert an unexpected value 410065408.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39546) Support ports definition in executor pod template

2022-09-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39546:
--
Summary: Support ports definition in executor pod template  (was: Respect 
port defininitions on K8S pod templates for both driver and executor)

> Support ports definition in executor pod template
> -
>
> Key: SPARK-39546
> URL: https://issues.apache.org/jira/browse/SPARK-39546
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Oliver Koeth
>Assignee: Yilun Fan
>Priority: Minor
> Fix For: 3.4.0
>
>
> *Description:*
> Spark on K8S allows to open additional ports for custom purposes on the 
> driver pod via the pod template, but ignores the port specification in the 
> executor pod template. Port specifications from the pod template should be 
> preserved (and extended) for both drivers and executors.
> *Scenario:*
> I want to run functionality in the executor that exposes data on an 
> additional port. In my case, this is monitoring data exposed by Spark's JMX 
> metrics sink via the JMX prometheus exporter java agent 
> https://github.com/prometheus/jmx_exporter -- the java agent opens an extra 
> port inside the container, but for prometheus to detect and scrape the port, 
> it must be exposed in the K8S pod resource.
> (More background if desired: This seems to be the "classic" Spark 2 way to 
> expose prometheus metrics. Spark 3 introduced a native equivalent servlet for 
> the driver, but for the executor, only a rather limited set of metrics is 
> forwarded via the driver, and that also follows a completely different naming 
> scheme. So the JMX + exporter approach still turns out to be more useful for 
> me, even in Spark 3)
> Expected behavior:
> I add the following to my pod template to expose the extra port opened by the 
> JMX exporter java agent
> spec:
>   containers:
>   - ...
>     ports:
>     - containerPort: 8090
>       name: jmx-prometheus
>       protocol: TCP
> Observed behavior:
> The port is exposed for driver pods but not for executor pods
> *Corresponding code:*
> driver pod creation just adds ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala]
>  (currently line 115)
> val driverContainer = new ContainerBuilder(pod.container)
> ...
>   .addNewPort()
> ...
>   .addNewPort()
> while executor pod creation replaces the ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala]
>  (currently line 211)
> val executorContainer = new ContainerBuilder(pod.container)
> ...
>   .withPorts(requiredPorts.asJava)
> The current handling is incosistent and unnecessarily limiting. It seems that 
> the executor creation could/should just as well preserve pods from the 
> template and add extra required ports.
> *Workaround:*
> It is possible to work around this limitation by adding a full sidecar 
> container to the executor pod spec which declares the port. Sidecar 
> containers are left unchanged by pod template handling.
> As all containers in a pod share the same network, it does not matter which 
> container actually declares to expose the port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39546) Support ports definition in executor pod template

2022-09-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39546:
--
Affects Version/s: 3.4.0
   (was: 3.3.0)

> Support ports definition in executor pod template
> -
>
> Key: SPARK-39546
> URL: https://issues.apache.org/jira/browse/SPARK-39546
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Oliver Koeth
>Assignee: Yilun Fan
>Priority: Minor
> Fix For: 3.4.0
>
>
> *Description:*
> Spark on K8S allows to open additional ports for custom purposes on the 
> driver pod via the pod template, but ignores the port specification in the 
> executor pod template. Port specifications from the pod template should be 
> preserved (and extended) for both drivers and executors.
> *Scenario:*
> I want to run functionality in the executor that exposes data on an 
> additional port. In my case, this is monitoring data exposed by Spark's JMX 
> metrics sink via the JMX prometheus exporter java agent 
> https://github.com/prometheus/jmx_exporter -- the java agent opens an extra 
> port inside the container, but for prometheus to detect and scrape the port, 
> it must be exposed in the K8S pod resource.
> (More background if desired: This seems to be the "classic" Spark 2 way to 
> expose prometheus metrics. Spark 3 introduced a native equivalent servlet for 
> the driver, but for the executor, only a rather limited set of metrics is 
> forwarded via the driver, and that also follows a completely different naming 
> scheme. So the JMX + exporter approach still turns out to be more useful for 
> me, even in Spark 3)
> Expected behavior:
> I add the following to my pod template to expose the extra port opened by the 
> JMX exporter java agent
> spec:
>   containers:
>   - ...
>     ports:
>     - containerPort: 8090
>       name: jmx-prometheus
>       protocol: TCP
> Observed behavior:
> The port is exposed for driver pods but not for executor pods
> *Corresponding code:*
> driver pod creation just adds ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala]
>  (currently line 115)
> val driverContainer = new ContainerBuilder(pod.container)
> ...
>   .addNewPort()
> ...
>   .addNewPort()
> while executor pod creation replaces the ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala]
>  (currently line 211)
> val executorContainer = new ContainerBuilder(pod.container)
> ...
>   .withPorts(requiredPorts.asJava)
> The current handling is incosistent and unnecessarily limiting. It seems that 
> the executor creation could/should just as well preserve pods from the 
> template and add extra required ports.
> *Workaround:*
> It is possible to work around this limitation by adding a full sidecar 
> container to the executor pod spec which declares the port. Sidecar 
> containers are left unchanged by pod template handling.
> As all containers in a pod share the same network, it does not matter which 
> container actually declares to expose the port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-39546) Respect port defininitions on K8S pod templates for both driver and executor

2022-09-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39546.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37803
[https://github.com/apache/spark/pull/37803]

> Respect port defininitions on K8S pod templates for both driver and executor
> 
>
> Key: SPARK-39546
> URL: https://issues.apache.org/jira/browse/SPARK-39546
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Oliver Koeth
>Assignee: Yilun Fan
>Priority: Minor
> Fix For: 3.4.0
>
>
> *Description:*
> Spark on K8S allows to open additional ports for custom purposes on the 
> driver pod via the pod template, but ignores the port specification in the 
> executor pod template. Port specifications from the pod template should be 
> preserved (and extended) for both drivers and executors.
> *Scenario:*
> I want to run functionality in the executor that exposes data on an 
> additional port. In my case, this is monitoring data exposed by Spark's JMX 
> metrics sink via the JMX prometheus exporter java agent 
> https://github.com/prometheus/jmx_exporter -- the java agent opens an extra 
> port inside the container, but for prometheus to detect and scrape the port, 
> it must be exposed in the K8S pod resource.
> (More background if desired: This seems to be the "classic" Spark 2 way to 
> expose prometheus metrics. Spark 3 introduced a native equivalent servlet for 
> the driver, but for the executor, only a rather limited set of metrics is 
> forwarded via the driver, and that also follows a completely different naming 
> scheme. So the JMX + exporter approach still turns out to be more useful for 
> me, even in Spark 3)
> Expected behavior:
> I add the following to my pod template to expose the extra port opened by the 
> JMX exporter java agent
> spec:
>   containers:
>   - ...
>     ports:
>     - containerPort: 8090
>       name: jmx-prometheus
>       protocol: TCP
> Observed behavior:
> The port is exposed for driver pods but not for executor pods
> *Corresponding code:*
> driver pod creation just adds ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala]
>  (currently line 115)
> val driverContainer = new ContainerBuilder(pod.container)
> ...
>   .addNewPort()
> ...
>   .addNewPort()
> while executor pod creation replaces the ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala]
>  (currently line 211)
> val executorContainer = new ContainerBuilder(pod.container)
> ...
>   .withPorts(requiredPorts.asJava)
> The current handling is incosistent and unnecessarily limiting. It seems that 
> the executor creation could/should just as well preserve pods from the 
> template and add extra required ports.
> *Workaround:*
> It is possible to work around this limitation by adding a full sidecar 
> container to the executor pod spec which declares the port. Sidecar 
> containers are left unchanged by pod template handling.
> As all containers in a pod share the same network, it does not matter which 
> container actually declares to expose the port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-39546) Respect port defininitions on K8S pod templates for both driver and executor

2022-09-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39546:
-

Assignee: Yilun Fan

> Respect port defininitions on K8S pod templates for both driver and executor
> 
>
> Key: SPARK-39546
> URL: https://issues.apache.org/jira/browse/SPARK-39546
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Oliver Koeth
>Assignee: Yilun Fan
>Priority: Minor
>
> *Description:*
> Spark on K8S allows to open additional ports for custom purposes on the 
> driver pod via the pod template, but ignores the port specification in the 
> executor pod template. Port specifications from the pod template should be 
> preserved (and extended) for both drivers and executors.
> *Scenario:*
> I want to run functionality in the executor that exposes data on an 
> additional port. In my case, this is monitoring data exposed by Spark's JMX 
> metrics sink via the JMX prometheus exporter java agent 
> https://github.com/prometheus/jmx_exporter -- the java agent opens an extra 
> port inside the container, but for prometheus to detect and scrape the port, 
> it must be exposed in the K8S pod resource.
> (More background if desired: This seems to be the "classic" Spark 2 way to 
> expose prometheus metrics. Spark 3 introduced a native equivalent servlet for 
> the driver, but for the executor, only a rather limited set of metrics is 
> forwarded via the driver, and that also follows a completely different naming 
> scheme. So the JMX + exporter approach still turns out to be more useful for 
> me, even in Spark 3)
> Expected behavior:
> I add the following to my pod template to expose the extra port opened by the 
> JMX exporter java agent
> spec:
>   containers:
>   - ...
>     ports:
>     - containerPort: 8090
>       name: jmx-prometheus
>       protocol: TCP
> Observed behavior:
> The port is exposed for driver pods but not for executor pods
> *Corresponding code:*
> driver pod creation just adds ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala]
>  (currently line 115)
> val driverContainer = new ContainerBuilder(pod.container)
> ...
>   .addNewPort()
> ...
>   .addNewPort()
> while executor pod creation replaces the ports
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala]
>  (currently line 211)
> val executorContainer = new ContainerBuilder(pod.container)
> ...
>   .withPorts(requiredPorts.asJava)
> The current handling is incosistent and unnecessarily limiting. It seems that 
> the executor creation could/should just as well preserve pods from the 
> template and add extra required ports.
> *Workaround:*
> It is possible to work around this limitation by adding a full sidecar 
> container to the executor pod spec which declares the port. Sidecar 
> containers are left unchanged by pod template handling.
> As all containers in a pod share the same network, it does not matter which 
> container actually declares to expose the port.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40362) Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40362:


Assignee: (was: Apache Spark)

> Bug in Canonicalization of expressions like Add & Multiply i.e Commutative 
> Operators
> 
>
> Key: SPARK-40362
> URL: https://issues.apache.org/jira/browse/SPARK-40362
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Asif
>Priority: Major
>  Labels: spark-sql
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In the canonicalization code which is now in two stages, canonicalization 
> involving Commutative operators is broken, if they are subexpressions of 
> certain type of expressions which override precanonicalize, for example 
> BinaryComparison 
> Consider following expression:
> a + b > 10
>          GT
>             |
> a + b          10
> The BinaryComparison operator in the precanonicalize, first precanonicalizes  
> children & then   may swap operands based on left /right hashCode inequality..
> lets say  Add(a + b) .hashCode is >  10.hashCode as a result GT is converted 
> to LT
> But If the same tree is created 
>            GT
>             |
>  b + a      10
> The hashCode of Add(b, a) is not same as Add(a, b) , thus it is possible that 
> for this tree
>  Add(b + a) .hashCode is <  10.hashCode  in which case GT remains as is.
> Thus to similar trees result in different canonicalization , one having GT 
> other having LT 
>  
> The problem occurs because  for commutative expressions the canonicalization 
> normalizes the expression with consistent hashCode which is not the case with 
> precanonicalize as the hashCode of commutative expression 's precanonicalize 
> and post canonicalize are different.
>  
>  
> The test 
> {quote}test("bug X")
> Unknown macro: \{     val tr1 = LocalRelation('c.int, 'b.string, 'a.int)    
> val y = tr1.where('a.attr + 'c.attr > 10).analyze    val fullCond = 
> y.asInstanceOf[Filter].condition.clone()   val addExpr = (fullCond match 
> Unknown macro}
> ).clone().asInstanceOf[Add]
> val canonicalizedFullCond = fullCond.canonicalized
> // swap the operands of add
> val newAddExpr = Add(addExpr.right, addExpr.left)
> // build a new condition which is same as the previous one, but with operands 
> of //Add reversed 
> val builtCondnCanonicalized = GreaterThan(newAddExpr, 
> Literal(10)).canonicalized
> assertEquals(canonicalizedFullCond, builtCondnCanonicalized)
> }
> {quote}
> This test fails.
> The fix which I propose is that for the commutative expressions, the 
> precanonicalize should be overridden and  
> Canonicalize.reorderCommutativeOperators be invoked on the expression instead 
> of at place of canonicalize. effectively for commutative operands ( add, or , 
> multiply , and etc) canonicalize and precanonicalize should be same.
> PR:
> [https://github.com/apache/spark/pull/37824]
>  
>  
> I am also trying a better fix, where by the idea is that for commutative 
> expressions the murmur hashCode are caluculated using unorderedHash so that 
> it is order  independent ( i.e symmetric).
> The above approach works fine , but in case of Least & Greatest, the 
> Product's element is  a Seq,  and that messes with consistency of hashCode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40362) Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40362:


Assignee: Apache Spark

> Bug in Canonicalization of expressions like Add & Multiply i.e Commutative 
> Operators
> 
>
> Key: SPARK-40362
> URL: https://issues.apache.org/jira/browse/SPARK-40362
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Asif
>Assignee: Apache Spark
>Priority: Major
>  Labels: spark-sql
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In the canonicalization code which is now in two stages, canonicalization 
> involving Commutative operators is broken, if they are subexpressions of 
> certain type of expressions which override precanonicalize, for example 
> BinaryComparison 
> Consider following expression:
> a + b > 10
>          GT
>             |
> a + b          10
> The BinaryComparison operator in the precanonicalize, first precanonicalizes  
> children & then   may swap operands based on left /right hashCode inequality..
> lets say  Add(a + b) .hashCode is >  10.hashCode as a result GT is converted 
> to LT
> But If the same tree is created 
>            GT
>             |
>  b + a      10
> The hashCode of Add(b, a) is not same as Add(a, b) , thus it is possible that 
> for this tree
>  Add(b + a) .hashCode is <  10.hashCode  in which case GT remains as is.
> Thus to similar trees result in different canonicalization , one having GT 
> other having LT 
>  
> The problem occurs because  for commutative expressions the canonicalization 
> normalizes the expression with consistent hashCode which is not the case with 
> precanonicalize as the hashCode of commutative expression 's precanonicalize 
> and post canonicalize are different.
>  
>  
> The test 
> {quote}test("bug X")
> Unknown macro: \{     val tr1 = LocalRelation('c.int, 'b.string, 'a.int)    
> val y = tr1.where('a.attr + 'c.attr > 10).analyze    val fullCond = 
> y.asInstanceOf[Filter].condition.clone()   val addExpr = (fullCond match 
> Unknown macro}
> ).clone().asInstanceOf[Add]
> val canonicalizedFullCond = fullCond.canonicalized
> // swap the operands of add
> val newAddExpr = Add(addExpr.right, addExpr.left)
> // build a new condition which is same as the previous one, but with operands 
> of //Add reversed 
> val builtCondnCanonicalized = GreaterThan(newAddExpr, 
> Literal(10)).canonicalized
> assertEquals(canonicalizedFullCond, builtCondnCanonicalized)
> }
> {quote}
> This test fails.
> The fix which I propose is that for the commutative expressions, the 
> precanonicalize should be overridden and  
> Canonicalize.reorderCommutativeOperators be invoked on the expression instead 
> of at place of canonicalize. effectively for commutative operands ( add, or , 
> multiply , and etc) canonicalize and precanonicalize should be same.
> PR:
> [https://github.com/apache/spark/pull/37824]
>  
>  
> I am also trying a better fix, where by the idea is that for commutative 
> expressions the murmur hashCode are caluculated using unorderedHash so that 
> it is order  independent ( i.e symmetric).
> The above approach works fine , but in case of Least & Greatest, the 
> Product's element is  a Seq,  and that messes with consistency of hashCode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40362) Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601933#comment-17601933
 ] 

Apache Spark commented on SPARK-40362:
--

User 'ahshahid' has created a pull request for this issue:
https://github.com/apache/spark/pull/37824

> Bug in Canonicalization of expressions like Add & Multiply i.e Commutative 
> Operators
> 
>
> Key: SPARK-40362
> URL: https://issues.apache.org/jira/browse/SPARK-40362
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Asif
>Priority: Major
>  Labels: spark-sql
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In the canonicalization code which is now in two stages, canonicalization 
> involving Commutative operators is broken, if they are subexpressions of 
> certain type of expressions which override precanonicalize, for example 
> BinaryComparison 
> Consider following expression:
> a + b > 10
>          GT
>             |
> a + b          10
> The BinaryComparison operator in the precanonicalize, first precanonicalizes  
> children & then   may swap operands based on left /right hashCode inequality..
> lets say  Add(a + b) .hashCode is >  10.hashCode as a result GT is converted 
> to LT
> But If the same tree is created 
>            GT
>             |
>  b + a      10
> The hashCode of Add(b, a) is not same as Add(a, b) , thus it is possible that 
> for this tree
>  Add(b + a) .hashCode is <  10.hashCode  in which case GT remains as is.
> Thus to similar trees result in different canonicalization , one having GT 
> other having LT 
>  
> The problem occurs because  for commutative expressions the canonicalization 
> normalizes the expression with consistent hashCode which is not the case with 
> precanonicalize as the hashCode of commutative expression 's precanonicalize 
> and post canonicalize are different.
>  
>  
> The test 
> {quote}test("bug X")
> Unknown macro: \{     val tr1 = LocalRelation('c.int, 'b.string, 'a.int)    
> val y = tr1.where('a.attr + 'c.attr > 10).analyze    val fullCond = 
> y.asInstanceOf[Filter].condition.clone()   val addExpr = (fullCond match 
> Unknown macro}
> ).clone().asInstanceOf[Add]
> val canonicalizedFullCond = fullCond.canonicalized
> // swap the operands of add
> val newAddExpr = Add(addExpr.right, addExpr.left)
> // build a new condition which is same as the previous one, but with operands 
> of //Add reversed 
> val builtCondnCanonicalized = GreaterThan(newAddExpr, 
> Literal(10)).canonicalized
> assertEquals(canonicalizedFullCond, builtCondnCanonicalized)
> }
> {quote}
> This test fails.
> The fix which I propose is that for the commutative expressions, the 
> precanonicalize should be overridden and  
> Canonicalize.reorderCommutativeOperators be invoked on the expression instead 
> of at place of canonicalize. effectively for commutative operands ( add, or , 
> multiply , and etc) canonicalize and precanonicalize should be same.
> PR:
> [https://github.com/apache/spark/pull/37824]
>  
>  
> I am also trying a better fix, where by the idea is that for commutative 
> expressions the murmur hashCode are caluculated using unorderedHash so that 
> it is order  independent ( i.e symmetric).
> The above approach works fine , but in case of Least & Greatest, the 
> Product's element is  a Seq,  and that messes with consistency of hashCode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40280) Failure to create parquet predicate push down for ints and longs on some valid files

2022-09-08 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-40280.
---
Fix Version/s: 3.4.0
   3.3.1
   3.2.3
 Assignee: Robert Joseph Evans
   Resolution: Fixed

> Failure to create parquet predicate push down for ints and longs on some 
> valid files
> 
>
> Key: SPARK-40280
> URL: https://issues.apache.org/jira/browse/SPARK-40280
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
>
> The [parquet 
> format|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#signed-integers]
>  specification states that...
> bq. {{{}INT(8, true){}}}, {{{}INT(16, true){}}}, and {{INT(32, true)}} must 
> annotate an {{int32}} primitive type and {{INT(64, true)}} must annotate an 
> {{int64}} primitive type. {{INT(32, true)}} and {{INT(64, true)}} are implied 
> by the {{int32}} and {{int64}} primitive types if no other annotation is 
> present and should be considered optional.
> But the code inside of 
> [ParquetFilters.scala|https://github.com/apache/spark/blob/296fe49ec855ac8c15c080e7bab6d519fe504bd3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L125-L126]
>  requires that for {{int32}} and {{int64}} that there be no annotation. If 
> there is an annotation for those columns and they are a part of a predicate 
> push down, the hard coded types will not match and the corresponding filter 
> ends up being {{None}}.
> This can be a huge performance penalty for a valid parquet file.
> I am happy to provide files that show the issue if needed for testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40385) Classes with companion object constructor fails interpreted path

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40385:


Assignee: Apache Spark

> Classes with companion object constructor fails interpreted path
> 
>
> Key: SPARK-40385
> URL: https://issues.apache.org/jira/browse/SPARK-40385
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.3.0, 3.2.2
>Reporter: Emil Ejbyfeldt
>Assignee: Apache Spark
>Priority: Major
>
> The Encoder implemented in SPARK-8288 for classes with only a companion 
> object constructor fails when using the interpreted path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40385) Classes with companion object constructor fails interpreted path

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40385:


Assignee: (was: Apache Spark)

> Classes with companion object constructor fails interpreted path
> 
>
> Key: SPARK-40385
> URL: https://issues.apache.org/jira/browse/SPARK-40385
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.3.0, 3.2.2
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> The Encoder implemented in SPARK-8288 for classes with only a companion 
> object constructor fails when using the interpreted path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40385) Classes with companion object constructor fails interpreted path

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601830#comment-17601830
 ] 

Apache Spark commented on SPARK-40385:
--

User 'eejbyfeldt' has created a pull request for this issue:
https://github.com/apache/spark/pull/37837

> Classes with companion object constructor fails interpreted path
> 
>
> Key: SPARK-40385
> URL: https://issues.apache.org/jira/browse/SPARK-40385
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.3.0, 3.2.2
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> The Encoder implemented in SPARK-8288 for classes with only a companion 
> object constructor fails when using the interpreted path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2022-09-08 Thread Shrikant Prasad (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601827#comment-17601827
 ] 

Shrikant Prasad commented on SPARK-26365:
-

Spark submit command exit code ($?) as 0 is okay as there is no error in job 
submission. It's the job which failed and that info we do get in container exit 
code (1).  When job submission fails, we do get proper exit code. So it doesn't 
seems to be a bug.
{code:java}
container status: 
 container name: spark-kubernetes-driver
 container image: **
 container state: terminated
 container started at: 2022-09-08T13:40:39Z
 container finished at: 2022-09-08T13:40:43Z
 exit code: 1
 termination reason: Error {code}

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0, 3.0.0, 3.1.0
>Reporter: Oscar Bonilla
>Priority: Major
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40380) Constant-folding of InvokeLike should not result in non-serializable result

2022-09-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40380.
-
Fix Version/s: 3.3.1
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 37823
[https://github.com/apache/spark/pull/37823]

> Constant-folding of InvokeLike should not result in non-serializable result
> ---
>
> Key: SPARK-40380
> URL: https://issues.apache.org/jira/browse/SPARK-40380
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kris Mok
>Assignee: Kris Mok
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> SPARK-37907 added constant-folding support to the {{InvokeLike}} family of 
> expressions. Unfortunately it introduced a regression for cases when a 
> constant-folded {{InvokeLike}} expression returned a non-serializable result. 
> {{ExpressionEncoder}}s is an area where this problem may be exposed, e.g. 
> when using sparksql-scalapb on Spark 3.3.0+.
> Below is a minimal repro to demonstrate this issue:
> {code:scala}
> import org.apache.spark.sql.Column
> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
> import org.apache.spark.sql.catalyst.expressions.Literal
> import org.apache.spark.sql.catalyst.expressions.objects.{Invoke, 
> StaticInvoke}
> import org.apache.spark.sql.types.{LongType, ObjectType}
> class NotSerializableBoxedLong(longVal: Long) { def add(other: Long): Long = 
> longVal + other }
> case class SerializableBoxedLong(longVal: Long) { def toNotSerializable(): 
> NotSerializableBoxedLong = new NotSerializableBoxedLong(longVal) }
> val litExpr = Literal.fromObject(SerializableBoxedLong(42L), 
> ObjectType(classOf[SerializableBoxedLong]))
> val toNotSerializableExpr = Invoke(litExpr, "toNotSerializable", 
> ObjectType(classOf[NotSerializableBoxedLong]))
> val addExpr = Invoke(toNotSerializableExpr, "add", LongType, 
> Seq(UnresolvedAttribute.quotedString("id")))
> val df = spark.range(2).select(new Column(addExpr))
> df.collect
> {code}
> Before SPARK-37907, this example would run fine and result in {{[[42], 
> [43]]}}. But after SPARK-37907, it'd fail with:
> {code:none}
> ...
> Caused by: java.io.NotSerializableException: NotSerializableBoxedLong
> Serialization stack:
>   - object not serializable (class: NotSerializableBoxedLong, value: 
> NotSerializableBoxedLong@71231636)
>   - element of array (index: 1)
>   - array (class [Ljava.lang.Object;, size 2)
>   - element of array (index: 1)
>   - array (class [Ljava.lang.Object;, size 3)
>   - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
> type: class [Ljava.lang.Object;)
>   - object (class java.lang.invoke.SerializedLambda, 
> SerializedLambda[capturingClass=class 
> org.apache.spark.sql.execution.WholeStageCodegenExec, 
> functionalInterfaceMethod=scala/Function2.apply:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;,
>  implementation=invokeStatic 
> org/apache/spark/sql/execution/WholeStageCodegenExec.$anonfun$doExecute$4$adapted:(Lorg/apache/spark/sql/catalyst/expressions/codegen/CodeAndComment;[Ljava/lang/Object;Lorg/apache/spark/sql/execution/metric/SQLMetric;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;,
>  
> instantiatedMethodType=(Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;,
>  numCaptured=3])
>   - writeReplace data (class: java.lang.invoke.SerializedLambda)
>   - object (class 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$Lambda$3123/1641694389, 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$Lambda$3123/1641694389@185db22c)
>   at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:441)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40380) Constant-folding of InvokeLike should not result in non-serializable result

2022-09-08 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40380:
---

Assignee: Kris Mok

> Constant-folding of InvokeLike should not result in non-serializable result
> ---
>
> Key: SPARK-40380
> URL: https://issues.apache.org/jira/browse/SPARK-40380
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kris Mok
>Assignee: Kris Mok
>Priority: Major
>
> SPARK-37907 added constant-folding support to the {{InvokeLike}} family of 
> expressions. Unfortunately it introduced a regression for cases when a 
> constant-folded {{InvokeLike}} expression returned a non-serializable result. 
> {{ExpressionEncoder}}s is an area where this problem may be exposed, e.g. 
> when using sparksql-scalapb on Spark 3.3.0+.
> Below is a minimal repro to demonstrate this issue:
> {code:scala}
> import org.apache.spark.sql.Column
> import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
> import org.apache.spark.sql.catalyst.expressions.Literal
> import org.apache.spark.sql.catalyst.expressions.objects.{Invoke, 
> StaticInvoke}
> import org.apache.spark.sql.types.{LongType, ObjectType}
> class NotSerializableBoxedLong(longVal: Long) { def add(other: Long): Long = 
> longVal + other }
> case class SerializableBoxedLong(longVal: Long) { def toNotSerializable(): 
> NotSerializableBoxedLong = new NotSerializableBoxedLong(longVal) }
> val litExpr = Literal.fromObject(SerializableBoxedLong(42L), 
> ObjectType(classOf[SerializableBoxedLong]))
> val toNotSerializableExpr = Invoke(litExpr, "toNotSerializable", 
> ObjectType(classOf[NotSerializableBoxedLong]))
> val addExpr = Invoke(toNotSerializableExpr, "add", LongType, 
> Seq(UnresolvedAttribute.quotedString("id")))
> val df = spark.range(2).select(new Column(addExpr))
> df.collect
> {code}
> Before SPARK-37907, this example would run fine and result in {{[[42], 
> [43]]}}. But after SPARK-37907, it'd fail with:
> {code:none}
> ...
> Caused by: java.io.NotSerializableException: NotSerializableBoxedLong
> Serialization stack:
>   - object not serializable (class: NotSerializableBoxedLong, value: 
> NotSerializableBoxedLong@71231636)
>   - element of array (index: 1)
>   - array (class [Ljava.lang.Object;, size 2)
>   - element of array (index: 1)
>   - array (class [Ljava.lang.Object;, size 3)
>   - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
> type: class [Ljava.lang.Object;)
>   - object (class java.lang.invoke.SerializedLambda, 
> SerializedLambda[capturingClass=class 
> org.apache.spark.sql.execution.WholeStageCodegenExec, 
> functionalInterfaceMethod=scala/Function2.apply:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;,
>  implementation=invokeStatic 
> org/apache/spark/sql/execution/WholeStageCodegenExec.$anonfun$doExecute$4$adapted:(Lorg/apache/spark/sql/catalyst/expressions/codegen/CodeAndComment;[Ljava/lang/Object;Lorg/apache/spark/sql/execution/metric/SQLMetric;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;,
>  
> instantiatedMethodType=(Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;,
>  numCaptured=3])
>   - writeReplace data (class: java.lang.invoke.SerializedLambda)
>   - object (class 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$Lambda$3123/1641694389, 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$Lambda$3123/1641694389@185db22c)
>   at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:441)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40345) Implement `ExpandingGroupby.quantile`.

2022-09-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601791#comment-17601791
 ] 

Yikun Jiang commented on SPARK-40345:
-

https://github.com/apache/spark/pull/37836

> Implement `ExpandingGroupby.quantile`.
> --
>
> Key: SPARK-40345
> URL: https://issues.apache.org/jira/browse/SPARK-40345
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should implement `ExpandingGroupby.quantile` for increasing pandas API 
> coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601792#comment-17601792
 ] 

Apache Spark commented on SPARK-40339:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37836

> Implement `Expanding.quantile`.
> ---
>
> Key: SPARK-40339
> URL: https://issues.apache.org/jira/browse/SPARK-40339
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should implement `Expanding.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.expanding.Expanding.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40339:


Assignee: Apache Spark

> Implement `Expanding.quantile`.
> ---
>
> Key: SPARK-40339
> URL: https://issues.apache.org/jira/browse/SPARK-40339
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should implement `Expanding.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.expanding.Expanding.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40348) Implement `RollingGroupby.quantile`.

2022-09-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601789#comment-17601789
 ] 

Yikun Jiang commented on SPARK-40348:
-

https://github.com/apache/spark/pull/37836

> Implement `RollingGroupby.quantile`.
> 
>
> Key: SPARK-40348
> URL: https://issues.apache.org/jira/browse/SPARK-40348
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should implement `RollingGroupby.quantile` for increasing pandas API 
> coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601790#comment-17601790
 ] 

Apache Spark commented on SPARK-40339:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37836

> Implement `Expanding.quantile`.
> ---
>
> Key: SPARK-40339
> URL: https://issues.apache.org/jira/browse/SPARK-40339
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should implement `Expanding.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.expanding.Expanding.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40339:


Assignee: (was: Apache Spark)

> Implement `Expanding.quantile`.
> ---
>
> Key: SPARK-40339
> URL: https://issues.apache.org/jira/browse/SPARK-40339
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should implement `Expanding.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.expanding.Expanding.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40342) Implement `Rolling.quantile`.

2022-09-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601788#comment-17601788
 ] 

Yikun Jiang commented on SPARK-40342:
-

https://github.com/apache/spark/pull/37836

> Implement `Rolling.quantile`.
> -
>
> Key: SPARK-40342
> URL: https://issues.apache.org/jira/browse/SPARK-40342
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should implement `Rolling.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.rolling.Rolling.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40339) Implement `Expanding.quantile`.

2022-09-08 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601787#comment-17601787
 ] 

Yikun Jiang commented on SPARK-40339:
-

https://github.com/apache/spark/pull/37836

> Implement `Expanding.quantile`.
> ---
>
> Key: SPARK-40339
> URL: https://issues.apache.org/jira/browse/SPARK-40339
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should implement `Expanding.quantile` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.expanding.Expanding.quantile.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40393) Refactor expanding and rolling test for function with input

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40393:


Assignee: (was: Apache Spark)

> Refactor expanding and rolling test for function with input
> ---
>
> Key: SPARK-40393
> URL: https://issues.apache.org/jira/browse/SPARK-40393
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40393) Refactor expanding and rolling test for function with input

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40393:


Assignee: Apache Spark

> Refactor expanding and rolling test for function with input
> ---
>
> Key: SPARK-40393
> URL: https://issues.apache.org/jira/browse/SPARK-40393
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40393) Refactor expanding and rolling test for function with input

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601785#comment-17601785
 ] 

Apache Spark commented on SPARK-40393:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37835

> Refactor expanding and rolling test for function with input
> ---
>
> Key: SPARK-40393
> URL: https://issues.apache.org/jira/browse/SPARK-40393
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40393) Refactor expanding and rolling test for function with input

2022-09-08 Thread Yikun Jiang (Jira)

Yikun Jiang created SPARK-40393:
---

 Summary: Refactor expanding and rolling test for function with 
input
 Key: SPARK-40393
 URL: https://issues.apache.org/jira/browse/SPARK-40393
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.4.0
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40392) Test the error class INDEX_OUT_OF_BOUNDS

2022-09-08 Thread Max Gekk (Jira)

Max Gekk created SPARK-40392:


 Summary: Test the error class INDEX_OUT_OF_BOUNDS
 Key: SPARK-40392
 URL: https://issues.apache.org/jira/browse/SPARK-40392
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place 
it to QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40392) Test the error class INDEX_OUT_OF_BOUNDS

2022-09-08 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-40392:
-
Description: Add a test for the error class INDEX_OUT_OF_BOUNDS and place 
it to QueryExecutionErrorsSuite.  (was: Add a test for the error class 
UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place it to QueryExecutionErrorsSuite.)

> Test the error class INDEX_OUT_OF_BOUNDS
> 
>
> Key: SPARK-40392
> URL: https://issues.apache.org/jira/browse/SPARK-40392
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>
> Add a test for the error class INDEX_OUT_OF_BOUNDS and place it to 
> QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40391) Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION

2022-09-08 Thread Max Gekk (Jira)

Max Gekk created SPARK-40391:


 Summary: Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION
 Key: SPARK-40391
 URL: https://issues.apache.org/jira/browse/SPARK-40391
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place 
it to QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40390) Spark Master UI - SSL implementation

2022-09-08 Thread Rhajvijay Manoharan (Jira)

Rhajvijay Manoharan created SPARK-40390:
---

 Summary: Spark Master UI - SSL implementation
 Key: SPARK-40390
 URL: https://issues.apache.org/jira/browse/SPARK-40390
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.8
Reporter: Rhajvijay Manoharan


While I tried to implement SSL for the master UI, we were getting the below 
error while having the spark-core (spark-core_2.11-{*}2.4.8{*}.jar) :

22/09/08 03:45:03 ERROR MasterWebUI: Failed to bind MasterWebUI
java.lang.IllegalStateException: KeyStores with multiple certificates are not 
supported on the base class org.spark_project.jetty.util.ssl.SslContextFactory. 
(Use org.spark_project.jetty.util.ssl.SslContextFactory$Server or 
org.spark_project.jetty.util.ssl.SslContextFactory$Client instead)
        at 
org.spark_project.jetty.util.ssl.SslContextFactory.newSniX509ExtendedKeyManager(SslContextFactory.java:1283)
        at 
org.spark_project.jetty.util.ssl.SslContextFactory.getKeyManagers(SslContextFactory.java:1265)hread.run(Thread.java:745)

 

But, while having the spark-core (spark-core_2.11-{*}2.4.3{*}.jar), we are not 
having any issues.

Please suggest how we can mitigate this issue in the latest version of the 
spark-core_2.11-2.4.8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40333) Implement `GroupBy.nth`.

2022-09-08 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40333.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37801
[https://github.com/apache/spark/pull/37801]

> Implement `GroupBy.nth`.
> 
>
> Key: SPARK-40333
> URL: https://issues.apache.org/jira/browse/SPARK-40333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> We should implement `GroupBy.nth` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.nth.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40292) arrays_zip output unexpected alias column names

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40292:


Assignee: (was: Apache Spark)

> arrays_zip output unexpected alias column names
> ---
>
> Key: SPARK-40292
> URL: https://issues.apache.org/jira/browse/SPARK-40292
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Priority: Major
>
> For the below query:
> {code:sql}
> with q as (
>   select
>     named_struct(
>       'my_array', array(named_struct('x', 1, 'y', 2))
>     ) as my_struct
> )
> select
>   arrays_zip(my_struct.my_array)
> from
>   q {code}
> The latest spark gives the below schema, the field name "my_array" was 
> changed to "0"
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  |    |-- element: struct (containsNull = false)
>  |    |    |-- 0: struct (nullable = true)
>  |    |    |    |-- x: integer (nullable = true)
>  |    |    |    |-- y: integer (nullable = true){code}
> While Spark 3.1 gives the expected result
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  ||-- element: struct (containsNull = false)
>  |||-- my_array: struct (nullable = true)
>  ||||-- x: integer (nullable = true)
>  ||||-- y: integer (nullable = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40292) arrays_zip output unexpected alias column names

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601694#comment-17601694
 ] 

Apache Spark commented on SPARK-40292:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/37833

> arrays_zip output unexpected alias column names
> ---
>
> Key: SPARK-40292
> URL: https://issues.apache.org/jira/browse/SPARK-40292
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Priority: Major
>
> For the below query:
> {code:sql}
> with q as (
>   select
>     named_struct(
>       'my_array', array(named_struct('x', 1, 'y', 2))
>     ) as my_struct
> )
> select
>   arrays_zip(my_struct.my_array)
> from
>   q {code}
> The latest spark gives the below schema, the field name "my_array" was 
> changed to "0"
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  |    |-- element: struct (containsNull = false)
>  |    |    |-- 0: struct (nullable = true)
>  |    |    |    |-- x: integer (nullable = true)
>  |    |    |    |-- y: integer (nullable = true){code}
> While Spark 3.1 gives the expected result
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  ||-- element: struct (containsNull = false)
>  |||-- my_array: struct (nullable = true)
>  ||||-- x: integer (nullable = true)
>  ||||-- y: integer (nullable = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40292) arrays_zip output unexpected alias column names

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40292:


Assignee: Apache Spark

> arrays_zip output unexpected alias column names
> ---
>
> Key: SPARK-40292
> URL: https://issues.apache.org/jira/browse/SPARK-40292
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Assignee: Apache Spark
>Priority: Major
>
> For the below query:
> {code:sql}
> with q as (
>   select
>     named_struct(
>       'my_array', array(named_struct('x', 1, 'y', 2))
>     ) as my_struct
> )
> select
>   arrays_zip(my_struct.my_array)
> from
>   q {code}
> The latest spark gives the below schema, the field name "my_array" was 
> changed to "0"
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  |    |-- element: struct (containsNull = false)
>  |    |    |-- 0: struct (nullable = true)
>  |    |    |    |-- x: integer (nullable = true)
>  |    |    |    |-- y: integer (nullable = true){code}
> While Spark 3.1 gives the expected result
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  ||-- element: struct (containsNull = false)
>  |||-- my_array: struct (nullable = true)
>  ||||-- x: integer (nullable = true)
>  ||||-- y: integer (nullable = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40389) Decimals can't upcast as integral types if the cast can overflow

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601684#comment-17601684
 ] 

Apache Spark commented on SPARK-40389:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37832

> Decimals can't upcast as integral types if the cast can overflow
> 
>
> Key: SPARK-40389
> URL: https://issues.apache.org/jira/browse/SPARK-40389
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast 
> the `from` type to `to` type without any truncating or precision lose or 
> possible runtime failures.
> Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
> This is wrong, since casting 90BD as Integer type will overflow.
> As a result:
>  * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
> will mistakenly convert "cast(cast(90BD as int) as long)" as 
> "cast(90BD as long)"
>  * The STRICT store assignment policy relies on this method too. With the 
> policy enabled, inserting 90BD into integer columns will pass 
> compiling time check and insert an unexpected value 410065408.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40389) Decimals can't upcast as integral types if the cast can overflow

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40389:


Assignee: Gengliang Wang  (was: Apache Spark)

> Decimals can't upcast as integral types if the cast can overflow
> 
>
> Key: SPARK-40389
> URL: https://issues.apache.org/jira/browse/SPARK-40389
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast 
> the `from` type to `to` type without any truncating or precision lose or 
> possible runtime failures.
> Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
> This is wrong, since casting 90BD as Integer type will overflow.
> As a result:
>  * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
> will mistakenly convert "cast(cast(90BD as int) as long)" as 
> "cast(90BD as long)"
>  * The STRICT store assignment policy relies on this method too. With the 
> policy enabled, inserting 90BD into integer columns will pass 
> compiling time check and insert an unexpected value 410065408.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40389) Decimals can't upcast as integral types if the cast can overflow

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40389:


Assignee: Apache Spark  (was: Gengliang Wang)

> Decimals can't upcast as integral types if the cast can overflow
> 
>
> Key: SPARK-40389
> URL: https://issues.apache.org/jira/browse/SPARK-40389
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast 
> the `from` type to `to` type without any truncating or precision lose or 
> possible runtime failures.
> Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
> This is wrong, since casting 90BD as Integer type will overflow.
> As a result:
>  * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
> will mistakenly convert "cast(cast(90BD as int) as long)" as 
> "cast(90BD as long)"
>  * The STRICT store assignment policy relies on this method too. With the 
> policy enabled, inserting 90BD into integer columns will pass 
> compiling time check and insert an unexpected value 410065408.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40389) Decimals can't upcast as integral types if the cast can overflow

2022-09-08 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-40389:
---
Description: 
In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast the 
`from` type to `to` type without any truncating or precision lose or possible 
runtime failures.

Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
This is wrong, since casting 90BD as Integer type will overflow.

As a result:
 * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
will mistakenly convert "cast(cast(90BD as int) as long)" as 
"cast(90BD as long)"
 * The STRICT store assignment policy relies on this method too. With the 
policy enabled, inserting 90BD into integer columns will pass compiling 
time check and insert an unexpected value 410065408.

  was:
In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast the 
`from` type to `to` type without any truncating or precision lose or possible 
runtime failures.

Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
This is wrong, since casting 90BD as Integer type will overflow.

As a result:
 * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
will mistakenly convert "cast(cast(90BD as int) as long)" as 
"cast(90BD as long)"
 * The STRICT store assignment policy relies on this method too. With the 
policy enabled, inserting 90BD into integer columns will pass compiling 
time check and unexpectedly cause runtime errors.


> Decimals can't upcast as integral types if the cast can overflow
> 
>
> Key: SPARK-40389
> URL: https://issues.apache.org/jira/browse/SPARK-40389
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast 
> the `from` type to `to` type without any truncating or precision lose or 
> possible runtime failures.
> Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
> This is wrong, since casting 90BD as Integer type will overflow.
> As a result:
>  * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
> will mistakenly convert "cast(cast(90BD as int) as long)" as 
> "cast(90BD as long)"
>  * The STRICT store assignment policy relies on this method too. With the 
> policy enabled, inserting 90BD into integer columns will pass 
> compiling time check and insert an unexpected value 410065408.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40389) Decimals can't upcast as integral types if the cast can overflow

2022-09-08 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-40389:
--

 Summary: Decimals can't upcast as integral types if the cast can 
overflow
 Key: SPARK-40389
 URL: https://issues.apache.org/jira/browse/SPARK-40389
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


In Spark SQL, the method "canUpCast" returns true iff we can safely up-cast the 
`from` type to `to` type without any truncating or precision lose or possible 
runtime failures.

Meanwhile, DecimalType(10, 0) is considered as "canUpCast" to Integer type. 
This is wrong, since casting 90BD as Integer type will overflow.

As a result:
 * The optimizer rule SimplifyCasts replies on the method "canUpCast" and it 
will mistakenly convert "cast(cast(90BD as int) as long)" as 
"cast(90BD as long)"
 * The STRICT store assignment policy relies on this method too. With the 
policy enabled, inserting 90BD into integer columns will pass compiling 
time check and unexpectedly cause runtime errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40388) SQL configuration spark.sql.mapKeyDedupPolicy not always applied

2022-09-08 Thread Paul Praet (Jira)

Paul Praet created SPARK-40388:
--

 Summary: SQL configuration spark.sql.mapKeyDedupPolicy not always 
applied
 Key: SPARK-40388
 URL: https://issues.apache.org/jira/browse/SPARK-40388
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Paul Praet


I have set spark.sql.mapKeyDedupPolicy to LAST_WIN.

However, I had still one failure where I got
{quote}Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 7 in stage 1201.0 failed 4 times, most recent failure: Lost task 
7.3 in stage 1201.0 (TID 1011313) (ip-10-1-34-47.eu-west-1.compute.internal 
executor 228): java.lang.RuntimeException: Duplicate map key domain was found, 
please check the input data. If you want to remove the duplicated keys, you can 
set spark.sql.mapKeyDedupPolicy to LAST_WIN so that the key inserted at last 
takes precedence.
{quote}
We are confident we set the right configuration in SparkConf (we can find it on 
the Spark UI -> Environment).

It is our impression this configuration is not propagated reliably to the 
executors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40354) Support eliminate dynamic partition for v1 writes

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40354:


Assignee: Apache Spark

> Support eliminate dynamic partition for v1 writes
> -
>
> Key: SPARK-40354
> URL: https://issues.apache.org/jira/browse/SPARK-40354
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> v1 writes will add an extra sort for dynamic columns, e.g.
> {code:java}
> INSERT INTO TABLE t1 PARTITION(p)
> SELECT c1, c2, 'a' as p FROM t2 {code}
> if the dynamic columns are foldable, we can optimize it to:
> {code:java}
> INSERT INTO TABLE t1 PARTITION(p='a')
> SELECT c1, c2 FROM t2 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40354) Support eliminate dynamic partition for v1 writes

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40354:


Assignee: (was: Apache Spark)

> Support eliminate dynamic partition for v1 writes
> -
>
> Key: SPARK-40354
> URL: https://issues.apache.org/jira/browse/SPARK-40354
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> v1 writes will add an extra sort for dynamic columns, e.g.
> {code:java}
> INSERT INTO TABLE t1 PARTITION(p)
> SELECT c1, c2, 'a' as p FROM t2 {code}
> if the dynamic columns are foldable, we can optimize it to:
> {code:java}
> INSERT INTO TABLE t1 PARTITION(p='a')
> SELECT c1, c2 FROM t2 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40354) Support eliminate dynamic partition for v1 writes

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601673#comment-17601673
 ] 

Apache Spark commented on SPARK-40354:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/37831

> Support eliminate dynamic partition for v1 writes
> -
>
> Key: SPARK-40354
> URL: https://issues.apache.org/jira/browse/SPARK-40354
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> v1 writes will add an extra sort for dynamic columns, e.g.
> {code:java}
> INSERT INTO TABLE t1 PARTITION(p)
> SELECT c1, c2, 'a' as p FROM t2 {code}
> if the dynamic columns are foldable, we can optimize it to:
> {code:java}
> INSERT INTO TABLE t1 PARTITION(p='a')
> SELECT c1, c2 FROM t2 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40387) Improve the implementation of Spark Decimal

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40387:


Assignee: (was: Apache Spark)

> Improve the implementation of Spark Decimal
> ---
>
> Key: SPARK-40387
> URL: https://issues.apache.org/jira/browse/SPARK-40387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark Decimal always use ne first, but eq is better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40387) Improve the implementation of Spark Decimal

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40387:


Assignee: Apache Spark

> Improve the implementation of Spark Decimal
> ---
>
> Key: SPARK-40387
> URL: https://issues.apache.org/jira/browse/SPARK-40387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Spark Decimal always use ne first, but eq is better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40387) Improve the implementation of Spark Decimal

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601672#comment-17601672
 ] 

Apache Spark commented on SPARK-40387:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/37830

> Improve the implementation of Spark Decimal
> ---
>
> Key: SPARK-40387
> URL: https://issues.apache.org/jira/browse/SPARK-40387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark Decimal always use ne first, but eq is better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40387) Improve the implementation of Spark Decimal

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601671#comment-17601671
 ] 

Apache Spark commented on SPARK-40387:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/37830

> Improve the implementation of Spark Decimal
> ---
>
> Key: SPARK-40387
> URL: https://issues.apache.org/jira/browse/SPARK-40387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark Decimal always use ne first, but eq is better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40387) Improve the implementation of Spark Decimal

2022-09-08 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-40387:
--

 Summary: Improve the implementation of Spark Decimal
 Key: SPARK-40387
 URL: https://issues.apache.org/jira/browse/SPARK-40387
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng


Spark Decimal always use ne first, but eq is better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40386) Implement `ddof` in `DataFrame.cov`

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40386:


Assignee: (was: Apache Spark)

> Implement `ddof` in `DataFrame.cov`
> ---
>
> Key: SPARK-40386
> URL: https://issues.apache.org/jira/browse/SPARK-40386
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40386) Implement `ddof` in `DataFrame.cov`

2022-09-08 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601651#comment-17601651
 ] 

Apache Spark commented on SPARK-40386:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/37829

> Implement `ddof` in `DataFrame.cov`
> ---
>
> Key: SPARK-40386
> URL: https://issues.apache.org/jira/browse/SPARK-40386
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40386) Implement `ddof` in `DataFrame.cov`

2022-09-08 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40386:


Assignee: Apache Spark

> Implement `ddof` in `DataFrame.cov`
> ---
>
> Key: SPARK-40386
> URL: https://issues.apache.org/jira/browse/SPARK-40386
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-40386) Implement `ddof` in `DataFrame.cov`

2022-09-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-40386:
-

 Summary: Implement `ddof` in `DataFrame.cov`
 Key: SPARK-40386
 URL: https://issues.apache.org/jira/browse/SPARK-40386
 Project: Spark
  Issue Type: Sub-task
  Components: ps, SQL
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

82 matches

Mail list logo