[jira] [Updated] (SPARK-46263) Clean up unnecessary `SeqOps.view` and `ArrayOps.view` conversions.

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46263:
---
Labels: pull-request-available  (was: )

> Clean up unnecessary `SeqOps.view` and `ArrayOps.view` conversions.
> ---
>
> Key: SPARK-46263
> URL: https://issues.apache.org/jira/browse/SPARK-46263
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46263) Clean up unnecessary `SeqOps.view` and `ArrayOps.view` conversions.

2023-12-04 Thread Yang Jie (Jira)
Yang Jie created SPARK-46263:


 Summary: Clean up unnecessary `SeqOps.view` and `ArrayOps.view` 
conversions.
 Key: SPARK-46263
 URL: https://issues.apache.org/jira/browse/SPARK-46263
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46262) Support `np.left_shift` for Pandas-on-Spark object.

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46262:
---
Labels: pull-request-available  (was: )

> Support `np.left_shift` for Pandas-on-Spark object.
> ---
>
> Key: SPARK-46262
> URL: https://issues.apache.org/jira/browse/SPARK-46262
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Now we support PyArrow>=4.0.0, we can enable the test for `np.left_shift`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46262) Support `np.left_shift` for Pandas-on-Spark object.

2023-12-04 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-46262:
---

 Summary: Support `np.left_shift` for Pandas-on-Spark object.
 Key: SPARK-46262
 URL: https://issues.apache.org/jira/browse/SPARK-46262
 Project: Spark
  Issue Type: Bug
  Components: Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


Now we support PyArrow>=4.0.0, we can enable the test for `np.left_shift`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46249) Fix state store metrics access after commit

2023-12-04 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46249.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44165
[https://github.com/apache/spark/pull/44165]

> Fix state store metrics access after commit
> ---
>
> Key: SPARK-46249
> URL: https://issues.apache.org/jira/browse/SPARK-46249
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix state store metrics access after commit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46249) Fix state store metrics access after commit

2023-12-04 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-46249:


Assignee: Anish Shrigondekar

> Fix state store metrics access after commit
> ---
>
> Key: SPARK-46249
> URL: https://issues.apache.org/jira/browse/SPARK-46249
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
>
> Fix state store metrics access after commit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46260) DataFrame.withColumnsRenamed should respect the dict ordering

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46260:
---
Labels: pull-request-available  (was: )

> DataFrame.withColumnsRenamed should respect the dict ordering
> -
>
> Key: SPARK-46260
> URL: https://issues.apache.org/jira/browse/SPARK-46260
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46257:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Upgrade Derby to 10.16.1.1
> --
>
> Key: SPARK-46257
> URL: https://issues.apache.org/jira/browse/SPARK-46257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> https://db.apache.org/derby/releases/release-10_16_1_1.cgi



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46261) Python Client DataFrame.withColumnsRenamed should respect the dict ordering

2023-12-04 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46261:
-

 Summary: Python Client DataFrame.withColumnsRenamed should respect 
the dict ordering
 Key: SPARK-46261
 URL: https://issues.apache.org/jira/browse/SPARK-46261
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46260) DataFrame.withColumnsRenamed should respect the dict ordering

2023-12-04 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46260:
-

 Summary: DataFrame.withColumnsRenamed should respect the dict 
ordering
 Key: SPARK-46260
 URL: https://issues.apache.org/jira/browse/SPARK-46260
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46258) Add RocksDBPersistenceEngine

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46258:
-

Assignee: Dongjoon Hyun

> Add RocksDBPersistenceEngine
> 
>
> Key: SPARK-46258
> URL: https://issues.apache.org/jira/browse/SPARK-46258
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46001) Spark UI Test Improvements

2023-12-04 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-46001:
-
Shepherd: Dongjoon Hyun

> Spark UI Test Improvements
> --
>
> Key: SPARK-46001
> URL: https://issues.apache.org/jira/browse/SPARK-46001
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, Tests, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>
> Spark UI tests are not supported, it's hard to test for developers and 
> maintain for the owners



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46259) Add appropriate link for error class usage documentation.

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46259:
---
Labels: pull-request-available  (was: )

> Add appropriate link for error class usage documentation.
> -
>
> Key: SPARK-46259
> URL: https://issues.apache.org/jira/browse/SPARK-46259
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We don't have appropriate link for error class usage documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46257:
---
Labels: pull-request-available  (was: )

> Upgrade Derby to 10.16.1.1
> --
>
> Key: SPARK-46257
> URL: https://issues.apache.org/jira/browse/SPARK-46257
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> https://db.apache.org/derby/releases/release-10_16_1_1.cgi



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46259) Add appropriate link for error class usage documentation.

2023-12-04 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-46259:
---

 Summary: Add appropriate link for error class usage documentation.
 Key: SPARK-46259
 URL: https://issues.apache.org/jira/browse/SPARK-46259
 Project: Spark
  Issue Type: Bug
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


We don't have appropriate link for error class usage documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46258) Add RocksDBPersistenceEngine

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46258:
---
Labels: pull-request-available  (was: )

> Add RocksDBPersistenceEngine
> 
>
> Key: SPARK-46258
> URL: https://issues.apache.org/jira/browse/SPARK-46258
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46258) Add RocksDBPersistenceEngine

2023-12-04 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46258:
-

 Summary: Add RocksDBPersistenceEngine
 Key: SPARK-46258
 URL: https://issues.apache.org/jira/browse/SPARK-46258
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1

2023-12-04 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-46257:
-
Summary: Upgrade Derby to 10.16.1.1  (was: Upgrade Derby to 10.17.1.0)

> Upgrade Derby to 10.16.1.1
> --
>
> Key: SPARK-46257
> URL: https://issues.apache.org/jira/browse/SPARK-46257
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> https://db.apache.org/derby/releases/release-10_17_1_0.cgi#New+Features



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46256) Parallel Compression Support for ZSTD

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46256:
---
Labels: pull-request-available  (was: )

> Parallel Compression Support for ZSTD
> -
>
> Key: SPARK-46256
> URL: https://issues.apache.org/jira/browse/SPARK-46256
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46257) Upgrade Derby to 10.17.1.0

2023-12-04 Thread Yang Jie (Jira)
Yang Jie created SPARK-46257:


 Summary: Upgrade Derby to 10.17.1.0
 Key: SPARK-46257
 URL: https://issues.apache.org/jira/browse/SPARK-46257
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie


https://db.apache.org/derby/releases/release-10_17_1_0.cgi#New+Features



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46256) Parallel Compression Support for ZSTD

2023-12-04 Thread Kent Yao (Jira)
Kent Yao created SPARK-46256:


 Summary: Parallel Compression Support for ZSTD
 Key: SPARK-46256
 URL: https://issues.apache.org/jira/browse/SPARK-46256
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46254.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44169
[https://github.com/apache/spark/pull/44169]

> Remove stale Python 3.8/3.7 version checkings
> -
>
> Key: SPARK-46254
> URL: https://issues.apache.org/jira/browse/SPARK-46254
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See PR linked. We dropped Python 3.7 and lowest is Python 3.8 so we can 
> remove all stale checkings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46255) Support complex type -> string conversion

2023-12-04 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46255:
-

 Summary: Support complex type -> string conversion
 Key: SPARK-46255
 URL: https://issues.apache.org/jira/browse/SPARK-46255
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46253) Plan Python data source read using mapInArrow

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46253:
---
Labels: pull-request-available  (was: )

> Plan Python data source read using mapInArrow
> -
>
> Key: SPARK-46253
> URL: https://issues.apache.org/jira/browse/SPARK-46253
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Instead of using a regular Python UDTF, we can actually use an arrow UDF and 
> plan the data source read using the mapInArrow operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46043) Support create table using DSv2 sources

2023-12-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46043:
---

Assignee: Allison Wang

> Support create table using DSv2 sources
> ---
>
> Key: SPARK-46043
> URL: https://issues.apache.org/jira/browse/SPARK-46043
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Support CREATE TABLE ... USING DSv2 sources.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46043) Support create table using DSv2 sources

2023-12-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46043.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43949
[https://github.com/apache/spark/pull/43949]

> Support create table using DSv2 sources
> ---
>
> Key: SPARK-46043
> URL: https://issues.apache.org/jira/browse/SPARK-46043
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support CREATE TABLE ... USING DSv2 sources.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-46213) Add PySparkImportError for error framework

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-46213:
--
  Assignee: (was: Haejoon Lee)

Reverted at 
https://github.com/apache/spark/commit/7f59565b9fc19c496bc7600e168650e7663c0065

> Add PySparkImportError for error framework
> --
>
> Key: SPARK-46213
> URL: https://issues.apache.org/jira/browse/SPARK-46213
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add PySparkImportError for error framework for wrapping ImportError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46213) Add PySparkImportError for error framework

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46213:
-
Fix Version/s: (was: 4.0.0)

> Add PySparkImportError for error framework
> --
>
> Key: SPARK-46213
> URL: https://issues.apache.org/jira/browse/SPARK-46213
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Add PySparkImportError for error framework for wrapping ImportError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall

2023-12-04 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-46009.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43910
[https://github.com/apache/spark/pull/43910]

> Merge the parse rule of PercentileCont and PercentileDisc into functionCall
> ---
>
> Key: SPARK-46009
> URL: https://issues.apache.org/jira/browse/SPARK-46009
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark SQL parser have a special rule to parse 
> [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
> We should merge this rule into the functionCall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46254:
---
Labels: pull-request-available  (was: )

> Remove stale Python 3.8/3.7 version checkings
> -
>
> Key: SPARK-46254
> URL: https://issues.apache.org/jira/browse/SPARK-46254
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See PR linked. We dropped Python 3.7 and lowest is Python 3.8 so we can 
> remove all stale checkings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46254:
-
Issue Type: Improvement  (was: New Feature)

> Remove stale Python 3.8/3.7 version checkings
> -
>
> Key: SPARK-46254
> URL: https://issues.apache.org/jira/browse/SPARK-46254
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See PR linked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46254:
-
Description: See PR linked. We dropped Python 3.7 and lowest is Python 3.8 
so we can remove all stale checkings.  (was: See PR linked.)

> Remove stale Python 3.8/3.7 version checkings
> -
>
> Key: SPARK-46254
> URL: https://issues.apache.org/jira/browse/SPARK-46254
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See PR linked. We dropped Python 3.7 and lowest is Python 3.8 so we can 
> remove all stale checkings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings

2023-12-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46254:


 Summary: Remove stale Python 3.8/3.7 version checkings
 Key: SPARK-46254
 URL: https://issues.apache.org/jira/browse/SPARK-46254
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See PR linked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46253) Plan Python data source read using mapInArrow

2023-12-04 Thread Allison Wang (Jira)
Allison Wang created SPARK-46253:


 Summary: Plan Python data source read using mapInArrow
 Key: SPARK-46253
 URL: https://issues.apache.org/jira/browse/SPARK-46253
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Instead of using a regular Python UDTF, we can actually use an arrow UDF and 
plan the data source read using the mapInArrow operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46040) Update API for 'analyze' partitioning/ordering columns to support general expressions

2023-12-04 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-46040.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43946
[https://github.com/apache/spark/pull/43946]

> Update API for 'analyze' partitioning/ordering columns to support general 
> expressions
> -
>
> Key: SPARK-46040
> URL: https://issues.apache.org/jira/browse/SPARK-46040
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46233) Migrate all remaining ArrtibuteError into PySpark error framework

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46233:
-

Assignee: Haejoon Lee

> Migrate all remaining ArrtibuteError into PySpark error framework
> -
>
> Key: SPARK-46233
> URL: https://issues.apache.org/jira/browse/SPARK-46233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46233) Migrate all remaining ArrtibuteError into PySpark error framework

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46233.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44150
[https://github.com/apache/spark/pull/44150]

> Migrate all remaining ArrtibuteError into PySpark error framework
> -
>
> Key: SPARK-46233
> URL: https://issues.apache.org/jira/browse/SPARK-46233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46250) Deflake test_parity_listener

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46250:


Assignee: Wei Liu

> Deflake test_parity_listener
> 
>
> Key: SPARK-46250
> URL: https://issues.apache.org/jira/browse/SPARK-46250
> Project: Spark
>  Issue Type: Task
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46250) Deflake test_parity_listener

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46250.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44166
[https://github.com/apache/spark/pull/44166]

> Deflake test_parity_listener
> 
>
> Key: SPARK-46250
> URL: https://issues.apache.org/jira/browse/SPARK-46250
> Project: Spark
>  Issue Type: Task
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46243) Describe arguments of decode()

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46243.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44157
[https://github.com/apache/spark/pull/44157]

> Describe arguments of decode()
> --
>
> Key: SPARK-46243
> URL: https://issues.apache.org/jira/browse/SPARK-46243
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Update the description of the `StringDecode` expression and apparently the 
> `decode()` function by describing the arguments `bin` and `charset`. The 
> ticket aims to improve user experience with Spark SQL by documenting the 
> public function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46252) Improve test coverage of memory_profiler.py

2023-12-04 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46252:


 Summary: Improve test coverage of memory_profiler.py
 Key: SPARK-46252
 URL: https://issues.apache.org/jira/browse/SPARK-46252
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46239) Hide Jetty info

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46239:
--
Affects Version/s: 3.4.1
   3.3.2
   3.2.4
   3.1.3
   3.0.3

> Hide Jetty info 
> 
>
> Key: SPARK-46239
> URL: https://issues.apache.org/jira/browse/SPARK-46239
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: chenyu
>Assignee: chenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.3.4, 3.4.3
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46239) Spark jetty exposes version information

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46239.
---
Fix Version/s: 3.3.4
   3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44158
[https://github.com/apache/spark/pull/44158]

> Spark jetty exposes version information
> ---
>
> Key: SPARK-46239
> URL: https://issues.apache.org/jira/browse/SPARK-46239
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: chenyu
>Assignee: chenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.4, 3.4.3, 3.5.1, 4.0.0
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46239) Spark jetty exposes version information

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46239:
-

Assignee: chenyu

> Spark jetty exposes version information
> ---
>
> Key: SPARK-46239
> URL: https://issues.apache.org/jira/browse/SPARK-46239
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: chenyu
>Assignee: chenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46239) Hide Jetty info

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46239:
--
Summary: Hide Jetty info   (was: Spark jetty exposes version information)

> Hide Jetty info 
> 
>
> Key: SPARK-46239
> URL: https://issues.apache.org/jira/browse/SPARK-46239
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: chenyu
>Assignee: chenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.3.4, 3.4.3
>
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into \{{None}} when the target type is 
\{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting \{{null}} into \{{None }} when the target type 
is \{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into \{{None}} when the 
> target type is \{{{}an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting \{{null}} into \{{None }} when the target type 
is \{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting null into None when the target 
type is {{{}an Option{}}}. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting \{{null}} into \{{None }} when the 
> target type is \{{{}an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira

[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Summary: Spark 3.3.3 tuple encoders built using Encoders.tuple do not 
correctly cast null into None for Option values  (was: Spark 3.3.3 tuple 
encoders built using `Encoders.tuple` do not correctly cast null into None for 
Option values)

> Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast 
> null into None for Option values
> 
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the 
> target type is an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into {{None}} when the target type is 
an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into \{{None}} when the target type is 
\{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the 
> target type is an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting null into None when the target 
type is {{{}an Option{}}}. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into {{None }}when the target type is 
an {{{}Option{}}}. 

 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}

 
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting null into None when 
> the target type is {{{}an Option{}}}. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by 

[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into {{None }}when the target type is 
an {{{}Option{}}}. 

 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}

 
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, 
..)` correctly handle casting `null` into `None` when the target type is an 
`Option`. 

 

In Spark `3.3.3`, this behaviour has changed and the Option value comes through 
as `null` which is likely to cause a `NullPointerException` for most Scala code 
that operates on the Option. The change seems to be related to the following 
commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match - since 3.3.3 this could fail if the 
encoder is derived manually using `Encoders.tuple(leftEncoder, rightEncoder)`. 
If the entire tuple encoder `Encoder[(Left, Option[Right]])` is derived at 
once, the encoder works as expected - the bug appears to be in the following 
function inside `ExpressionEncoder.scala`

```
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...
```


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into {{None }}when the 
> target type is an {{{}Option{}}}. 
>  
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
>  
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
>  
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
>  
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-46250) Deflake test_parity_listener

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46250:
---
Labels: pull-request-available  (was: )

> Deflake test_parity_listener
> 
>
> Key: SPARK-46250
> URL: https://issues.apache.org/jira/browse/SPARK-46250
> Project: Spark
>  Issue Type: Task
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Summary: Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not 
correctly cast null into None for Option values  (was: Spark 3.3.3 tuple 
encoders do not correctly cast null into None for Option values)

> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, 
> ..)` correctly handle casting `null` into `None` when the target type is an 
> `Option`. 
>  
> In Spark `3.3.3`, this behaviour has changed and the Option value comes 
> through as `null` which is likely to cause a `NullPointerException` for most 
> Scala code that operates on the Option. The change seems to be related to the 
> following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
>  
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
>  
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match - since 3.3.3 this could fail if the 
> encoder is derived manually using `Encoders.tuple(leftEncoder, 
> rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` 
> is derived at once, the encoder works as expected - the bug appears to be in 
> the following function inside `ExpressionEncoder.scala`
> ```
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Summary: Spark 3.3.3 tuple encoders do not correctly cast null into None 
for Option values  (was: Spark 3.3.3 tuple encoders do not correctly casting 
null into None for Option values)

> Spark 3.3.3 tuple encoders do not correctly cast null into None for Option 
> values
> -
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, 
> ..)` correctly handle casting `null` into `None` when the target type is an 
> `Option`. 
>  
> In Spark `3.3.3`, this behaviour has changed and the Option value comes 
> through as `null` which is likely to cause a `NullPointerException` for most 
> Scala code that operates on the Option. The change seems to be related to the 
> following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
>  
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
>  
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match - since 3.3.3 this could fail if the 
> encoder is derived manually using `Encoders.tuple(leftEncoder, 
> rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` 
> is derived at once, the encoder works as expected - the bug appears to be in 
> the following function inside `ExpressionEncoder.scala`
> ```
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46251) Spark 3.3.3 tuple encoders do not correctly casting null into None for Option values

2023-12-04 Thread Will Boulter (Jira)
Will Boulter created SPARK-46251:


 Summary: Spark 3.3.3 tuple encoders do not correctly casting null 
into None for Option values
 Key: SPARK-46251
 URL: https://issues.apache.org/jira/browse/SPARK-46251
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1, 3.4.0, 3.4.2, 3.3.3
Reporter: Will Boulter


In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, 
..)` correctly handle casting `null` into `None` when the target type is an 
`Option`. 

 

In Spark `3.3.3`, this behaviour has changed and the Option value comes through 
as `null` which is likely to cause a `NullPointerException` for most Scala code 
that operates on the Option. The change seems to be related to the following 
commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match - since 3.3.3 this could fail if the 
encoder is derived manually using `Encoders.tuple(leftEncoder, rightEncoder)`. 
If the entire tuple encoder `Encoder[(Left, Option[Right]])` is derived at 
once, the encoder works as expected - the bug appears to be in the following 
function inside `ExpressionEncoder.scala`

```
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46250) Deflake test_parity_listener

2023-12-04 Thread Wei Liu (Jira)
Wei Liu created SPARK-46250:
---

 Summary: Deflake test_parity_listener
 Key: SPARK-46250
 URL: https://issues.apache.org/jira/browse/SPARK-46250
 Project: Spark
  Issue Type: Task
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46249) Fix state store metrics access after commit

2023-12-04 Thread Anish Shrigondekar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793036#comment-17793036
 ] 

Anish Shrigondekar commented on SPARK-46249:


PR here - [https://github.com/apache/spark/pull/44165]

 

cc - [~kabhwan] 

> Fix state store metrics access after commit
> ---
>
> Key: SPARK-46249
> URL: https://issues.apache.org/jira/browse/SPARK-46249
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
>
> Fix state store metrics access after commit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46249) Fix state store metrics access after commit

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46249:
---
Labels: pull-request-available  (was: )

> Fix state store metrics access after commit
> ---
>
> Key: SPARK-46249
> URL: https://issues.apache.org/jira/browse/SPARK-46249
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Anish Shrigondekar
>Priority: Major
>  Labels: pull-request-available
>
> Fix state store metrics access after commit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46092) Overflow in Parquet row group filter creation causes incorrect results

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46092:
--
Fix Version/s: 3.3.4

> Overflow in Parquet row group filter creation causes incorrect results
> --
>
> Key: SPARK-46092
> URL: https://issues.apache.org/jira/browse/SPARK-46092
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.3.4, 3.4.3
>
>
> While the parquet readers don't support reading parquet values into larger 
> Spark types, it's possible to trigger an overflow when creating a Parquet row 
> group filter that will then incorrectly skip row groups and bypass the 
> exception in the reader,
> Repro:
> {code:java}
> Seq(0).toDF("a").write.parquet(path)
> spark.read.schema("a LONG").parquet(path).where(s"a < 
> ${Long.MaxValue}").collect(){code}
> This succeeds and returns no results. This should either fail if the Parquet 
> reader doesn't support the upcast from int to long or produce result `[0]` if 
> it does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46245:
-

Assignee: Yang Jie

> Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`
> --
>
> Key: SPARK-46245
> URL: https://issues.apache.org/jira/browse/SPARK-46245
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s, Spark Core, SQL, YARN
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46245.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44160
[https://github.com/apache/spark/pull/44160]

> Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`
> --
>
> Key: SPARK-46245
> URL: https://issues.apache.org/jira/browse/SPARK-46245
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s, Spark Core, SQL, YARN
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32246) Enable streaming-kinesis-asl tests in Github Action CI

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32246:
-

Assignee: junyuc25

> Enable streaming-kinesis-asl tests in Github Action CI
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: junyuc25
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32246) Enable streaming-kinesis-asl tests in Github Action CI

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32246:
--
Summary: Enable streaming-kinesis-asl tests in Github Action CI  (was: Have 
a way to optionally run streaming-kinesis-asl)

> Enable streaming-kinesis-asl tests in Github Action CI
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32246.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43736
[https://github.com/apache/spark/pull/43736]

> Have a way to optionally run streaming-kinesis-asl
> --
>
> Key: SPARK-32246
> URL: https://issues.apache.org/jira/browse/SPARK-32246
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on 
> external Amazon kinesis service.
> We should have a way to run it optionally. Currently, this is not being run 
> in Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-38473) Use error classes in org.apache.spark.scheduler

2023-12-04 Thread Asmita Limaye (Jira)


[ https://issues.apache.org/jira/browse/SPARK-38473 ]


Asmita Limaye deleted comment on SPARK-38473:
---

was (Author: JIRAUSER303226):
Hi [~bozhang],

I was working on this issue (with Hannah) and had some doubts related to how to 
assign the sqlState field. 

You can see my (WIP) PR here: [https://github.com/apache/spark/pull/43941]

I was wondering if you could help with telling me how to properly assign the 
sqlStates to the new error classes created. I have gone through the error 
README, but I'm still not sure how to ensure I'm doing it correctly.

Thanks!

-Asmita

> Use error classes in org.apache.spark.scheduler
> ---
>
> Key: SPARK-38473
> URL: https://issues.apache.org/jira/browse/SPARK-38473
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46249) Fix state store metrics access after commit

2023-12-04 Thread Anish Shrigondekar (Jira)
Anish Shrigondekar created SPARK-46249:
--

 Summary: Fix state store metrics access after commit
 Key: SPARK-46249
 URL: https://issues.apache.org/jira/browse/SPARK-46249
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Anish Shrigondekar


Fix state store metrics access after commit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46248:
---
Labels: pull-request-available  (was: )

> Support ignoreCorruptFiles and ignoreMissingFiles options in XML
> 
>
> Key: SPARK-46248
> URL: https://issues.apache.org/jira/browse/SPARK-46248
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> This PR corrects the handling of corrupt or missing multiline XML files by 
> respecting user-specific options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML

2023-12-04 Thread Shujing Yang (Jira)
Shujing Yang created SPARK-46248:


 Summary: Support ignoreCorruptFiles and ignoreMissingFiles options 
in XML
 Key: SPARK-46248
 URL: https://issues.apache.org/jira/browse/SPARK-46248
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Shujing Yang


This PR corrects the handling of corrupt or missing multiline XML files by 
respecting user-specific options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39800) DataSourceV2: View support

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-39800:
---
Labels: pull-request-available  (was: )

> DataSourceV2: View support
> --
>
> Key: SPARK-39800
> URL: https://issues.apache.org/jira/browse/SPARK-39800
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>  Labels: pull-request-available
>
> Support Data source V2 views.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46225) Collapse stacked withColumns into a single message

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46225:
---
Labels: pull-request-available  (was: )

> Collapse stacked withColumns into a single message
> --
>
> Key: SPARK-46225
> URL: https://issues.apache.org/jira/browse/SPARK-46225
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>
> It is quite a common patten to create queries with heavily nested 
> withColumns(..) calls. This can easily lead to hitting proto recursion limits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46246:
---
Labels: pull-request-available  (was: )

> Support EXECUTE IMMEDIATE sytax in Spark SQL
> 
>
> Key: SPARK-46246
> URL: https://issues.apache.org/jira/browse/SPARK-46246
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
>
> Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries 
> from within SQL.
> This API executes query passed as string with arguments.
> Other DBs that support this:
>  * 
> [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm]
>  * 
> [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate]
>  * 
> [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46092) Overflow in Parquet row group filter creation causes incorrect results

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46092:
--
Fix Version/s: 3.4.3

> Overflow in Parquet row group filter creation causes incorrect results
> --
>
> Key: SPARK-46092
> URL: https://issues.apache.org/jira/browse/SPARK-46092
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.4.3
>
>
> While the parquet readers don't support reading parquet values into larger 
> Spark types, it's possible to trigger an overflow when creating a Parquet row 
> group filter that will then incorrectly skip row groups and bypass the 
> exception in the reader,
> Repro:
> {code:java}
> Seq(0).toDF("a").write.parquet(path)
> spark.read.schema("a LONG").parquet(path).where(s"a < 
> ${Long.MaxValue}").collect(){code}
> This succeeds and returns no results. This should either fail if the Parquet 
> reader doesn't support the upcast from int to long or produce result `[0]` if 
> it does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46092) Overflow in Parquet row group filter creation causes incorrect results

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46092:
--
Fix Version/s: 3.5.1

> Overflow in Parquet row group filter creation causes incorrect results
> --
>
> Key: SPARK-46092
> URL: https://issues.apache.org/jira/browse/SPARK-46092
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Johan Lasperas
>Assignee: Johan Lasperas
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> While the parquet readers don't support reading parquet values into larger 
> Spark types, it's possible to trigger an overflow when creating a Parquet row 
> group filter that will then incorrectly skip row groups and bypass the 
> exception in the reader,
> Repro:
> {code:java}
> Seq(0).toDF("a").write.parquet(path)
> spark.read.schema("a LONG").parquet(path).where(s"a < 
> ${Long.MaxValue}").collect(){code}
> This succeeds and returns no results. This should either fail if the Parquet 
> reader doesn't support the upcast from int to long or produce result `[0]` if 
> it does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46231) Migrate all remaining NotImplementedError & TypeError into PySpark error framework

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46231.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44148
[https://github.com/apache/spark/pull/44148]

> Migrate all remaining NotImplementedError & TypeError into PySpark error 
> framework
> --
>
> Key: SPARK-46231
> URL: https://issues.apache.org/jira/browse/SPARK-46231
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46231) Migrate all remaining NotImplementedError & TypeError into PySpark error framework

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46231:
-

Assignee: Haejoon Lee

> Migrate all remaining NotImplementedError & TypeError into PySpark error 
> framework
> --
>
> Key: SPARK-46231
> URL: https://issues.apache.org/jira/browse/SPARK-46231
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46237) Make `HiveDDLSuite` independently testable

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46237:
--
Summary: Make `HiveDDLSuite` independently testable  (was: Fix test failed 
of `HiveDDLSuite`)

> Make `HiveDDLSuite` independently testable
> --
>
> Key: SPARK-46237
> URL: https://issues.apache.org/jira/browse/SPARK-46237
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Run `
> build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" 
> -Phive
> `
> {code:java}
> [info] - SPARK-34261: Avoid side effect if create exists temporary function 
> *** FAILED *** (4 milliseconds)
> [info]   java.util.NoSuchElementException: key not found: default
> [info]   at scala.collection.MapOps.default(Map.scala:274)
> [info]   at scala.collection.MapOps.default$(Map.scala:273)
> [info]   at scala.collection.AbstractMap.default(Map.scala:405)
> [info]   at scala.collection.MapOps.apply(Map.scala:176)
> [info]   at scala.collection.MapOps.apply$(Map.scala:175)
> [info]   at scala.collection.AbstractMap.apply(Map.scala:405)
> [info]   at 
> org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
> [info]   at 
> org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
> [info]   at 
> org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> [info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> [info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> [info]   at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> [info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:333)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> 

[jira] [Assigned] (SPARK-46237) Fix test failed of `HiveDDLSuite`

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46237:
-

Assignee: Yang Jie

> Fix test failed of `HiveDDLSuite`
> -
>
> Key: SPARK-46237
> URL: https://issues.apache.org/jira/browse/SPARK-46237
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> Run `
> build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" 
> -Phive
> `
> {code:java}
> [info] - SPARK-34261: Avoid side effect if create exists temporary function 
> *** FAILED *** (4 milliseconds)
> [info]   java.util.NoSuchElementException: key not found: default
> [info]   at scala.collection.MapOps.default(Map.scala:274)
> [info]   at scala.collection.MapOps.default$(Map.scala:273)
> [info]   at scala.collection.AbstractMap.default(Map.scala:405)
> [info]   at scala.collection.MapOps.apply(Map.scala:176)
> [info]   at scala.collection.MapOps.apply$(Map.scala:175)
> [info]   at scala.collection.AbstractMap.apply(Map.scala:405)
> [info]   at 
> org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
> [info]   at 
> org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
> [info]   at 
> org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> [info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> [info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> [info]   at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> [info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:333)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> 

[jira] [Resolved] (SPARK-46237) Fix test failed of `HiveDDLSuite`

2023-12-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46237.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44153
[https://github.com/apache/spark/pull/44153]

> Fix test failed of `HiveDDLSuite`
> -
>
> Key: SPARK-46237
> URL: https://issues.apache.org/jira/browse/SPARK-46237
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Run `
> build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" 
> -Phive
> `
> {code:java}
> [info] - SPARK-34261: Avoid side effect if create exists temporary function 
> *** FAILED *** (4 milliseconds)
> [info]   java.util.NoSuchElementException: key not found: default
> [info]   at scala.collection.MapOps.default(Map.scala:274)
> [info]   at scala.collection.MapOps.default$(Map.scala:273)
> [info]   at scala.collection.AbstractMap.default(Map.scala:405)
> [info]   at scala.collection.MapOps.apply(Map.scala:176)
> [info]   at scala.collection.MapOps.apply$(Map.scala:175)
> [info]   at scala.collection.AbstractMap.apply(Map.scala:405)
> [info]   at 
> org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254)
> [info]   at 
> org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326)
> [info]   at 
> org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> [info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> [info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> [info]   at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> [info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:333)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> 

[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-12-04 Thread Bruce Robbins (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792942#comment-17792942
 ] 

Bruce Robbins commented on SPARK-45644:
---

Even though this is the original issue, I closed it as a duplicate because the 
fix was applied under SPARK-45896.

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch 
> worker for task 1.0 in stage 0.0 (TID 1)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> 

[jira] [Resolved] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"

2023-12-04 Thread Bruce Robbins (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins resolved SPARK-45644.
---
Resolution: Duplicate

> After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException 
> "scala.Some is not a valid external type for schema of array"
> --
>
> Key: SPARK-45644
> URL: https://issues.apache.org/jira/browse/SPARK-45644
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Adi Wehrli
>Priority: Major
>
> I do not really know if this is a bug, but I am at the end with my knowledge.
> A Spark job ran successfully with Spark 3.2.x and 3.3.x. 
> But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job 
> with the same data the following always occurs now:
> {code}
> scala.Some is not a valid external type for schema of array
> {code}
> The corresponding stacktrace is:
> {code}
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch 
> worker for task 0.0 in stage 0.0 (TID 0)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380)
>  ~[spark-sql_2.12-3.5.0.jar:3.5.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) 
> ~[scala-library-2.12.15.jar:?]
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.scheduler.Task.run(Task.scala:141) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>  ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
>  ~[spark-common-utils_2.12-3.5.0.jar:3.5.0]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) 
> ~[spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) 
> [spark-core_2.12-3.5.0.jar:3.5.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor 
> msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch 
> worker for task 1.0 in stage 0.0 (TID 1)"
> java.lang.RuntimeException: scala.Some is not a valid external type for 
> schema of array
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown
>  Source) ~[?:?]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown
>  Source) ~[?:?]
>   at 
> 

[jira] [Created] (SPARK-46247) Invalid bucket file error when reading from bucketed table created with PathOutputCommitProtocol

2023-12-04 Thread Jira
Никита Соколов created SPARK-46247:
--

 Summary: Invalid bucket file error when reading from bucketed 
table created with PathOutputCommitProtocol
 Key: SPARK-46247
 URL: https://issues.apache.org/jira/browse/SPARK-46247
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Никита Соколов


I am trying to create an external partioned bucketed table using this code:
{code:java}
spark.read.parquet("s3://faucct/input")
  .repartition(128, col("product_id"))
  .write.partitionBy("features_date").bucketBy(128, "product_id")
  .option("path", "s3://faucct/tmp/output")
  .option("compression", "uncompressed")
  .saveAsTable("tmp.output"){code}
At first it took more time than expected because it had to rename a lot of 
files in the end, which requires copying in S3. But I have used the 
configuration from the documentation – 
[https://spark.apache.org/docs/3.0.0-preview/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast]:
{code:java}
spark.hadoop.fs.s3a.committer.name directory
spark.sql.sources.commitProtocolClass 
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
spark.sql.parquet.output.committer.class 
org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter {code}
It is properly partitioned: every partition_date has exactly 128 files named 
like 
[part-00117-43293810-d0e9-4eee-9be8-e9e50a3e10fd_00117-5eb66a54-2fbb-4775-8f3b-3040b2966a71.c000.parquet|https://s3.console.aws.amazon.com/s3/object/joom-analytics-recom?region=eu-central-1=recom/dataset/best/best-to-cart-rt/user-product-v4/to_cart-faucct/fnw/ipw/msv2/2023-09-15/14d/tmp_3/features_date%3D2023-09-01/part-00117-43293810-d0e9-4eee-9be8-e9e50a3e10fd_00117-5eb66a54-2fbb-4775-8f3b-3040b2966a71.c000.parquet].
Then I am trying to join this table with another one, for example like this:
{code:java}
spark.table("tmp.output").repartition(128, $"product_id")
  .join(spark.table("tmp.output").repartition(128, $"product_id"), 
Seq("product_id")).count(){code}
Because of the configuration I get the following errors:
{code:java}
org.apache.spark.SparkException: [INVALID_BUCKET_FILE] Invalid bucket file: 
s3://faucct/tmp/output/features_date=2023-09-01/part-0-43293810-d0e9-4eee-9be8-e9e50a3e10fd_0-5eb66a54-2fbb-4775-8f3b-3040b2966a71.c000.parquet.
  at 
org.apache.spark.sql.errors.QueryExecutionErrors$.invalidBucketFile(QueryExecutionErrors.scala:2731)
  at 
org.apache.spark.sql.execution.FileSourceScanExec.$anonfun$createBucketedReadRDD$5(DataSourceScanExec.scala:636)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL

2023-12-04 Thread Milan Stefanovic (Jira)
Milan Stefanovic created SPARK-46246:


 Summary: Support EXECUTE IMMEDIATE sytax in Spark SQL
 Key: SPARK-46246
 URL: https://issues.apache.org/jira/browse/SPARK-46246
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Milan Stefanovic


Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries from 
within SQL.

This API executes query passed as string with arguments.

Other DBs that support this:
 * 
[Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm]
 * [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate]
 * 
[PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44900) Cached DataFrame keeps growing

2023-12-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792872#comment-17792872
 ] 

王范明 commented on SPARK-44900:
-

I have analyzed the program execution details in the logs and it seems that 
there is an issue with the 
'org.apache.spark.status.AppStatusListener#updateRDDBlock' method.The method 
directly calculates the usage of {{rdd.memoryUsed}} and {{{}rdd.diskUsed{}}}, 
but it does not pay sufficient attention to the {{{}storageLevel{}}}.

> Cached DataFrame keeps growing
> --
>
> Key: SPARK-44900
> URL: https://issues.apache.org/jira/browse/SPARK-44900
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Varun Nalla
>Priority: Blocker
>
> Scenario :
> We have a kafka streaming application where the data lookups are happening by 
> joining  another DF which is cached, and the caching strategy is 
> MEMORY_AND_DISK.
> However the size of the cached DataFrame keeps on growing for every micro 
> batch the streaming application process and that's being visible under 
> storage tab.
> A similar stack overflow thread was already raised.
> https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46245:
---
Labels: pull-request-available  (was: )

> Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`
> --
>
> Key: SPARK-46245
> URL: https://issues.apache.org/jira/browse/SPARK-46245
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s, Spark Core, SQL, YARN
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`

2023-12-04 Thread Yang Jie (Jira)
Yang Jie created SPARK-46245:


 Summary: Replcace `s.c.MapOps.view.filterKeys` with 
`s.c.MapOps.filter`
 Key: SPARK-46245
 URL: https://issues.apache.org/jira/browse/SPARK-46245
 Project: Spark
  Issue Type: Improvement
  Components: k8s, Spark Core, SQL, YARN
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46244) INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46244:
---
Labels: pull-request-available  (was: )

> INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME
> --
>
> Key: SPARK-46244
> URL: https://issues.apache.org/jira/browse/SPARK-46244
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46244) INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME

2023-12-04 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-46244:
---

 Summary: INSERT/UPDATE * in MERGE should follow the same semantic 
of INSERT BY NAME
 Key: SPARK-46244
 URL: https://issues.apache.org/jira/browse/SPARK-46244
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46243) Describe arguments of decode()

2023-12-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-46243:
-
Description: Update the description of the `StringDecode` expression and 
apparently the `decode()` function by describing the arguments `bin` and 
`charset`. The ticket aims to improve user experience with Spark SQL by 
documenting the public function.  (was: Update the description of the `Encode` 
expression and apparently the `encode()` function by describing the arguments 
`str` and `charset`. The ticket aims to improve user experience with Spark SQL 
by documenting the public function.)

> Describe arguments of decode()
> --
>
> Key: SPARK-46243
> URL: https://issues.apache.org/jira/browse/SPARK-46243
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Update the description of the `StringDecode` expression and apparently the 
> `decode()` function by describing the arguments `bin` and `charset`. The 
> ticket aims to improve user experience with Spark SQL by documenting the 
> public function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46243) Describe arguments of decode()

2023-12-04 Thread Max Gekk (Jira)
Max Gekk created SPARK-46243:


 Summary: Describe arguments of decode()
 Key: SPARK-46243
 URL: https://issues.apache.org/jira/browse/SPARK-46243
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 4.0.0


Update the description of the `Encode` expression and apparently the `encode()` 
function by describing the arguments `str` and `charset`. The ticket aims to 
improve user experience with Spark SQL by documenting the public function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46243) Describe arguments of decode()

2023-12-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-46243:
-
Fix Version/s: (was: 4.0.0)

> Describe arguments of decode()
> --
>
> Key: SPARK-46243
> URL: https://issues.apache.org/jira/browse/SPARK-46243
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Update the description of the `Encode` expression and apparently the 
> `encode()` function by describing the arguments `str` and `charset`. The 
> ticket aims to improve user experience with Spark SQL by documenting the 
> public function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46234) Introduce PySparkKeyError for PySpark error framework

2023-12-04 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46234.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44151
[https://github.com/apache/spark/pull/44151]

> Introduce PySparkKeyError for PySpark error framework
> -
>
> Key: SPARK-46234
> URL: https://issues.apache.org/jira/browse/SPARK-46234
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46234) Introduce PySparkKeyError for PySpark error framework

2023-12-04 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-46234:
-

Assignee: Haejoon Lee

> Introduce PySparkKeyError for PySpark error framework
> -
>
> Key: SPARK-46234
> URL: https://issues.apache.org/jira/browse/SPARK-46234
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46242) Enable all Kinesis tests in Github Actions by default

2023-12-04 Thread junyuc25 (Jira)
junyuc25 created SPARK-46242:


 Summary: Enable all Kinesis tests in Github Actions by default
 Key: SPARK-46242
 URL: https://issues.apache.org/jira/browse/SPARK-46242
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 3.5.0
Reporter: junyuc25


This ticket is created as per the discussion in this PR: 
[https://github.com/apache/spark/pull/43736#issuecomment-1833368339.]  Some 
Kinesis requires interaction with Amazon Kinesis service which would incur 
billing costs to users, thus they are not enabled by default. Further 
investigations are needed to figure out a way to run all Kinesis tests in 
Github Actions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46241) Fix error handling routine so it wouldn't fall into infinite recursion

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46241:
---
Labels: pull-request-available  (was: )

> Fix error handling routine so it wouldn't fall into infinite recursion
> --
>
> Key: SPARK-46241
> URL: https://issues.apache.org/jira/browse/SPARK-46241
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
>
> Currently, we can fall into infinite recursion as follows:
> {quote}[Some error happens] -> _handle_error -> _handle_rpc_error -> 
> _display_server_stack_trace -> RuntimeConf.get -> SparkConnectClient.config 
> -> [An error happens] -> _handle_error.{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started

2023-12-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46186.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44095
[https://github.com/apache/spark/pull/44095]

> Invalid Spark Connect execution state transition if interrupted before thread 
> started
> -
>
> Key: SPARK-46186
> URL: https://issues.apache.org/jira/browse/SPARK-46186
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix edge case where interrupting execution before the ExecuteThreadRunner 
> started could lead to illegal state transition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46241) Fix error handling routine so it wouldn't fall into infinite recursion

2023-12-04 Thread Alice Sayutina (Jira)
Alice Sayutina created SPARK-46241:
--

 Summary: Fix error handling routine so it wouldn't fall into 
infinite recursion
 Key: SPARK-46241
 URL: https://issues.apache.org/jira/browse/SPARK-46241
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Alice Sayutina


Currently, we can fall into infinite recursion as follows:

{quote}[Some error happens] -> _handle_error -> _handle_rpc_error -> 
_display_server_stack_trace -> RuntimeConf.get -> SparkConnectClient.config -> 
[An error happens] -> _handle_error.{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions

2023-12-04 Thread jiang13021 (Jira)
jiang13021 created SPARK-46240:
--

 Summary: Add PrepExecutedPlanRule to SparkSessionExtensions
 Key: SPARK-46240
 URL: https://issues.apache.org/jira/browse/SPARK-46240
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0, 3.3.0, 3.2.0
Reporter: jiang13021


Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. 
However, users do not have the ability to add rules in this context.
{code:java}
// org.apache.spark.sql.execution.QueryExecution#preparations  
private[execution] def preparations(
sparkSession: SparkSession,
adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
subquery: Boolean): Seq[Rule[SparkPlan]] = {
  // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following 
rules will be no-op
  // as the original plan is hidden behind `AdaptiveSparkPlanExec`.
  adaptiveExecutionRule.toSeq ++
  Seq(
CoalesceBucketsInJoin,
PlanDynamicPruningFilters(sparkSession),
PlanSubqueries(sparkSession),
RemoveRedundantProjects,
EnsureRequirements(),
// `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to 
guarantee the
// sort order of each node is checked to be valid.
ReplaceHashWithSortAgg,
// `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to 
guarantee the same
// number of partitions when instantiating PartitioningCollection.
RemoveRedundantSorts,
DisableUnnecessaryBucketedScan,
ApplyColumnarRulesAndInsertTransitions(
  sparkSession.sessionState.columnarRules, outputsColumnar = false),
CollapseCodegenStages()) ++
(if (subquery) {
  Nil
} else {
  Seq(ReuseExchangeAndSubquery)
})
}{code}
We could add an extension called "PrepExecutedPlanRule" to 
SparkSessionExtensions,  which would allow users to add their own rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions

2023-12-04 Thread jiang13021 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiang13021 updated SPARK-46240:
---
Description: 
Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. 
However, users do not have the ability to add rules in this context.
{code:java}
// org.apache.spark.sql.execution.QueryExecution#preparations  
private[execution] def preparations(
sparkSession: SparkSession,
adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
subquery: Boolean): Seq[Rule[SparkPlan]] = {
  // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following 
rules will be no-op
  // as the original plan is hidden behind `AdaptiveSparkPlanExec`.
  adaptiveExecutionRule.toSeq ++
  Seq(
CoalesceBucketsInJoin,
PlanDynamicPruningFilters(sparkSession),
PlanSubqueries(sparkSession),
RemoveRedundantProjects,
EnsureRequirements(),
// `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to 
guarantee the
// sort order of each node is checked to be valid.
ReplaceHashWithSortAgg,
// `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to 
guarantee the same
// number of partitions when instantiating PartitioningCollection.
RemoveRedundantSorts,
DisableUnnecessaryBucketedScan,
ApplyColumnarRulesAndInsertTransitions(
  sparkSession.sessionState.columnarRules, outputsColumnar = false),
CollapseCodegenStages()) ++
(if (subquery) {
  Nil
} else {
  Seq(ReuseExchangeAndSubquery)
})
}{code}
We could add an extension called "PrepExecutedPlanRule" to 
SparkSessionExtensions,  which would allow users to add their own rules.

  was:
Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. 
However, users do not have the ability to add rules in this context.
{code:java}
// org.apache.spark.sql.execution.QueryExecution#preparations  
private[execution] def preparations(
sparkSession: SparkSession,
adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
subquery: Boolean): Seq[Rule[SparkPlan]] = {
  // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following 
rules will be no-op
  // as the original plan is hidden behind `AdaptiveSparkPlanExec`.
  adaptiveExecutionRule.toSeq ++
  Seq(
CoalesceBucketsInJoin,
PlanDynamicPruningFilters(sparkSession),
PlanSubqueries(sparkSession),
RemoveRedundantProjects,
EnsureRequirements(),
// `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to 
guarantee the
// sort order of each node is checked to be valid.
ReplaceHashWithSortAgg,
// `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to 
guarantee the same
// number of partitions when instantiating PartitioningCollection.
RemoveRedundantSorts,
DisableUnnecessaryBucketedScan,
ApplyColumnarRulesAndInsertTransitions(
  sparkSession.sessionState.columnarRules, outputsColumnar = false),
CollapseCodegenStages()) ++
(if (subquery) {
  Nil
} else {
  Seq(ReuseExchangeAndSubquery)
})
}{code}
We could add an extension called "PrepExecutedPlanRule" to 
SparkSessionExtensions,  which would allow users to add their own rules.


> Add PrepExecutedPlanRule to SparkSessionExtensions
> --
>
> Key: SPARK-46240
> URL: https://issues.apache.org/jira/browse/SPARK-46240
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: jiang13021
>Priority: Major
>
> Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. 
> However, users do not have the ability to add rules in this context.
> {code:java}
> // org.apache.spark.sql.execution.QueryExecution#preparations  
> private[execution] def preparations(
> sparkSession: SparkSession,
> adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
> subquery: Boolean): Seq[Rule[SparkPlan]] = {
>   // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following 
> rules will be no-op
>   // as the original plan is hidden behind `AdaptiveSparkPlanExec`.
>   adaptiveExecutionRule.toSeq ++
>   Seq(
> CoalesceBucketsInJoin,
> PlanDynamicPruningFilters(sparkSession),
> PlanSubqueries(sparkSession),
> RemoveRedundantProjects,
> EnsureRequirements(),
> // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` 
> to guarantee the
> // sort order of each node is checked to be valid.
> ReplaceHashWithSortAgg,
> // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to 
> guarantee the same
> // number of partitions when instantiating PartitioningCollection.
> RemoveRedundantSorts,
> DisableUnnecessaryBucketedScan,
> 

[jira] [Updated] (SPARK-46239) Spark jetty exposes version information

2023-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46239:
---
Labels: pull-request-available  (was: )

> Spark jetty exposes version information
> ---
>
> Key: SPARK-46239
> URL: https://issues.apache.org/jira/browse/SPARK-46239
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46239) Spark jetty exposes version information

2023-12-04 Thread chenyu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792746#comment-17792746
 ] 

chenyu commented on SPARK-46239:


It is unsafe to expose version information.

It will obtain remote WWW service information through HTTP.

> Spark jetty exposes version information
> ---
>
> Key: SPARK-46239
> URL: https://issues.apache.org/jira/browse/SPARK-46239
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: chenyu
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >