[jira] [Updated] (SPARK-46263) Clean up unnecessary `SeqOps.view` and `ArrayOps.view` conversions.
[ https://issues.apache.org/jira/browse/SPARK-46263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46263: --- Labels: pull-request-available (was: ) > Clean up unnecessary `SeqOps.view` and `ArrayOps.view` conversions. > --- > > Key: SPARK-46263 > URL: https://issues.apache.org/jira/browse/SPARK-46263 > Project: Spark > Issue Type: Improvement > Components: MLlib, Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46263) Clean up unnecessary `SeqOps.view` and `ArrayOps.view` conversions.
Yang Jie created SPARK-46263: Summary: Clean up unnecessary `SeqOps.view` and `ArrayOps.view` conversions. Key: SPARK-46263 URL: https://issues.apache.org/jira/browse/SPARK-46263 Project: Spark Issue Type: Improvement Components: MLlib, Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46262) Support `np.left_shift` for Pandas-on-Spark object.
[ https://issues.apache.org/jira/browse/SPARK-46262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46262: --- Labels: pull-request-available (was: ) > Support `np.left_shift` for Pandas-on-Spark object. > --- > > Key: SPARK-46262 > URL: https://issues.apache.org/jira/browse/SPARK-46262 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Now we support PyArrow>=4.0.0, we can enable the test for `np.left_shift`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46262) Support `np.left_shift` for Pandas-on-Spark object.
Haejoon Lee created SPARK-46262: --- Summary: Support `np.left_shift` for Pandas-on-Spark object. Key: SPARK-46262 URL: https://issues.apache.org/jira/browse/SPARK-46262 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee Now we support PyArrow>=4.0.0, we can enable the test for `np.left_shift`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46249) Fix state store metrics access after commit
[ https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-46249. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44165 [https://github.com/apache/spark/pull/44165] > Fix state store metrics access after commit > --- > > Key: SPARK-46249 > URL: https://issues.apache.org/jira/browse/SPARK-46249 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Fix state store metrics access after commit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46249) Fix state store metrics access after commit
[ https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-46249: Assignee: Anish Shrigondekar > Fix state store metrics access after commit > --- > > Key: SPARK-46249 > URL: https://issues.apache.org/jira/browse/SPARK-46249 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > > Fix state store metrics access after commit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46260) DataFrame.withColumnsRenamed should respect the dict ordering
[ https://issues.apache.org/jira/browse/SPARK-46260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46260: --- Labels: pull-request-available (was: ) > DataFrame.withColumnsRenamed should respect the dict ordering > - > > Key: SPARK-46260 > URL: https://issues.apache.org/jira/browse/SPARK-46260 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1
[ https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46257: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Upgrade Derby to 10.16.1.1 > -- > > Key: SPARK-46257 > URL: https://issues.apache.org/jira/browse/SPARK-46257 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > https://db.apache.org/derby/releases/release-10_16_1_1.cgi -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46261) Python Client DataFrame.withColumnsRenamed should respect the dict ordering
Ruifeng Zheng created SPARK-46261: - Summary: Python Client DataFrame.withColumnsRenamed should respect the dict ordering Key: SPARK-46261 URL: https://issues.apache.org/jira/browse/SPARK-46261 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46260) DataFrame.withColumnsRenamed should respect the dict ordering
Ruifeng Zheng created SPARK-46260: - Summary: DataFrame.withColumnsRenamed should respect the dict ordering Key: SPARK-46260 URL: https://issues.apache.org/jira/browse/SPARK-46260 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46258) Add RocksDBPersistenceEngine
[ https://issues.apache.org/jira/browse/SPARK-46258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46258: - Assignee: Dongjoon Hyun > Add RocksDBPersistenceEngine > > > Key: SPARK-46258 > URL: https://issues.apache.org/jira/browse/SPARK-46258 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46001) Spark UI Test Improvements
[ https://issues.apache.org/jira/browse/SPARK-46001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-46001: - Shepherd: Dongjoon Hyun > Spark UI Test Improvements > -- > > Key: SPARK-46001 > URL: https://issues.apache.org/jira/browse/SPARK-46001 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Tests, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > > Spark UI tests are not supported, it's hard to test for developers and > maintain for the owners -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46259) Add appropriate link for error class usage documentation.
[ https://issues.apache.org/jira/browse/SPARK-46259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46259: --- Labels: pull-request-available (was: ) > Add appropriate link for error class usage documentation. > - > > Key: SPARK-46259 > URL: https://issues.apache.org/jira/browse/SPARK-46259 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We don't have appropriate link for error class usage documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1
[ https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46257: --- Labels: pull-request-available (was: ) > Upgrade Derby to 10.16.1.1 > -- > > Key: SPARK-46257 > URL: https://issues.apache.org/jira/browse/SPARK-46257 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > https://db.apache.org/derby/releases/release-10_16_1_1.cgi -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46259) Add appropriate link for error class usage documentation.
Haejoon Lee created SPARK-46259: --- Summary: Add appropriate link for error class usage documentation. Key: SPARK-46259 URL: https://issues.apache.org/jira/browse/SPARK-46259 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee We don't have appropriate link for error class usage documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46258) Add RocksDBPersistenceEngine
[ https://issues.apache.org/jira/browse/SPARK-46258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46258: --- Labels: pull-request-available (was: ) > Add RocksDBPersistenceEngine > > > Key: SPARK-46258 > URL: https://issues.apache.org/jira/browse/SPARK-46258 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46258) Add RocksDBPersistenceEngine
Dongjoon Hyun created SPARK-46258: - Summary: Add RocksDBPersistenceEngine Key: SPARK-46258 URL: https://issues.apache.org/jira/browse/SPARK-46258 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1
[ https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-46257: - Summary: Upgrade Derby to 10.16.1.1 (was: Upgrade Derby to 10.17.1.0) > Upgrade Derby to 10.16.1.1 > -- > > Key: SPARK-46257 > URL: https://issues.apache.org/jira/browse/SPARK-46257 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > https://db.apache.org/derby/releases/release-10_17_1_0.cgi#New+Features -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46256) Parallel Compression Support for ZSTD
[ https://issues.apache.org/jira/browse/SPARK-46256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46256: --- Labels: pull-request-available (was: ) > Parallel Compression Support for ZSTD > - > > Key: SPARK-46256 > URL: https://issues.apache.org/jira/browse/SPARK-46256 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46257) Upgrade Derby to 10.17.1.0
Yang Jie created SPARK-46257: Summary: Upgrade Derby to 10.17.1.0 Key: SPARK-46257 URL: https://issues.apache.org/jira/browse/SPARK-46257 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie https://db.apache.org/derby/releases/release-10_17_1_0.cgi#New+Features -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46256) Parallel Compression Support for ZSTD
Kent Yao created SPARK-46256: Summary: Parallel Compression Support for ZSTD Key: SPARK-46256 URL: https://issues.apache.org/jira/browse/SPARK-46256 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings
[ https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46254. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44169 [https://github.com/apache/spark/pull/44169] > Remove stale Python 3.8/3.7 version checkings > - > > Key: SPARK-46254 > URL: https://issues.apache.org/jira/browse/SPARK-46254 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See PR linked. We dropped Python 3.7 and lowest is Python 3.8 so we can > remove all stale checkings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46255) Support complex type -> string conversion
Ruifeng Zheng created SPARK-46255: - Summary: Support complex type -> string conversion Key: SPARK-46255 URL: https://issues.apache.org/jira/browse/SPARK-46255 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46253) Plan Python data source read using mapInArrow
[ https://issues.apache.org/jira/browse/SPARK-46253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46253: --- Labels: pull-request-available (was: ) > Plan Python data source read using mapInArrow > - > > Key: SPARK-46253 > URL: https://issues.apache.org/jira/browse/SPARK-46253 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Instead of using a regular Python UDTF, we can actually use an arrow UDF and > plan the data source read using the mapInArrow operator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46043) Support create table using DSv2 sources
[ https://issues.apache.org/jira/browse/SPARK-46043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-46043: --- Assignee: Allison Wang > Support create table using DSv2 sources > --- > > Key: SPARK-46043 > URL: https://issues.apache.org/jira/browse/SPARK-46043 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Support CREATE TABLE ... USING DSv2 sources. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46043) Support create table using DSv2 sources
[ https://issues.apache.org/jira/browse/SPARK-46043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-46043. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43949 [https://github.com/apache/spark/pull/43949] > Support create table using DSv2 sources > --- > > Key: SPARK-46043 > URL: https://issues.apache.org/jira/browse/SPARK-46043 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support CREATE TABLE ... USING DSv2 sources. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-46213) Add PySparkImportError for error framework
[ https://issues.apache.org/jira/browse/SPARK-46213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-46213: -- Assignee: (was: Haejoon Lee) Reverted at https://github.com/apache/spark/commit/7f59565b9fc19c496bc7600e168650e7663c0065 > Add PySparkImportError for error framework > -- > > Key: SPARK-46213 > URL: https://issues.apache.org/jira/browse/SPARK-46213 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add PySparkImportError for error framework for wrapping ImportError -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46213) Add PySparkImportError for error framework
[ https://issues.apache.org/jira/browse/SPARK-46213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46213: - Fix Version/s: (was: 4.0.0) > Add PySparkImportError for error framework > -- > > Key: SPARK-46213 > URL: https://issues.apache.org/jira/browse/SPARK-46213 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Add PySparkImportError for error framework for wrapping ImportError -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall
[ https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-46009. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43910 [https://github.com/apache/spark/pull/43910] > Merge the parse rule of PercentileCont and PercentileDisc into functionCall > --- > > Key: SPARK-46009 > URL: https://issues.apache.org/jira/browse/SPARK-46009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark SQL parser have a special rule to parse > [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). > We should merge this rule into the functionCall. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings
[ https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46254: --- Labels: pull-request-available (was: ) > Remove stale Python 3.8/3.7 version checkings > - > > Key: SPARK-46254 > URL: https://issues.apache.org/jira/browse/SPARK-46254 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See PR linked. We dropped Python 3.7 and lowest is Python 3.8 so we can > remove all stale checkings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings
[ https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46254: - Issue Type: Improvement (was: New Feature) > Remove stale Python 3.8/3.7 version checkings > - > > Key: SPARK-46254 > URL: https://issues.apache.org/jira/browse/SPARK-46254 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > See PR linked. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings
[ https://issues.apache.org/jira/browse/SPARK-46254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46254: - Description: See PR linked. We dropped Python 3.7 and lowest is Python 3.8 so we can remove all stale checkings. (was: See PR linked.) > Remove stale Python 3.8/3.7 version checkings > - > > Key: SPARK-46254 > URL: https://issues.apache.org/jira/browse/SPARK-46254 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > See PR linked. We dropped Python 3.7 and lowest is Python 3.8 so we can > remove all stale checkings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46254) Remove stale Python 3.8/3.7 version checkings
Hyukjin Kwon created SPARK-46254: Summary: Remove stale Python 3.8/3.7 version checkings Key: SPARK-46254 URL: https://issues.apache.org/jira/browse/SPARK-46254 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See PR linked. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46253) Plan Python data source read using mapInArrow
Allison Wang created SPARK-46253: Summary: Plan Python data source read using mapInArrow Key: SPARK-46253 URL: https://issues.apache.org/jira/browse/SPARK-46253 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Instead of using a regular Python UDTF, we can actually use an arrow UDF and plan the data source read using the mapInArrow operator. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46040) Update API for 'analyze' partitioning/ordering columns to support general expressions
[ https://issues.apache.org/jira/browse/SPARK-46040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-46040. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43946 [https://github.com/apache/spark/pull/43946] > Update API for 'analyze' partitioning/ordering columns to support general > expressions > - > > Key: SPARK-46040 > URL: https://issues.apache.org/jira/browse/SPARK-46040 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46233) Migrate all remaining ArrtibuteError into PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46233: - Assignee: Haejoon Lee > Migrate all remaining ArrtibuteError into PySpark error framework > - > > Key: SPARK-46233 > URL: https://issues.apache.org/jira/browse/SPARK-46233 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46233) Migrate all remaining ArrtibuteError into PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46233. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44150 [https://github.com/apache/spark/pull/44150] > Migrate all remaining ArrtibuteError into PySpark error framework > - > > Key: SPARK-46233 > URL: https://issues.apache.org/jira/browse/SPARK-46233 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46250) Deflake test_parity_listener
[ https://issues.apache.org/jira/browse/SPARK-46250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46250: Assignee: Wei Liu > Deflake test_parity_listener > > > Key: SPARK-46250 > URL: https://issues.apache.org/jira/browse/SPARK-46250 > Project: Spark > Issue Type: Task > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46250) Deflake test_parity_listener
[ https://issues.apache.org/jira/browse/SPARK-46250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46250. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44166 [https://github.com/apache/spark/pull/44166] > Deflake test_parity_listener > > > Key: SPARK-46250 > URL: https://issues.apache.org/jira/browse/SPARK-46250 > Project: Spark > Issue Type: Task > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46243) Describe arguments of decode()
[ https://issues.apache.org/jira/browse/SPARK-46243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46243. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44157 [https://github.com/apache/spark/pull/44157] > Describe arguments of decode() > -- > > Key: SPARK-46243 > URL: https://issues.apache.org/jira/browse/SPARK-46243 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Update the description of the `StringDecode` expression and apparently the > `decode()` function by describing the arguments `bin` and `charset`. The > ticket aims to improve user experience with Spark SQL by documenting the > public function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46252) Improve test coverage of memory_profiler.py
Xinrong Meng created SPARK-46252: Summary: Improve test coverage of memory_profiler.py Key: SPARK-46252 URL: https://issues.apache.org/jira/browse/SPARK-46252 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46239) Hide Jetty info
[ https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46239: -- Affects Version/s: 3.4.1 3.3.2 3.2.4 3.1.3 3.0.3 > Hide Jetty info > > > Key: SPARK-46239 > URL: https://issues.apache.org/jira/browse/SPARK-46239 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: chenyu >Assignee: chenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1, 3.3.4, 3.4.3 > > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46239) Spark jetty exposes version information
[ https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46239. --- Fix Version/s: 3.3.4 3.4.3 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 44158 [https://github.com/apache/spark/pull/44158] > Spark jetty exposes version information > --- > > Key: SPARK-46239 > URL: https://issues.apache.org/jira/browse/SPARK-46239 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: chenyu >Assignee: chenyu >Priority: Major > Labels: pull-request-available > Fix For: 3.3.4, 3.4.3, 3.5.1, 4.0.0 > > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46239) Spark jetty exposes version information
[ https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46239: - Assignee: chenyu > Spark jetty exposes version information > --- > > Key: SPARK-46239 > URL: https://issues.apache.org/jira/browse/SPARK-46239 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: chenyu >Assignee: chenyu >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46239) Hide Jetty info
[ https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46239: -- Summary: Hide Jetty info (was: Spark jetty exposes version information) > Hide Jetty info > > > Key: SPARK-46239 > URL: https://issues.apache.org/jira/browse/SPARK-46239 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: chenyu >Assignee: chenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1, 3.3.4, 3.4.3 > > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Description: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting {{null}} into \{{None}} when the target type is \{{{}an Option. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} was: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting \{{null}} into \{{None }} when the target type is \{{{}an Option. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} > Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast > null into None for Option values > -- > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, > encoder2, ..)}} correctly handle casting {{null}} into \{{None}} when the > target type is \{{{}an Option. > In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes > through as {{null}} which is likely to cause a {{NullPointerException}} for > most Scala code that operates on the Option. The change seems to be related > to the following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match. Since 3.3.3 this would fail if the > encoder is derived manually using {{Encoders.tuple(leftEncoder, > rightEncoder).}} > If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at > once using reflection, the encoder works as expected. The bug appears to be > in the following function inside {{ExpressionEncoder.scala}} > {code:java} > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = > ...{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Description: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting \{{null}} into \{{None }} when the target type is \{{{}an Option. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} was: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting null into None when the target type is {{{}an Option{}}}. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} > Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast > null into None for Option values > -- > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, > encoder2, ..)}} correctly handle casting \{{null}} into \{{None }} when the > target type is \{{{}an Option. > In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes > through as {{null}} which is likely to cause a {{NullPointerException}} for > most Scala code that operates on the Option. The change seems to be related > to the following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match. Since 3.3.3 this would fail if the > encoder is derived manually using {{Encoders.tuple(leftEncoder, > rightEncoder).}} > If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at > once using reflection, the encoder works as expected. The bug appears to be > in the following function inside {{ExpressionEncoder.scala}} > {code:java} > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = > ...{code} > -- This message was sent by Atlassian Jira (v8.20.10#820
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Summary: Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast null into None for Option values (was: Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values) > Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast > null into None for Option values > > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, > encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the > target type is an Option. > In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes > through as {{null}} which is likely to cause a {{NullPointerException}} for > most Scala code that operates on the Option. The change seems to be related > to the following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match. Since 3.3.3 this would fail if the > encoder is derived manually using {{Encoders.tuple(leftEncoder, > rightEncoder).}} > If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at > once using reflection, the encoder works as expected. The bug appears to be > in the following function inside {{ExpressionEncoder.scala}} > {code:java} > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = > ...{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Description: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the target type is an Option. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} was: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting {{null}} into \{{None}} when the target type is \{{{}an Option. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} > Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast > null into None for Option values > -- > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, > encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the > target type is an Option. > In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes > through as {{null}} which is likely to cause a {{NullPointerException}} for > most Scala code that operates on the Option. The change seems to be related > to the following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match. Since 3.3.3 this would fail if the > encoder is derived manually using {{Encoders.tuple(leftEncoder, > rightEncoder).}} > If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at > once using reflection, the encoder works as expected. The bug appears to be > in the following function inside {{ExpressionEncoder.scala}} > {code:java} > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = > ...{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Description: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting null into None when the target type is {{{}an Option{}}}. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} was: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting {{null}} into {{None }}when the target type is an {{{}Option{}}}. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} > Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast > null into None for Option values > -- > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, > encoder2, ..)}} correctly handle casting null into None when > the target type is {{{}an Option{}}}. > In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes > through as {{null}} which is likely to cause a {{NullPointerException}} for > most Scala code that operates on the Option. The change seems to be related > to the following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match. Since 3.3.3 this would fail if the > encoder is derived manually using {{Encoders.tuple(leftEncoder, > rightEncoder).}} > If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at > once using reflection, the encoder works as expected. The bug appears to be > in the following function inside {{ExpressionEncoder.scala}} > {code:java} > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = > ...{code} > -- This message was sent by Atlassia
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Description: In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, ..)}} correctly handle casting {{null}} into {{None }}when the target type is an {{{}Option{}}}. In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes through as {{null}} which is likely to cause a {{NullPointerException}} for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match. Since 3.3.3 this would fail if the encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}} If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at once using reflection, the encoder works as expected. The bug appears to be in the following function inside {{ExpressionEncoder.scala}} {code:java} def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code} was: In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, ..)` correctly handle casting `null` into `None` when the target type is an `Option`. In Spark `3.3.3`, this behaviour has changed and the Option value comes through as `null` which is likely to cause a `NullPointerException` for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match - since 3.3.3 this could fail if the encoder is derived manually using `Encoders.tuple(leftEncoder, rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` is derived at once, the encoder works as expected - the bug appears to be in the following function inside `ExpressionEncoder.scala` ``` def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ... ``` > Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast > null into None for Option values > -- > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, > encoder2, ..)}} correctly handle casting {{null}} into {{None }}when the > target type is an {{{}Option{}}}. > > In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes > through as {{null}} which is likely to cause a {{NullPointerException}} for > most Scala code that operates on the Option. The change seems to be related > to the following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match. Since 3.3.3 this would fail if the > encoder is derived manually using {{Encoders.tuple(leftEncoder, > rightEncoder).}} > If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at > once using reflection, the encoder works as expected. The bug appears to be > in the following function inside {{ExpressionEncoder.scala}} > > {code:java} > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = > ...{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (SPARK-46250) Deflake test_parity_listener
[ https://issues.apache.org/jira/browse/SPARK-46250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46250: --- Labels: pull-request-available (was: ) > Deflake test_parity_listener > > > Key: SPARK-46250 > URL: https://issues.apache.org/jira/browse/SPARK-46250 > Project: Spark > Issue Type: Task > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Summary: Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values (was: Spark 3.3.3 tuple encoders do not correctly cast null into None for Option values) > Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast > null into None for Option values > -- > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, > ..)` correctly handle casting `null` into `None` when the target type is an > `Option`. > > In Spark `3.3.3`, this behaviour has changed and the Option value comes > through as `null` which is likely to cause a `NullPointerException` for most > Scala code that operates on the Option. The change seems to be related to the > following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match - since 3.3.3 this could fail if the > encoder is derived manually using `Encoders.tuple(leftEncoder, > rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` > is derived at once, the encoder works as expected - the bug appears to be in > the following function inside `ExpressionEncoder.scala` > ``` > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ... > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders do not correctly cast null into None for Option values
[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Boulter updated SPARK-46251: - Summary: Spark 3.3.3 tuple encoders do not correctly cast null into None for Option values (was: Spark 3.3.3 tuple encoders do not correctly casting null into None for Option values) > Spark 3.3.3 tuple encoders do not correctly cast null into None for Option > values > - > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 >Reporter: Will Boulter >Priority: Major > > In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, > ..)` correctly handle casting `null` into `None` when the target type is an > `Option`. > > In Spark `3.3.3`, this behaviour has changed and the Option value comes > through as `null` which is likely to cause a `NullPointerException` for most > Scala code that operates on the Option. The change seems to be related to the > following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match - since 3.3.3 this could fail if the > encoder is derived manually using `Encoders.tuple(leftEncoder, > rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` > is derived at once, the encoder works as expected - the bug appears to be in > the following function inside `ExpressionEncoder.scala` > ``` > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ... > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46251) Spark 3.3.3 tuple encoders do not correctly casting null into None for Option values
Will Boulter created SPARK-46251: Summary: Spark 3.3.3 tuple encoders do not correctly casting null into None for Option values Key: SPARK-46251 URL: https://issues.apache.org/jira/browse/SPARK-46251 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.1, 3.4.0, 3.4.2, 3.3.3 Reporter: Will Boulter In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, ..)` correctly handle casting `null` into `None` when the target type is an `Option`. In Spark `3.3.3`, this behaviour has changed and the Option value comes through as `null` which is likely to cause a `NullPointerException` for most Scala code that operates on the Option. The change seems to be related to the following commit: [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] I have made a reproduction with a couple of examples in a public Github repo here: [https://github.com/q-willboulter/spark-tuple-encoders-bug] The common use case where this is likely to be encountered is while doing any joins that can return null, e.g. left or outer joins. When casting the result of a left join it is sensible to wrap the right-hand side in an Option to handle the case where there is no match - since 3.3.3 this could fail if the encoder is derived manually using `Encoders.tuple(leftEncoder, rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` is derived at once, the encoder works as expected - the bug appears to be in the following function inside `ExpressionEncoder.scala` ``` def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ... ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46250) Deflake test_parity_listener
Wei Liu created SPARK-46250: --- Summary: Deflake test_parity_listener Key: SPARK-46250 URL: https://issues.apache.org/jira/browse/SPARK-46250 Project: Spark Issue Type: Task Components: Connect, SS Affects Versions: 4.0.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46249) Fix state store metrics access after commit
[ https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793036#comment-17793036 ] Anish Shrigondekar commented on SPARK-46249: PR here - [https://github.com/apache/spark/pull/44165] cc - [~kabhwan] > Fix state store metrics access after commit > --- > > Key: SPARK-46249 > URL: https://issues.apache.org/jira/browse/SPARK-46249 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > > Fix state store metrics access after commit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46249) Fix state store metrics access after commit
[ https://issues.apache.org/jira/browse/SPARK-46249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46249: --- Labels: pull-request-available (was: ) > Fix state store metrics access after commit > --- > > Key: SPARK-46249 > URL: https://issues.apache.org/jira/browse/SPARK-46249 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > > Fix state store metrics access after commit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46092) Overflow in Parquet row group filter creation causes incorrect results
[ https://issues.apache.org/jira/browse/SPARK-46092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46092: -- Fix Version/s: 3.3.4 > Overflow in Parquet row group filter creation causes incorrect results > -- > > Key: SPARK-46092 > URL: https://issues.apache.org/jira/browse/SPARK-46092 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Major > Labels: correctness, pull-request-available > Fix For: 4.0.0, 3.5.1, 3.3.4, 3.4.3 > > > While the parquet readers don't support reading parquet values into larger > Spark types, it's possible to trigger an overflow when creating a Parquet row > group filter that will then incorrectly skip row groups and bypass the > exception in the reader, > Repro: > {code:java} > Seq(0).toDF("a").write.parquet(path) > spark.read.schema("a LONG").parquet(path).where(s"a < > ${Long.MaxValue}").collect(){code} > This succeeds and returns no results. This should either fail if the Parquet > reader doesn't support the upcast from int to long or produce result `[0]` if > it does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`
[ https://issues.apache.org/jira/browse/SPARK-46245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46245: - Assignee: Yang Jie > Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` > -- > > Key: SPARK-46245 > URL: https://issues.apache.org/jira/browse/SPARK-46245 > Project: Spark > Issue Type: Improvement > Components: k8s, Spark Core, SQL, YARN >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`
[ https://issues.apache.org/jira/browse/SPARK-46245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46245. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44160 [https://github.com/apache/spark/pull/44160] > Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` > -- > > Key: SPARK-46245 > URL: https://issues.apache.org/jira/browse/SPARK-46245 > Project: Spark > Issue Type: Improvement > Components: k8s, Spark Core, SQL, YARN >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32246) Enable streaming-kinesis-asl tests in Github Action CI
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32246: - Assignee: junyuc25 > Enable streaming-kinesis-asl tests in Github Action CI > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: junyuc25 >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32246) Enable streaming-kinesis-asl tests in Github Action CI
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32246: -- Summary: Enable streaming-kinesis-asl tests in Github Action CI (was: Have a way to optionally run streaming-kinesis-asl) > Enable streaming-kinesis-asl tests in Github Action CI > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32246) Have a way to optionally run streaming-kinesis-asl
[ https://issues.apache.org/jira/browse/SPARK-32246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32246. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43736 [https://github.com/apache/spark/pull/43736] > Have a way to optionally run streaming-kinesis-asl > -- > > Key: SPARK-32246 > URL: https://issues.apache.org/jira/browse/SPARK-32246 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See https://github.com/HyukjinKwon/spark/pull/4. Kinesis tests depends on > external Amazon kinesis service. > We should have a way to run it optionally. Currently, this is not being run > in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-38473) Use error classes in org.apache.spark.scheduler
[ https://issues.apache.org/jira/browse/SPARK-38473 ] Asmita Limaye deleted comment on SPARK-38473: --- was (Author: JIRAUSER303226): Hi [~bozhang], I was working on this issue (with Hannah) and had some doubts related to how to assign the sqlState field. You can see my (WIP) PR here: [https://github.com/apache/spark/pull/43941] I was wondering if you could help with telling me how to properly assign the sqlStates to the new error classes created. I have gone through the error README, but I'm still not sure how to ensure I'm doing it correctly. Thanks! -Asmita > Use error classes in org.apache.spark.scheduler > --- > > Key: SPARK-38473 > URL: https://issues.apache.org/jira/browse/SPARK-38473 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46249) Fix state store metrics access after commit
Anish Shrigondekar created SPARK-46249: -- Summary: Fix state store metrics access after commit Key: SPARK-46249 URL: https://issues.apache.org/jira/browse/SPARK-46249 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Anish Shrigondekar Fix state store metrics access after commit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML
[ https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46248: --- Labels: pull-request-available (was: ) > Support ignoreCorruptFiles and ignoreMissingFiles options in XML > > > Key: SPARK-46248 > URL: https://issues.apache.org/jira/browse/SPARK-46248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Priority: Major > Labels: pull-request-available > > This PR corrects the handling of corrupt or missing multiline XML files by > respecting user-specific options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML
Shujing Yang created SPARK-46248: Summary: Support ignoreCorruptFiles and ignoreMissingFiles options in XML Key: SPARK-46248 URL: https://issues.apache.org/jira/browse/SPARK-46248 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Shujing Yang This PR corrects the handling of corrupt or missing multiline XML files by respecting user-specific options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39800) DataSourceV2: View support
[ https://issues.apache.org/jira/browse/SPARK-39800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-39800: --- Labels: pull-request-available (was: ) > DataSourceV2: View support > -- > > Key: SPARK-39800 > URL: https://issues.apache.org/jira/browse/SPARK-39800 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: John Zhuge >Priority: Major > Labels: pull-request-available > > Support Data source V2 views. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46225) Collapse stacked withColumns into a single message
[ https://issues.apache.org/jira/browse/SPARK-46225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46225: --- Labels: pull-request-available (was: ) > Collapse stacked withColumns into a single message > -- > > Key: SPARK-46225 > URL: https://issues.apache.org/jira/browse/SPARK-46225 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Labels: pull-request-available > > It is quite a common patten to create queries with heavily nested > withColumns(..) calls. This can easily lead to hitting proto recursion limits. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46246: --- Labels: pull-request-available (was: ) > Support EXECUTE IMMEDIATE sytax in Spark SQL > > > Key: SPARK-46246 > URL: https://issues.apache.org/jira/browse/SPARK-46246 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Milan Stefanovic >Priority: Major > Labels: pull-request-available > > Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries > from within SQL. > This API executes query passed as string with arguments. > Other DBs that support this: > * > [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm] > * > [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate] > * > [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46092) Overflow in Parquet row group filter creation causes incorrect results
[ https://issues.apache.org/jira/browse/SPARK-46092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46092: -- Fix Version/s: 3.4.3 > Overflow in Parquet row group filter creation causes incorrect results > -- > > Key: SPARK-46092 > URL: https://issues.apache.org/jira/browse/SPARK-46092 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Major > Labels: correctness, pull-request-available > Fix For: 4.0.0, 3.5.1, 3.4.3 > > > While the parquet readers don't support reading parquet values into larger > Spark types, it's possible to trigger an overflow when creating a Parquet row > group filter that will then incorrectly skip row groups and bypass the > exception in the reader, > Repro: > {code:java} > Seq(0).toDF("a").write.parquet(path) > spark.read.schema("a LONG").parquet(path).where(s"a < > ${Long.MaxValue}").collect(){code} > This succeeds and returns no results. This should either fail if the Parquet > reader doesn't support the upcast from int to long or produce result `[0]` if > it does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46092) Overflow in Parquet row group filter creation causes incorrect results
[ https://issues.apache.org/jira/browse/SPARK-46092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46092: -- Fix Version/s: 3.5.1 > Overflow in Parquet row group filter creation causes incorrect results > -- > > Key: SPARK-46092 > URL: https://issues.apache.org/jira/browse/SPARK-46092 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Johan Lasperas >Assignee: Johan Lasperas >Priority: Major > Labels: correctness, pull-request-available > Fix For: 4.0.0, 3.5.1 > > > While the parquet readers don't support reading parquet values into larger > Spark types, it's possible to trigger an overflow when creating a Parquet row > group filter that will then incorrectly skip row groups and bypass the > exception in the reader, > Repro: > {code:java} > Seq(0).toDF("a").write.parquet(path) > spark.read.schema("a LONG").parquet(path).where(s"a < > ${Long.MaxValue}").collect(){code} > This succeeds and returns no results. This should either fail if the Parquet > reader doesn't support the upcast from int to long or produce result `[0]` if > it does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46231) Migrate all remaining NotImplementedError & TypeError into PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46231. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44148 [https://github.com/apache/spark/pull/44148] > Migrate all remaining NotImplementedError & TypeError into PySpark error > framework > -- > > Key: SPARK-46231 > URL: https://issues.apache.org/jira/browse/SPARK-46231 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46231) Migrate all remaining NotImplementedError & TypeError into PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46231: - Assignee: Haejoon Lee > Migrate all remaining NotImplementedError & TypeError into PySpark error > framework > -- > > Key: SPARK-46231 > URL: https://issues.apache.org/jira/browse/SPARK-46231 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46237) Make `HiveDDLSuite` independently testable
[ https://issues.apache.org/jira/browse/SPARK-46237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46237: -- Summary: Make `HiveDDLSuite` independently testable (was: Fix test failed of `HiveDDLSuite`) > Make `HiveDDLSuite` independently testable > -- > > Key: SPARK-46237 > URL: https://issues.apache.org/jira/browse/SPARK-46237 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Run ` > build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" > -Phive > ` > {code:java} > [info] - SPARK-34261: Avoid side effect if create exists temporary function > *** FAILED *** (4 milliseconds) > [info] java.util.NoSuchElementException: key not found: default > [info] at scala.collection.MapOps.default(Map.scala:274) > [info] at scala.collection.MapOps.default$(Map.scala:273) > [info] at scala.collection.AbstractMap.default(Map.scala:405) > [info] at scala.collection.MapOps.apply(Map.scala:176) > [info] at scala.collection.MapOps.apply$(Map.scala:175) > [info] at scala.collection.AbstractMap.apply(Map.scala:405) > [info] at > org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254) > [info] at > org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326) > [info] at > org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > [info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > [info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > [info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > [info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:333) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuit
[jira] [Assigned] (SPARK-46237) Fix test failed of `HiveDDLSuite`
[ https://issues.apache.org/jira/browse/SPARK-46237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-46237: - Assignee: Yang Jie > Fix test failed of `HiveDDLSuite` > - > > Key: SPARK-46237 > URL: https://issues.apache.org/jira/browse/SPARK-46237 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > > Run ` > build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" > -Phive > ` > {code:java} > [info] - SPARK-34261: Avoid side effect if create exists temporary function > *** FAILED *** (4 milliseconds) > [info] java.util.NoSuchElementException: key not found: default > [info] at scala.collection.MapOps.default(Map.scala:274) > [info] at scala.collection.MapOps.default$(Map.scala:273) > [info] at scala.collection.AbstractMap.default(Map.scala:405) > [info] at scala.collection.MapOps.apply(Map.scala:176) > [info] at scala.collection.MapOps.apply$(Map.scala:175) > [info] at scala.collection.AbstractMap.apply(Map.scala:405) > [info] at > org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254) > [info] at > org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326) > [info] at > org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > [info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > [info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > [info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > [info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:333) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > [i
[jira] [Resolved] (SPARK-46237) Fix test failed of `HiveDDLSuite`
[ https://issues.apache.org/jira/browse/SPARK-46237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46237. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44153 [https://github.com/apache/spark/pull/44153] > Fix test failed of `HiveDDLSuite` > - > > Key: SPARK-46237 > URL: https://issues.apache.org/jira/browse/SPARK-46237 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Run ` > build/sbt "hive/testOnly org.apache.spark.sql.hive.execution.HiveDDLSuite" > -Phive > ` > {code:java} > [info] - SPARK-34261: Avoid side effect if create exists temporary function > *** FAILED *** (4 milliseconds) > [info] java.util.NoSuchElementException: key not found: default > [info] at scala.collection.MapOps.default(Map.scala:274) > [info] at scala.collection.MapOps.default$(Map.scala:273) > [info] at scala.collection.AbstractMap.default(Map.scala:405) > [info] at scala.collection.MapOps.apply(Map.scala:176) > [info] at scala.collection.MapOps.apply$(Map.scala:175) > [info] at scala.collection.AbstractMap.apply(Map.scala:405) > [info] at > org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$445(HiveDDLSuite.scala:3275) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:256) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:254) > [info] at > org.apache.spark.sql.execution.command.DDLSuite.withUserDefinedFunction(DDLSuite.scala:326) > [info] at > org.apache.spark.sql.hive.execution.HiveDDLSuite.$anonfun$new$444(HiveDDLSuite.scala:3267) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > [info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > [info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > [info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > [info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:333) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > [info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > [info] at org.scalatest.Suite.run(Suite.scala:1114) > [info] at org.scalatest.Suite.run$(Suite.scala:1096) > [info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > [info] at > org.scalatest.f
[jira] [Commented] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792942#comment-17792942 ] Bruce Robbins commented on SPARK-45644: --- Even though this is the original issue, I closed it as a duplicate because the fix was applied under SPARK-45896. > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch > worker for task 1.0 in stage 0.0 (TID 1)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.s
[jira] [Resolved] (SPARK-45644) After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array"
[ https://issues.apache.org/jira/browse/SPARK-45644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins resolved SPARK-45644. --- Resolution: Duplicate > After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException > "scala.Some is not a valid external type for schema of array" > -- > > Key: SPARK-45644 > URL: https://issues.apache.org/jira/browse/SPARK-45644 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Adi Wehrli >Priority: Major > > I do not really know if this is a bug, but I am at the end with my knowledge. > A Spark job ran successfully with Spark 3.2.x and 3.3.x. > But after upgrading to 3.4.1 (as well as with 3.5.0) running the same job > with the same data the following always occurs now: > {code} > scala.Some is not a valid external type for schema of array > {code} > The corresponding stacktrace is: > {code} > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 0.0 in stage 0.0 (TID 0)" thread="Executor task launch > worker for task 0.0 in stage 0.0 (TID 0)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.createNamedStruct_14_3$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.If_12$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.ObjectOperator$.$anonfun$serializeObjectToRow$1(objects.scala:165) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.sql.execution.AppendColumnsWithObjectExec.$anonfun$doExecute$15(objects.scala:380) > ~[spark-sql_2.12-3.5.0.jar:3.5.0] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > ~[scala-library-2.12.15.jar:?] > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:169) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.scheduler.Task.run(Task.scala:141) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) > ~[spark-common-utils_2.12-3.5.0.jar:3.5.0] > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) > ~[spark-core_2.12-3.5.0.jar:3.5.0] > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) > [spark-core_2.12-3.5.0.jar:3.5.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > 2023-10-24T06:28:50.932 level=ERROR logger=org.apache.spark.executor.Executor > msg="Exception in task 1.0 in stage 0.0 (TID 1)" thread="Executor task launch > worker for task 1.0 in stage 0.0 (TID 1)" > java.lang.RuntimeException: scala.Some is not a valid external type for > schema of array > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_10$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.ExternalMapToCatalyst_1$(Unknown > Source) ~[?:?] > at > org.apache
[jira] [Created] (SPARK-46247) Invalid bucket file error when reading from bucketed table created with PathOutputCommitProtocol
Никита Соколов created SPARK-46247: -- Summary: Invalid bucket file error when reading from bucketed table created with PathOutputCommitProtocol Key: SPARK-46247 URL: https://issues.apache.org/jira/browse/SPARK-46247 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0 Reporter: Никита Соколов I am trying to create an external partioned bucketed table using this code: {code:java} spark.read.parquet("s3://faucct/input") .repartition(128, col("product_id")) .write.partitionBy("features_date").bucketBy(128, "product_id") .option("path", "s3://faucct/tmp/output") .option("compression", "uncompressed") .saveAsTable("tmp.output"){code} At first it took more time than expected because it had to rename a lot of files in the end, which requires copying in S3. But I have used the configuration from the documentation – [https://spark.apache.org/docs/3.0.0-preview/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast]: {code:java} spark.hadoop.fs.s3a.committer.name directory spark.sql.sources.commitProtocolClass org.apache.spark.internal.io.cloud.PathOutputCommitProtocol spark.sql.parquet.output.committer.class org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter {code} It is properly partitioned: every partition_date has exactly 128 files named like [part-00117-43293810-d0e9-4eee-9be8-e9e50a3e10fd_00117-5eb66a54-2fbb-4775-8f3b-3040b2966a71.c000.parquet|https://s3.console.aws.amazon.com/s3/object/joom-analytics-recom?region=eu-central-1&prefix=recom/dataset/best/best-to-cart-rt/user-product-v4/to_cart-faucct/fnw/ipw/msv2/2023-09-15/14d/tmp_3/features_date%3D2023-09-01/part-00117-43293810-d0e9-4eee-9be8-e9e50a3e10fd_00117-5eb66a54-2fbb-4775-8f3b-3040b2966a71.c000.parquet]. Then I am trying to join this table with another one, for example like this: {code:java} spark.table("tmp.output").repartition(128, $"product_id") .join(spark.table("tmp.output").repartition(128, $"product_id"), Seq("product_id")).count(){code} Because of the configuration I get the following errors: {code:java} org.apache.spark.SparkException: [INVALID_BUCKET_FILE] Invalid bucket file: s3://faucct/tmp/output/features_date=2023-09-01/part-0-43293810-d0e9-4eee-9be8-e9e50a3e10fd_0-5eb66a54-2fbb-4775-8f3b-3040b2966a71.c000.parquet. at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidBucketFile(QueryExecutionErrors.scala:2731) at org.apache.spark.sql.execution.FileSourceScanExec.$anonfun$createBucketedReadRDD$5(DataSourceScanExec.scala:636) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL
Milan Stefanovic created SPARK-46246: Summary: Support EXECUTE IMMEDIATE sytax in Spark SQL Key: SPARK-46246 URL: https://issues.apache.org/jira/browse/SPARK-46246 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Milan Stefanovic Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries from within SQL. This API executes query passed as string with arguments. Other DBs that support this: * [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm] * [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate] * [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44900) Cached DataFrame keeps growing
[ https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792872#comment-17792872 ] 王范明 commented on SPARK-44900: - I have analyzed the program execution details in the logs and it seems that there is an issue with the 'org.apache.spark.status.AppStatusListener#updateRDDBlock' method.The method directly calculates the usage of {{rdd.memoryUsed}} and {{{}rdd.diskUsed{}}}, but it does not pay sufficient attention to the {{{}storageLevel{}}}. > Cached DataFrame keeps growing > -- > > Key: SPARK-44900 > URL: https://issues.apache.org/jira/browse/SPARK-44900 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Varun Nalla >Priority: Blocker > > Scenario : > We have a kafka streaming application where the data lookups are happening by > joining another DF which is cached, and the caching strategy is > MEMORY_AND_DISK. > However the size of the cached DataFrame keeps on growing for every micro > batch the streaming application process and that's being visible under > storage tab. > A similar stack overflow thread was already raised. > https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`
[ https://issues.apache.org/jira/browse/SPARK-46245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46245: --- Labels: pull-request-available (was: ) > Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` > -- > > Key: SPARK-46245 > URL: https://issues.apache.org/jira/browse/SPARK-46245 > Project: Spark > Issue Type: Improvement > Components: k8s, Spark Core, SQL, YARN >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46245) Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter`
Yang Jie created SPARK-46245: Summary: Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` Key: SPARK-46245 URL: https://issues.apache.org/jira/browse/SPARK-46245 Project: Spark Issue Type: Improvement Components: k8s, Spark Core, SQL, YARN Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46244) INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME
[ https://issues.apache.org/jira/browse/SPARK-46244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46244: --- Labels: pull-request-available (was: ) > INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME > -- > > Key: SPARK-46244 > URL: https://issues.apache.org/jira/browse/SPARK-46244 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46244) INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME
Wenchen Fan created SPARK-46244: --- Summary: INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME Key: SPARK-46244 URL: https://issues.apache.org/jira/browse/SPARK-46244 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46243) Describe arguments of decode()
[ https://issues.apache.org/jira/browse/SPARK-46243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-46243: - Description: Update the description of the `StringDecode` expression and apparently the `decode()` function by describing the arguments `bin` and `charset`. The ticket aims to improve user experience with Spark SQL by documenting the public function. (was: Update the description of the `Encode` expression and apparently the `encode()` function by describing the arguments `str` and `charset`. The ticket aims to improve user experience with Spark SQL by documenting the public function.) > Describe arguments of decode() > -- > > Key: SPARK-46243 > URL: https://issues.apache.org/jira/browse/SPARK-46243 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Update the description of the `StringDecode` expression and apparently the > `decode()` function by describing the arguments `bin` and `charset`. The > ticket aims to improve user experience with Spark SQL by documenting the > public function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46243) Describe arguments of decode()
Max Gekk created SPARK-46243: Summary: Describe arguments of decode() Key: SPARK-46243 URL: https://issues.apache.org/jira/browse/SPARK-46243 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 4.0.0 Update the description of the `Encode` expression and apparently the `encode()` function by describing the arguments `str` and `charset`. The ticket aims to improve user experience with Spark SQL by documenting the public function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46243) Describe arguments of decode()
[ https://issues.apache.org/jira/browse/SPARK-46243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-46243: - Fix Version/s: (was: 4.0.0) > Describe arguments of decode() > -- > > Key: SPARK-46243 > URL: https://issues.apache.org/jira/browse/SPARK-46243 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Update the description of the `Encode` expression and apparently the > `encode()` function by describing the arguments `str` and `charset`. The > ticket aims to improve user experience with Spark SQL by documenting the > public function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46234) Introduce PySparkKeyError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46234. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44151 [https://github.com/apache/spark/pull/44151] > Introduce PySparkKeyError for PySpark error framework > - > > Key: SPARK-46234 > URL: https://issues.apache.org/jira/browse/SPARK-46234 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46234) Introduce PySparkKeyError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-46234: - Assignee: Haejoon Lee > Introduce PySparkKeyError for PySpark error framework > - > > Key: SPARK-46234 > URL: https://issues.apache.org/jira/browse/SPARK-46234 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46242) Enable all Kinesis tests in Github Actions by default
junyuc25 created SPARK-46242: Summary: Enable all Kinesis tests in Github Actions by default Key: SPARK-46242 URL: https://issues.apache.org/jira/browse/SPARK-46242 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 3.5.0 Reporter: junyuc25 This ticket is created as per the discussion in this PR: [https://github.com/apache/spark/pull/43736#issuecomment-1833368339.] Some Kinesis requires interaction with Amazon Kinesis service which would incur billing costs to users, thus they are not enabled by default. Further investigations are needed to figure out a way to run all Kinesis tests in Github Actions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46241) Fix error handling routine so it wouldn't fall into infinite recursion
[ https://issues.apache.org/jira/browse/SPARK-46241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46241: --- Labels: pull-request-available (was: ) > Fix error handling routine so it wouldn't fall into infinite recursion > -- > > Key: SPARK-46241 > URL: https://issues.apache.org/jira/browse/SPARK-46241 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Alice Sayutina >Priority: Major > Labels: pull-request-available > > Currently, we can fall into infinite recursion as follows: > {quote}[Some error happens] -> _handle_error -> _handle_rpc_error -> > _display_server_stack_trace -> RuntimeConf.get -> SparkConnectClient.config > -> [An error happens] -> _handle_error.{quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46186) Invalid Spark Connect execution state transition if interrupted before thread started
[ https://issues.apache.org/jira/browse/SPARK-46186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46186. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44095 [https://github.com/apache/spark/pull/44095] > Invalid Spark Connect execution state transition if interrupted before thread > started > - > > Key: SPARK-46186 > URL: https://issues.apache.org/jira/browse/SPARK-46186 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Fix edge case where interrupting execution before the ExecuteThreadRunner > started could lead to illegal state transition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46241) Fix error handling routine so it wouldn't fall into infinite recursion
Alice Sayutina created SPARK-46241: -- Summary: Fix error handling routine so it wouldn't fall into infinite recursion Key: SPARK-46241 URL: https://issues.apache.org/jira/browse/SPARK-46241 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Alice Sayutina Currently, we can fall into infinite recursion as follows: {quote}[Some error happens] -> _handle_error -> _handle_rpc_error -> _display_server_stack_trace -> RuntimeConf.get -> SparkConnectClient.config -> [An error happens] -> _handle_error.{quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions
jiang13021 created SPARK-46240: -- Summary: Add PrepExecutedPlanRule to SparkSessionExtensions Key: SPARK-46240 URL: https://issues.apache.org/jira/browse/SPARK-46240 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0, 3.3.0, 3.2.0 Reporter: jiang13021 Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We could add an extension called "PrepExecutedPlanRule" to SparkSessionExtensions, which would allow users to add their own rules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions
[ https://issues.apache.org/jira/browse/SPARK-46240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 updated SPARK-46240: --- Description: Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We could add an extension called "PrepExecutedPlanRule" to SparkSessionExtensions, which would allow users to add their own rules. was: Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We could add an extension called "PrepExecutedPlanRule" to SparkSessionExtensions, which would allow users to add their own rules. > Add PrepExecutedPlanRule to SparkSessionExtensions > -- > > Key: SPARK-46240 > URL: https://issues.apache.org/jira/browse/SPARK-46240 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: jiang13021 >Priority: Major > > Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. > However, users do not have the ability to add rules in this context. > {code:java} > // org.apache.spark.sql.execution.QueryExecution#preparations > private[execution] def preparations( > sparkSession: SparkSession, > adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, > subquery: Boolean): Seq[Rule[SparkPlan]] = { > // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following > rules will be no-op > // as the original plan is hidden behind `AdaptiveSparkPlanExec`. > adaptiveExecutionRule.toSeq ++ > Seq( > CoalesceBucketsInJoin, > PlanDynamicPruningFilters(sparkSession), > PlanSubqueries(sparkSession), > RemoveRedundantProjects, > EnsureRequirements(), > // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` > to guarantee the > // sort order of each node is checked to be valid. > ReplaceHashWithSortAgg, > // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to > guarantee the same > // number of partitions when instantiating PartitioningCollection. > RemoveRedundantSorts, > DisableUnnecessaryBucketedScan, >
[jira] [Updated] (SPARK-46239) Spark jetty exposes version information
[ https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46239: --- Labels: pull-request-available (was: ) > Spark jetty exposes version information > --- > > Key: SPARK-46239 > URL: https://issues.apache.org/jira/browse/SPARK-46239 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: chenyu >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46239) Spark jetty exposes version information
[ https://issues.apache.org/jira/browse/SPARK-46239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792746#comment-17792746 ] chenyu commented on SPARK-46239: It is unsafe to expose version information. It will obtain remote WWW service information through HTTP. > Spark jetty exposes version information > --- > > Key: SPARK-46239 > URL: https://issues.apache.org/jira/browse/SPARK-46239 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: chenyu >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org