[jira] [Updated] (SPARK-48187) Run `docs` only in PR builders and Java 21 Daily CI
[ https://issues.apache.org/jira/browse/SPARK-48187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48187: --- Labels: pull-request-available (was: ) > Run `docs` only in PR builders and Java 21 Daily CI > --- > > Key: SPARK-48187 > URL: https://issues.apache.org/jira/browse/SPARK-48187 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48138) Disable a flaky `SparkSessionE2ESuite.interrupt tag` test
[ https://issues.apache.org/jira/browse/SPARK-48138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48138: -- Fix Version/s: 3.5.2 > Disable a flaky `SparkSessionE2ESuite.interrupt tag` test > - > > Key: SPARK-48138 > URL: https://issues.apache.org/jira/browse/SPARK-48138 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > - https://github.com/apache/spark/actions/runs/8962353911/job/24611130573 > (Master, 5/5) > - https://github.com/apache/spark/actions/runs/8948176536/job/24581022674 > (Master, 5/4) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`
[ https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48139: -- Affects Version/s: 3.5.2 > Re-enable `SparkSessionE2ESuite.interrupt tag` > -- > > Key: SPARK-48139 > URL: https://issues.apache.org/jira/browse/SPARK-48139 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48037: -- Fix Version/s: 3.5.2 > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.4.3 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Blocker > Labels: correctness, pull-request-available > Fix For: 4.0.0, 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48160: - Component/s: SQL (was: Spark Core) > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48158) XML expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48158: - Component/s: SQL (was: Spark Core) > XML expressions (all collations) > > > Key: SPARK-48158 > URL: https://issues.apache.org/jira/browse/SPARK-48158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for *XML* built-in string functions in Spark > ({*}XmlToStructs{*}, {*}SchemaOfXml{*}, {*}StructsToXml{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *xmlExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *XML* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48159) Datetime expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48159: - Component/s: SQL (was: Spark Core) > Datetime expressions (all collations) > - > > Key: SPARK-48159 > URL: https://issues.apache.org/jira/browse/SPARK-48159 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48157: - Component/s: SQL (was: Spark Core) > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48161) JSON expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48161: - Component/s: SQL (was: Spark Core) > JSON expressions (all collations) > - > > Key: SPARK-48161 > URL: https://issues.apache.org/jira/browse/SPARK-48161 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48162) Miscellaneous expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48162: - Component/s: SQL (was: Spark Core) > Miscellaneous expressions (all collations) > -- > > Key: SPARK-48162 > URL: https://issues.apache.org/jira/browse/SPARK-48162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48186) Add support for AbstractMapType
Uroš Bojanić created SPARK-48186: Summary: Add support for AbstractMapType Key: SPARK-48186 URL: https://issues.apache.org/jira/browse/SPARK-48186 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48183) Update error contribution guide to respect new error class file
[ https://issues.apache.org/jira/browse/SPARK-48183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48183. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46455 [https://github.com/apache/spark/pull/46455] > Update error contribution guide to respect new error class file > --- > > Key: SPARK-48183 > URL: https://issues.apache.org/jira/browse/SPARK-48183 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We moved error class definition from .py to .json but documentation still > shows old behavior. We should update it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48183) Update error contribution guide to respect new error class file
[ https://issues.apache.org/jira/browse/SPARK-48183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48183: - Assignee: Haejoon Lee > Update error contribution guide to respect new error class file > --- > > Key: SPARK-48183 > URL: https://issues.apache.org/jira/browse/SPARK-48183 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We moved error class definition from .py to .json but documentation still > shows old behavior. We should update it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47914) Do not display the splits parameter in Rang
[ https://issues.apache.org/jira/browse/SPARK-47914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47914. -- Resolution: Fixed Issue resolved by pull request 46136 [https://github.com/apache/spark/pull/46136] > Do not display the splits parameter in Rang > --- > > Key: SPARK-47914 > URL: https://issues.apache.org/jira/browse/SPARK-47914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: guihuawen >Assignee: guihuawen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [SQL] > explain extended select * from range(0, 4); > plan > == Parsed Logical Plan == > 'Project [*] > +- 'UnresolvedTableValuedFunction [range], [0, 4] > > == Analyzed Logical Plan == > id: bigint > Project [id#11L|#11L] > +- Range (0, 4, step=1, splits=None) > > == Optimized Logical Plan == > Range (0, 4, step=1, splits=None) > > == Physical Plan == > *(1) Range (0, 4, step=1, splits=1) > > The splits parameter will only be set during the physical execution phase. > But it is also displayed in the logical execution phase as None, which is not > very user-friendly. Showing the physical execution plan can help users. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47914) Do not display the splits parameter in Rang
[ https://issues.apache.org/jira/browse/SPARK-47914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47914: Assignee: guihuawen > Do not display the splits parameter in Rang > --- > > Key: SPARK-47914 > URL: https://issues.apache.org/jira/browse/SPARK-47914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: guihuawen >Assignee: guihuawen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [SQL] > explain extended select * from range(0, 4); > plan > == Parsed Logical Plan == > 'Project [*] > +- 'UnresolvedTableValuedFunction [range], [0, 4] > > == Analyzed Logical Plan == > id: bigint > Project [id#11L|#11L] > +- Range (0, 4, step=1, splits=None) > > == Optimized Logical Plan == > Range (0, 4, step=1, splits=None) > > == Physical Plan == > *(1) Range (0, 4, step=1, splits=1) > > The splits parameter will only be set during the physical execution phase. > But it is also displayed in the logical execution phase as None, which is not > very user-friendly. Showing the physical execution plan can help users. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48185) Fix 'symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo'
[ https://issues.apache.org/jira/browse/SPARK-48185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48185: --- Labels: pull-request-available (was: ) > Fix 'symbolic reference class is not accessible: class > sun.util.calendar.ZoneInfo' > -- > > Key: SPARK-48185 > URL: https://issues.apache.org/jira/browse/SPARK-48185 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48185) Fix 'symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo'
Kent Yao created SPARK-48185: Summary: Fix 'symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo' Key: SPARK-48185 URL: https://issues.apache.org/jira/browse/SPARK-48185 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48183) Update error contribution guide to respect new error class file
[ https://issues.apache.org/jira/browse/SPARK-48183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48183: --- Labels: pull-request-available (was: ) > Update error contribution guide to respect new error class file > --- > > Key: SPARK-48183 > URL: https://issues.apache.org/jira/browse/SPARK-48183 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We moved error class definition from .py to .json but documentation still > shows old behavior. We should update it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48183) Update error contribution guide to respect new error class file
Haejoon Lee created SPARK-48183: --- Summary: Update error contribution guide to respect new error class file Key: SPARK-48183 URL: https://issues.apache.org/jira/browse/SPARK-48183 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee We moved error class definition from .py to .json but documentation still shows old behavior. We should update it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the contents of a PySpark DataFrame as a [PyArrow Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. Currently the only documented way to do this is: *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* This adds significant overhead compared to going direct from PySpark DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to convert to pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], would it be possible to publicly expose a (possibly experimental) *toArrowTable()* method of the Spark DataFrame class? was: Over in the Apache Arrow community, we hear from a lot of users who want to return the contents of a PySpark DataFrame as a [PyArrow Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. Currently the only documented way to do this is: *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* This adds significant overhead compared to going direct from PySpark DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to convert to pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], would it be possible to publicly expose an experimental *_toArrowTable()* method of the Spark DataFrame class? > Add toArrowTable() DataFrame method to PySpark > -- > > Key: SPARK-47365 > URL: https://issues.apache.org/jira/browse/SPARK-47365 > Project: Spark > Issue Type: Improvement > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > Labels: pull-request-available > > Over in the Apache Arrow community, we hear from a lot of users who want to > return the contents of a PySpark DataFrame as a [PyArrow > Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. > Currently the only documented way to do this is: > *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* > This adds significant overhead compared to going direct from PySpark > DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to > convert to > pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], > would it be possible to publicly expose a (possibly experimental) > *toArrowTable()* method of the Spark DataFrame class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48126) Make spark.log.structuredLogging.enabled effective
[ https://issues.apache.org/jira/browse/SPARK-48126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-48126. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46452 [https://github.com/apache/spark/pull/46452] > Make spark.log.structuredLogging.enabled effective > -- > > Key: SPARK-48126 > URL: https://issues.apache.org/jira/browse/SPARK-48126 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, the spark conf spark.log.structuredLogging.enabled is not taking > effects. We need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47365) Add toArrowTable() DataFrame method to PySpark
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Summary: Add toArrowTable() DataFrame method to PySpark (was: Add _toArrowTable() DataFrame method to PySpark) > Add toArrowTable() DataFrame method to PySpark > -- > > Key: SPARK-47365 > URL: https://issues.apache.org/jira/browse/SPARK-47365 > Project: Spark > Issue Type: Improvement > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > Labels: pull-request-available > > Over in the Apache Arrow community, we hear from a lot of users who want to > return the contents of a PySpark DataFrame as a [PyArrow > Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. > Currently the only documented way to do this is: > *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* > This adds significant overhead compared to going direct from PySpark > DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to > convert to > pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], > would it be possible to publicly expose an experimental *_toArrowTable()* > method of the Spark DataFrame class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48045: Assignee: Saidatt Sinai Amonkar > Pandas API groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Assignee: Saidatt Sinai Amonkar >Priority: Minor > Labels: pull-request-available > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include group keys in the resulting DataFrame. This diverges from > expected behavior as well as from the behavior of native Pandas, e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48045. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46391 [https://github.com/apache/spark/pull/46391] > Pandas API groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Assignee: Saidatt Sinai Amonkar >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include group keys in the resulting DataFrame. This diverges from > expected behavior as well as from the behavior of native Pandas, e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48152) Make spark-profiler as a part of release and publish to maven central repo
[ https://issues.apache.org/jira/browse/SPARK-48152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48152. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46402 [https://github.com/apache/spark/pull/46402] > Make spark-profiler as a part of release and publish to maven central repo > -- > > Key: SPARK-48152 > URL: https://issues.apache.org/jira/browse/SPARK-48152 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48152) Make spark-profiler as a part of release and publish to maven central repo
[ https://issues.apache.org/jira/browse/SPARK-48152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48152: - Assignee: BingKun Pan > Make spark-profiler as a part of release and publish to maven central repo > -- > > Key: SPARK-48152 > URL: https://issues.apache.org/jira/browse/SPARK-48152 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47960) Support Chaining Stateful Operators in TransformWithState
[ https://issues.apache.org/jira/browse/SPARK-47960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-47960: Assignee: Bhuwan Sahni > Support Chaining Stateful Operators in TransformWithState > - > > Key: SPARK-47960 > URL: https://issues.apache.org/jira/browse/SPARK-47960 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bhuwan Sahni >Assignee: Bhuwan Sahni >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This issue tracks adding support to chain stateful operators after the > Arbitrary State API, transformWithState. In order to support chaining, we > need to allow the user to specify the new eventTimeColumn in the output from > StatefulProcessor. Any watermark evaluation expressions downstream after > transformWithState would use the user specified eventTimeColumn. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47960) Support Chaining Stateful Operators in TransformWithState
[ https://issues.apache.org/jira/browse/SPARK-47960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-47960. -- Resolution: Fixed Issue resolved by pull request 45376 [https://github.com/apache/spark/pull/45376] > Support Chaining Stateful Operators in TransformWithState > - > > Key: SPARK-47960 > URL: https://issues.apache.org/jira/browse/SPARK-47960 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bhuwan Sahni >Assignee: Bhuwan Sahni >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This issue tracks adding support to chain stateful operators after the > Arbitrary State API, transformWithState. In order to support chaining, we > need to allow the user to specify the new eventTimeColumn in the output from > StatefulProcessor. Any watermark evaluation expressions downstream after > transformWithState would use the user specified eventTimeColumn. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48126) Make spark.log.structuredLogging.enabled effective
[ https://issues.apache.org/jira/browse/SPARK-48126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48126: --- Labels: pull-request-available (was: ) > Make spark.log.structuredLogging.enabled effective > -- > > Key: SPARK-48126 > URL: https://issues.apache.org/jira/browse/SPARK-48126 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Currently, the spark conf spark.log.structuredLogging.enabled is not taking > effects. We need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument
[ https://issues.apache.org/jira/browse/SPARK-48180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844471#comment-17844471 ] Daniel commented on SPARK-48180: Fix here: [https://github.com/apache/spark/pull/46451] > Analyzer bug with multiple ORDER BY items for input table argument > -- > > Key: SPARK-48180 > URL: https://issues.apache.org/jira/browse/SPARK-48180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > > Steps to reproduce: > > {{from pyspark.sql.functions import udtf}} > {{@udtf(returnType="a: int, b: int")}} > {{class tvf:}} > {{ def eval(self, *args):}} > {{ yield 1, 2}} > > {{SELECT * FROM tvf(}} > {{ TABLE(}} > {{ SELECT 1 AS device_id, 2 AS data_ds}} > {{ )}} > {{ WITH SINGLE PARTITION}} > {{ ORDER BY device_id, data_ds}} > {{ )}} > {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] > Unsupported subquery expression: Table arguments are used in a function where > they are not supported:}} > {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], > false}} > {{ +- Project [1 AS device_id#336, 2 AS data_ds#337]}} > {{ +- OneRowRelation}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48126) Make spark.log.structuredLogging.enabled effective
[ https://issues.apache.org/jira/browse/SPARK-48126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-48126: --- Summary: Make spark.log.structuredLogging.enabled effective (was: Make spark.log.structuredLogging.enabled effecitve) > Make spark.log.structuredLogging.enabled effective > -- > > Key: SPARK-48126 > URL: https://issues.apache.org/jira/browse/SPARK-48126 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the spark conf spark.log.structuredLogging.enabled is not taking > effects. We need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument
[ https://issues.apache.org/jira/browse/SPARK-48180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48180: --- Labels: pull-request-available (was: ) > Analyzer bug with multiple ORDER BY items for input table argument > -- > > Key: SPARK-48180 > URL: https://issues.apache.org/jira/browse/SPARK-48180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > > Steps to reproduce: > > {{from pyspark.sql.functions import udtf}} > {{@udtf(returnType="a: int, b: int")}} > {{class tvf:}} > {{ def eval(self, *args):}} > {{ yield 1, 2}} > > {{SELECT * FROM tvf(}} > {{ TABLE(}} > {{ SELECT 1 AS device_id, 2 AS data_ds}} > {{ )}} > {{ WITH SINGLE PARTITION}} > {{ ORDER BY device_id, data_ds}} > {{ )}} > {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] > Unsupported subquery expression: Table arguments are used in a function where > they are not supported:}} > {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], > false}} > {{ +- Project [1 AS device_id#336, 2 AS data_ds#337]}} > {{ +- OneRowRelation}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48182) SQL (java side): Migrate `error/warn/info` with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-48182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48182: --- Labels: pull-request-available (was: ) > SQL (java side): Migrate `error/warn/info` with variables to structured > logging framework > - > > Key: SPARK-48182 > URL: https://issues.apache.org/jira/browse/SPARK-48182 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48182) SQL (java side): Migrate `error/warn/info` with variables to structured logging framework
BingKun Pan created SPARK-48182: --- Summary: SQL (java side): Migrate `error/warn/info` with variables to structured logging framework Key: SPARK-48182 URL: https://issues.apache.org/jira/browse/SPARK-48182 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48152) Make spark-profiler as a part of release and publish to maven central repo
[ https://issues.apache.org/jira/browse/SPARK-48152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-48152: Summary: Make spark-profiler as a part of release and publish to maven central repo (was: Publish the module `spark-profiler` to `maven central repository`) > Make spark-profiler as a part of release and publish to maven central repo > -- > > Key: SPARK-48152 > URL: https://issues.apache.org/jira/browse/SPARK-48152 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48178) Run `build/scala-213/java-11-17` jobs of branch-3.5 only if needed
[ https://issues.apache.org/jira/browse/SPARK-48178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48178. --- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46449 [https://github.com/apache/spark/pull/46449] > Run `build/scala-213/java-11-17` jobs of branch-3.5 only if needed > -- > > Key: SPARK-48178 > URL: https://issues.apache.org/jira/browse/SPARK-48178 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.5.2 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument
[ https://issues.apache.org/jira/browse/SPARK-48180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844467#comment-17844467 ] Daniel commented on SPARK-48180: The bug is that parentheses are required around the two arguments in {{ORDER BY device_id, data_ds.}} Otherwise the SQL analyzer cannot tell the difference between ordering by an additional table column vs. another expression argument to the TVF. It could help to improve the error message here to make it more explicit. > Analyzer bug with multiple ORDER BY items for input table argument > -- > > Key: SPARK-48180 > URL: https://issues.apache.org/jira/browse/SPARK-48180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Daniel >Priority: Major > > Steps to reproduce: > > {{from pyspark.sql.functions import udtf}} > {{@udtf(returnType="a: int, b: int")}} > {{class tvf:}} > {{ def eval(self, *args):}} > {{ yield 1, 2}} > > {{SELECT * FROM tvf(}} > {{ TABLE(}} > {{ SELECT 1 AS device_id, 2 AS data_ds}} > {{ )}} > {{ WITH SINGLE PARTITION}} > {{ ORDER BY device_id, data_ds}} > {{ )}} > {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] > Unsupported subquery expression: Table arguments are used in a function where > they are not supported:}} > {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], > false}} > {{ +- Project [1 AS device_id#336, 2 AS data_ds#337]}} > {{ +- OneRowRelation}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48178) Run `build/scala-213/java-11-17` jobs of branch-3.5 only if needed
[ https://issues.apache.org/jira/browse/SPARK-48178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48178: - Assignee: Dongjoon Hyun > Run `build/scala-213/java-11-17` jobs of branch-3.5 only if needed > -- > > Key: SPARK-48178 > URL: https://issues.apache.org/jira/browse/SPARK-48178 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.5.2 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48181) Unify StreamingPythonRunner and PythonPlannerRunner
Wei Liu created SPARK-48181: --- Summary: Unify StreamingPythonRunner and PythonPlannerRunner Key: SPARK-48181 URL: https://issues.apache.org/jira/browse/SPARK-48181 Project: Spark Issue Type: New Feature Components: Connect, SS Affects Versions: 4.0.0 Reporter: Wei Liu We should unify the two driver side python runner for PySpark. To do this we should move out of StreamingPythonRunner and enhance PythonPlannerRunner with streaming support (multiple read - write loop) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48178) Run `build/scala-213/java-11-17` jobs of branch-3.5 only if needed
[ https://issues.apache.org/jira/browse/SPARK-48178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48178: -- Summary: Run `build/scala-213/java-11-17` jobs of branch-3.5 only if needed (was: Run `build/scala-211/java-11-17` jobs of branch-3.5 only if needed) > Run `build/scala-213/java-11-17` jobs of branch-3.5 only if needed > -- > > Key: SPARK-48178 > URL: https://issues.apache.org/jira/browse/SPARK-48178 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.5.2 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48179) Pin `nbsphinx` to `0.9.3`
[ https://issues.apache.org/jira/browse/SPARK-48179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48179. --- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46448 [https://github.com/apache/spark/pull/46448] > Pin `nbsphinx` to `0.9.3` > -- > > Key: SPARK-48179 > URL: https://issues.apache.org/jira/browse/SPARK-48179 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.5.2 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48179) Pin `nbsphinx` to `0.9.3`
[ https://issues.apache.org/jira/browse/SPARK-48179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48179: - Assignee: Dongjoon Hyun > Pin `nbsphinx` to `0.9.3` > -- > > Key: SPARK-48179 > URL: https://issues.apache.org/jira/browse/SPARK-48179 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.5.2 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48179) Pin `nbsphinx` to `0.9.3`
[ https://issues.apache.org/jira/browse/SPARK-48179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48179: --- Labels: pull-request-available (was: ) > Pin `nbsphinx` to `0.9.3` > -- > > Key: SPARK-48179 > URL: https://issues.apache.org/jira/browse/SPARK-48179 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.5.2 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument
Daniel created SPARK-48180: -- Summary: Analyzer bug with multiple ORDER BY items for input table argument Key: SPARK-48180 URL: https://issues.apache.org/jira/browse/SPARK-48180 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.1, 3.5.0, 4.0.0 Reporter: Daniel Steps to reproduce: {{from pyspark.sql.functions import udtf}} {{@udtf(returnType="a: int, b: int")}} {{class tvf:}} {{ def eval(self, *args):}} {{ yield 1, 2}} {{SELECT * FROM tvf(}} {{ TABLE(}} {{ SELECT 1 AS device_id, 2 AS data_ds}} {{ )}} {{ WITH SINGLE PARTITION}} {{ ORDER BY device_id, data_ds}} {{ )}} {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] Unsupported subquery expression: Table arguments are used in a function where they are not supported:}} {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], false}} {{ +- Project [1 AS device_id#336, 2 AS data_ds#337]}} {{ +- OneRowRelation}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48179) Pin `nbsphinx` to `0.9.3`
Dongjoon Hyun created SPARK-48179: - Summary: Pin `nbsphinx` to `0.9.3` Key: SPARK-48179 URL: https://issues.apache.org/jira/browse/SPARK-48179 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.5.2 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48178) Run `build/scala-211/java-11-17` jobs of branch-3.5 only if needed
[ https://issues.apache.org/jira/browse/SPARK-48178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48178: --- Labels: pull-request-available (was: ) > Run `build/scala-211/java-11-17` jobs of branch-3.5 only if needed > -- > > Key: SPARK-48178 > URL: https://issues.apache.org/jira/browse/SPARK-48178 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.5.2 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48178) Run `build/scala-211/java-11-17` jobs of branch-3.5 only if needed
Dongjoon Hyun created SPARK-48178: - Summary: Run `build/scala-211/java-11-17` jobs of branch-3.5 only if needed Key: SPARK-48178 URL: https://issues.apache.org/jira/browse/SPARK-48178 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.5.2 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48177) Upgrade `Parquet` to 1.14.0
[ https://issues.apache.org/jira/browse/SPARK-48177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48177: -- Summary: Upgrade `Parquet` to 1.14.0 (was: Bump Parquet to 1.14.0) > Upgrade `Parquet` to 1.14.0 > --- > > Key: SPARK-48177 > URL: https://issues.apache.org/jira/browse/SPARK-48177 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48177) Bump Parquet to 1.14.0
[ https://issues.apache.org/jira/browse/SPARK-48177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48177: -- Affects Version/s: 4.0.0 (was: 3.5.2) > Bump Parquet to 1.14.0 > -- > > Key: SPARK-48177 > URL: https://issues.apache.org/jira/browse/SPARK-48177 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48177) Bump Parquet to 1.14.0
[ https://issues.apache.org/jira/browse/SPARK-48177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48177: - Assignee: Fokko Driesprong > Bump Parquet to 1.14.0 > -- > > Key: SPARK-48177 > URL: https://issues.apache.org/jira/browse/SPARK-48177 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48177) Bump Parquet to 1.14.0
[ https://issues.apache.org/jira/browse/SPARK-48177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48177: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Bump Parquet to 1.14.0 > -- > > Key: SPARK-48177 > URL: https://issues.apache.org/jira/browse/SPARK-48177 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48177) Bump Parquet to 1.14.0
[ https://issues.apache.org/jira/browse/SPARK-48177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48177: -- Fix Version/s: (was: 4.0.0) > Bump Parquet to 1.14.0 > -- > > Key: SPARK-48177 > URL: https://issues.apache.org/jira/browse/SPARK-48177 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48177) Bump Parquet to 1.14.0
[ https://issues.apache.org/jira/browse/SPARK-48177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48177: --- Labels: pull-request-available (was: ) > Bump Parquet to 1.14.0 > -- > > Key: SPARK-48177 > URL: https://issues.apache.org/jira/browse/SPARK-48177 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.2 >Reporter: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48177) Bump Parquet to 1.14.0
Fokko Driesprong created SPARK-48177: Summary: Bump Parquet to 1.14.0 Key: SPARK-48177 URL: https://issues.apache.org/jira/browse/SPARK-48177 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.2 Reporter: Fokko Driesprong Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48134) Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-48134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-48134. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46390 [https://github.com/apache/spark/pull/46390] > Spark core (java side): Migrate `error/warn/info` with variables to > structured logging framework > > > Key: SPARK-48134 > URL: https://issues.apache.org/jira/browse/SPARK-48134 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48134) Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-48134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-48134: -- Assignee: BingKun Pan > Spark core (java side): Migrate `error/warn/info` with variables to > structured logging framework > > > Key: SPARK-48134 > URL: https://issues.apache.org/jira/browse/SPARK-48134 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48176) Fix name of FIELD_ALREADY_EXISTS error condition
Nicholas Chammas created SPARK-48176: Summary: Fix name of FIELD_ALREADY_EXISTS error condition Key: SPARK-48176 URL: https://issues.apache.org/jira/browse/SPARK-48176 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Nicholas Chammas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48037. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46273 [https://github.com/apache/spark/pull/46273] > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.4.3 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Blocker > Labels: correctness, pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48037: --- Labels: correctness pull-request-available (was: correctness) > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.4.3 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Blocker > Labels: correctness, pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844388#comment-17844388 ] Dongjoon Hyun commented on SPARK-48037: --- Thank you, [~dzcxzl]. I raised the priority to `Blocker` for all future releases and added a label, `correctness`. > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.4.3 >Reporter: dzcxzl >Priority: Blocker > Labels: correctness > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48037: - Assignee: dzcxzl > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.4.3 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Blocker > Labels: correctness > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48037: -- Affects Version/s: 3.4.3 3.5.1 4.0.0 > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.4.3 >Reporter: dzcxzl >Priority: Blocker > Labels: correctness > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48037: -- Target Version/s: 4.0.0, 3.5.2, 3.4.4 > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0, 4.0.0, 3.5.1, 3.4.3 >Reporter: dzcxzl >Priority: Blocker > Labels: correctness > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48037: -- Labels: correctness (was: pull-request-available) > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: dzcxzl >Priority: Major > Labels: correctness > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48037) SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data
[ https://issues.apache.org/jira/browse/SPARK-48037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48037: -- Priority: Blocker (was: Major) > SortShuffleWriter lacks shuffle write related metrics resulting in > potentially inaccurate data > -- > > Key: SPARK-48037 > URL: https://issues.apache.org/jira/browse/SPARK-48037 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: dzcxzl >Priority: Blocker > Labels: correctness > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48146) Fix error with aggregate function in With child
[ https://issues.apache.org/jira/browse/SPARK-48146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48146: --- Labels: pull-request-available (was: ) > Fix error with aggregate function in With child > --- > > Key: SPARK-48146 > URL: https://issues.apache.org/jira/browse/SPARK-48146 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Right now, if we have an aggregate function in the child of a With > expression, we fail an assertion. However, queries like this used to work: > {code:sql} > select > id between cast(max(id between 1 and 2) as int) and id > from range(10) > group by id > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41547) Reenable ANSI mode in pyspark.sql.tests.connect.test_connect_functions
[ https://issues.apache.org/jira/browse/SPARK-41547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41547. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46432 [https://github.com/apache/spark/pull/46432] > Reenable ANSI mode in pyspark.sql.tests.connect.test_connect_functions > -- > > Key: SPARK-41547 > URL: https://issues.apache.org/jira/browse/SPARK-41547 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See https://issues.apache.org/jira/browse/SPARK-41548 > We should fix the tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48169) Use lazy BadRecordException cause for StaxXmlParser and JacksonParser
[ https://issues.apache.org/jira/browse/SPARK-48169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48169. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46438 [https://github.com/apache/spark/pull/46438] > Use lazy BadRecordException cause for StaxXmlParser and JacksonParser > - > > Key: SPARK-48169 > URL: https://issues.apache.org/jira/browse/SPARK-48169 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > For now since the https://issues.apache.org/jira/browse/SPARK-48143, the old > constructor is used -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48175) Store collation information in metadata and not in type for SER/DE
Stefan Kandic created SPARK-48175: - Summary: Store collation information in metadata and not in type for SER/DE Key: SPARK-48175 URL: https://issues.apache.org/jira/browse/SPARK-48175 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 4.0.0 Reporter: Stefan Kandic Changing serialization and deserialization of collated strings so that the collation information is put in the metadata of the enclosing struct field - and then read back from there during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48165) Update `ap-loader` to 3.0-9
[ https://issues.apache.org/jira/browse/SPARK-48165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48165. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46427 [https://github.com/apache/spark/pull/46427] > Update `ap-loader` to 3.0-9 > --- > > Key: SPARK-48165 > URL: https://issues.apache.org/jira/browse/SPARK-48165 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48173) CheckAnalsis should see the entire query plan
[ https://issues.apache.org/jira/browse/SPARK-48173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48173. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46439 [https://github.com/apache/spark/pull/46439] > CheckAnalsis should see the entire query plan > - > > Key: SPARK-48173 > URL: https://issues.apache.org/jira/browse/SPARK-48173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48173) CheckAnalsis should see the entire query plan
[ https://issues.apache.org/jira/browse/SPARK-48173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48173: - Assignee: Wenchen Fan > CheckAnalsis should see the entire query plan > - > > Key: SPARK-48173 > URL: https://issues.apache.org/jira/browse/SPARK-48173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47297) Format expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47297: --- Assignee: Uroš Bojanić > Format expressions (all collations) > --- > > Key: SPARK-47297 > URL: https://issues.apache.org/jira/browse/SPARK-47297 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47297) Format expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47297. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46423 [https://github.com/apache/spark/pull/46423] > Format expressions (all collations) > --- > > Key: SPARK-47297 > URL: https://issues.apache.org/jira/browse/SPARK-47297 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48171) Clean up the use of deprecated APIs related to `o.rocksdb.Logger`
[ https://issues.apache.org/jira/browse/SPARK-48171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48171: - Assignee: Yang Jie > Clean up the use of deprecated APIs related to `o.rocksdb.Logger` > - > > Key: SPARK-48171 > URL: https://issues.apache.org/jira/browse/SPARK-48171 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > /** > * AbstractLogger constructor. > * > * Important: the log level set within > * the {@link org.rocksdb.Options} instance will be used as > * maximum log level of RocksDB. > * > * @param options {@link org.rocksdb.Options} instance. > * > * @deprecated Use {@link Logger#Logger(InfoLogLevel)} instead, e.g. {@code > new > * Logger(options.infoLogLevel())}. > */ > @Deprecated > public Logger(final Options options) { > this(options.infoLogLevel()); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48171) Clean up the use of deprecated APIs related to `o.rocksdb.Logger`
[ https://issues.apache.org/jira/browse/SPARK-48171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48171. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46436 [https://github.com/apache/spark/pull/46436] > Clean up the use of deprecated APIs related to `o.rocksdb.Logger` > - > > Key: SPARK-48171 > URL: https://issues.apache.org/jira/browse/SPARK-48171 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > /** > * AbstractLogger constructor. > * > * Important: the log level set within > * the {@link org.rocksdb.Options} instance will be used as > * maximum log level of RocksDB. > * > * @param options {@link org.rocksdb.Options} instance. > * > * @deprecated Use {@link Logger#Logger(InfoLogLevel)} instead, e.g. {@code > new > * Logger(options.infoLogLevel())}. > */ > @Deprecated > public Logger(final Options options) { > this(options.infoLogLevel()); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47465) Remove experimental tag from toArrowTable() PySpark DataFrame method
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47465: - Description: As a follow-up to SPARK-47365: What is needed to consider making the *toArrowTable()* PySpark DataFrame non-experimental? What can the Apache Arrow developers do to help with this? was: As a follow-up to SPARK-47365: What is needed to consider making the *toArrow()* PySpark DataFrame non-experimental? What can the Apache Arrow developers do to help with this? > Remove experimental tag from toArrowTable() PySpark DataFrame method > > > Key: SPARK-47465 > URL: https://issues.apache.org/jira/browse/SPARK-47465 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > > As a follow-up to SPARK-47365: > What is needed to consider making the *toArrowTable()* PySpark DataFrame > non-experimental? > What can the Apache Arrow developers do to help with this? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches
[ https://issues.apache.org/jira/browse/SPARK-47466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47466: - Description: As a follow-up to SPARK-47365: *toArrowTable()* is useful when the data is relatively small. For larger data, the best way to return the contents of a PySpark DataFrame in Arrow format is to return an iterator of [PyArrow RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. was: As a follow-up to SPARK-47365: *toArrow()* is useful when the data is relatively small. For larger data, the best way to return the contents of a PySpark DataFrame in Arrow format is to return an iterator of [PyArrow RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. > Add PySpark DataFrame method to return iterator of PyArrow RecordBatches > > > Key: SPARK-47466 > URL: https://issues.apache.org/jira/browse/SPARK-47466 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > > As a follow-up to SPARK-47365: > *toArrowTable()* is useful when the data is relatively small. For larger > data, the best way to return the contents of a PySpark DataFrame in Arrow > format is to return an iterator of [PyArrow > RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47465) Remove experimental tag from toArrowTable() PySpark DataFrame method
[ https://issues.apache.org/jira/browse/SPARK-47465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47465: - Summary: Remove experimental tag from toArrowTable() PySpark DataFrame method (was: Remove experimental tag from toArrow() PySpark DataFrame method) > Remove experimental tag from toArrowTable() PySpark DataFrame method > > > Key: SPARK-47465 > URL: https://issues.apache.org/jira/browse/SPARK-47465 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > > As a follow-up to SPARK-47365: > What is needed to consider making the *toArrow()* PySpark DataFrame > non-experimental? > What can the Apache Arrow developers do to help with this? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47365) Add _toArrowTable() DataFrame method to PySpark
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Summary: Add _toArrowTable() DataFrame method to PySpark (was: Add _toArrow() DataFrame method to PySpark) > Add _toArrowTable() DataFrame method to PySpark > --- > > Key: SPARK-47365 > URL: https://issues.apache.org/jira/browse/SPARK-47365 > Project: Spark > Issue Type: Improvement > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > Labels: pull-request-available > > Over in the Apache Arrow community, we hear from a lot of users who want to > return the contents of a PySpark DataFrame as a [PyArrow > Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. > Currently the only documented way to do this is: > *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* > This adds significant overhead compared to going direct from PySpark > DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to > convert to > pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], > would it be possible to publicly expose an experimental *_toArrow()* method > of the Spark DataFrame class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47365) Add _toArrowTable() DataFrame method to PySpark
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cook updated SPARK-47365: - Description: Over in the Apache Arrow community, we hear from a lot of users who want to return the contents of a PySpark DataFrame as a [PyArrow Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. Currently the only documented way to do this is: *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* This adds significant overhead compared to going direct from PySpark DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to convert to pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], would it be possible to publicly expose an experimental *_toArrowTable()* method of the Spark DataFrame class? was: Over in the Apache Arrow community, we hear from a lot of users who want to return the contents of a PySpark DataFrame as a [PyArrow Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. Currently the only documented way to do this is: *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* This adds significant overhead compared to going direct from PySpark DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to convert to pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], would it be possible to publicly expose an experimental *_toArrow()* method of the Spark DataFrame class? > Add _toArrowTable() DataFrame method to PySpark > --- > > Key: SPARK-47365 > URL: https://issues.apache.org/jira/browse/SPARK-47365 > Project: Spark > Issue Type: Improvement > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 3.5.1 >Reporter: Ian Cook >Priority: Major > Labels: pull-request-available > > Over in the Apache Arrow community, we hear from a lot of users who want to > return the contents of a PySpark DataFrame as a [PyArrow > Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. > Currently the only documented way to do this is: > *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* > This adds significant overhead compared to going direct from PySpark > DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to > convert to > pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], > would it be possible to publicly expose an experimental *_toArrowTable()* > method of the Spark DataFrame class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48173) CheckAnalsis should see the entire query plan
[ https://issues.apache.org/jira/browse/SPARK-48173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48173: --- Labels: pull-request-available (was: ) > CheckAnalsis should see the entire query plan > - > > Key: SPARK-48173 > URL: https://issues.apache.org/jira/browse/SPARK-48173 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48173) CheckAnalsis should see the entire query plan
Wenchen Fan created SPARK-48173: --- Summary: CheckAnalsis should see the entire query plan Key: SPARK-48173 URL: https://issues.apache.org/jira/browse/SPARK-48173 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48169) Use lazy BadRecordException cause for StaxXmlParser and JacksonParser
[ https://issues.apache.org/jira/browse/SPARK-48169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48169: --- Labels: pull-request-available (was: ) > Use lazy BadRecordException cause for StaxXmlParser and JacksonParser > - > > Key: SPARK-48169 > URL: https://issues.apache.org/jira/browse/SPARK-48169 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Priority: Minor > Labels: pull-request-available > > For now since the https://issues.apache.org/jira/browse/SPARK-48143, the old > constructor is used -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode
[ https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48143: --- Assignee: Vladimir Golubev > UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE > mode > --- > > Key: SPARK-48143 > URL: https://issues.apache.org/jira/browse/SPARK-48143 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > Parsing partially-malformed CSV in permissive mode is slow due to heavy > exception construction -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48143) UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE mode
[ https://issues.apache.org/jira/browse/SPARK-48143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48143. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46400 [https://github.com/apache/spark/pull/46400] > UnivocityParser is slow when parsing partially-malformed CSV in PERMISSIVE > mode > --- > > Key: SPARK-48143 > URL: https://issues.apache.org/jira/browse/SPARK-48143 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing partially-malformed CSV in permissive mode is slow due to heavy > exception construction -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48172) Fix escaping issue for mysql
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48172: --- Labels: pull-request-available (was: ) > Fix escaping issue for mysql > > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48172) Fix escaping issue for mysql
Mihailo Milosevic created SPARK-48172: - Summary: Fix escaping issue for mysql Key: SPARK-48172 URL: https://issues.apache.org/jira/browse/SPARK-48172 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48171) Clean up the use of deprecated APIs related to `o.rocksdb.Logger`
[ https://issues.apache.org/jira/browse/SPARK-48171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48171: --- Labels: pull-request-available (was: ) > Clean up the use of deprecated APIs related to `o.rocksdb.Logger` > - > > Key: SPARK-48171 > URL: https://issues.apache.org/jira/browse/SPARK-48171 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > /** > * AbstractLogger constructor. > * > * Important: the log level set within > * the {@link org.rocksdb.Options} instance will be used as > * maximum log level of RocksDB. > * > * @param options {@link org.rocksdb.Options} instance. > * > * @deprecated Use {@link Logger#Logger(InfoLogLevel)} instead, e.g. {@code > new > * Logger(options.infoLogLevel())}. > */ > @Deprecated > public Logger(final Options options) { > this(options.infoLogLevel()); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48171) Clean up the use of deprecated APIs related to `o.rocksdb.Logger`
Yang Jie created SPARK-48171: Summary: Clean up the use of deprecated APIs related to `o.rocksdb.Logger` Key: SPARK-48171 URL: https://issues.apache.org/jira/browse/SPARK-48171 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} /** * AbstractLogger constructor. * * Important: the log level set within * the {@link org.rocksdb.Options} instance will be used as * maximum log level of RocksDB. * * @param options {@link org.rocksdb.Options} instance. * * @deprecated Use {@link Logger#Logger(InfoLogLevel)} instead, e.g. {@code new * Logger(options.infoLogLevel())}. */ @Deprecated public Logger(final Options options) { this(options.infoLogLevel()); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48000) Hash join support for strings with collation
[ https://issues.apache.org/jira/browse/SPARK-48000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48000: --- Labels: pull-request-available (was: ) > Hash join support for strings with collation > > > Key: SPARK-48000 > URL: https://issues.apache.org/jira/browse/SPARK-48000 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48123) Provide a constant table schema for querying structured logs
[ https://issues.apache.org/jira/browse/SPARK-48123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844245#comment-17844245 ] Steve Loughran commented on SPARK-48123: this doesn't handle nested stack traces. I seem to have my comments here ignored. let me repeat * deep nested are common, especially those coming from networks, where have to translate things like aws sdk errors into meaningful and well known exceptions. * these consist of a chain of exceptions, each with their own message and stack trace * any log format which fails to anticipate or support these is inadequate to diagnose a large portion of the stack traces a failing app will generate * thus destroying its utility value has a decision been made to ignore my requirements? > Provide a constant table schema for querying structured logs > > > Key: SPARK-48123 > URL: https://issues.apache.org/jira/browse/SPARK-48123 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Providing a table schema LOG_SCHEMA, so that users can load structured logs > with the following: > ``` > spark.read.schema(LOG_SCHEMA).json(logPath) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41794) Reenable ANSI mode in pyspark.sql.tests.connect.test_connect_column
[ https://issues.apache.org/jira/browse/SPARK-41794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41794: --- Labels: pull-request-available (was: ) > Reenable ANSI mode in pyspark.sql.tests.connect.test_connect_column > --- > > Key: SPARK-41794 > URL: https://issues.apache.org/jira/browse/SPARK-41794 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > > {code} > == > ERROR [0.901s]: test_column_accessor > (pyspark.sql.tests.connect.test_connect_column.SparkConnectTests) > -- > Traceback (most recent call last): > File "/.../spark/python/pyspark/sql/tests/connect/test_connect_column.py", > line 744, in test_column_accessor > cdf.select(CF.col("z")[0], cdf.z[10], CF.col("z")[-10]).toPandas(), > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 949, in > toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 623, in > _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkArrayIndexOutOfBoundsException) [INVALID_ARRAY_INDEX] > The index 10 is out of bounds. The array has 3 elements. Use the SQL function > `get()` to tolerate accessing element at invalid index and return NULL > instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this > error. > == > ERROR [0.245s]: test_column_arithmetic_ops > (pyspark.sql.tests.connect.test_connect_column.SparkConnectTests) > -- > Traceback (most recent call last): > File "/.../spark/python/pyspark/sql/tests/connect/test_connect_column.py", > line 799, in test_column_arithmetic_ops > cdf.select(cdf.a % cdf["b"], cdf["a"] % 2, 12 % cdf.c).toPandas(), > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 949, in > toPandas > return self._session.client.to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in > to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in > _execute_and_fetch > self._handle_error(rpc_error) > File "/.../spark/python/pyspark/sql/connect/client.py", line 623, in > _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (org.apache.spark.SparkArithmeticException) [DIVIDE_BY_ZERO] Division by > zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. > If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48086) Different Arrow versions in client and server
[ https://issues.apache.org/jira/browse/SPARK-48086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48086. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46431 [https://github.com/apache/spark/pull/46431] > Different Arrow versions in client and server > -- > > Key: SPARK-48086 > URL: https://issues.apache.org/jira/browse/SPARK-48086 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > {code} > == > FAIL [1.071s]: test_pandas_udf_arrow_overflow > (pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 531, in dump_stream > return ArrowStreamSerializer.dump_stream(self, > init_stream_yield_batches(), stream) > > > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 104, in dump_stream > for batch in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 525, in init_stream_yield_batches > batch = self._create_batch(series) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 511, in _create_batch > arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast)) > ^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 330, in _create_array > raise PySparkValueError(error_msg % (series.dtype, series.na... > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py", > line 299, in test_pandas_udf_arrow_overflow > with self.assertRaisesRegex( > AssertionError: "Exception thrown when converting pandas.Series" does not > match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File >
[jira] [Assigned] (SPARK-48086) Different Arrow versions in client and server
[ https://issues.apache.org/jira/browse/SPARK-48086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48086: Assignee: Hyukjin Kwon > Different Arrow versions in client and server > -- > > Key: SPARK-48086 > URL: https://issues.apache.org/jira/browse/SPARK-48086 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > == > FAIL [1.071s]: test_pandas_udf_arrow_overflow > (pyspark.sql.tests.connect.test_parity_pandas_udf.PandasUDFParityTests.test_pandas_udf_arrow_overflow) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 531, in dump_stream > return ArrowStreamSerializer.dump_stream(self, > init_stream_yield_batches(), stream) > > > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 104, in dump_stream > for batch in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 525, in init_stream_yield_batches > batch = self._create_batch(series) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 511, in _create_batch > arrs.append(self._create_array(s, t, arrow_cast=self._arrow_cast)) > ^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 330, in _create_array > raise PySparkValueError(error_msg % (series.dtype, series.na... > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/pandas/test_pandas_udf.py", > line 299, in test_pandas_udf_arrow_overflow > with self.assertRaisesRegex( > AssertionError: "Exception thrown when converting pandas.Series" does not > match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 302, in _create_array > return pa.Array.from_pandas( >^ > File "pyarrow/array.pxi", line 1054, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 323, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Integer value 128 not in range: -128 to 127 > The above exception was the direct cause of the following exception: > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", > line 531, in dump_stream > Traceback (most recent call last): > File >
[jira] [Created] (SPARK-48169) Use lazy BadRecordException cause for StaxXmlParser and JacksonParser
Vladimir Golubev created SPARK-48169: Summary: Use lazy BadRecordException cause for StaxXmlParser and JacksonParser Key: SPARK-48169 URL: https://issues.apache.org/jira/browse/SPARK-48169 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Vladimir Golubev For now since the https://issues.apache.org/jira/browse/SPARK-48143, the old constructor is used -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41547) Reenable ANSI mode in pyspark.sql.tests.connect.test_connect_functions
[ https://issues.apache.org/jira/browse/SPARK-41547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41547: --- Labels: pull-request-available (was: ) > Reenable ANSI mode in pyspark.sql.tests.connect.test_connect_functions > -- > > Key: SPARK-41547 > URL: https://issues.apache.org/jira/browse/SPARK-41547 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > See https://issues.apache.org/jira/browse/SPARK-41548 > We should fix the tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48168) Add bitwise shifting operators support
Kent Yao created SPARK-48168: Summary: Add bitwise shifting operators support Key: SPARK-48168 URL: https://issues.apache.org/jira/browse/SPARK-48168 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47267) Hash functions should respect collation
[ https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47267: --- Assignee: Uroš Bojanić > Hash functions should respect collation > --- > > Key: SPARK-47267 > URL: https://issues.apache.org/jira/browse/SPARK-47267 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > All functions in `hash_funcs` group should respec collation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47267) Hash functions should respect collation
[ https://issues.apache.org/jira/browse/SPARK-47267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47267: --- Labels: pull-request-available (was: ) > Hash functions should respect collation > --- > > Key: SPARK-47267 > URL: https://issues.apache.org/jira/browse/SPARK-47267 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > > All functions in `hash_funcs` group should respec collation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org