[jira] [Comment Edited] (SPARK-44581) ShutdownHookManager get wrong hadoop user group information

2023-08-08 Thread Kent Yao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752270#comment-17752270 ] Kent Yao edited comment on SPARK-44581 at 8/9/23 5:58 AM: -- Issue resolved by  

[jira] [Resolved] (SPARK-44726) Improve HeartbeatReceiver config validation error message

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44726. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42403

[jira] [Assigned] (SPARK-44726) Improve HeartbeatReceiver config validation error message

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44726: - Assignee: Dongjoon Hyun > Improve HeartbeatReceiver config validation error message >

[jira] [Resolved] (SPARK-44581) ShutdownHookManager get wrong hadoop user group information

2023-08-08 Thread Kent Yao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-44581. -- Fix Version/s: 3.4.2 3.5.0 3.3.4 Resolution: Fixed Issue

[jira] [Assigned] (SPARK-44581) ShutdownHookManager get wrong hadoop user group information

2023-08-08 Thread Kent Yao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-44581: Assignee: liang yu > ShutdownHookManager get wrong hadoop user group information >

[jira] [Resolved] (SPARK-43907) Add SQL functions into Scala, Python and R API

2023-08-08 Thread Ruifeng Zheng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43907. --- Resolution: Resolved > Add SQL functions into Scala, Python and R API >

[jira] [Resolved] (SPARK-43709) Enable NamespaceTests.test_date_range for pandas 2.0.0.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43709. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42389

[jira] [Assigned] (SPARK-43709) Enable NamespaceTests.test_date_range for pandas 2.0.0.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43709: Assignee: Haejoon Lee > Enable NamespaceTests.test_date_range for pandas 2.0.0. >

[jira] [Commented] (SPARK-44725) Document spark.network.timeoutInterval

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752260#comment-17752260 ] Snoot.io commented on SPARK-44725: -- User 'dongjoon-hyun' has created a pull request for this issue:

[jira] [Commented] (SPARK-44725) Document spark.network.timeoutInterval

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752259#comment-17752259 ] Snoot.io commented on SPARK-44725: -- User 'dongjoon-hyun' has created a pull request for this issue:

[jira] [Commented] (SPARK-44726) Improve HeartbeatReceiver config validation error message

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752255#comment-17752255 ] Snoot.io commented on SPARK-44726: -- User 'dongjoon-hyun' has created a pull request for this issue:

[jira] [Commented] (SPARK-44726) Improve HeartbeatReceiver config validation error message

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752254#comment-17752254 ] Snoot.io commented on SPARK-44726: -- User 'dongjoon-hyun' has created a pull request for this issue:

[jira] [Commented] (SPARK-44737) Should not display json format errors on SQL page for non-SparkThrowables

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752251#comment-17752251 ] Snoot.io commented on SPARK-44737: -- User 'yaooqinn' has created a pull request for this issue:

[jira] [Commented] (SPARK-44737) Should not display json format errors on SQL page for non-SparkThrowables

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752248#comment-17752248 ] Snoot.io commented on SPARK-44737: -- User 'yaooqinn' has created a pull request for this issue:

[jira] [Commented] (SPARK-42746) Add the LISTAGG() aggregate function

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752249#comment-17752249 ] Snoot.io commented on SPARK-42746: -- User 'Hisoka-X' has created a pull request for this issue:

[jira] [Commented] (SPARK-44718) High On-heap memory usage is detected while doing parquet-file reading with Off-Heap memory mode enabled on spark

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752247#comment-17752247 ] Snoot.io commented on SPARK-44718: -- User 'majdyz' has created a pull request for this issue:

[jira] [Created] (SPARK-44737) Should not display json format errors on SQL page for non-SparkThrowables

2023-08-08 Thread Kent Yao (Jira)
Kent Yao created SPARK-44737: Summary: Should not display json format errors on SQL page for non-SparkThrowables Key: SPARK-44737 URL: https://issues.apache.org/jira/browse/SPARK-44737 Project: Spark

[jira] [Created] (SPARK-44736) Implement Dataset.explode

2023-08-08 Thread Jira
Herman van Hövell created SPARK-44736: - Summary: Implement Dataset.explode Key: SPARK-44736 URL: https://issues.apache.org/jira/browse/SPARK-44736 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-43429) Add default/active SparkSession APIs

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752241#comment-17752241 ] Snoot.io commented on SPARK-43429: -- User 'hvanhovell' has created a pull request for this issue:

[jira] [Created] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-08-08 Thread Holden Karau (Jira)
Holden Karau created SPARK-44735: Summary: Log a warning when inserting columns with the same name by row that don't match up Key: SPARK-44735 URL: https://issues.apache.org/jira/browse/SPARK-44735

[jira] [Commented] (SPARK-44690) Downgrade Scala to 2.13.8

2023-08-08 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752240#comment-17752240 ] Snoot.io commented on SPARK-44690: -- User 'LuciferYang' has created a pull request for this issue:

[jira] [Updated] (SPARK-43711) Support `pyspark.ml.feature.Bucketizer` and `pyspark.mllib.stat.KernelDensity` to work with Spark Connect.

2023-08-08 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43711: Affects Version/s: 4.0.0 (was: 3.5.0) Description: Repro:

[jira] [Commented] (SPARK-24087) Avoid shuffle when join keys are a super-set of bucket keys

2023-08-08 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752239#comment-17752239 ] Yuming Wang commented on SPARK-24087: - Fixed by SPARK-35703. > Avoid shuffle when join keys are a

[jira] [Updated] (SPARK-43711) Support `pyspark.ml.feature.Bucketizer` and `pyspark.mllib.stat.KernelDensity` to work with Spark Connect.

2023-08-08 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43711: Summary: Support `pyspark.ml.feature.Bucketizer` and `pyspark.mllib.stat.KernelDensity` to work

[jira] [Updated] (SPARK-43711) Support `pyspark.ml.feature.Bucketizer` and `pyspark.mllib.stat.KernelDensity` to work with Spark Connect.

2023-08-08 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43711: Component/s: MLlib (was: Pandas API on Spark) > Support

[jira] [Updated] (SPARK-43711) Support `pyspark.ml.feature.Bucketizer` to work with Spark Connect.

2023-08-08 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43711: Summary: Support `pyspark.ml.feature.Bucketizer` to work with Spark Connect. (was: Fix

[jira] [Updated] (SPARK-44581) ShutdownHookManager get wrong hadoop user group information

2023-08-08 Thread Kent Yao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-44581: - Affects Version/s: 3.4.1 3.3.2 > ShutdownHookManager get wrong hadoop user group

[jira] [Updated] (SPARK-44581) ShutdownHookManager get wrong hadoop user group information

2023-08-08 Thread Kent Yao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-44581: - Priority: Minor (was: Major) > ShutdownHookManager get wrong hadoop user group information >

[jira] [Updated] (SPARK-44734) Add documentation for type casting rules in Python UDFs/UDTFs

2023-08-08 Thread Allison Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-44734: - Description: In addition to type mappings between Spark data types and Python data types

[jira] [Created] (SPARK-44734) Add documentation for type casting rules in Python UDFs/UDTFs

2023-08-08 Thread Allison Wang (Jira)
Allison Wang created SPARK-44734: Summary: Add documentation for type casting rules in Python UDFs/UDTFs Key: SPARK-44734 URL: https://issues.apache.org/jira/browse/SPARK-44734 Project: Spark

[jira] [Updated] (SPARK-44733) Add documentation for type mappings between Spark and Python data types

2023-08-08 Thread Allison Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-44733: - Summary: Add documentation for type mappings between Spark and Python data types (was: Add

[jira] [Updated] (SPARK-44733) Add type mappings between Spark data types and Python types

2023-08-08 Thread Allison Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-44733: - Summary: Add type mappings between Spark data types and Python types (was: Add type mappings

[jira] [Created] (SPARK-44733) Add type mappings between Spark data type and Python type

2023-08-08 Thread Allison Wang (Jira)
Allison Wang created SPARK-44733: Summary: Add type mappings between Spark data type and Python type Key: SPARK-44733 URL: https://issues.apache.org/jira/browse/SPARK-44733 Project: Spark

[jira] [Resolved] (SPARK-44717) "pyspark.pandas.resample" is incorrect when DST is overlapped and setting "spark.sql.timestampType" to TIMESTAMP_NTZ does not help

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44717. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull

[jira] [Assigned] (SPARK-44717) "pyspark.pandas.resample" is incorrect when DST is overlapped and setting "spark.sql.timestampType" to TIMESTAMP_NTZ does not help

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44717: Assignee: Hyukjin Kwon > "pyspark.pandas.resample" is incorrect when DST is overlapped

[jira] [Resolved] (SPARK-43633) Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43633. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42273

[jira] [Assigned] (SPARK-43568) Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43568: Assignee: Haejoon Lee > Enable CategoricalIndexTests.test_categories_setter for pandas

[jira] [Resolved] (SPARK-43568) Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43568. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42273

[jira] [Assigned] (SPARK-43633) Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43633: Assignee: Haejoon Lee > Enable CategoricalIndexTests.test_remove_categories for pandas

[jira] [Assigned] (SPARK-44695) Improve error message for `DataFrame.toDF`.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44695: Assignee: Haejoon Lee > Improve error message for `DataFrame.toDF`. >

[jira] [Resolved] (SPARK-44695) Improve error message for `DataFrame.toDF`.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44695. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull

[jira] [Commented] (SPARK-44732) Port the initial implementation of Spark XML data source

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752216#comment-17752216 ] Hyukjin Kwon commented on SPARK-44732: -- https://github.com/apache/spark/pull/41832 > Port the

[jira] (SPARK-44265) Built-in XML data source support

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44265 ] Hyukjin Kwon deleted comment on SPARK-44265: -- was (Author: snoot): User 'sandip-db' has created a pull request for this issue: https://github.com/apache/spark/pull/41832 > Built-in XML

[jira] [Updated] (SPARK-44265) Built-in XML data source support

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44265: - Affects Version/s: 4.0.0 (was: 3.5.0) > Built-in XML data source

[jira] [Created] (SPARK-44732) Port the initial implementation of Spark XML data source

2023-08-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44732: Summary: Port the initial implementation of Spark XML data source Key: SPARK-44732 URL: https://issues.apache.org/jira/browse/SPARK-44732 Project: Spark

[jira] [Updated] (SPARK-44265) Built-in XML data source support

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44265: - Issue Type: Umbrella (was: New Feature) > Built-in XML data source support >

[jira] [Resolved] (SPARK-44723) Upgrade `gcs-connector` to 2.2.16

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44723. -- Fix Version/s: 4.0.0 Assignee: Dongjoon Hyun Resolution: Fixed Fixed in

[jira] [Resolved] (SPARK-44665) Add support for pandas DataFrame assertDataFrameEqual

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44665. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull

[jira] [Assigned] (SPARK-44665) Add support for pandas DataFrame assertDataFrameEqual

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44665: Assignee: Amanda Liu > Add support for pandas DataFrame assertDataFrameEqual >

[jira] [Resolved] (SPARK-44722) reattach.py: AttributeError: 'NoneType' object has no attribute 'message'

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44722. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull

[jira] [Assigned] (SPARK-44722) reattach.py: AttributeError: 'NoneType' object has no attribute 'message'

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44722: Assignee: Juliusz Sompolski > reattach.py: AttributeError: 'NoneType' object has no

[jira] [Created] (SPARK-44731) Support 'spark.sql.timestampType' in Python Spark Connect client

2023-08-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44731: Summary: Support 'spark.sql.timestampType' in Python Spark Connect client Key: SPARK-44731 URL: https://issues.apache.org/jira/browse/SPARK-44731 Project: Spark

[jira] [Created] (SPARK-44730) Spark Connect: Cleaner thread not stopped when SparkSession stops

2023-08-08 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44730: - Summary: Spark Connect: Cleaner thread not stopped when SparkSession stops Key: SPARK-44730 URL: https://issues.apache.org/jira/browse/SPARK-44730 Project:

[jira] [Updated] (SPARK-44725) Document spark.network.timeoutInterval

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44725: -- Fix Version/s: 3.3.4 3.5.1 (was: 3.5.0)

[jira] [Assigned] (SPARK-44725) Document spark.network.timeoutInterval

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44725: - Assignee: Dongjoon Hyun > Document spark.network.timeoutInterval >

[jira] [Resolved] (SPARK-44725) Document spark.network.timeoutInterval

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44725. --- Fix Version/s: 3.3.3 3.5.0 4.0.0

[jira] [Created] (SPARK-44729) Add canonical links to the PySpark docs page

2023-08-08 Thread Allison Wang (Jira)
Allison Wang created SPARK-44729: Summary: Add canonical links to the PySpark docs page Key: SPARK-44729 URL: https://issues.apache.org/jira/browse/SPARK-44729 Project: Spark Issue Type:

[jira] [Created] (SPARK-44728) Improve PySpark documentations

2023-08-08 Thread Allison Wang (Jira)
Allison Wang created SPARK-44728: Summary: Improve PySpark documentations Key: SPARK-44728 URL: https://issues.apache.org/jira/browse/SPARK-44728 Project: Spark Issue Type: Umbrella

[jira] [Updated] (SPARK-44725) Document spark.network.timeoutInterval

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44725: -- Affects Version/s: 3.4.1 3.3.2 > Document

[jira] [Created] (SPARK-44727) Improve the error message for dynamic allocation conditions

2023-08-08 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-44727: - Summary: Improve the error message for dynamic allocation conditions Key: SPARK-44727 URL: https://issues.apache.org/jira/browse/SPARK-44727 Project: Spark Issue

[jira] [Updated] (SPARK-44726) Improve HeartbeatReceiver config validation error message

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44726: -- Description: {code} $ bin/spark-shell -c spark.network.timeout=30s Setting default log level

[jira] [Created] (SPARK-44726) Improve HeartbeatReceiver config validation error message

2023-08-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44726: - Summary: Improve HeartbeatReceiver config validation error message Key: SPARK-44726 URL: https://issues.apache.org/jira/browse/SPARK-44726 Project: Spark

[jira] [Created] (SPARK-44725) Document spark.network.timeoutInterval

2023-08-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44725: - Summary: Document spark.network.timeoutInterval Key: SPARK-44725 URL: https://issues.apache.org/jira/browse/SPARK-44725 Project: Spark Issue Type:

[jira] [Created] (SPARK-44724) INSET hash hset set to None when plan exported into JSON

2023-08-08 Thread Matteo Interlandi (Jira)
Matteo Interlandi created SPARK-44724: - Summary: INSET hash hset set to None when plan exported into JSON Key: SPARK-44724 URL: https://issues.apache.org/jira/browse/SPARK-44724 Project: Spark

[jira] [Created] (SPARK-44723) Upgrade `gcs-connector` to 2.2.16

2023-08-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44723: - Summary: Upgrade `gcs-connector` to 2.2.16 Key: SPARK-44723 URL: https://issues.apache.org/jira/browse/SPARK-44723 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-44699) Add logging for complete write events to file in EventLogFileWriter.closeWriter

2023-08-08 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752133#comment-17752133 ] Hudson commented on SPARK-44699: User 'shuyouZZ' has created a pull request for this issue:

[jira] [Commented] (SPARK-44691) Move Subclasses of Analysis to sql/api

2023-08-08 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752134#comment-17752134 ] Hudson commented on SPARK-44691: User 'heyihong' has created a pull request for this issue:

[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle

2023-08-08 Thread Juliusz Sompolski (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752130#comment-17752130 ] Juliusz Sompolski commented on SPARK-43754: --- Not in epic, but nice-to-have refactoring:

[jira] [Updated] (SPARK-43756) Spark Connect - prefer to pass around SessionHolder / ExecuteHolder more

2023-08-08 Thread Juliusz Sompolski (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43756: -- Epic Link: (was: SPARK-43754) > Spark Connect - prefer to pass around SessionHolder

[jira] [Resolved] (SPARK-44709) Fix flow control in ExecuteGrpcResponseSender

2023-08-08 Thread Jira
[ https://issues.apache.org/jira/browse/SPARK-44709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44709. --- Fix Version/s: 3.5.0 Assignee: Juliusz Sompolski Resolution: Fixed

[jira] [Commented] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752119#comment-17752119 ] Dongjoon Hyun commented on SPARK-44719: --- No, there is no Apache Hive 2.3.10 release yet. Given

[jira] [Created] (SPARK-44722) reattach.py: AttributeError: 'NoneType' object has no attribute 'message'

2023-08-08 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44722: - Summary: reattach.py: AttributeError: 'NoneType' object has no attribute 'message' Key: SPARK-44722 URL: https://issues.apache.org/jira/browse/SPARK-44722

[jira] [Created] (SPARK-44721) Retry Policy Revamp

2023-08-08 Thread Alice Sayutina (Jira)
Alice Sayutina created SPARK-44721: -- Summary: Retry Policy Revamp Key: SPARK-44721 URL: https://issues.apache.org/jira/browse/SPARK-44721 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-44720) Make Dataset use Encoder instead of AgnosticEncoder

2023-08-08 Thread Jira
Herman van Hövell created SPARK-44720: - Summary: Make Dataset use Encoder instead of AgnosticEncoder Key: SPARK-44720 URL: https://issues.apache.org/jira/browse/SPARK-44720 Project: Spark

[jira] [Resolved] (SPARK-44715) Add missing udf and callUdf functions

2023-08-08 Thread Jira
[ https://issues.apache.org/jira/browse/SPARK-44715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44715. --- Fix Version/s: 3.5.0 Resolution: Fixed > Add missing udf and callUdf

[jira] [Commented] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Manu Zhang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752049#comment-17752049 ] Manu Zhang commented on SPARK-44719: Is there a 2.3.10 release? > NoClassDefFoundError when using

[jira] [Resolved] (SPARK-44710) Support Dataset.dropDuplicatesWithinWatermark in Scala Client

2023-08-08 Thread Jira
[ https://issues.apache.org/jira/browse/SPARK-44710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44710. --- Fix Version/s: 3.5.0 Assignee: Herman van Hövell Resolution: Fixed

[jira] [Resolved] (SPARK-44713) Deduplicate files between sql/core and Spark Connect Scala Client

2023-08-08 Thread Jira
[ https://issues.apache.org/jira/browse/SPARK-44713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44713. --- Fix Version/s: 3.5.0 Resolution: Fixed > Deduplicate files between sql/core

[jira] [Commented] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752023#comment-17752023 ] Yuming Wang commented on SPARK-44719: - There are two ways to fix it: 1. Upgrade the built-in hive to

[jira] [Updated] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44719: Description: How to reproduce: {noformat} spark-sql (default)> add jar

[jira] [Updated] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44719: Attachment: HiveUDFs-1.0-SNAPSHOT.jar > NoClassDefFoundError when using Hive UDF >

[jira] [Created] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-08 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44719: --- Summary: NoClassDefFoundError when using Hive UDF Key: SPARK-44719 URL: https://issues.apache.org/jira/browse/SPARK-44719 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-44718) High On-heap memory usage is detected while doing parquet-file reading with Off-Heap memory mode enabled on spark

2023-08-08 Thread Zamil Majdy (Jira)
Zamil Majdy created SPARK-44718: --- Summary: High On-heap memory usage is detected while doing parquet-file reading with Off-Heap memory mode enabled on spark Key: SPARK-44718 URL:

[jira] [Updated] (SPARK-44564) Refine the documents with LLM

2023-08-08 Thread Ruifeng Zheng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-44564: -- Description: Let's first focus on the Documents of *PySpark DataFrame APIs*. *1*, Chose a

[jira] [Updated] (SPARK-44564) Refine the documents with LLM

2023-08-08 Thread Ruifeng Zheng (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-44564: -- Attachment: docstr_prompt.py > Refine the documents with LLM > -

[jira] [Commented] (SPARK-44717) "pyspark.pandas.resample" is incorrect when DST is overlapped and setting "spark.sql.timestampType" to TIMESTAMP_NTZ does not help

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752006#comment-17752006 ] Hyukjin Kwon commented on SPARK-44717: -- Made a quick fix:

[jira] [Assigned] (SPARK-44236) Even `spark.sql.codegen.factoryMode` is NO_CODEGEN, the WholeStageCodegen also will be generated.

2023-08-08 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-44236: --- Assignee: Jia Fan > Even `spark.sql.codegen.factoryMode` is NO_CODEGEN, the

[jira] [Resolved] (SPARK-44236) Even `spark.sql.codegen.factoryMode` is NO_CODEGEN, the WholeStageCodegen also will be generated.

2023-08-08 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44236. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41779

[jira] [Resolved] (SPARK-44714) Ease restriction of LCA resolution regarding queries with having

2023-08-08 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44714. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 42276

[jira] [Assigned] (SPARK-44714) Ease restriction of LCA resolution regarding queries with having

2023-08-08 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-44714: --- Assignee: Xinyi Yu > Ease restriction of LCA resolution regarding queries with having >

[jira] [Commented] (SPARK-44714) Ease restriction of LCA resolution regarding queries with having

2023-08-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751966#comment-17751966 ] ASF GitHub Bot commented on SPARK-44714: User 'anchovYu' has created a pull request for this

[jira] [Comment Edited] (SPARK-44717) "pyspark.pandas.resample" is incorrect when DST is overlapped and setting "spark.sql.timestampType" to TIMESTAMP_NTZ does not help

2023-08-08 Thread Attila Zsolt Piros (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751962#comment-17751962 ] Attila Zsolt Piros edited comment on SPARK-44717 at 8/8/23 8:56 AM:

[jira] [Commented] (SPARK-44717) "pyspark.pandas.resample" is incorrect when DST is overlapped and setting "spark.sql.timestampType" to TIMESTAMP_NTZ does not help

2023-08-08 Thread Attila Zsolt Piros (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751962#comment-17751962 ] Attila Zsolt Piros commented on SPARK-44717: The TIMESTAMP_NTZ would work for sure. Here is

[jira] [Resolved] (SPARK-44657) Incorrect limit handling and config parsing in Arrow collect

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44657. -- Fix Version/s: 3.5.0 4.0.0 3.4.2 Resolution:

[jira] [Assigned] (SPARK-44657) Incorrect limit handling and config parsing in Arrow collect

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44657: Assignee: Venkata Sai Akhil Gudesa > Incorrect limit handling and config parsing in

[jira] [Resolved] (SPARK-44680) parameter markers are not blocked from DEFAULT (and other places)

2023-08-08 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-44680. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request

[jira] [Assigned] (SPARK-44680) parameter markers are not blocked from DEFAULT (and other places)

2023-08-08 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-44680: Assignee: Max Gekk > parameter markers are not blocked from DEFAULT (and other places) >

[jira] [Commented] (SPARK-44717) "pyspark.pandas.resample" is incorrect when DST is overlapped and setting "spark.sql.timestampType" to TIMESTAMP_NTZ does not help

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751928#comment-17751928 ] Hyukjin Kwon commented on SPARK-44717: -- [~attilapiros] which time zone are you in? Would you mind

[jira] [Resolved] (SPARK-43567) Enable CategoricalIndexTests.test_factorize for pandas 2.0.0.

2023-08-08 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43567. -- Fix Version/s: 4.0.0 Assignee: Haejoon Lee Resolution: Fixed Fixed in