[jira] [Commented] (SPARK-40241) Correct the link of GenericUDTF

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585669#comment-17585669 ] Apache Spark commented on SPARK-40241: -- User 'zhengruifeng' has created a pull request for this

[jira] [Assigned] (SPARK-40241) Correct the link of GenericUDTF

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40241: Assignee: (was: Apache Spark) > Correct the link of GenericUDTF >

[jira] [Assigned] (SPARK-40241) Correct the link of GenericUDTF

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40241: Assignee: Apache Spark > Correct the link of GenericUDTF >

[jira] [Commented] (SPARK-40241) Correct the link of GenericUDTF

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585668#comment-17585668 ] Apache Spark commented on SPARK-40241: -- User 'zhengruifeng' has created a pull request for this

[jira] [Created] (SPARK-40241) Correct the link of GenericUDTF

2022-08-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40241: - Summary: Correct the link of GenericUDTF Key: SPARK-40241 URL: https://issues.apache.org/jira/browse/SPARK-40241 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40039: Assignee: (was: Apache Spark) > Introducing a streaming checkpoint file manager

[jira] [Assigned] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40039: Assignee: Apache Spark > Introducing a streaming checkpoint file manager based on

[jira] [Commented] (SPARK-40152) Codegen compilation error when using split_part

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585665#comment-17585665 ] Apache Spark commented on SPARK-40152: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Commented] (SPARK-40152) Codegen compilation error when using split_part

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585666#comment-17585666 ] Apache Spark commented on SPARK-40152: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-40156) url_decode() exposes a Java error

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40156: Assignee: Apache Spark > url_decode() exposes a Java error >

[jira] [Assigned] (SPARK-40156) url_decode() exposes a Java error

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40156: Assignee: (was: Apache Spark) > url_decode() exposes a Java error >

[jira] [Updated] (SPARK-40156) url_decode() exposes a Java error

2022-08-26 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40156: - Fix Version/s: (was: 3.4.0) > url_decode() exposes a Java error >

[jira] [Reopened] (SPARK-40156) url_decode() exposes a Java error

2022-08-26 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-40156: -- Assignee: (was: ming95) Reverted at

[jira] [Updated] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-26 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40039: - Fix Version/s: (was: 3.4.0) > Introducing a streaming checkpoint file manager based on

[jira] [Reopened] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-26 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-40039: -- Assignee: (was: Attila Zsolt Piros) Reverted at

[jira] [Assigned] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40240: Assignee: (was: Apache Spark) > PySpark rdd.takeSample should validate `num >

[jira] [Assigned] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40240: Assignee: Apache Spark > PySpark rdd.takeSample should validate `num > maxSampleSize` at

[jira] [Commented] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585656#comment-17585656 ] Apache Spark commented on SPARK-40240: -- User 'zhengruifeng' has created a pull request for this

[jira] [Created] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first

2022-08-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40240: - Summary: PySpark rdd.takeSample should validate `num > maxSampleSize` at first Key: SPARK-40240 URL: https://issues.apache.org/jira/browse/SPARK-40240 Project:

[jira] [Assigned] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40239: Assignee: Apache Spark > Remove duplicated 'fraction' validation in RDD.sample >

[jira] [Commented] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585654#comment-17585654 ] Apache Spark commented on SPARK-40239: -- User 'zhengruifeng' has created a pull request for this

[jira] [Assigned] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40239: Assignee: (was: Apache Spark) > Remove duplicated 'fraction' validation in

[jira] [Created] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample

2022-08-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40239: - Summary: Remove duplicated 'fraction' validation in RDD.sample Key: SPARK-40239 URL: https://issues.apache.org/jira/browse/SPARK-40239 Project: Spark

[jira] [Created] (SPARK-40238) support scaleUpFactor and initialNumPartition in pyspark rdd API

2022-08-26 Thread Ziqi Liu (Jira)
Ziqi Liu created SPARK-40238: Summary: support scaleUpFactor and initialNumPartition in pyspark rdd API Key: SPARK-40238 URL: https://issues.apache.org/jira/browse/SPARK-40238 Project: Spark

[jira] [Resolved] (SPARK-40211) Allow executeTake() / collectLimit's number of starting partitions to be customized

2022-08-26 Thread Josh Rosen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-40211. Fix Version/s: 3.4.0 Assignee: Ziqi Liu Resolution: Fixed Resolved by

[jira] [Commented] (SPARK-40235) Use interruptible lock instead of synchronized in Executor.updateDependencies()

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585633#comment-17585633 ] Apache Spark commented on SPARK-40235: -- User 'JoshRosen' has created a pull request for this issue:

[jira] [Updated] (SPARK-40235) Use interruptible lock instead of synchronized in Executor.updateDependencies()

2022-08-26 Thread Josh Rosen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-40235: --- Description: This patch modifies the synchronization in {{Executor.updateDependencies()}} in order

[jira] [Created] (SPARK-40237) Can't get JDBC type for map in Spark 3.3.0 and PostgreSQL

2022-08-26 Thread Igor Suhorukov (Jira)
Igor Suhorukov created SPARK-40237: -- Summary: Can't get JDBC type for map in Spark 3.3.0 and PostgreSQL Key: SPARK-40237 URL: https://issues.apache.org/jira/browse/SPARK-40237 Project: Spark

[jira] [Resolved] (SPARK-40222) Numeric try_add/try_divide/try_subtract/try_multiply should throw error from their children

2022-08-26 Thread Gengliang Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-40222. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37663

[jira] [Updated] (SPARK-40236) Error messages produced from error-classes.json should not have hard-coded sentences as parameters

2022-08-26 Thread Vitalii Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Li updated SPARK-40236: --- Description: Relevant comment: [https://github.com/apache/spark/pull/37621#discussion_r955101102.]

[jira] [Created] (SPARK-40236) Error messages produced from error-classes.json should not have hard-coded sentences as parameters

2022-08-26 Thread Vitalii Li (Jira)
Vitalii Li created SPARK-40236: -- Summary: Error messages produced from error-classes.json should not have hard-coded sentences as parameters Key: SPARK-40236 URL: https://issues.apache.org/jira/browse/SPARK-40236

[jira] [Created] (SPARK-40235) Use interruptible lock instead of synchronized in Executor.updateDependencies()

2022-08-26 Thread Josh Rosen (Jira)
Josh Rosen created SPARK-40235: -- Summary: Use interruptible lock instead of synchronized in Executor.updateDependencies() Key: SPARK-40235 URL: https://issues.apache.org/jira/browse/SPARK-40235 Project:

[jira] [Assigned] (SPARK-40234) Clean only MDC items set by Spark

2022-08-26 Thread L. C. Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-40234: --- Assignee: L. C. Hsieh > Clean only MDC items set by Spark >

[jira] [Assigned] (SPARK-40234) Clean only MDC items set by Spark

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40234: Assignee: (was: Apache Spark) > Clean only MDC items set by Spark >

[jira] [Assigned] (SPARK-40234) Clean only MDC items set by Spark

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40234: Assignee: Apache Spark > Clean only MDC items set by Spark >

[jira] [Commented] (SPARK-40234) Clean only MDC items set by Spark

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585609#comment-17585609 ] Apache Spark commented on SPARK-40234: -- User 'viirya' has created a pull request for this issue:

[jira] [Created] (SPARK-40234) Clean only MDC items set by Spark

2022-08-26 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-40234: --- Summary: Clean only MDC items set by Spark Key: SPARK-40234 URL: https://issues.apache.org/jira/browse/SPARK-40234 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-35242) Support change catalog default database for spark

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585601#comment-17585601 ] Apache Spark commented on SPARK-35242: -- User 'roczei' has created a pull request for this issue:

[jira] [Commented] (SPARK-35242) Support change catalog default database for spark

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585602#comment-17585602 ] Apache Spark commented on SPARK-35242: -- User 'roczei' has created a pull request for this issue:

[jira] [Created] (SPARK-40233) Unable to load large pandas dataframe to pyspark

2022-08-26 Thread Niranda Perera (Jira)
Niranda Perera created SPARK-40233: -- Summary: Unable to load large pandas dataframe to pyspark Key: SPARK-40233 URL: https://issues.apache.org/jira/browse/SPARK-40233 Project: Spark Issue

[jira] [Created] (SPARK-40232) KMeans: high variability in results despite high initSteps parameter value

2022-08-26 Thread Patryk Piekarski (Jira)
Patryk Piekarski created SPARK-40232: Summary: KMeans: high variability in results despite high initSteps parameter value Key: SPARK-40232 URL: https://issues.apache.org/jira/browse/SPARK-40232

[jira] [Updated] (SPARK-40232) KMeans: high variability in results despite high initSteps parameter value

2022-08-26 Thread Patryk Piekarski (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patryk Piekarski updated SPARK-40232: - Attachment: sample_data.csv > KMeans: high variability in results despite high

[jira] [Commented] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585318#comment-17585318 ] Apache Spark commented on SPARK-40124: -- User 'mskapilks' has created a pull request for this issue:

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in

[jira] [Commented] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585315#comment-17585315 ] Apache Spark commented on SPARK-40039: -- User 'roczei' has created a pull request for this issue:

[jira] [Commented] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585314#comment-17585314 ] Apache Spark commented on SPARK-40039: -- User 'roczei' has created a pull request for this issue:

[jira] [Updated] (SPARK-34265) Instrument Python UDF execution using SQL Metrics

2022-08-26 Thread Luca Canali (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-34265: Affects Version/s: 3.3.0 (was: 3.2.0) > Instrument Python UDF

[jira] [Updated] (SPARK-34265) Instrument Python UDF execution using SQL Metrics

2022-08-26 Thread Luca Canali (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-34265: Description: This proposes to add SQLMetrics instrumentation for Python UDF. This is aimed at

[jira] [Commented] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585297#comment-17585297 ] Apache Spark commented on SPARK-40124: -- User 'mskapilks' has created a pull request for this issue:

[jira] [Commented] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585296#comment-17585296 ] Apache Spark commented on SPARK-40124: -- User 'mskapilks' has created a pull request for this issue:

[jira] [Assigned] (SPARK-40156) url_decode() exposes a Java error

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40156: Assignee: ming95 > url_decode() exposes a Java error > - > >

[jira] [Resolved] (SPARK-40156) url_decode() exposes a Java error

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40156. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37636

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in

[jira] [Assigned] (SPARK-40231) Add 1TB TPCDS Plan stability tests

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40231: Assignee: (was: Apache Spark) > Add 1TB TPCDS Plan stability tests >

[jira] [Commented] (SPARK-40231) Add 1TB TPCDS Plan stability tests

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585279#comment-17585279 ] Apache Spark commented on SPARK-40231: -- User 'mskapilks' has created a pull request for this issue:

[jira] [Commented] (SPARK-40231) Add 1TB TPCDS Plan stability tests

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585277#comment-17585277 ] Apache Spark commented on SPARK-40231: -- User 'mskapilks' has created a pull request for this issue:

[jira] [Assigned] (SPARK-40231) Add 1TB TPCDS Plan stability tests

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40231: Assignee: Apache Spark > Add 1TB TPCDS Plan stability tests >

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-08-26 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585275#comment-17585275 ] Steve Loughran commented on SPARK-38934: [~graceee318] try explicitly setting the aws secrets as

[jira] [Created] (SPARK-40231) Add 1TB TPCDS Plan stability tests

2022-08-26 Thread Kapil Singh (Jira)
Kapil Singh created SPARK-40231: --- Summary: Add 1TB TPCDS Plan stability tests Key: SPARK-40231 URL: https://issues.apache.org/jira/browse/SPARK-40231 Project: Spark Issue Type: Task

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling `DataFrame.groupby(...).applyInPandas(...)` for very small groups in

[jira] [Reopened] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-08-26 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reopened SPARK-38934: > Provider TemporaryAWSCredentialsProvider has no credentials >

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-08-26 Thread Steve Loughran (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585268#comment-17585268 ] Steve Loughran commented on SPARK-38934: staring at this some more, as there's enough

[jira] [Updated] (SPARK-40230) Executor connection issue in hybrid cloud deployment

2022-08-26 Thread Gleb Abroskin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gleb Abroskin updated SPARK-40230: -- Environment: About the k8s setup: * 6+ nodes in AWS * 4 nodes in DC Spark 3.2.1 +

[jira] [Created] (SPARK-40230) Executor connection issue in hybrid cloud deployment

2022-08-26 Thread Gleb Abroskin (Jira)
Gleb Abroskin created SPARK-40230: - Summary: Executor connection issue in hybrid cloud deployment Key: SPARK-40230 URL: https://issues.apache.org/jira/browse/SPARK-40230 Project: Spark Issue

[jira] [Resolved] (SPARK-38749) Test the error class: RENAME_SRC_PATH_NOT_FOUND

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-38749. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37611

[jira] [Assigned] (SPARK-40228) Don't simplify multiLike if child is not attribute

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40228: Assignee: (was: Apache Spark) > Don't simplify multiLike if child is not attribute >

[jira] [Assigned] (SPARK-40228) Don't simplify multiLike if child is not attribute

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40228: Assignee: Apache Spark > Don't simplify multiLike if child is not attribute >

[jira] [Commented] (SPARK-40228) Don't simplify multiLike if child is not attribute

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585217#comment-17585217 ] Apache Spark commented on SPARK-40228: -- User 'wangyum' has created a pull request for this issue:

[jira] [Commented] (SPARK-40229) Re-enable excel I/O test for pandas API on Spark.

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585210#comment-17585210 ] Apache Spark commented on SPARK-40229: -- User 'itholic' has created a pull request for this issue:

[jira] [Assigned] (SPARK-40229) Re-enable excel I/O test for pandas API on Spark.

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40229: Assignee: Apache Spark > Re-enable excel I/O test for pandas API on Spark. >

[jira] [Assigned] (SPARK-40229) Re-enable excel I/O test for pandas API on Spark.

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40229: Assignee: (was: Apache Spark) > Re-enable excel I/O test for pandas API on Spark. >

[jira] [Updated] (SPARK-40229) Re-enable excel I/O test for pandas API on Spark.

2022-08-26 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-40229: Description: Currently we're skipping the `read_excel` and `to_excel` test for pandas API on

[jira] [Updated] (SPARK-40229) Re-enable excel I/O test for pandas API on Spark.

2022-08-26 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-40229: Summary: Re-enable excel I/O test for pandas API on Spark. (was: Re-enable read_excel test for

[jira] [Updated] (SPARK-40229) Re-enable excel I/O test for pandas API on Spark.

2022-08-26 Thread Haejoon Lee (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-40229: Description: Currently we're skipping the `read_excel` and `to_excel` tests for pandas API on

[jira] [Created] (SPARK-40229) Re-enable read_excel test for pandas API on Spark.

2022-08-26 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-40229: --- Summary: Re-enable read_excel test for pandas API on Spark. Key: SPARK-40229 URL: https://issues.apache.org/jira/browse/SPARK-40229 Project: Spark Issue Type:

[jira] [Created] (SPARK-40228) Don't simplify multiLike if child is not attribute

2022-08-26 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-40228: --- Summary: Don't simplify multiLike if child is not attribute Key: SPARK-40228 URL: https://issues.apache.org/jira/browse/SPARK-40228 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-40197) Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40197. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37632

[jira] [Assigned] (SPARK-40197) Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40197: Assignee: Vitalii Li > Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR >

[jira] [Resolved] (SPARK-40220) Don't output the empty map of error message parameters

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40220. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37660

[jira] [Resolved] (SPARK-40218) GROUPING SETS should preserve the grouping columns

2022-08-26 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40218. - Fix Version/s: 3.3.1 3.2.3 3.4.0 Resolution: Fixed

[jira] [Commented] (SPARK-40221) Not able to format using scalafmt

2022-08-26 Thread Ziqi Liu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585191#comment-17585191 ] Ziqi Liu commented on SPARK-40221: -- [~hyukjin.kwon]  I think running in master should always be good,

[jira] [Assigned] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40215: Assignee: Ivan Sadikov > Add SQL configs to control CSV/JSON date and timestamp parsing

[jira] [Resolved] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour

2022-08-26 Thread Max Gekk (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40215. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37653

[jira] [Commented] (SPARK-40227) Data Source V2: Support creating table with the duplicate transform with different arguments

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585186#comment-17585186 ] Apache Spark commented on SPARK-40227: -- User 'ConeyLiu' has created a pull request for this issue:

[jira] [Assigned] (SPARK-40227) Data Source V2: Support creating table with the duplicate transform with different arguments

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40227: Assignee: (was: Apache Spark) > Data Source V2: Support creating table with the

[jira] [Assigned] (SPARK-40227) Data Source V2: Support creating table with the duplicate transform with different arguments

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40227: Assignee: Apache Spark > Data Source V2: Support creating table with the duplicate

[jira] [Commented] (SPARK-40227) Data Source V2: Support creating table with the duplicate transform with different arguments

2022-08-26 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585185#comment-17585185 ] Apache Spark commented on SPARK-40227: -- User 'ConeyLiu' has created a pull request for this issue:

[jira] [Created] (SPARK-40227) Data Source V2: Support creating table with the duplicate transform with different arguments

2022-08-26 Thread Xianyang Liu (Jira)
Xianyang Liu created SPARK-40227: Summary: Data Source V2: Support creating table with the duplicate transform with different arguments Key: SPARK-40227 URL: https://issues.apache.org/jira/browse/SPARK-40227

[jira] [Created] (SPARK-40226) ps.DataFrame should support MultiIndex with a Distributed Dataset

2022-08-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40226: - Summary: ps.DataFrame should support MultiIndex with a Distributed Dataset Key: SPARK-40226 URL: https://issues.apache.org/jira/browse/SPARK-40226 Project: Spark

[jira] [Resolved] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions

2022-08-26 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40225. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37669

[jira] [Assigned] (SPARK-40225) PySpark rdd.takeOrdered should check num and numPartitions

2022-08-26 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40225: Assignee: Ruifeng Zheng > PySpark rdd.takeOrdered should check num and numPartitions >