[jira] [Updated] (SPARK-44772) Reading blocks from remote executors causes timeout issue

2023-08-10 Thread nebi mert aydin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nebi mert aydin updated SPARK-44772: Component/s: Shuffle Spark Core > Reading blocks from remote executors

[jira] [Updated] (SPARK-44772) Reading blocks from remote executors causes timeout issue

2023-08-10 Thread nebi mert aydin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nebi mert aydin updated SPARK-44772: Description: I'm using EMR 6.5 with Spark 3.1.2 I'm shuffling 1.5 TiB of data with 3000

[jira] [Updated] (SPARK-44772) Reading blocks from remote executors causes timeout issue

2023-08-10 Thread nebi mert aydin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nebi mert aydin updated SPARK-44772: Description: I'm using EMR 6.5 with Spark 3.1.2 I'm shuffling 1.5 TiB of data with 3000

[jira] [Created] (SPARK-44772) Reading blocks from remote executors causes timeout issue

2023-08-10 Thread nebi mert aydin (Jira)
nebi mert aydin created SPARK-44772: --- Summary: Reading blocks from remote executors causes timeout issue Key: SPARK-44772 URL: https://issues.apache.org/jira/browse/SPARK-44772 Project: Spark

[jira] [Commented] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue

2023-08-10 Thread nebi mert aydin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753038#comment-17753038 ] nebi mert aydin commented on SPARK-24578: - I still have this problem in Amazon EMR spark 3.1.2,

[jira] [Commented] (SPARK-44719) NoClassDefFoundError when using Hive UDF

2023-08-10 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753037#comment-17753037 ] Snoot.io commented on SPARK-44719: -- User 'wangyum' has created a pull request for this issue:

[jira] [Commented] (SPARK-44461) Enable Process Isolation for streaming python worker

2023-08-10 Thread Snoot.io (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753036#comment-17753036 ] Snoot.io commented on SPARK-44461: -- User 'WweiL' has created a pull request for this issue:

[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Description: The code sample below is to showcase the wholestagecodegen generated when exploding

[jira] [Commented] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753035#comment-17753035 ] Franck Tago commented on SPARK-44768: - !image-2023-08-10-20-32-55-684.png! > Improve WSCG handling

[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Attachment: image-2023-08-10-20-32-55-684.png > Improve WSCG handling of row buffer by accounting

[jira] [Commented] (SPARK-43194) PySpark 3.4.0 cannot convert timestamp-typed objects to pandas with pandas 2.0

2023-08-10 Thread Berg Lloyd-Haig (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753034#comment-17753034 ] Berg Lloyd-Haig commented on SPARK-43194: - This is affecting us also with parquet read from AWS

[jira] [Resolved] (SPARK-44771) Remove 'sudo' in 'pip install' suggestions in the dev scripts

2023-08-10 Thread Gengliang Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-44771. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42444

[jira] [Resolved] (SPARK-42132) DeduplicateRelations rule breaks plan when co-grouping the same DataFrame

2023-08-10 Thread Jia Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan resolved SPARK-42132. - Fix Version/s: 3.5.0 Resolution: Fixed > DeduplicateRelations rule breaks plan when co-grouping

[jira] [Created] (SPARK-44771) Remove 'sudo' in 'pip install' suggestions in the dev scripts

2023-08-10 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-44771: -- Summary: Remove 'sudo' in 'pip install' suggestions in the dev scripts Key: SPARK-44771 URL: https://issues.apache.org/jira/browse/SPARK-44771 Project: Spark

[jira] [Resolved] (SPARK-44763) Fix a bug of promoting string as double in binary arithmetic with interval

2023-08-10 Thread Gengliang Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-44763. Target Version/s: 4.0.0 Resolution: Fixed > Fix a bug of promoting string as

[jira] [Commented] (SPARK-44763) Fix a bug of promoting string as double in binary arithmetic with interval

2023-08-10 Thread Gengliang Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753020#comment-17753020 ] Gengliang Wang commented on SPARK-44763: Resolved in https://github.com/apache/spark/pull/42436

[jira] [Resolved] (SPARK-43781) IllegalStateException when cogrouping two datasets derived from the same source

2023-08-10 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43781. - Fix Version/s: 3.5.0 Assignee: Jia Fan Resolution: Fixed >

[jira] [Reopened] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-44760: -- Assignee: (was: Kent Yao) Reverted at

[jira] [Updated] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44760: - Fix Version/s: (was: 4.0.0) > Index Out Of Bound for JIRA resolution in merge_spark_pr >

[jira] [Assigned] (SPARK-43872) Enable DataFramePlotMatplotlibTests for pandas 2.0.0.

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43872: Assignee: Haejoon Lee > Enable DataFramePlotMatplotlibTests for pandas 2.0.0. >

[jira] [Resolved] (SPARK-44705) Make PythonRunner single-threaded

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44705. -- Assignee: Utkarsh Agarwal Resolution: Fixed Fixed in

[jira] [Assigned] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44765: Assignee: Juliusz Sompolski > Make ReleaseExecute retry in

[jira] [Updated] (SPARK-43032) Add StreamingQueryManager API

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-43032: - Fix Version/s: 3.5.0 4.0.0 > Add StreamingQueryManager API >

[jira] [Resolved] (SPARK-44766) Cache the pandas converter for Python UDTFs

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44766. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull

[jira] [Assigned] (SPARK-44766) Cache the pandas converter for Python UDTFs

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44766: Assignee: Allison Wang > Cache the pandas converter for Python UDTFs >

[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Description: consider a scenario where you flatten  a nested array  // e.g you can use the

[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Attachment: spark-jira_wscg_code.txt > Improve WSCG handling of row buffer by accounting for

[jira] [Created] (SPARK-44770) Add a displayOrder variable to WebUITab to specify the order in which tabs appear

2023-08-10 Thread Jason Li (Jira)
Jason Li created SPARK-44770: Summary: Add a displayOrder variable to WebUITab to specify the order in which tabs appear Key: SPARK-44770 URL: https://issues.apache.org/jira/browse/SPARK-44770 Project:

[jira] [Created] (SPARK-44769) Add SQL statement to create an empty array with a type

2023-08-10 Thread Holden Karau (Jira)
Holden Karau created SPARK-44769: Summary: Add SQL statement to create an empty array with a type Key: SPARK-44769 URL: https://issues.apache.org/jira/browse/SPARK-44769 Project: Spark Issue

[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Summary: Improve WSCG handling of row buffer by accounting for executor memory . Exploding

[jira] [Created] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory

2023-08-10 Thread Franck Tago (Jira)
Franck Tago created SPARK-44768: --- Summary: Improve WSCG handling of row buffer by accounting for executor memory Key: SPARK-44768 URL: https://issues.apache.org/jira/browse/SPARK-44768 Project: Spark

[jira] [Commented] (SPARK-44767) Plugin API for PySpark and SparkR workers

2023-08-10 Thread Willi Raschkowski (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752926#comment-17752926 ] Willi Raschkowski commented on SPARK-44767: --- I put up a proposal implementation here:

[jira] [Updated] (SPARK-44767) Plugin API for PySpark and SparkR workers

2023-08-10 Thread Willi Raschkowski (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Willi Raschkowski updated SPARK-44767: -- Summary: Plugin API for PySpark and SparkR workers (was: Plugin API for PySpark and

[jira] [Created] (SPARK-44767) Plugin API for PySpark and SparkR subprocesses

2023-08-10 Thread Willi Raschkowski (Jira)
Willi Raschkowski created SPARK-44767: - Summary: Plugin API for PySpark and SparkR subprocesses Key: SPARK-44767 URL: https://issues.apache.org/jira/browse/SPARK-44767 Project: Spark

[jira] [Updated] (SPARK-44767) Plugin API for PySpark and SparkR subprocesses

2023-08-10 Thread Willi Raschkowski (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Willi Raschkowski updated SPARK-44767: -- Description: An API to customize Python and R workers allows for extensibility beyond

[jira] [Created] (SPARK-44766) Cache the pandas converter for Python UDTFs

2023-08-10 Thread Allison Wang (Jira)
Allison Wang created SPARK-44766: Summary: Cache the pandas converter for Python UDTFs Key: SPARK-44766 URL: https://issues.apache.org/jira/browse/SPARK-44766 Project: Spark Issue Type:

[jira] [Created] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism

2023-08-10 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44765: - Summary: Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism Key: SPARK-44765 URL:

[jira] [Created] (SPARK-44764) Streaming process improvement

2023-08-10 Thread Wei Liu (Jira)
Wei Liu created SPARK-44764: --- Summary: Streaming process improvement Key: SPARK-44764 URL: https://issues.apache.org/jira/browse/SPARK-44764 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate operators in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: This is an issue since the WSCG  implementation of the generate node.  Because WSCG

[jira] [Created] (SPARK-44763) Fix a bug of promoting string as double in binary arithmetic with interval

2023-08-10 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-44763: -- Summary: Fix a bug of promoting string as double in binary arithmetic with interval Key: SPARK-44763 URL: https://issues.apache.org/jira/browse/SPARK-44763

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate operators in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Summary: Do not combine multiple Generate operators in the same WholeStageCodeGen node because it

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Summary: Do not combine multiple Generate nodes in the same WholeStageCodeGen node because it can

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: This is an issue since the WSCG  implementation of the generate node.  The generate

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Affects Version/s: 3.4.1 3.4.0 3.3.2

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: The generate node used to flatten array generally  produces an amount of output

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: The generate node used to flatten array generally  produces an amount of output

[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752870#comment-17752870 ] Franck Tago commented on SPARK-44759: - Spark Dag for the use case . The failure is from the

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-33-47-788.png > Do not combine multiple Generate nodes in the

[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752868#comment-17752868 ] Franck Tago commented on SPARK-44759: - WSCG  generated code for second Generate node 

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-32-46-163.png > Do not combine multiple Generate nodes in the

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-29-24-804.png > Do not combine multiple Generate nodes in the

[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752866#comment-17752866 ] Franck Tago commented on SPARK-44759: - WSCG  generated code for first Generate node 

[jira] [Comment Edited] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752864#comment-17752864 ] Franck Tago edited comment on SPARK-44759 at 8/10/23 4:28 PM: -- WSCG 

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-27-24-124.png > Do not combine multiple Generate nodes in the

[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752864#comment-17752864 ] Franck Tago commented on SPARK-44759: - !image-2023-08-10-09-27-24-124.png! > Do not combine

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Summary: Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: The generate node used to flatten array generally  produces an amount of output

[jira] [Resolved] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr

2023-08-10 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44760. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42429

[jira] [Created] (SPARK-44762) Add more documentation and examples for using job tags for interrupt

2023-08-10 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44762: - Summary: Add more documentation and examples for using job tags for interrupt Key: SPARK-44762 URL: https://issues.apache.org/jira/browse/SPARK-44762

[jira] [Updated] (SPARK-44741) Support regex-based MetricFilter in StatsdSink

2023-08-10 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44741: -- Affects Version/s: 4.0.0 (was: 3.4.1) > Support regex-based

[jira] [Updated] (SPARK-44741) Support regex-based MetricFilter in StatsdSink

2023-08-10 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44741: -- Summary: Support regex-based MetricFilter in StatsdSink (was: Spark StatsD metrics reported

[jira] [Assigned] (SPARK-44741) Spark StatsD metrics reported to support metrics filter option

2023-08-10 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44741: - Assignee: rameshkrishnan muthusamy > Spark StatsD metrics reported to support metrics

[jira] [Resolved] (SPARK-44741) Spark StatsD metrics reported to support metrics filter option

2023-08-10 Thread Dongjoon Hyun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44741. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42416

[jira] [Updated] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)

2023-08-10 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44700: Fix Version/s: 3.3.0 > Rule OptimizeCsvJsonExprs should not be applied to expression like >

[jira] [Resolved] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)

2023-08-10 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-44700. - Resolution: Fixed Please upgrade Spark to the latest version to fix this issue. > Rule

[jira] [Updated] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)

2023-08-10 Thread Yuming Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44700: Affects Version/s: 3.1.1 (was: 3.4.0) (was:

[jira] [Created] (SPARK-44761) Add DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2) signature

2023-08-10 Thread Jira
Herman van Hövell created SPARK-44761: - Summary: Add DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2) signature Key: SPARK-44761 URL:

[jira] [Commented] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-08-10 Thread GridGain Integration (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752785#comment-17752785 ] GridGain Integration commented on SPARK-44756: -- User 'hdaikoku' has created a pull request

[jira] [Commented] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-08-10 Thread Harunobu Daikoku (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752781#comment-17752781 ] Harunobu Daikoku commented on SPARK-44756: -- PR raised:

[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-08-10 Thread Harunobu Daikoku (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Component/s: Spark Core > Executor hangs when RetryingBlockTransferor fails to initiate

[jira] [Created] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr

2023-08-10 Thread Kent Yao (Jira)
Kent Yao created SPARK-44760: Summary: Index Out Of Bound for JIRA resolution in merge_spark_pr Key: SPARK-44760 URL: https://issues.apache.org/jira/browse/SPARK-44760 Project: Spark Issue Type:

[jira] [Updated] (SPARK-44758) Support memory limit configurable

2023-08-10 Thread zhuml (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuml updated SPARK-44758: -- Description: Currently the memory request and limit are set by summing the values of

[jira] [Updated] (SPARK-44758) Support memory limit configurable

2023-08-10 Thread zhuml (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuml updated SPARK-44758: -- Summary: Support memory limit configurable (was: Support memory limit) > Support memory limit configurable >

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: (was: spark-verbosewithcodegenenabled) > Do not combine multiple Generate nodes

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: wholestagecodegen_wc1_debug_wholecodegen_passed > Do not combine multiple Generate

[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures

2023-08-10 Thread Franck Tago (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: spark-verbosewithcodegenenabled > Do not combine multiple Generate nodes in the same

[jira] [Created] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures

2023-08-10 Thread Franck Tago (Jira)
Franck Tago created SPARK-44759: --- Summary: Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures Key: SPARK-44759 URL:

[jira] [Updated] (SPARK-44757) Vulnerabilities in Spark3.4

2023-08-10 Thread Anand Balasubramaniam (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Balasubramaniam updated SPARK-44757: -- Priority: Major (was: Minor) > Vulnerabilities in Spark3.4 >

[jira] [Created] (SPARK-44758) Support memory limit

2023-08-10 Thread zhuml (Jira)
zhuml created SPARK-44758: - Summary: Support memory limit Key: SPARK-44758 URL: https://issues.apache.org/jira/browse/SPARK-44758 Project: Spark Issue Type: Improvement Components:

[jira] [Created] (SPARK-44757) Vulnerabilities in Spark3.4

2023-08-10 Thread Anand Balasubramaniam (Jira)
Anand Balasubramaniam created SPARK-44757: - Summary: Vulnerabilities in Spark3.4 Key: SPARK-44757 URL: https://issues.apache.org/jira/browse/SPARK-44757 Project: Spark Issue Type:

[jira] [Assigned] (SPARK-43705) Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43705: Assignee: Haejoon Lee > Enable TimedeltaIndexTests.test_properties for pandas 2.0.0. >

[jira] [Resolved] (SPARK-43245) Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of Index.

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43245. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42271

[jira] [Resolved] (SPARK-43705) Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43705. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42271

[jira] [Assigned] (SPARK-43245) Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of Index.

2023-08-10 Thread Hyukjin Kwon (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43245: Assignee: Haejoon Lee > Fix DatetimeIndex.microsecond to return 'int32' instead of

[jira] [Commented] (SPARK-44742) Add Spark version drop down to the PySpark doc site

2023-08-10 Thread BingKun Pan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752689#comment-17752689 ] BingKun Pan commented on SPARK-44742: - I work on it. > Add Spark version drop down to the PySpark

[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-08-10 Thread Harunobu Daikoku (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Description: We have been observing this issue several times in our production where

[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-08-10 Thread Harunobu Daikoku (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Description: We have been observing this issue several times in our production where

[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-08-10 Thread Harunobu Daikoku (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Summary: Executor hangs when RetryingBlockTransferor fails to initiate retry (was:

[jira] [Created] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to submit retry request

2023-08-10 Thread Harunobu Daikoku (Jira)
Harunobu Daikoku created SPARK-44756: Summary: Executor hangs when RetryingBlockTransferor fails to submit retry request Key: SPARK-44756 URL: https://issues.apache.org/jira/browse/SPARK-44756

[jira] [Updated] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-08-10 Thread Siddaraju G C (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddaraju G C updated SPARK-44573: -- Priority: Blocker (was: Major) > Couldn't submit Spark application to Kubenetes in versions

[jira] [Assigned] (SPARK-44691) Move Subclasses of Analysis to sql/api

2023-08-10 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-44691: --- Assignee: Yihong He > Move Subclasses of Analysis to sql/api >

[jira] [Resolved] (SPARK-44691) Move Subclasses of Analysis to sql/api

2023-08-10 Thread Wenchen Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44691. - Fix Version/s: 3.5.0 Resolution: Fixed > Move Subclasses of Analysis to sql/api >

[jira] [Created] (SPARK-44755) Local tmp data is not cleared while using spark streaming consuming from kafka

2023-08-10 Thread leesf (Jira)
leesf created SPARK-44755: - Summary: Local tmp data is not cleared while using spark streaming consuming from kafka Key: SPARK-44755 URL: https://issues.apache.org/jira/browse/SPARK-44755 Project: Spark

[jira] [Updated] (SPARK-44754) Improve DeduplicateRelations rewriteAttrs compatibility

2023-08-10 Thread Jia Fan (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-44754: Description: {{Follow [https://github.com/apache/spark/pull/41554,] we should add test for }}

[jira] [Created] (SPARK-44754) Improve DeduplicateRelations rewriteAttrs compatibility

2023-08-10 Thread Jia Fan (Jira)
Jia Fan created SPARK-44754: --- Summary: Improve DeduplicateRelations rewriteAttrs compatibility Key: SPARK-44754 URL: https://issues.apache.org/jira/browse/SPARK-44754 Project: Spark Issue Type: