[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44509:
-
Fix Version/s: (was: 3.5.0)
   (was: 4.0.0)

> Fine grained interrupt in Python Spark Connect
> --
>
> Key: SPARK-44509
> URL: https://issues.apache.org/jira/browse/SPARK-44509
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Next to SparkSession.interruptAll, provide mechanism for interrupting 
>  * individual queries
>  * user defined groups of queries in a session (by a tag)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44509:
-
Reporter: Hyukjin Kwon  (was: Juliusz Sompolski)

> Fine grained interrupt in Python Spark Connect
> --
>
> Key: SPARK-44509
> URL: https://issues.apache.org/jira/browse/SPARK-44509
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Next to SparkSession.interruptAll, provide mechanism for interrupting 
>  * individual queries
>  * user defined groups of queries in a session (by a tag)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44509) Fine grained interrupt in Python Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44509:


 Summary: Fine grained interrupt in Python Spark Connect
 Key: SPARK-44509
 URL: https://issues.apache.org/jira/browse/SPARK-44509
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Juliusz Sompolski
Assignee: Juliusz Sompolski
 Fix For: 3.5.0, 4.0.0


Next to SparkSession.interruptAll, provide mechanism for interrupting 
 * individual queries
 * user defined groups of queries in a session (by a tag)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44509:
-
Description: 
Same as https://issues.apache.org/jira/browse/SPARK-44422 but need it for Python

 

  was:
Next to SparkSession.interruptAll, provide mechanism for interrupting 
 * individual queries
 * user defined groups of queries in a session (by a tag)


> Fine grained interrupt in Python Spark Connect
> --
>
> Key: SPARK-44509
> URL: https://issues.apache.org/jira/browse/SPARK-44509
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Same as https://issues.apache.org/jira/browse/SPARK-44422 but need it for 
> Python
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44509:
-
Component/s: PySpark

> Fine grained interrupt in Python Spark Connect
> --
>
> Key: SPARK-44509
> URL: https://issues.apache.org/jira/browse/SPARK-44509
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Same as https://issues.apache.org/jira/browse/SPARK-44422 but need it for 
> Python
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44509) Fine grained interrupt in Python Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44509:


Assignee: (was: Juliusz Sompolski)

> Fine grained interrupt in Python Spark Connect
> --
>
> Key: SPARK-44509
> URL: https://issues.apache.org/jira/browse/SPARK-44509
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Next to SparkSession.interruptAll, provide mechanism for interrupting 
>  * individual queries
>  * user defined groups of queries in a session (by a tag)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44422) Fine grained interrupt in Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44422:


Assignee: Juliusz Sompolski

> Fine grained interrupt in Spark Connect
> ---
>
> Key: SPARK-44422
> URL: https://issues.apache.org/jira/browse/SPARK-44422
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>
> Next to SparkSession.interruptAll, provide mechanism for interrupting 
>  * individual queries
>  * user defined groups of queries in a session (by a tag)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44422) Fine grained interrupt in Spark Connect

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44422.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42009
[https://github.com/apache/spark/pull/42009]

> Fine grained interrupt in Spark Connect
> ---
>
> Key: SPARK-44422
> URL: https://issues.apache.org/jira/browse/SPARK-44422
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Next to SparkSession.interruptAll, provide mechanism for interrupting 
>  * individual queries
>  * user defined groups of queries in a session (by a tag)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39634) Allow file splitting in combination with row index generation

2023-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39634.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40728
[https://github.com/apache/spark/pull/40728]

> Allow file splitting in combination with row index generation
> -
>
> Key: SPARK-39634
> URL: https://issues.apache.org/jira/browse/SPARK-39634
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Ala Luszczak
>Assignee: Ala Luszczak
>Priority: Major
> Fix For: 3.5.0
>
>
> This issue is a follow up for SPARK-37980
> Because of a bug in parquet-mr 
> https://issues.apache.org/jira/browse/PARQUET-2161 it is currently impossible 
> to generate row indexes for parquet files if they are split into multiple 
> pieces. Instead, each file must be read in a single task. 
> Once the version of parquet-mr with the fix is included in Spark, we should 
> remove the workarounds from the code (marked with this ticket number) from 
> the code, so that parquet files are splittable even when the row indexes need 
> to be generated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39634) Allow file splitting in combination with row index generation

2023-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39634:
---

Assignee: Ala Luszczak

> Allow file splitting in combination with row index generation
> -
>
> Key: SPARK-39634
> URL: https://issues.apache.org/jira/browse/SPARK-39634
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Ala Luszczak
>Assignee: Ala Luszczak
>Priority: Major
>
> This issue is a follow up for SPARK-37980
> Because of a bug in parquet-mr 
> https://issues.apache.org/jira/browse/PARQUET-2161 it is currently impossible 
> to generate row indexes for parquet files if they are split into multiple 
> pieces. Instead, each file must be read in a single task. 
> Once the version of parquet-mr with the fix is included in Spark, we should 
> remove the workarounds from the code (marked with this ticket number) from 
> the code, so that parquet files are splittable even when the row indexes need 
> to be generated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44464) Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44464:
-
Fix Version/s: 3.4.2

> Fix applyInPandasWithStatePythonRunner to output rows that have Null as first 
> column value
> --
>
> Key: SPARK-44464
> URL: https://issues.apache.org/jira/browse/SPARK-44464
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.3
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Major
> Fix For: 3.5.0, 3.4.2
>
>
> The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot 
> deal with outputs where the first column of the row is {{{}null{}}}, as it 
> cannot distinguish the case where the column is null, or the field is filled 
> as the number of data records are smaller than state records. It causes 
> incorrect results for the former case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44484) Add missing json field batchDuration to StreamingQueryProgress

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-44484:


Assignee: Wei Liu

> Add missing json field batchDuration to StreamingQueryProgress
> --
>
> Key: SPARK-44484
> URL: https://issues.apache.org/jira/browse/SPARK-44484
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44484) Add missing json field batchDuration to StreamingQueryProgress

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-44484.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42077
[https://github.com/apache/spark/pull/42077]

> Add missing json field batchDuration to StreamingQueryProgress
> --
>
> Key: SPARK-44484
> URL: https://issues.apache.org/jira/browse/SPARK-44484
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44504) Maintenance task should clean up loaded providers on stop error

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-44504:
-
Fix Version/s: 3.5.0

> Maintenance task should clean up loaded providers on stop error
> ---
>
> Key: SPARK-44504
> URL: https://issues.apache.org/jira/browse/SPARK-44504
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Maintenance task should clean up loaded providers on stop error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44504) Maintenance task should clean up loaded providers on stop error

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-44504:


Assignee: Anish Shrigondekar

> Maintenance task should clean up loaded providers on stop error
> ---
>
> Key: SPARK-44504
> URL: https://issues.apache.org/jira/browse/SPARK-44504
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
>
> Maintenance task should clean up loaded providers on stop error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44504) Maintenance task should clean up loaded providers on stop error

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-44504.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42098
[https://github.com/apache/spark/pull/42098]

> Maintenance task should clean up loaded providers on stop error
> ---
>
> Key: SPARK-44504
> URL: https://issues.apache.org/jira/browse/SPARK-44504
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
> Fix For: 4.0.0
>
>
> Maintenance task should clean up loaded providers on stop error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44504) Maintenance task should clean up loaded providers on stop error

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-44504:
-
Issue Type: Bug  (was: Task)

> Maintenance task should clean up loaded providers on stop error
> ---
>
> Key: SPARK-44504
> URL: https://issues.apache.org/jira/browse/SPARK-44504
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Maintenance task should clean up loaded providers on stop error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-44464) Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value

2023-07-20 Thread Jungtaek Lim (Jira)


[ https://issues.apache.org/jira/browse/SPARK-44464 ]


Jungtaek Lim deleted comment on SPARK-44464:
--

was (Author: JIRAUSER39):
[~kabhwan] should we backport it all the way to 11.3? Or it's OK to only fix 
newer versions?

> Fix applyInPandasWithStatePythonRunner to output rows that have Null as first 
> column value
> --
>
> Key: SPARK-44464
> URL: https://issues.apache.org/jira/browse/SPARK-44464
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.3
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Major
> Fix For: 3.5.0
>
>
> The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot 
> deal with outputs where the first column of the row is {{{}null{}}}, as it 
> cannot distinguish the case where the column is null, or the field is filled 
> as the number of data records are smaller than state records. It causes 
> incorrect results for the former case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43966) Support non-deterministic Python UDTFs

2023-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-43966:
---

Assignee: Allison Wang

> Support non-deterministic Python UDTFs
> --
>
> Key: SPARK-43966
> URL: https://issues.apache.org/jira/browse/SPARK-43966
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Support Python UDTFs with non-deterministic function body and inputs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43966) Support non-deterministic Python UDTFs

2023-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43966.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 42075
[https://github.com/apache/spark/pull/42075]

> Support non-deterministic Python UDTFs
> --
>
> Key: SPARK-43966
> URL: https://issues.apache.org/jira/browse/SPARK-43966
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Support Python UDTFs with non-deterministic function body and inputs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44365) Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, MergeRowsExec

2023-07-20 Thread Vinod KC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741837#comment-17741837
 ] 

Vinod KC edited comment on SPARK-44365 at 7/21/23 4:15 AM:
---

Raise PR : https://github.com/apache/spark/pull/42105


was (Author: vinodkc):
Im working on it

> Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, 
> MergeRowsExec
> --
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators 
> FileSourceScanExec
> RowDataSourceScanExec
> MergeRowsExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44508) Add user guide and documentation for Python UDTFs

2023-07-20 Thread Allison Wang (Jira)
Allison Wang created SPARK-44508:


 Summary: Add user guide and documentation for Python UDTFs
 Key: SPARK-44508
 URL: https://issues.apache.org/jira/browse/SPARK-44508
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Allison Wang


Add documentation for Python UDTFs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used

2023-07-20 Thread Kent Yao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745365#comment-17745365
 ] 

Kent Yao commented on SPARK-42898:
--

Issue resolved https://github.com/apache/spark/pull/42089

> Cast from string to date and date to string say timezone is needed, but it is 
> not used
> --
>
> Key: SPARK-42898
> URL: https://issues.apache.org/jira/browse/SPARK-42898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Major
>
> This is really minor but SPARK-35581 removed the need for a timezone when 
> casting from a `StringType` to a `DateType`, but the patch didn't update the 
> `needsTimeZone` function to indicate that it was not longer required.
> Currently Casting from a DateType to a StringType also says that it needs the 
> timezone, but it only uses the `DateFormatter` with it's default parameters 
> that do not use the time zone at all.
> I think this can be fixed with just a two line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used

2023-07-20 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-42898.
--
  Assignee: Robert Joseph Evans
Resolution: Fixed

> Cast from string to date and date to string say timezone is needed, but it is 
> not used
> --
>
> Key: SPARK-42898
> URL: https://issues.apache.org/jira/browse/SPARK-42898
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Major
>
> This is really minor but SPARK-35581 removed the need for a timezone when 
> casting from a `StringType` to a `DateType`, but the patch didn't update the 
> `needsTimeZone` function to indicate that it was not longer required.
> Currently Casting from a DateType to a StringType also says that it needs the 
> timezone, but it only uses the `DateFormatter` with it's default parameters 
> that do not use the time zone at all.
> I think this can be fixed with just a two line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44507) SCSC does not depend on AnalysisException

2023-07-20 Thread Rui Wang (Jira)
Rui Wang created SPARK-44507:


 Summary: SCSC does not depend on AnalysisException
 Key: SPARK-44507
 URL: https://issues.apache.org/jira/browse/SPARK-44507
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44502) Add mission versionchanged field to docs

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44502.
--
  Assignee: Wei Liu
Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/42097

> Add mission versionchanged field to docs
> 
>
> Key: SPARK-44502
> URL: https://issues.apache.org/jira/browse/SPARK-44502
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44502) Add mission versionchanged field to docs

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44502:
-
Fix Version/s: 3.5.0
   4.0.0

> Add mission versionchanged field to docs
> 
>
> Key: SPARK-44502
> URL: https://issues.apache.org/jira/browse/SPARK-44502
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44506) Upgrade mima-core & sbt-mima-plugin from 1.1.2 to 1.1.3

2023-07-20 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44506:
---

 Summary: Upgrade mima-core & sbt-mima-plugin from 1.1.2 to 1.1.3
 Key: SPARK-44506
 URL: https://issues.apache.org/jira/browse/SPARK-44506
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42118) Wrong result when parsing a multiline JSON file with differing types for same column

2023-07-20 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745345#comment-17745345
 ] 

Jia Fan commented on SPARK-42118:
-

Seem like already fixed on master branch.

> Wrong result when parsing a multiline JSON file with differing types for same 
> column
> 
>
> Key: SPARK-42118
> URL: https://issues.apache.org/jira/browse/SPARK-42118
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Dilip Biswal
>Priority: Major
>
> Here is a simple reproduction of the problem. We have a JSON file whose 
> content looks like following and is in multiLine format.
> {code}
> [{"name":""},{"name":123.34}]
> {code}
> Here is the result of spark query when we read the above content.
> scala> val df = spark.read.format("json").option("multiLine", 
> true).load("/tmp/json")
> df: org.apache.spark.sql.DataFrame = [name: double]
> scala> df.show(false)
> ++
> |name|
> ++
> |null|
> ++
> scala> df.count
> res5: Long = 2
> This is quite a serious problem for us as it's causing us to master corrupt 
> data in lake. If there is some issue with parsing the input, we expect spark 
> set the "_corrupt_record" so that we can act on it. Please note that df.count 
> is reporting 2 rows where as df.show only reports 1 row with null value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort

2023-07-20 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44359:
-
Summary: Define the computing logic through PartitionEvaluator API and use 
it in BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort  (was: 
Define the computing logic through PartitionEvaluator API and use it in 
BaseScriptTransformationExec)

> Define the computing logic through PartitionEvaluator API and use it in 
> BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort
> --
>
> Key: SPARK-44359
> URL: https://issues.apache.org/jira/browse/SPARK-44359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> aggregate operators
> BaseScriptTransformationExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort

2023-07-20 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44359:
-
Description: 
Define the computing logic through PartitionEvaluator API and use it in SQL 
aggregate operators

BaseScriptTransformationExec

InMemoryTableScanExec

ReferenceSort

  was:
Define the computing logic through PartitionEvaluator API and use it in SQL 
aggregate operators

BaseScriptTransformationExec


> Define the computing logic through PartitionEvaluator API and use it in 
> BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort
> --
>
> Key: SPARK-44359
> URL: https://issues.apache.org/jira/browse/SPARK-44359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> aggregate operators
> BaseScriptTransformationExec
> InMemoryTableScanExec
> ReferenceSort



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, MergeRowsExec

2023-07-20 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44365:
-
Description: 
Define the computing logic through PartitionEvaluator API and use it in SQL 
operators 

FileSourceScanExec

RowDataSourceScanExec

MergeRowsExec

  was:
Define the computing logic through PartitionEvaluator API and use it in SQL 
operators 

InMemoryTableScanExec

DataSourceScanExec

MergeRowsExec

ReferenceSort


> Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, 
> MergeRowsExec
> --
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators 
> FileSourceScanExec
> RowDataSourceScanExec
> MergeRowsExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, MergeRowsExec

2023-07-20 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44365:
-
Summary: Use PartitionEvaluator API in FileSourceScanExec, 
RowDataSourceScanExec, MergeRowsExec  (was: Define the computing logic through 
PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec, 
DataSourceScanExec, MergeRowsExec , ReferenceSort)

> Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, 
> MergeRowsExec
> --
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators 
> InMemoryTableScanExec
> DataSourceScanExec
> MergeRowsExec
> ReferenceSort



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44487) KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44487:


Assignee: Jia Fan

> KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir
> 
>
> Key: SPARK-44487
> URL: https://issues.apache.org/jira/browse/SPARK-44487
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.4.1
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
>
> KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir
>  
> Exception encountered when invoking run on a nested suite.
> java.lang.NullPointerException
>     at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77)
>     at sun.nio.fs.UnixPath.(UnixPath.java:71)
>     at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
>     at java.nio.file.Paths.get(Paths.java:84)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4(KubernetesSuite.scala:164)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4$adapted(KubernetesSuite.scala:163)
>     at scala.collection.LinearSeqOptimized.find(LinearSeqOptimized.scala:115)
>     at scala.collection.LinearSeqOptimized.find$(LinearSeqOptimized.scala:112)
>     at scala.collection.immutable.List.find(List.scala:91)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:163)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44487) KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44487.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42081
[https://github.com/apache/spark/pull/42081]

> KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir
> 
>
> Key: SPARK-44487
> URL: https://issues.apache.org/jira/browse/SPARK-44487
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.4.1
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
> Fix For: 4.0.0
>
>
> KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir
>  
> Exception encountered when invoking run on a nested suite.
> java.lang.NullPointerException
>     at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77)
>     at sun.nio.fs.UnixPath.(UnixPath.java:71)
>     at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
>     at java.nio.file.Paths.get(Paths.java:84)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4(KubernetesSuite.scala:164)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4$adapted(KubernetesSuite.scala:163)
>     at scala.collection.LinearSeqOptimized.find(LinearSeqOptimized.scala:115)
>     at scala.collection.LinearSeqOptimized.find$(LinearSeqOptimized.scala:112)
>     at scala.collection.immutable.List.find(List.scala:91)
>     at 
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:163)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44477) CheckAnalysis uses error subclass as an error class

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44477:


Assignee: Bruce Robbins

> CheckAnalysis uses error subclass as an error class
> ---
>
> Key: SPARK-44477
> URL: https://issues.apache.org/jira/browse/SPARK-44477
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
>
> {{CheckAnalysis}} treats {{TYPE_CHECK_FAILURE_WITH_HINT}} as an error class, 
> but it is instead an error subclass of {{{}DATATYPE_MISMATCH{}}}.
> {noformat}
> spark-sql (default)> select bitmap_count(12);
> [INTERNAL_ERROR] Cannot find main error class 'TYPE_CHECK_FAILURE_WITH_HINT'
> org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error 
> class 'TYPE_CHECK_FAILURE_WITH_HINT'
> at org.apache.spark.SparkException$.internalError(SparkException.scala:83)
> at org.apache.spark.SparkException$.internalError(SparkException.scala:87)
> at 
> org.apache.spark.ErrorClassesJsonReader.$anonfun$getMessageTemplate$1(ErrorClassesJSONReader.scala:68)
> at scala.collection.immutable.HashMap$HashMap1.getOrElse0(HashMap.scala:361)
> at 
> scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:594)
> at 
> scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:589)
> at scala.collection.immutable.HashMap.getOrElse(HashMap.scala:73)
> {noformat}
> This issue only occurs when an expression uses 
> {{TypeCheckResult.TypeCheckFailure}} to indicate input type check failure. 
> {{TypeCheckResult.TypeCheckFailure}} appears to be deprecated in favor of 
> {{{}TypeCheckResult.DataTypeMismatch{}}}, but recently two expressions were 
> added that use {{{}TypeCheckResult.TypeCheckFailure{}}}: {{BitmapCount}} and 
> {{{}BitmapOrAgg{}}}.
> {{BitmapCount}} and {{BitmapOrAgg}} should probably be fixed to use 
> {{{}TypeCheckResult.DataTypeMismatch{}}}. Regardless, the code in 
> {{CheckAnalysis}} that handles {{TypeCheckResult.TypeCheckFailure}} should be 
> corrected (or removed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44477) CheckAnalysis uses error subclass as an error class

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44477.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42064
[https://github.com/apache/spark/pull/42064]

> CheckAnalysis uses error subclass as an error class
> ---
>
> Key: SPARK-44477
> URL: https://issues.apache.org/jira/browse/SPARK-44477
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
> Fix For: 3.5.0, 4.0.0
>
>
> {{CheckAnalysis}} treats {{TYPE_CHECK_FAILURE_WITH_HINT}} as an error class, 
> but it is instead an error subclass of {{{}DATATYPE_MISMATCH{}}}.
> {noformat}
> spark-sql (default)> select bitmap_count(12);
> [INTERNAL_ERROR] Cannot find main error class 'TYPE_CHECK_FAILURE_WITH_HINT'
> org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error 
> class 'TYPE_CHECK_FAILURE_WITH_HINT'
> at org.apache.spark.SparkException$.internalError(SparkException.scala:83)
> at org.apache.spark.SparkException$.internalError(SparkException.scala:87)
> at 
> org.apache.spark.ErrorClassesJsonReader.$anonfun$getMessageTemplate$1(ErrorClassesJSONReader.scala:68)
> at scala.collection.immutable.HashMap$HashMap1.getOrElse0(HashMap.scala:361)
> at 
> scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:594)
> at 
> scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:589)
> at scala.collection.immutable.HashMap.getOrElse(HashMap.scala:73)
> {noformat}
> This issue only occurs when an expression uses 
> {{TypeCheckResult.TypeCheckFailure}} to indicate input type check failure. 
> {{TypeCheckResult.TypeCheckFailure}} appears to be deprecated in favor of 
> {{{}TypeCheckResult.DataTypeMismatch{}}}, but recently two expressions were 
> added that use {{{}TypeCheckResult.TypeCheckFailure{}}}: {{BitmapCount}} and 
> {{{}BitmapOrAgg{}}}.
> {{BitmapCount}} and {{BitmapOrAgg}} should probably be fixed to use 
> {{{}TypeCheckResult.DataTypeMismatch{}}}. Regardless, the code in 
> {{CheckAnalysis}} that handles {{TypeCheckResult.TypeCheckFailure}} should be 
> corrected (or removed).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44252) Add error class for the case when loading state from DFS fails

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-44252.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 41705
[https://github.com/apache/spark/pull/41705]

> Add error class for the case when loading state from DFS fails
> --
>
> Key: SPARK-44252
> URL: https://issues.apache.org/jira/browse/SPARK-44252
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Lucy Yao
>Assignee: Lucy Yao
>Priority: Major
> Fix For: 4.0.0
>
>
> This is part of [https://github.com/apache/spark/pull/41705.]
> Wrap the exception during the loading state, to assign error class properly. 
> With assigning error class, we can classify the errors which help us to 
> determine what errors customers are struggling much. 
> StateStoreProvider.getStore() & StateStoreProvider.getReadStore() is the 
> entry point.
> This ticket also covers failedToReadDeltaFileError and 
> failedToReadSnapshotFileError from 
> [https://issues.apache.org/jira/browse/SPARK-36305|http://example.com/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44252) Add error class for the case when loading state from DFS fails

2023-07-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-44252:


Assignee: Lucy Yao

> Add error class for the case when loading state from DFS fails
> --
>
> Key: SPARK-44252
> URL: https://issues.apache.org/jira/browse/SPARK-44252
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Lucy Yao
>Assignee: Lucy Yao
>Priority: Major
>
> This is part of [https://github.com/apache/spark/pull/41705.]
> Wrap the exception during the loading state, to assign error class properly. 
> With assigning error class, we can classify the errors which help us to 
> determine what errors customers are struggling much. 
> StateStoreProvider.getStore() & StateStoreProvider.getReadStore() is the 
> entry point.
> This ticket also covers failedToReadDeltaFileError and 
> failedToReadSnapshotFileError from 
> [https://issues.apache.org/jira/browse/SPARK-36305|http://example.com/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain

2023-07-20 Thread Martin Grund (Jira)
Martin Grund created SPARK-44505:


 Summary: DataSource v2 Scans should not require planning the input 
partitions on explain
 Key: SPARK-44505
 URL: https://issues.apache.org/jira/browse/SPARK-44505
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Martin Grund


Right now, we will always call `planInputPartitions()` for a DSv2 
implementation even if there is no spark job run but only explain.

We should provide a way to avoid scanning all input partitions just to 
determine if the input is columnar or not. The scan should provide an override.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44504) Maintenance task should clean up loaded providers on stop error

2023-07-20 Thread Anish Shrigondekar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745272#comment-17745272
 ] 

Anish Shrigondekar commented on SPARK-44504:


Sent PR here: https://github.com/apache/spark/pull/42098

> Maintenance task should clean up loaded providers on stop error
> ---
>
> Key: SPARK-44504
> URL: https://issues.apache.org/jira/browse/SPARK-44504
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Maintenance task should clean up loaded providers on stop error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44504) Maintenance task should clean up loaded providers on stop error

2023-07-20 Thread Anish Shrigondekar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745272#comment-17745272
 ] 

Anish Shrigondekar edited comment on SPARK-44504 at 7/20/23 8:57 PM:
-

Sent PR here: [https://github.com/apache/spark/pull/42098]

 

cc - [~kabhwan] 


was (Author: JIRAUSER287599):
Sent PR here: https://github.com/apache/spark/pull/42098

> Maintenance task should clean up loaded providers on stop error
> ---
>
> Key: SPARK-44504
> URL: https://issues.apache.org/jira/browse/SPARK-44504
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Maintenance task should clean up loaded providers on stop error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44504) Maintenance task should clean up loaded providers on stop error

2023-07-20 Thread Anish Shrigondekar (Jira)
Anish Shrigondekar created SPARK-44504:
--

 Summary: Maintenance task should clean up loaded providers on stop 
error
 Key: SPARK-44504
 URL: https://issues.apache.org/jira/browse/SPARK-44504
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 3.5.0
Reporter: Anish Shrigondekar


Maintenance task should clean up loaded providers on stop error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents

2023-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44501.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 42094
[https://github.com/apache/spark/pull/42094]

> Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents
> -
>
> Key: SPARK-44501
> URL: https://issues.apache.org/jira/browse/SPARK-44501
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents

2023-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44501:
-

Assignee: Dongjoon Hyun

> Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents
> -
>
> Key: SPARK-44501
> URL: https://issues.apache.org/jira/browse/SPARK-44501
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44503) Support PARTITION BY and ORDER BY clause for table arguments

2023-07-20 Thread Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745258#comment-17745258
 ] 

Daniel commented on SPARK-44503:


I can work on this part

> Support PARTITION BY and ORDER BY clause for table arguments
> 
>
> Key: SPARK-44503
> URL: https://issues.apache.org/jira/browse/SPARK-44503
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44503) Support PARTITION BY and ORDER BY clause for table arguments

2023-07-20 Thread Daniel (Jira)
Daniel created SPARK-44503:
--

 Summary: Support PARTITION BY and ORDER BY clause for table 
arguments
 Key: SPARK-44503
 URL: https://issues.apache.org/jira/browse/SPARK-44503
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44502) Add mission versionchanged field to docs

2023-07-20 Thread Wei Liu (Jira)
Wei Liu created SPARK-44502:
---

 Summary: Add mission versionchanged field to docs
 Key: SPARK-44502
 URL: https://issues.apache.org/jira/browse/SPARK-44502
 Project: Spark
  Issue Type: New Feature
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents

2023-07-20 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44501:
-

 Summary: Ignore checksum files in 
KubernetesLocalDiskShuffleExecutorComponents
 Key: SPARK-44501
 URL: https://issues.apache.org/jira/browse/SPARK-44501
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents

2023-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44501:
--
Issue Type: Improvement  (was: Bug)

> Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents
> -
>
> Key: SPARK-44501
> URL: https://issues.apache.org/jira/browse/SPARK-44501
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44464) Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value

2023-07-20 Thread Siying Dong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745191#comment-17745191
 ] 

Siying Dong commented on SPARK-44464:
-

[~kabhwan] should we backport it all the way to 11.3? Or it's OK to only fix 
newer versions?

> Fix applyInPandasWithStatePythonRunner to output rows that have Null as first 
> column value
> --
>
> Key: SPARK-44464
> URL: https://issues.apache.org/jira/browse/SPARK-44464
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.3
>Reporter: Siying Dong
>Assignee: Siying Dong
>Priority: Major
> Fix For: 3.5.0
>
>
> The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot 
> deal with outputs where the first column of the row is {{{}null{}}}, as it 
> cannot distinguish the case where the column is null, or the field is filled 
> as the number of data records are smaller than state records. It causes 
> incorrect results for the former case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44500) parse_url treats key as regular expression

2023-07-20 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-44500:
---

 Summary: parse_url treats key as regular expression
 Key: SPARK-44500
 URL: https://issues.apache.org/jira/browse/SPARK-44500
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1, 3.4.0, 3.3.0, 3.2.0
Reporter: Robert Joseph Evans


To be clear I am not 100% sure that this is a bug. It might be a feature, but I 
don't see anywhere that it is used as a feature. If it is a feature it really 
should be documented, because there are pitfalls. If it is a bug it should be 
fixed because it is really confusing and it is simple to shoot yourself in the 
foot.

```scala
> val urls = Seq("http://foo/bar?abc=BAD=GOOD;, 
> "http://foo/bar?a.c=GOOD=BAD;).toDF
> urls.selectExpr("parse_url(value, 'QUERY', 'a.c')").show(false)

++
|parse_url(value, QUERY, a.c)|
++
|BAD |
|GOOD|
++

> urls.selectExpr("parse_url(value, 'QUERY', 'a[c')").show(false)
java.util.regex.PatternSyntaxException: Unclosed character class near index 15
(&|^)a[c=([^&]*)
   ^
  at java.util.regex.Pattern.error(Pattern.java:1969)
  at java.util.regex.Pattern.clazz(Pattern.java:2562)
  at java.util.regex.Pattern.sequence(Pattern.java:2077)
  at java.util.regex.Pattern.expr(Pattern.java:2010)
  at java.util.regex.Pattern.compile(Pattern.java:1702)
  at java.util.regex.Pattern.(Pattern.java:1352)
  at java.util.regex.Pattern.compile(Pattern.java:1028)

```

The simple fix is to quote the key when making the pattern.

```scala
  private def getPattern(key: UTF8String): Pattern = {
Pattern.compile(REGEXPREFIX + Pattern.quote(key.toString) + REGEXSUBFIX)
  }
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44499) FileSourceScanExec OutputPartitioning for non bucketed scan

2023-07-20 Thread Tushar Mahale (Jira)
Tushar Mahale created SPARK-44499:
-

 Summary: FileSourceScanExec OutputPartitioning for non bucketed 
scan
 Key: SPARK-44499
 URL: https://issues.apache.org/jira/browse/SPARK-44499
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1
Reporter: Tushar Mahale


FileSourceScanExec.outputPartitioning currently is calculated for bucketed scan 
only and for non-bucketed scan, we return UnknownPartitioning(0). This may 
result into unnecessary empty tasks creation, based on the SQLConf 
defaultParallelism setting even though the actual file may have very low number 
of partitions.

We need to also calculate and set the number of output partitions correctly for 
non-bucketed scan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44466) Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX from modifiedConfigs

2023-07-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44466:

Summary: Exclude configs starting with SPARK_DRIVER_PREFIX and 
SPARK_EXECUTOR_PREFIX from modifiedConfigs  (was: Update initialSessionOptions 
to the value after supplementation)

> Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX 
> from modifiedConfigs
> 
>
> Key: SPARK-44466
> URL: https://issues.apache.org/jira/browse/SPARK-44466
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Should not include this value: 
> !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44498) Add support for Micrometer Observation

2023-07-20 Thread Marcin Grzejszczak (Jira)
Marcin Grzejszczak created SPARK-44498:
--

 Summary: Add support for Micrometer Observation
 Key: SPARK-44498
 URL: https://issues.apache.org/jira/browse/SPARK-44498
 Project: Spark
  Issue Type: New Feature
  Components: Java API
Affects Versions: 3.5.0
Reporter: Marcin Grzejszczak


I'm a co-maintainer of Spring Cloud Sleuth and Micrometer projects (together 
with Tommy Ludwig and Jonatan Ivanov).

[Micrometer Observation|https://micrometer.io/docs/observation] is part of the 
Micrometer 1.10 release and [Micrometer 
Tracing|https://micrometer.io/docs/tracing] is a new project. The idea of 
Micrometer Observation is that you instrument code once but you get multiple 
benefits out of it - e.g. you can get tracing, metrics, logging or whatever you 
see fit).

I was curious if there's interest in adding Micrometer Observation support so 
that automatically (when on classpath) except for metrics, spans could be 
created and tracing context propagation could happen too. In other words 
metrics and tracing of this project could be created + if there are Micrometer 
Observation compatible projects, then they will join the whole graph (e.g. 
whole Spring Framework 6 is, Apache Dubbo, Apache Camel, Resilience4j etc.)

If there's interest in adding that feature, I can provide a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44494) K8s-it test failed

2023-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44494:
--
Fix Version/s: 3.5.0

> K8s-it test failed
> --
>
> Key: SPARK-44494
> URL: https://issues.apache.org/jira/browse/SPARK-44494
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838]
> {code:java}
> [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 
> minutes, 11 seconds)
> 3786[info]   The code passed to eventually never returned normally. Attempted 
> 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u
> 3787[info]   + myuid=185
> 3788[info]   ++ id -g
> 3789[info]   + mygid=0
> 3790[info]   + set +e
> 3791[info]   ++ getent passwd 185
> 3792[info]   + uidentry=
> 3793[info]   + set -e
> 3794[info]   + '[' -z '' ']'
> 3795[info]   + '[' -w /etc/passwd ']'
> 3796[info]   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> 3797[info]   + '[' -z /opt/java/openjdk ']'
> 3798[info]   + SPARK_CLASSPATH=':/opt/spark/jars/*'
> 3799[info]   + grep SPARK_JAVA_OPT_
> 3800[info]   + sort -t_ -k4 -n
> 3801[info]   + sed 's/[^=]*=\(.*\)/\1/g'
> 3802[info]   + env
> 3803[info]   ++ command -v readarray
> 3804[info]   + '[' readarray ']'
> 3805[info]   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> 3806[info]   + '[' -n '' ']'
> 3807[info]   + '[' -z ']'
> 3808[info]   + '[' -z ']'
> 3809[info]   + '[' -n '' ']'
> 3810[info]   + '[' -z ']'
> 3811[info]   + '[' -z x ']'
> 3812[info]   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
> 3813[info]   + 
> SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir'
> 3814[info]   + case "$1" in
> 3815[info]   + shift 1
> 3816[info]   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf 
> "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" 
> --deploy-mode client "$@")
> 3817[info]   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=10.244.0.45 --conf 
> spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.MiniReadWriteTest 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3818[info]   Files 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from 
> /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to 
> /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar
> 3819[info]   23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 3820[info]   Performing local word count from 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3821[info]   File contents are List(test PVs)
> 3822[info]   Creating SparkSession
> 3823[info]   23/07/20 06:15:15 INFO SparkContext: Running Spark version 
> 4.0.0-SNAPSHOT
> 3824[info]   23/07/20 06:15:15 INFO SparkContext: OS info Linux, 
> 5.15.0-1041-azure, amd64
> 3825[info]   23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7
> 3826[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3827[info]   23/07/20 06:15:15 INFO ResourceUtils: No custom resources 
> configured for spark.driver.
> 3828[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3829[info]   23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini 
> Read Write Test
> 3830[info]   23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile 
> created, executor resources: Map(cores -> name: cores, amount: 1, script: , 
> vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap 
> -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> 
> name: cpus, amount: 1.0) {code}
> The tests in the past two days have failed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44494) K8s-it test failed

2023-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44494:
-

Assignee: Yang Jie

> K8s-it test failed
> --
>
> Key: SPARK-44494
> URL: https://issues.apache.org/jira/browse/SPARK-44494
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838]
> {code:java}
> [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 
> minutes, 11 seconds)
> 3786[info]   The code passed to eventually never returned normally. Attempted 
> 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u
> 3787[info]   + myuid=185
> 3788[info]   ++ id -g
> 3789[info]   + mygid=0
> 3790[info]   + set +e
> 3791[info]   ++ getent passwd 185
> 3792[info]   + uidentry=
> 3793[info]   + set -e
> 3794[info]   + '[' -z '' ']'
> 3795[info]   + '[' -w /etc/passwd ']'
> 3796[info]   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> 3797[info]   + '[' -z /opt/java/openjdk ']'
> 3798[info]   + SPARK_CLASSPATH=':/opt/spark/jars/*'
> 3799[info]   + grep SPARK_JAVA_OPT_
> 3800[info]   + sort -t_ -k4 -n
> 3801[info]   + sed 's/[^=]*=\(.*\)/\1/g'
> 3802[info]   + env
> 3803[info]   ++ command -v readarray
> 3804[info]   + '[' readarray ']'
> 3805[info]   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> 3806[info]   + '[' -n '' ']'
> 3807[info]   + '[' -z ']'
> 3808[info]   + '[' -z ']'
> 3809[info]   + '[' -n '' ']'
> 3810[info]   + '[' -z ']'
> 3811[info]   + '[' -z x ']'
> 3812[info]   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
> 3813[info]   + 
> SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir'
> 3814[info]   + case "$1" in
> 3815[info]   + shift 1
> 3816[info]   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf 
> "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" 
> --deploy-mode client "$@")
> 3817[info]   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=10.244.0.45 --conf 
> spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.MiniReadWriteTest 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3818[info]   Files 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from 
> /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to 
> /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar
> 3819[info]   23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 3820[info]   Performing local word count from 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3821[info]   File contents are List(test PVs)
> 3822[info]   Creating SparkSession
> 3823[info]   23/07/20 06:15:15 INFO SparkContext: Running Spark version 
> 4.0.0-SNAPSHOT
> 3824[info]   23/07/20 06:15:15 INFO SparkContext: OS info Linux, 
> 5.15.0-1041-azure, amd64
> 3825[info]   23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7
> 3826[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3827[info]   23/07/20 06:15:15 INFO ResourceUtils: No custom resources 
> configured for spark.driver.
> 3828[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3829[info]   23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini 
> Read Write Test
> 3830[info]   23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile 
> created, executor resources: Map(cores -> name: cores, amount: 1, script: , 
> vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap 
> -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> 
> name: cpus, amount: 1.0) {code}
> The tests in the past two days have failed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44494) K8s-it test failed

2023-07-20 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44494.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42091
[https://github.com/apache/spark/pull/42091]

> K8s-it test failed
> --
>
> Key: SPARK-44494
> URL: https://issues.apache.org/jira/browse/SPARK-44494
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 4.0.0
>
>
> * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838]
> {code:java}
> [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 
> minutes, 11 seconds)
> 3786[info]   The code passed to eventually never returned normally. Attempted 
> 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u
> 3787[info]   + myuid=185
> 3788[info]   ++ id -g
> 3789[info]   + mygid=0
> 3790[info]   + set +e
> 3791[info]   ++ getent passwd 185
> 3792[info]   + uidentry=
> 3793[info]   + set -e
> 3794[info]   + '[' -z '' ']'
> 3795[info]   + '[' -w /etc/passwd ']'
> 3796[info]   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> 3797[info]   + '[' -z /opt/java/openjdk ']'
> 3798[info]   + SPARK_CLASSPATH=':/opt/spark/jars/*'
> 3799[info]   + grep SPARK_JAVA_OPT_
> 3800[info]   + sort -t_ -k4 -n
> 3801[info]   + sed 's/[^=]*=\(.*\)/\1/g'
> 3802[info]   + env
> 3803[info]   ++ command -v readarray
> 3804[info]   + '[' readarray ']'
> 3805[info]   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> 3806[info]   + '[' -n '' ']'
> 3807[info]   + '[' -z ']'
> 3808[info]   + '[' -z ']'
> 3809[info]   + '[' -n '' ']'
> 3810[info]   + '[' -z ']'
> 3811[info]   + '[' -z x ']'
> 3812[info]   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
> 3813[info]   + 
> SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir'
> 3814[info]   + case "$1" in
> 3815[info]   + shift 1
> 3816[info]   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf 
> "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" 
> --deploy-mode client "$@")
> 3817[info]   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=10.244.0.45 --conf 
> spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.MiniReadWriteTest 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3818[info]   Files 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from 
> /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to 
> /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar
> 3819[info]   23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 3820[info]   Performing local word count from 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3821[info]   File contents are List(test PVs)
> 3822[info]   Creating SparkSession
> 3823[info]   23/07/20 06:15:15 INFO SparkContext: Running Spark version 
> 4.0.0-SNAPSHOT
> 3824[info]   23/07/20 06:15:15 INFO SparkContext: OS info Linux, 
> 5.15.0-1041-azure, amd64
> 3825[info]   23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7
> 3826[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3827[info]   23/07/20 06:15:15 INFO ResourceUtils: No custom resources 
> configured for spark.driver.
> 3828[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3829[info]   23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini 
> Read Write Test
> 3830[info]   23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile 
> created, executor resources: Map(cores -> name: cores, amount: 1, script: , 
> vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap 
> -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> 
> name: cpus, amount: 1.0) {code}
> The tests in the past two days have failed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44497) Show task partition id in Task table

2023-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744987#comment-17744987
 ] 

ASF GitHub Bot commented on SPARK-44497:


User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/42093

> Show task partition id in Task table
> 
>
> Key: SPARK-44497
> URL: https://issues.apache.org/jira/browse/SPARK-44497
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.4.1
>Reporter: dzcxzl
>Priority: Minor
>
> In SPARK-37831, the partition id is added in taskinfo, and the task partition 
> id cannot be directly seen in the ui.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44497) Show task partition id in Task table

2023-07-20 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-44497:
---
Description: In SPARK-37831, the partition id is added in taskinfo, and the 
task partition id cannot be directly seen in the ui.  (was: In 
[SPARK-37831|https://issues.apache.org/jira/browse/SPARK-37831], the partition 
id is added in taskinfo, and the task partition id cannot be directly seen in 
the ui)

> Show task partition id in Task table
> 
>
> Key: SPARK-44497
> URL: https://issues.apache.org/jira/browse/SPARK-44497
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.4.1
>Reporter: dzcxzl
>Priority: Minor
>
> In SPARK-37831, the partition id is added in taskinfo, and the task partition 
> id cannot be directly seen in the ui.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44497) Show task partition id in Task table

2023-07-20 Thread dzcxzl (Jira)
dzcxzl created SPARK-44497:
--

 Summary: Show task partition id in Task table
 Key: SPARK-44497
 URL: https://issues.apache.org/jira/browse/SPARK-44497
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.4.1
Reporter: dzcxzl


In [SPARK-37831|https://issues.apache.org/jira/browse/SPARK-37831], the 
partition id is added in taskinfo, and the task partition id cannot be directly 
seen in the ui



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44484) Add missing json field batchDuration to StreamingQueryProgress

2023-07-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744978#comment-17744978
 ] 

ASF GitHub Bot commented on SPARK-44484:


User 'WweiL' has created a pull request for this issue:
https://github.com/apache/spark/pull/42077

> Add missing json field batchDuration to StreamingQueryProgress
> --
>
> Key: SPARK-44484
> URL: https://issues.apache.org/jira/browse/SPARK-44484
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43001) Spark last window dont flush in append mode

2023-07-20 Thread padavan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744973#comment-17744973
 ] 

padavan commented on SPARK-43001:
-

[~kabhwan]  ? 

> Spark last window dont flush in append mode
> ---
>
> Key: SPARK-43001
> URL: https://issues.apache.org/jira/browse/SPARK-43001
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Structured Streaming
>Affects Versions: 3.3.2
>Reporter: padavan
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The problem is very simple, when you use *TUMBLING* *window* with {*}append 
> mode{*}, then the window is closed +only when the next message arrives+ 
> ({_}+watermark logic{_}). 
> In the current implementation, if you *stop* *incoming* streaming data, the 
> *last* window will *NEVER close* and we LOSE the last window data.
>  
> Business situation:
> Worked correctly and new messages stop incoming and next message come in 5 
> hours  later and the client will get the message after 5 hours instead of the 
> 10 seconds delay of window.
> !https://user-images.githubusercontent.com/61819835/226478055-dc4a123c-4397-4eb0-b6ed-1e185b6fab76.png|width=707,height=294!
> The current implementation needs to be improved. Include in spark internal 
> mechanisms to close windows automatically.
>  
> *What we propose:*
> Add third parameter 
> {{{}DataFrame.{}}}{{{}withWatermark{}}}({_}eventTime{_}, {_}delayThreshold, 
> *maxDelayClose*{_}). And then trigger will execute 
> {code:java}
> if(now - window.upper_bound > maxDelayClose){
>      window.close().flush();
> }
> {code}
> I assume it can be done in a day. It wasn't expected for us that our 
> customers couldn't get the notifications. (the company is in the medical 
> field).
>  
> simple code for problem:
> {code:java}
> kafka_stream_df = spark \
>     .readStream \
>     .format("kafka") \
>     .option("kafka.bootstrap.servers", KAFKA_BROKER) \
>     .option("subscribe", KAFKA_TOPIC) \
>     .option("includeHeaders", "true") \
>     .load()
> sel = (kafka_stream_df.selectExpr("CAST(key AS STRING)", "CAST(value AS 
> STRING)")
>        .select(from_json(col("value").cast("string"), 
> json_schema).alias("data"))
>        .select("data.*")
>        .withWatermark("dt", "1 seconds")
>        .groupBy(window("dt", "10 seconds"))
>        .agg(sum("price"))
>       )
>  
> console = sel \
>     .writeStream \
>     .trigger(processingTime='10 seconds') \
>     .format("console") \
>     .outputMode("append")\
>     .start()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44496) Move Interfaces needed by SCSC to sql/api

2023-07-20 Thread Rui Wang (Jira)
Rui Wang created SPARK-44496:


 Summary: Move Interfaces needed by SCSC to sql/api
 Key: SPARK-44496
 URL: https://issues.apache.org/jira/browse/SPARK-44496
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44495) Resume to use the latest minikube for k8s-it on GitHub Action

2023-07-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-44495:


 Summary: Resume to use the latest minikube for k8s-it on GitHub 
Action
 Key: SPARK-44495
 URL: https://issues.apache.org/jira/browse/SPARK-44495
 Project: Spark
  Issue Type: Task
  Components: Kubernetes, Project Infra, Tests
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44494) K8s-it test failed

2023-07-20 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744935#comment-17744935
 ] 

Yang Jie commented on SPARK-44494:
--

It seems that the test started to fail after Minikube upgrading to 1.31.0, 
before is v1.30.1

 

> K8s-it test failed
> --
>
> Key: SPARK-44494
> URL: https://issues.apache.org/jira/browse/SPARK-44494
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838]
> {code:java}
> [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 
> minutes, 11 seconds)
> 3786[info]   The code passed to eventually never returned normally. Attempted 
> 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u
> 3787[info]   + myuid=185
> 3788[info]   ++ id -g
> 3789[info]   + mygid=0
> 3790[info]   + set +e
> 3791[info]   ++ getent passwd 185
> 3792[info]   + uidentry=
> 3793[info]   + set -e
> 3794[info]   + '[' -z '' ']'
> 3795[info]   + '[' -w /etc/passwd ']'
> 3796[info]   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> 3797[info]   + '[' -z /opt/java/openjdk ']'
> 3798[info]   + SPARK_CLASSPATH=':/opt/spark/jars/*'
> 3799[info]   + grep SPARK_JAVA_OPT_
> 3800[info]   + sort -t_ -k4 -n
> 3801[info]   + sed 's/[^=]*=\(.*\)/\1/g'
> 3802[info]   + env
> 3803[info]   ++ command -v readarray
> 3804[info]   + '[' readarray ']'
> 3805[info]   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> 3806[info]   + '[' -n '' ']'
> 3807[info]   + '[' -z ']'
> 3808[info]   + '[' -z ']'
> 3809[info]   + '[' -n '' ']'
> 3810[info]   + '[' -z ']'
> 3811[info]   + '[' -z x ']'
> 3812[info]   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
> 3813[info]   + 
> SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir'
> 3814[info]   + case "$1" in
> 3815[info]   + shift 1
> 3816[info]   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf 
> "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" 
> --deploy-mode client "$@")
> 3817[info]   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=10.244.0.45 --conf 
> spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.MiniReadWriteTest 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3818[info]   Files 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from 
> /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to 
> /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar
> 3819[info]   23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 3820[info]   Performing local word count from 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3821[info]   File contents are List(test PVs)
> 3822[info]   Creating SparkSession
> 3823[info]   23/07/20 06:15:15 INFO SparkContext: Running Spark version 
> 4.0.0-SNAPSHOT
> 3824[info]   23/07/20 06:15:15 INFO SparkContext: OS info Linux, 
> 5.15.0-1041-azure, amd64
> 3825[info]   23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7
> 3826[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3827[info]   23/07/20 06:15:15 INFO ResourceUtils: No custom resources 
> configured for spark.driver.
> 3828[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3829[info]   23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini 
> Read Write Test
> 3830[info]   23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile 
> created, executor resources: Map(cores -> name: cores, amount: 1, script: , 
> vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap 
> -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> 
> name: cpus, amount: 1.0) {code}
> The tests in the past two days have failed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44494) K8s-it test failed

2023-07-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-44494:


 Summary: K8s-it test failed
 Key: SPARK-44494
 URL: https://issues.apache.org/jira/browse/SPARK-44494
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Tests
Affects Versions: 4.0.0
Reporter: Yang Jie


* [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838]

{code:java}
[info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 
minutes, 11 seconds)
3786[info]   The code passed to eventually never returned normally. Attempted 
7921 times over 3.000105988813 minutes. Last failure message: "++ id -u
3787[info]   + myuid=185
3788[info]   ++ id -g
3789[info]   + mygid=0
3790[info]   + set +e
3791[info]   ++ getent passwd 185
3792[info]   + uidentry=
3793[info]   + set -e
3794[info]   + '[' -z '' ']'
3795[info]   + '[' -w /etc/passwd ']'
3796[info]   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
3797[info]   + '[' -z /opt/java/openjdk ']'
3798[info]   + SPARK_CLASSPATH=':/opt/spark/jars/*'
3799[info]   + grep SPARK_JAVA_OPT_
3800[info]   + sort -t_ -k4 -n
3801[info]   + sed 's/[^=]*=\(.*\)/\1/g'
3802[info]   + env
3803[info]   ++ command -v readarray
3804[info]   + '[' readarray ']'
3805[info]   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
3806[info]   + '[' -n '' ']'
3807[info]   + '[' -z ']'
3808[info]   + '[' -z ']'
3809[info]   + '[' -n '' ']'
3810[info]   + '[' -z ']'
3811[info]   + '[' -z x ']'
3812[info]   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
3813[info]   + 
SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir'
3814[info]   + case "$1" in
3815[info]   + shift 1
3816[info]   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf 
"spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" 
--deploy-mode client "$@")
3817[info]   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=10.244.0.45 --conf 
spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client 
--properties-file /opt/spark/conf/spark.properties --class 
org.apache.spark.examples.MiniReadWriteTest 
local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar 
/opt/spark/pv-tests/tmp3727659354473892032.txt
3818[info]   Files 
local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from 
/opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to 
/opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar
3819[info]   23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
3820[info]   Performing local word count from 
/opt/spark/pv-tests/tmp3727659354473892032.txt
3821[info]   File contents are List(test PVs)
3822[info]   Creating SparkSession
3823[info]   23/07/20 06:15:15 INFO SparkContext: Running Spark version 
4.0.0-SNAPSHOT
3824[info]   23/07/20 06:15:15 INFO SparkContext: OS info Linux, 
5.15.0-1041-azure, amd64
3825[info]   23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7
3826[info]   23/07/20 06:15:15 INFO ResourceUtils: 
==
3827[info]   23/07/20 06:15:15 INFO ResourceUtils: No custom resources 
configured for spark.driver.
3828[info]   23/07/20 06:15:15 INFO ResourceUtils: 
==
3829[info]   23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini 
Read Write Test
3830[info]   23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile 
created, executor resources: Map(cores -> name: cores, amount: 1, script: , 
vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> 
name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> 
name: cpus, amount: 1.0) {code}
The tests in the past two days have failed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44494) K8s-it test failed

2023-07-20 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744933#comment-17744933
 ] 

Yang Jie commented on SPARK-44494:
--

cc [~yikunkero]  Do you have any suggestions?

 

> K8s-it test failed
> --
>
> Key: SPARK-44494
> URL: https://issues.apache.org/jira/browse/SPARK-44494
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838]
> {code:java}
> [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 
> minutes, 11 seconds)
> 3786[info]   The code passed to eventually never returned normally. Attempted 
> 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u
> 3787[info]   + myuid=185
> 3788[info]   ++ id -g
> 3789[info]   + mygid=0
> 3790[info]   + set +e
> 3791[info]   ++ getent passwd 185
> 3792[info]   + uidentry=
> 3793[info]   + set -e
> 3794[info]   + '[' -z '' ']'
> 3795[info]   + '[' -w /etc/passwd ']'
> 3796[info]   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> 3797[info]   + '[' -z /opt/java/openjdk ']'
> 3798[info]   + SPARK_CLASSPATH=':/opt/spark/jars/*'
> 3799[info]   + grep SPARK_JAVA_OPT_
> 3800[info]   + sort -t_ -k4 -n
> 3801[info]   + sed 's/[^=]*=\(.*\)/\1/g'
> 3802[info]   + env
> 3803[info]   ++ command -v readarray
> 3804[info]   + '[' readarray ']'
> 3805[info]   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> 3806[info]   + '[' -n '' ']'
> 3807[info]   + '[' -z ']'
> 3808[info]   + '[' -z ']'
> 3809[info]   + '[' -n '' ']'
> 3810[info]   + '[' -z ']'
> 3811[info]   + '[' -z x ']'
> 3812[info]   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
> 3813[info]   + 
> SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir'
> 3814[info]   + case "$1" in
> 3815[info]   + shift 1
> 3816[info]   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf 
> "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" 
> --deploy-mode client "$@")
> 3817[info]   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=10.244.0.45 --conf 
> spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client 
> --properties-file /opt/spark/conf/spark.properties --class 
> org.apache.spark.examples.MiniReadWriteTest 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3818[info]   Files 
> local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from 
> /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to 
> /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar
> 3819[info]   23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 3820[info]   Performing local word count from 
> /opt/spark/pv-tests/tmp3727659354473892032.txt
> 3821[info]   File contents are List(test PVs)
> 3822[info]   Creating SparkSession
> 3823[info]   23/07/20 06:15:15 INFO SparkContext: Running Spark version 
> 4.0.0-SNAPSHOT
> 3824[info]   23/07/20 06:15:15 INFO SparkContext: OS info Linux, 
> 5.15.0-1041-azure, amd64
> 3825[info]   23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7
> 3826[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3827[info]   23/07/20 06:15:15 INFO ResourceUtils: No custom resources 
> configured for spark.driver.
> 3828[info]   23/07/20 06:15:15 INFO ResourceUtils: 
> ==
> 3829[info]   23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini 
> Read Write Test
> 3830[info]   23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile 
> created, executor resources: Map(cores -> name: cores, amount: 1, script: , 
> vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap 
> -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> 
> name: cpus, amount: 1.0) {code}
> The tests in the past two days have failed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44493) Extract pushable predicates from disjunctive predicates

2023-07-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44493:

Attachment: before.png

> Extract pushable predicates from disjunctive predicates
> ---
>
> Key: SPARK-44493
> URL: https://issues.apache.org/jira/browse/SPARK-44493
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: after.png, before.png
>
>
> Example:
> {code:sql}
> select count(*)
> from
>   db.very_large_table
> where
>   session_start_dt between date_sub('2023-07-15', 1) and 
> date_add('2023-07-16', 1)
>   and type = 'event'
>   and date(event_timestamp) between '2023-07-15' and '2023-07-16'
>   and (
> (
>   page_id in (2627, 2835, 2402999)
>   and -- other predicates
>   and rdt = 0
> ) or (
>   page_id in (2616, 3411350)
>   and rdt = 0
> ) or (
>   page_id = 2403006
> ) or (
>   page_id in (2208336, 2356359)
>   and -- other predicates
>   and rdt = 0
> )
>   )
> {code}
> We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 
> 2208336, 2356359)}} to datasource.
> Before:
> After:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44493) Extract pushable predicates from disjunctive predicates

2023-07-20 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44493:
---

 Summary: Extract pushable predicates from disjunctive predicates
 Key: SPARK-44493
 URL: https://issues.apache.org/jira/browse/SPARK-44493
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yuming Wang
 Attachments: after.png, before.png

Example:
{code:sql}
select count(*)
from
  db.very_large_table
where
  session_start_dt between date_sub('2023-07-15', 1) and date_add('2023-07-16', 
1)
  and type = 'event'
  and date(event_timestamp) between '2023-07-15' and '2023-07-16'
  and (
(
  page_id in (2627, 2835, 2402999)
  and -- other predicates
  and rdt = 0
) or (
  page_id in (2616, 3411350)
  and rdt = 0
) or (
  page_id = 2403006
) or (
  page_id in (2208336, 2356359)
  and -- other predicates
  and rdt = 0
)
  )
{code}

We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 
2208336, 2356359)}} to datasource.
Before:

After:





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44493) Extract pushable predicates from disjunctive predicates

2023-07-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44493:

Description: 
Example:
{code:sql}
select count(*)
from
  db.very_large_table
where
  session_start_dt between date_sub('2023-07-15', 1) and date_add('2023-07-16', 
1)
  and type = 'event'
  and date(event_timestamp) between '2023-07-15' and '2023-07-16'
  and (
(
  page_id in (2627, 2835, 2402999)
  and -- other predicates
  and rdt = 0
) or (
  page_id in (2616, 3411350)
  and rdt = 0
) or (
  page_id = 2403006
) or (
  page_id in (2208336, 2356359)
  and -- other predicates
  and rdt = 0
)
  )
{code}

We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 
2208336, 2356359)}} to datasource.
Before:
 !before.png! 
After:
 !after.png! 



  was:
Example:
{code:sql}
select count(*)
from
  db.very_large_table
where
  session_start_dt between date_sub('2023-07-15', 1) and date_add('2023-07-16', 
1)
  and type = 'event'
  and date(event_timestamp) between '2023-07-15' and '2023-07-16'
  and (
(
  page_id in (2627, 2835, 2402999)
  and -- other predicates
  and rdt = 0
) or (
  page_id in (2616, 3411350)
  and rdt = 0
) or (
  page_id = 2403006
) or (
  page_id in (2208336, 2356359)
  and -- other predicates
  and rdt = 0
)
  )
{code}

We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 
2208336, 2356359)}} to datasource.
Before:

After:




> Extract pushable predicates from disjunctive predicates
> ---
>
> Key: SPARK-44493
> URL: https://issues.apache.org/jira/browse/SPARK-44493
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: after.png, before.png
>
>
> Example:
> {code:sql}
> select count(*)
> from
>   db.very_large_table
> where
>   session_start_dt between date_sub('2023-07-15', 1) and 
> date_add('2023-07-16', 1)
>   and type = 'event'
>   and date(event_timestamp) between '2023-07-15' and '2023-07-16'
>   and (
> (
>   page_id in (2627, 2835, 2402999)
>   and -- other predicates
>   and rdt = 0
> ) or (
>   page_id in (2616, 3411350)
>   and rdt = 0
> ) or (
>   page_id = 2403006
> ) or (
>   page_id in (2208336, 2356359)
>   and -- other predicates
>   and rdt = 0
> )
>   )
> {code}
> We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 
> 2208336, 2356359)}} to datasource.
> Before:
>  !before.png! 
> After:
>  !after.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44493) Extract pushable predicates from disjunctive predicates

2023-07-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44493:

Attachment: after.png

> Extract pushable predicates from disjunctive predicates
> ---
>
> Key: SPARK-44493
> URL: https://issues.apache.org/jira/browse/SPARK-44493
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: after.png, before.png
>
>
> Example:
> {code:sql}
> select count(*)
> from
>   db.very_large_table
> where
>   session_start_dt between date_sub('2023-07-15', 1) and 
> date_add('2023-07-16', 1)
>   and type = 'event'
>   and date(event_timestamp) between '2023-07-15' and '2023-07-16'
>   and (
> (
>   page_id in (2627, 2835, 2402999)
>   and -- other predicates
>   and rdt = 0
> ) or (
>   page_id in (2616, 3411350)
>   and rdt = 0
> ) or (
>   page_id = 2403006
> ) or (
>   page_id in (2208336, 2356359)
>   and -- other predicates
>   and rdt = 0
> )
>   )
> {code}
> We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 
> 2208336, 2356359)}} to datasource.
> Before:
> After:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44475) Relocate DataType and Parser to sql/api

2023-07-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44475.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41928
[https://github.com/apache/spark/pull/41928]

> Relocate DataType and Parser to sql/api
> ---
>
> Key: SPARK-44475
> URL: https://issues.apache.org/jira/browse/SPARK-44475
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44491) Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44491:


Assignee: BingKun Pan

> Add `branch-3.5` to `publish_snapshot` GitHub Action job
> 
>
> Key: SPARK-44491
> URL: https://issues.apache.org/jira/browse/SPARK-44491
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44491) Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44491.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42088
[https://github.com/apache/spark/pull/42088]

> Add `branch-3.5` to `publish_snapshot` GitHub Action job
> 
>
> Key: SPARK-44491
> URL: https://issues.apache.org/jira/browse/SPARK-44491
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44492) Resolve remaining AnalysisException

2023-07-20 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-44492:
---

 Summary: Resolve remaining AnalysisException
 Key: SPARK-44492
 URL: https://issues.apache.org/jira/browse/SPARK-44492
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


We addressed most of AnalysisException from SPARK-43611, but there are still 
some remaining tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org