[jira] [Resolved] (SPARK-45752) Unreferenced CTE should all be checked by CheckAnalysis0

2023-11-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45752.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43614
[https://github.com/apache/spark/pull/43614]

> Unreferenced CTE should all be checked by CheckAnalysis0
> 
>
> Key: SPARK-45752
> URL: https://issues.apache.org/jira/browse/SPARK-45752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45848) spark-build-info.ps1 missing the docroot property

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45848:
---
Labels: pull-request-available  (was: )

> spark-build-info.ps1 missing the docroot property
> -
>
> Key: SPARK-45848
> URL: https://issues.apache.org/jira/browse/SPARK-45848
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44
> https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45848) spark-build-info.ps1 missing the docroot property

2023-11-08 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-45848:

Description: 
https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44
https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36

  
was:https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44


> spark-build-info.ps1 missing the docroot property
> -
>
> Key: SPARK-45848
> URL: https://issues.apache.org/jira/browse/SPARK-45848
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44
> https://github.com/apache/spark/blob/master/build/spark-build-info#L30-L36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45848) spark-build-info.ps1 missing the docroot property

2023-11-08 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-45848:
---

 Summary: spark-build-info.ps1 missing the docroot property
 Key: SPARK-45848
 URL: https://issues.apache.org/jira/browse/SPARK-45848
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 4.0.0
Reporter: Yuming Wang


https://github.com/apache/spark/blob/master/build/spark-build-info.ps1#L38-L44



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44662) SPIP: Improving performance of BroadcastHashJoin queries with stream side join key on non partition columns

2023-11-08 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784282#comment-17784282
 ] 

Asif commented on SPARK-44662:
--

The changes for iceberg which support broadcast-var-pushdown are present in the 
git repo:
[iceberg-repo|https://github.com/ahshahid/iceberg.git]
branch : broadcastvar-push.
The changes done in the iceberg branch are compatible with latest apache/spark 
master ( identified as 3.5 to iceberg) and tested and compiled using scala 2.13.
To get the iceberg-spark-run-time jar for use:

First locally install the spark jars using the PR of spark mentioned below.
(./build/mvn clean install -Phive -Phive-thriftserver -DskipTests)
Then use the iceberg branch broadcastvar-push to create the iceberg spark 
runtime jar such that it uses the locally installed spark as dependency.

In case you are interested in evaluating performance, pls let me know.

> SPIP: Improving performance of BroadcastHashJoin queries with stream side 
> join key on non partition columns
> ---
>
> Key: SPARK-44662
> URL: https://issues.apache.org/jira/browse/SPARK-44662
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Major
>  Labels: pull-request-available
> Attachments: perf results broadcast var pushdown - Partitioned 
> TPCDS.pdf
>
>
> h2. *Q1. What are you trying to do? Articulate your objectives using 
> absolutely no jargon.*
> On the lines of DPP which helps DataSourceV2 relations when the joining key 
> is a partition column, the same concept can be extended over to the case 
> where joining key is not a partition column.
> The Keys of BroadcastHashJoin are already available before actual evaluation 
> of the stream iterator. These keys can be pushed down to the DataSource as a 
> SortedSet.
> For non partition columns, the DataSources like iceberg have max/min stats on 
> column available at manifest level, and for formats like parquet , they have 
> max/min stats at various storage level. The passed SortedSet can be used to 
> prune using ranges at both driver level ( manifests files) as well as 
> executor level ( while actually going through chunks , row groups etc at 
> parquet level)
> If the data is stored as Columnar Batch format , then it would not be 
> possible to filter out individual row at DataSource level, even though we 
> have keys.
> But at the scan level, ( ColumnToRowExec) it is still possible to filter out 
> as many rows as possible , if the query involves nested joins. Thus reducing 
> the number of rows to join at the higher join levels.
> Will be adding more details..
> h2. *Q2. What problem is this proposal NOT designed to solve?*
> This can only help in BroadcastHashJoin's performance if the join is Inner or 
> Left Semi.
> This will also not work if there are nodes like Expand, Generator , Aggregate 
> (without group by on keys not part of joining column etc) below the 
> BroadcastHashJoin node being targeted.
> h2. *Q3. How is it done today, and what are the limits of current practice?*
> Currently this sort of pruning at DataSource level is being done using DPP 
> (Dynamic Partition Pruning ) and IFF one of the join key column is a 
> Partitioning column ( so that cost of DPP query is justified and way less 
> than amount of data it will be filtering by skipping partitions).
> The limitation is that DPP type approach is not implemented ( intentionally I 
> believe), if the join column is a non partition column ( because of cost of 
> "DPP type" query would most likely be way high as compared to any possible 
> pruning ( especially if the column is not stored in a sorted manner).
> h2. *Q4. What is new in your approach and why do you think it will be 
> successful?*
> 1) This allows pruning on non partition column based joins. 
> 2) Because it piggy backs on Broadcasted Keys, there is no extra cost of "DPP 
> type" query. 
> 3) The Data can be used by DataSource to prune at driver (possibly) and also 
> at executor level ( as in case of parquet which has max/min at various 
> structure levels)
> 4) The big benefit should be seen in multilevel nested join queries. In the 
> current code base, if I am correct, only one join's pruning filter would get 
> pushed at scan level. Since it is on partition key may be that is sufficient. 
> But if it is a nested Join query , and may be involving different columns on 
> streaming side for join, each such filter push could do significant pruning. 
> This requires some handling in case of AQE, as the stream side iterator ( & 
> hence stage evaluation needs to be delayed, till all the available join 
> filters in the nested tree are pushed at their respective target 
> 

[jira] [Updated] (SPARK-45830) Refactor StorageUtils#bufferCleaner

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45830:
---
Labels: pull-request-available  (was: )

> Refactor StorageUtils#bufferCleaner
> ---
>
> Key: SPARK-45830
> URL: https://issues.apache.org/jira/browse/SPARK-45830
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45831) Change to using the collection factory to create an immutable Java collection

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45831:
-

Assignee: Yang Jie

> Change to using the collection factory to create an immutable Java collection
> -
>
> Key: SPARK-45831
> URL: https://issues.apache.org/jira/browse/SPARK-45831
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45831) Change to using the collection factory to create an immutable Java collection

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45831.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43709
[https://github.com/apache/spark/pull/43709]

> Change to using the collection factory to create an immutable Java collection
> -
>
> Key: SPARK-45831
> URL: https://issues.apache.org/jira/browse/SPARK-45831
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45835) Make gitHub labeler more accurate and remove outdated comments

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45835:
-

Assignee: BingKun Pan

> Make gitHub labeler more accurate and remove outdated comments
> --
>
> Key: SPARK-45835
> URL: https://issues.apache.org/jira/browse/SPARK-45835
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45835) Make gitHub labeler more accurate and remove outdated comments

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45835.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43716
[https://github.com/apache/spark/pull/43716]

> Make gitHub labeler more accurate and remove outdated comments
> --
>
> Key: SPARK-45835
> URL: https://issues.apache.org/jira/browse/SPARK-45835
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45842) Refactor Catalog Function APIs to use analyzer

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45842.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43720
[https://github.com/apache/spark/pull/43720]

> Refactor Catalog Function APIs to use analyzer
> --
>
> Key: SPARK-45842
> URL: https://issues.apache.org/jira/browse/SPARK-45842
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45842) Refactor Catalog Function APIs to use analyzer

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45842:
-

Assignee: Yihong He

> Refactor Catalog Function APIs to use analyzer
> --
>
> Key: SPARK-45842
> URL: https://issues.apache.org/jira/browse/SPARK-45842
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45846) spark.sql.optimizeNullAwareAntiJoin should respect spark.sql.autoBroadcastJoinThreshold

2023-11-08 Thread Chao Sun (Jira)
Chao Sun created SPARK-45846:


 Summary: spark.sql.optimizeNullAwareAntiJoin should respect 
spark.sql.autoBroadcastJoinThreshold
 Key: SPARK-45846
 URL: https://issues.apache.org/jira/browse/SPARK-45846
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Chao Sun


Normally broadcast join can be disabled when users set 
{{spark.sql.autoBroadcastJoinThreshold}} to -1. However this doesn't apply to 
{{spark.sql.optimizeNullAwareAntiJoin}}:

{code}
  case j @ ExtractSingleColumnNullAwareAntiJoin(leftKeys, rightKeys) =>
Seq(joins.BroadcastHashJoinExec(leftKeys, rightKeys, LeftAnti, 
BuildRight,
  None, planLater(j.left), planLater(j.right), isNullAwareAntiJoin = 
true))
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45845) Streaming UI add number of evicted state rows

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45845:
---
Labels: pull-request-available  (was: )

> Streaming UI add number of evicted state rows
> -
>
> Key: SPARK-45845
> URL: https://issues.apache.org/jira/browse/SPARK-45845
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> The UI is missing this chart, and people always confuse "aggregated number of 
> rows dropped by watermark" with this newly added metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45845) Streaming UI add number of evicted state rows

2023-11-08 Thread Wei Liu (Jira)
Wei Liu created SPARK-45845:
---

 Summary: Streaming UI add number of evicted state rows
 Key: SPARK-45845
 URL: https://issues.apache.org/jira/browse/SPARK-45845
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Wei Liu


The UI is missing this chart, and people always confuse "aggregated number of 
rows dropped by watermark" with this newly added metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45843) Support `killAll` in REST Submission API

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45843:
-

Assignee: Dongjoon Hyun

> Support `killAll` in REST Submission API
> 
>
> Key: SPARK-45843
> URL: https://issues.apache.org/jira/browse/SPARK-45843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45843) Support `killAll` in REST Submission API

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45843.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43721
[https://github.com/apache/spark/pull/43721]

> Support `killAll` in REST Submission API
> 
>
> Key: SPARK-45843
> URL: https://issues.apache.org/jira/browse/SPARK-45843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45843) Support `killall` in REST Submission API

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45843:
--
Summary: Support `killall` in REST Submission API  (was: Support `killAll` 
in REST Submission API)

> Support `killall` in REST Submission API
> 
>
> Key: SPARK-45843
> URL: https://issues.apache.org/jira/browse/SPARK-45843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-11-08 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42821.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 40454
[https://github.com/apache/spark/pull/40454]

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-11-08 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-42821:


Assignee: BingKun Pan

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45844) Implement case insensitivity for XML

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45844:
---
Labels: pull-request-available  (was: )

> Implement case insensitivity for XML
> 
>
> Key: SPARK-45844
> URL: https://issues.apache.org/jira/browse/SPARK-45844
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, we don't follow the `SQLConf` of case insensitivity in XML, which 
> is inconsistent with other file formats. This PR implements the 
> case-insensitive behavior for schema inference and file reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45844) Implement case insensitivity for XML

2023-11-08 Thread Shujing Yang (Jira)
Shujing Yang created SPARK-45844:


 Summary: Implement case insensitivity for XML
 Key: SPARK-45844
 URL: https://issues.apache.org/jira/browse/SPARK-45844
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Shujing Yang


Currently, we don't follow the `SQLConf` of case insensitivity in XML, which is 
inconsistent with other file formats. This PR implements the case-insensitive 
behavior for schema inference and file reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45827) Add variant data type in Spark

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45827:
---
Labels: pull-request-available  (was: )

> Add variant data type in Spark
> --
>
> Key: SPARK-45827
> URL: https://issues.apache.org/jira/browse/SPARK-45827
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45843) Support `killAll` in REST Submission API

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45843:
---
Labels: pull-request-available  (was: )

> Support `killAll` in REST Submission API
> 
>
> Key: SPARK-45843
> URL: https://issues.apache.org/jira/browse/SPARK-45843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45843) Support `killAll` in REST Submission API

2023-11-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-45843:
-

 Summary: Support `killAll` in REST Submission API
 Key: SPARK-45843
 URL: https://issues.apache.org/jira/browse/SPARK-45843
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45639) Support loading Python data sources in DataFrameReader

2023-11-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45639.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43630
[https://github.com/apache/spark/pull/43630]

> Support loading Python data sources in DataFrameReader
> --
>
> Key: SPARK-45639
> URL: https://issues.apache.org/jira/browse/SPARK-45639
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Allow users to read from a Python data source using 
> `spark.read.format(...).load()` in PySpark. For example
> Users can extend the DataSource and the DataSourceReader classes to create 
> their own Python data source reader and use them in PySpark:
> {code:java}
> class MyReader(DataSourceReader):
>     def read(self, partition):
>         yield (0, 1)
> class MyDataSource(DataSource):
>     def schema(self):
>         return "id INT, value INT"
>     
> def reader(self, schema):
>         return MyReader()
> df = spark.read.format("MyDataSource").load()
> df.show()
> +---+-+
> | id|value|
> +---+-+
> |  0|    1|
> +---+-+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45639) Support loading Python data sources in DataFrameReader

2023-11-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45639:


Assignee: Allison Wang

> Support loading Python data sources in DataFrameReader
> --
>
> Key: SPARK-45639
> URL: https://issues.apache.org/jira/browse/SPARK-45639
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Allow users to read from a Python data source using 
> `spark.read.format(...).load()` in PySpark. For example
> Users can extend the DataSource and the DataSourceReader classes to create 
> their own Python data source reader and use them in PySpark:
> {code:java}
> class MyReader(DataSourceReader):
>     def read(self, partition):
>         yield (0, 1)
> class MyDataSource(DataSource):
>     def schema(self):
>         return "id INT, value INT"
>     
> def reader(self, schema):
>         return MyReader()
> df = spark.read.format("MyDataSource").load()
> df.show()
> +---+-+
> | id|value|
> +---+-+
> |  0|    1|
> +---+-+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45828) Remove deprecated method in dsl

2023-11-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45828.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43708
[https://github.com/apache/spark/pull/43708]

> Remove deprecated method in dsl
> ---
>
> Key: SPARK-45828
> URL: https://issues.apache.org/jira/browse/SPARK-45828
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45282) Join loses records for cached datasets

2023-11-08 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784115#comment-17784115
 ] 

koert kuipers commented on SPARK-45282:
---

it does look like same issue

and partitioning being the cause makes sense too




> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Blocker
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45282) Join loses records for cached datasets

2023-11-08 Thread Emil Ejbyfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784109#comment-17784109
 ] 

Emil Ejbyfeldt commented on SPARK-45282:


The code reproducing the bug looks quite similar to 
https://issues.apache.org/jira/browse/SPARK-45592 I wonder if the fix for that 
might also have solved this bug as I could not reproduce this issue on a build 
from the master branch.

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Blocker
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45826) Add a SQL config for extra stack traces in Origin

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45826:
---
Labels: pull-request-available  (was: )

> Add a SQL config for extra stack traces in Origin
> -
>
> Key: SPARK-45826
> URL: https://issues.apache.org/jira/browse/SPARK-45826
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Add a SQL config to control how many extra stack traces should be captured in 
> the withOrigin method. This should improve user experience in troubleshooting 
> issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45842) Refactor Catalog Function APIs to use analyzer

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45842:
---
Labels: pull-request-available  (was: )

> Refactor Catalog Function APIs to use analyzer
> --
>
> Key: SPARK-45842
> URL: https://issues.apache.org/jira/browse/SPARK-45842
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45842) Refactor Catalog Function APIs to use analyzer

2023-11-08 Thread Yihong He (Jira)
Yihong He created SPARK-45842:
-

 Summary: Refactor Catalog Function APIs to use analyzer
 Key: SPARK-45842
 URL: https://issues.apache.org/jira/browse/SPARK-45842
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Yihong He






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45841) Expose stack trace by DataFrameQueryContext

2023-11-08 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45841.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43703
[https://github.com/apache/spark/pull/43703]

> Expose stack trace by DataFrameQueryContext
> ---
>
> Key: SPARK-45841
> URL: https://issues.apache.org/jira/browse/SPARK-45841
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Modify DataFrameQueryContext and expose stack traces to users. This should 
> allow easily troubleshoot issues. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45841) Expose stack trace by DataFrameQueryContext

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45841:
---
Labels: pull-request-available  (was: )

> Expose stack trace by DataFrameQueryContext
> ---
>
> Key: SPARK-45841
> URL: https://issues.apache.org/jira/browse/SPARK-45841
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Modify DataFrameQueryContext and expose stack traces to users. This should 
> allow easily troubleshoot issues. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45837) Report underlying error in scala client

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45837:
---
Labels: pull-request-available  (was: )

> Report underlying error in scala client
> ---
>
> Key: SPARK-45837
> URL: https://issues.apache.org/jira/browse/SPARK-45837
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Priority: Minor
>  Labels: pull-request-available
>
> When there is retry-worthy error, we need to not just throw RetryException, 
> but also 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45841) Expose stack trace by DataFrameQueryContext

2023-11-08 Thread Max Gekk (Jira)
Max Gekk created SPARK-45841:


 Summary: Expose stack trace by DataFrameQueryContext
 Key: SPARK-45841
 URL: https://issues.apache.org/jira/browse/SPARK-45841
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


Modify DataFrameQueryContext and expose stack traces to users. This should 
allow easily troubleshoot issues. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45840) Fix these issue in module sql/hive, sql/hive-thriftserver

2023-11-08 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-45840:
---
Summary: Fix these issue in module sql/hive, sql/hive-thriftserver  (was: 
Fix these issue in module sql/hive)

> Fix these issue in module sql/hive, sql/hive-thriftserver
> -
>
> Key: SPARK-45840
> URL: https://issues.apache.org/jira/browse/SPARK-45840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45840) Fix these issue in module sql/hive

2023-11-08 Thread Jiaan Geng (Jira)
Jiaan Geng created SPARK-45840:
--

 Summary: Fix these issue in module sql/hive
 Key: SPARK-45840
 URL: https://issues.apache.org/jira/browse/SPARK-45840
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45839) Fix these issue in module sql/api

2023-11-08 Thread Jiaan Geng (Jira)
Jiaan Geng created SPARK-45839:
--

 Summary: Fix these issue in module sql/api
 Key: SPARK-45839
 URL: https://issues.apache.org/jira/browse/SPARK-45839
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45838) Fix these issue in module sql/core

2023-11-08 Thread Jiaan Geng (Jira)
Jiaan Geng created SPARK-45838:
--

 Summary: Fix these issue in module sql/core
 Key: SPARK-45838
 URL: https://issues.apache.org/jira/browse/SPARK-45838
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45825) Fix these issue in module sql/catalyst

2023-11-08 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-45825:
---
Summary: Fix these issue in module sql/catalyst  (was: Fix these issue in 
package sql/catalyst)

> Fix these issue in module sql/catalyst
> --
>
> Key: SPARK-45825
> URL: https://issues.apache.org/jira/browse/SPARK-45825
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45816) Return null when overflowing during casting from timestamp to integers

2023-11-08 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-45816.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43694
[https://github.com/apache/spark/pull/43694]

> Return null when overflowing during casting from timestamp to integers
> --
>
> Key: SPARK-45816
> URL: https://issues.apache.org/jira/browse/SPARK-45816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark cast works in two modes: ansi and non-ansi. When overflowing during 
> casting, the common behavior under non-ansi mode is to return null. However, 
> casting from Timestamp to Int/Short/Byte returns a wrapping value now. The 
> behavior to silently overflow doesn't make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45816) Return null when overflowing during casting from timestamp to integers

2023-11-08 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng reassigned SPARK-45816:
--

Assignee: L. C. Hsieh

> Return null when overflowing during casting from timestamp to integers
> --
>
> Key: SPARK-45816
> URL: https://issues.apache.org/jira/browse/SPARK-45816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> Spark cast works in two modes: ansi and non-ansi. When overflowing during 
> casting, the common behavior under non-ansi mode is to return null. However, 
> casting from Timestamp to Int/Short/Byte returns a wrapping value now. The 
> behavior to silently overflow doesn't make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45837) Report underlying error in scala client

2023-11-08 Thread Alice Sayutina (Jira)
Alice Sayutina created SPARK-45837:
--

 Summary: Report underlying error in scala client
 Key: SPARK-45837
 URL: https://issues.apache.org/jira/browse/SPARK-45837
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Alice Sayutina


When there is retry-worthy error, we need to not just throw RetryException, but 
also 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45606) Release restrictions on multi-layer runtime filter

2023-11-08 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-45606.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43449
[https://github.com/apache/spark/pull/43449]

> Release restrictions on multi-layer runtime filter
> --
>
> Key: SPARK-45606
> URL: https://issues.apache.org/jira/browse/SPARK-45606
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Before https://issues.apache.org/jira/browse/SPARK-41674, Spark only supports 
> insert runtime filter for application side of shuffle join on single-layer. 
> Considered it's not worth to insert more runtime filter if one side of the 
> shuffle join already exists runtime filter, Spark restricts it.
> After https://issues.apache.org/jira/browse/SPARK-41674, Spark supports 
> insert runtime filter for one side of any shuffle join on multi-layer. But 
> the restrictions on multi-layer runtime filter looks outdated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45829) The default value of ‘spark.executor.logs.rolling.maxSize' on the official website is incorrect

2023-11-08 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-45829.
--
Fix Version/s: 3.3.4
   3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43712
[https://github.com/apache/spark/pull/43712]

> The default value of ‘spark.executor.logs.rolling.maxSize' on the official 
> website is incorrect
> ---
>
> Key: SPARK-45829
> URL: https://issues.apache.org/jira/browse/SPARK-45829
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, UI
>Affects Versions: 3.5.0
>Reporter: chenyu
>Assignee: chenyu
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2
>
> Attachments: the default value.png, the value on the website.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45829) The default value of ‘spark.executor.logs.rolling.maxSize' on the official website is incorrect

2023-11-08 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-45829:


Assignee: chenyu

> The default value of ‘spark.executor.logs.rolling.maxSize' on the official 
> website is incorrect
> ---
>
> Key: SPARK-45829
> URL: https://issues.apache.org/jira/browse/SPARK-45829
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, UI
>Affects Versions: 3.5.0
>Reporter: chenyu
>Assignee: chenyu
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: the default value.png, the value on the website.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45825) Fix these issue in package sql/catalyst

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45825:
---
Labels: pull-request-available  (was: )

> Fix these issue in package sql/catalyst
> ---
>
> Key: SPARK-45825
> URL: https://issues.apache.org/jira/browse/SPARK-45825
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43341) StructType.toDDL does not pick up on non-nullability of column in nested struct

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43341:
---
Labels: pull-request-available  (was: )

> StructType.toDDL does not pick up on non-nullability of column in nested 
> struct
> ---
>
> Key: SPARK-43341
> URL: https://issues.apache.org/jira/browse/SPARK-43341
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Bram Boogaarts
>Priority: Major
>  Labels: pull-request-available
>
> h2. The problem
> When converting a StructType instance containing a nested StructType column 
> which in turn contains a column for which {{nullable = false}} to a DDL 
> string using {{{}.toDDL{}}}, the resulting DDL string does not include this 
> non-nullability. For example:
> {code:java}
> val testschema = StructType(List(
>   StructField("key", IntegerType, false),
>   StructField("value", StringType, true),
>   StructField("nestedCols", StructType(List(
> StructField("nestedKey", IntegerType, false),
> StructField("nestedValue", StringType, true)
>   )), false)
> ))
> println(testschema.toDDL)
> println(StructType.fromDDL(testschema.toDDL)){code}
> gives:
> {code:java}
> key INT NOT NULL,value STRING,nestedCols STRUCT STRING> NOT NULL
> StructType(
>   StructField(key,IntegerType,false),
>   StructField(value,StringType,true),
>   StructField(nestedCols,StructType(
> StructField(nestedKey,IntegerType,true),
> StructField(nestedValue,StringType,true)
>   ),false)
> ){code}
>  
> This is due to the fact that {{StructType.toDDL}} calls {{StructField.toDDL}} 
> for its fields, which in turn calls {{.sql}} for its {{{}dataType{}}}. If 
> {{dataType}} is a {{{}StructType{}}}, the call to {{.sql}} in turn calls 
> {{.sql}} for all the nested fields, and this last method does not include the 
> nullability of the field in its output.
> h2. Proposed solution
> {{StructField.toDDL}} should call {{dataType.toDDL}} for a 
> {{{}StructType{}}}, since this will include information about nullability of 
> nested columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45816) Return null when overflowing during casting from timestamp to integers

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45816:
--

Assignee: Apache Spark

> Return null when overflowing during casting from timestamp to integers
> --
>
> Key: SPARK-45816
> URL: https://issues.apache.org/jira/browse/SPARK-45816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Spark cast works in two modes: ansi and non-ansi. When overflowing during 
> casting, the common behavior under non-ansi mode is to return null. However, 
> casting from Timestamp to Int/Short/Byte returns a wrapping value now. The 
> behavior to silently overflow doesn't make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45816) Return null when overflowing during casting from timestamp to integers

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45816:
--

Assignee: (was: Apache Spark)

> Return null when overflowing during casting from timestamp to integers
> --
>
> Key: SPARK-45816
> URL: https://issues.apache.org/jira/browse/SPARK-45816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> Spark cast works in two modes: ansi and non-ansi. When overflowing during 
> casting, the common behavior under non-ansi mode is to return null. However, 
> casting from Timestamp to Int/Short/Byte returns a wrapping value now. The 
> behavior to silently overflow doesn't make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-11-08 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reopened SPARK-42821:
--

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-11-08 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42821:
-
Affects Version/s: 4.0.0
   (was: 3.5.0)

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45824) Enforce error class in ParseException

2023-11-08 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45824.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43702
[https://github.com/apache/spark/pull/43702]

> Enforce error class in ParseException
> -
>
> Key: SPARK-45824
> URL: https://issues.apache.org/jira/browse/SPARK-45824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Make the error class in ParseException mandatory to enforce callers to always 
> set it. This simplifies migration on error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42821) Remove unused parameters in splitFiles methods

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42821:
---
Labels: pull-request-available  (was: )

> Remove unused parameters in splitFiles methods
> --
>
> Key: SPARK-42821
> URL: https://issues.apache.org/jira/browse/SPARK-42821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45835) Make gitHub labeler more accurate and remove outdated comments

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45835:
---
Labels: pull-request-available  (was: )

> Make gitHub labeler more accurate and remove outdated comments
> --
>
> Key: SPARK-45835
> URL: https://issues.apache.org/jira/browse/SPARK-45835
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org