date:20231008

[jira] [Updated] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45467:
---
Labels: pull-request-available  (was: )

> Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
> 
>
> Key: SPARK-45467
> URL: https://issues.apache.org/jira/browse/SPARK-45467
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
>  * @deprecated Proxy classes generated in a named module are encapsulated
>  *  and not accessible to code outside its module.
>  *  {@link Constructor#newInstance(Object...) Constructor.newInstance}
>  *  will throw {@code IllegalAccessException} when it is called on
>  *  an inaccessible proxy class.
>  *  Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)}
>  *  to create a proxy instance instead.
>  *
>  * @see Package and Module Membership of Proxy Class
>  * @revised 9
>  */
> @Deprecated
> @CallerSensitive
> public static Class getProxyClass(ClassLoader loader,
>  Class... interfaces)
> throws IllegalArgumentException {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45464) [CORE] Fix yarn distribution build

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45464:
---
Labels: pull-request-available  (was: )

> [CORE] Fix yarn distribution build
> --
>
> Key: SPARK-45464
> URL: https://issues.apache.org/jira/browse/SPARK-45464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/apache/spark/pull/43164] introduced a regression in:
>  
> ```
> ./dev/make-distribution.sh --tgz -Phive -Phive-thriftserver -Pyarn
> ```
>  
> this needs to be fixed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`

2023-10-08 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45467:
-
Description: 
{code:java}
 * @deprecated Proxy classes generated in a named module are encapsulated
 *  and not accessible to code outside its module.
 *  {@link Constructor#newInstance(Object...) Constructor.newInstance}
 *  will throw {@code IllegalAccessException} when it is called on
 *  an inaccessible proxy class.
 *  Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)}
 *  to create a proxy instance instead.
 *
 * @see Package and Module Membership of Proxy Class
 * @revised 9
 */
@Deprecated
@CallerSensitive
public static Class getProxyClass(ClassLoader loader,
 Class... interfaces)
throws IllegalArgumentException {code}

> Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
> 
>
> Key: SPARK-45467
> URL: https://issues.apache.org/jira/browse/SPARK-45467
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
>  * @deprecated Proxy classes generated in a named module are encapsulated
>  *  and not accessible to code outside its module.
>  *  {@link Constructor#newInstance(Object...) Constructor.newInstance}
>  *  will throw {@code IllegalAccessException} when it is called on
>  *  an inaccessible proxy class.
>  *  Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)}
>  *  to create a proxy instance instead.
>  *
>  * @see Package and Module Membership of Proxy Class
>  * @revised 9
>  */
> @Deprecated
> @CallerSensitive
> public static Class getProxyClass(ClassLoader loader,
>  Class... interfaces)
> throws IllegalArgumentException {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`

2023-10-08 Thread Yang Jie (Jira)

Yang Jie created SPARK-45467:


 Summary: Replace `Proxy.getProxyClass()` with 
`Proxy.newProxyInstance().getClass`
 Key: SPARK-45467
 URL: https://issues.apache.org/jira/browse/SPARK-45467
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45466) VectorAssembler should validate the vector values

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45466:
---
Labels: pull-request-available  (was: )

> VectorAssembler should validate the vector values
> -
>
> Key: SPARK-45466
> URL: https://issues.apache.org/jira/browse/SPARK-45466
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45466) VectorAssembler should validate the vector values

2023-10-08 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-45466:
-

 Summary: VectorAssembler should validate the vector values
 Key: SPARK-45466
 URL: https://issues.apache.org/jira/browse/SPARK-45466
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45465) Upgrade kubernetes-client to 6.9.0 for K8s 1.28

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45465.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43284
[https://github.com/apache/spark/pull/43284]

> Upgrade kubernetes-client to 6.9.0 for K8s 1.28
> ---
>
> Key: SPARK-45465
> URL: https://issues.apache.org/jira/browse/SPARK-45465
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-08 Thread BingKun Pan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773090#comment-17773090
 ] 

BingKun Pan commented on SPARK-45428:
-

Okay, let me investigate it.

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44729) Add canonical links to the PySpark docs page

2023-10-08 Thread BingKun Pan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773088#comment-17773088
 ] 

BingKun Pan commented on SPARK-44729:
-

The historical versions includes: 
3.1.1
3.1.2
3.1.3

3.2.0
3.2.1
3.2.2
3.2.3
3.2.4

3.3.0
3.3.1
3.3.2
3.3.3

3.4.0
3.4.1

> Add canonical links to the PySpark docs page
> 
>
> Key: SPARK-44729
> URL: https://issues.apache.org/jira/browse/SPARK-44729
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 4.0.0
>
>
> We should add the canonical link to the PySpark docs page 
> [https://spark.apache.org/docs/latest/api/python/index.html] so that the 
> search engine can return the latest PySpark docs.
> Then, we need to update all released documentation pages to add the canonical 
> link pointing to the latest spark documentation of the API (such as group 
> by). Currently, if you Google pyspark groupby, Google will return the docs 
> page from 3.1.1, which is not ideal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-08 Thread Ruifeng Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773078#comment-17773078
 ] 

Ruifeng Zheng commented on SPARK-45428:
---

[~panbingkun]  would you mind taking a look? thanks!

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread XiDuo You (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773074#comment-17773074
 ] 

XiDuo You commented on SPARK-45443:
---

> Can this increase probability of concurrent IMR materialization for same IMR 
> instance?

I think they are same, The TableCacheQueryStage is more like a barrier and 
report some metrics to AQE framework. The gap of `eagerly` is very small.

> For queries using AQE, can introducing TableCacheQueryStage into physical 
> plan once per unique IMR instance help

I did not see the difference. I think one idea is, we can introduce something 
like `ReusedTableCacheQueryStage`. The `ReusedTableCacheQueryStage` only holds 
an empty future which wait for the first TableCacheQueryStage materialization, 
so that we can make sure the cached RDD only be executed once. But this idea 
only work for one query, say, if there are multi-queries which reference the 
same caced RDD (e.g., in thiftserver), the issue is still existed.


> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>   .count().as("count")
> df.persist()
> val df2 = df.sort("count").filter('count <= 2)
> val df3 = df.sort("count").filter('count >= 3)
> val df4 = df2.join(df3, Seq("id", "name", "department"), "fullouter")
> df4.show() {code}
> *Physical Plan:*
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan (31)
> +- == Final Plan ==
>    CollectLimit (21)
>    +- * Project (20)
>       +- * SortMergeJoin FullOuter (19)
>          :- * Sort (10)
>          :  +- * Filter (9)
>          :     +- TableCacheQueryStage (8), Statistics(sizeInBytes=210.0 B, 
> rowCount=5)
>          :        +- InMemoryTableScan (1)
>          :              +- InMemoryRelation (2)
>          :                    +- AdaptiveSparkPlan (7)
>          :                       +- HashAggregate (6)
>          :                          +- Exchange (5)
>          :                             +- HashAggregate (4)
>          :                                +- LocalTableScan (3)
>          +- * Sort (18)
>             +- * Filter (17)
>                +- TableCacheQueryStage (16), Statistics(sizeInBytes=210.0 B, 
> rowCount=5)
>                   +- InMemoryTableScan (11)
>                         +- InMemoryRelation (12)
>                               +- AdaptiveSparkPlan (15)
>                                  +- HashAggregate (14)
>                                     +- Exchange (13)
>                                        +- HashAggregate (4)
>                                           +- LocalTableScan (3) {code}
> *Stages DAGs materializing the same IMR instance:*
> !IMR Materialization - Stage 2.png|width=303,height=134!
> !IMR Materialization - Stage 3.png|width=303,height=134!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44729) Add canonical links to the PySpark docs page

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44729:
---
Labels: pull-request-available  (was: )

> Add canonical links to the PySpark docs page
> 
>
> Key: SPARK-44729
> URL: https://issues.apache.org/jira/browse/SPARK-44729
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 4.0.0
>
>
> We should add the canonical link to the PySpark docs page 
> [https://spark.apache.org/docs/latest/api/python/index.html] so that the 
> search engine can return the latest PySpark docs.
> Then, we need to update all released documentation pages to add the canonical 
> link pointing to the latest spark documentation of the API (such as group 
> by). Currently, if you Google pyspark groupby, Google will return the docs 
> page from 3.1.1, which is not ideal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42716:
---
Labels: pull-request-available  (was: )

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Priority: Major
>  Labels: pull-request-available
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44527) Simplify BinaryComparison if its children contain ScalarSubquery with empty output

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44527.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42129
[https://github.com/apache/spark/pull/42129]

> Simplify BinaryComparison if its children contain ScalarSubquery with empty 
> output
> --
>
> Key: SPARK-44527
> URL: https://issues.apache.org/jira/browse/SPARK-44527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44527) Simplify BinaryComparison if its children contain ScalarSubquery with empty output

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44527:
---
Labels: pull-request-available  (was: )

> Simplify BinaryComparison if its children contain ScalarSubquery with empty 
> output
> --
>
> Key: SPARK-44527
> URL: https://issues.apache.org/jira/browse/SPARK-44527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44527) Replace ScalarSubquery with null if its maxRows is 0

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44527:
--
Summary: Replace ScalarSubquery with null if its maxRows is 0  (was: 
Simplify BinaryComparison if its children contain ScalarSubquery with empty 
output)

> Replace ScalarSubquery with null if its maxRows is 0
> 
>
> Key: SPARK-44527
> URL: https://issues.apache.org/jira/browse/SPARK-44527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44527) Simplify BinaryComparison if its children contain ScalarSubquery with empty output

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44527:
-

Assignee: Yuming Wang

> Simplify BinaryComparison if its children contain ScalarSubquery with empty 
> output
> --
>
> Key: SPARK-44527
> URL: https://issues.apache.org/jira/browse/SPARK-44527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45465) Upgrade kubernetes-client to 6.9.0 for K8s 1.28

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45465:
-

Assignee: Dongjoon Hyun

> Upgrade kubernetes-client to 6.9.0 for K8s 1.28
> ---
>
> Key: SPARK-45465
> URL: https://issues.apache.org/jira/browse/SPARK-45465
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45465) Upgrade kubernetes-client to 6.9.0 for K8s 1.28

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45465:
---
Labels: pull-request-available  (was: )

> Upgrade kubernetes-client to 6.9.0 for K8s 1.28
> ---
>
> Key: SPARK-45465
> URL: https://issues.apache.org/jira/browse/SPARK-45465
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45465) Upgrade kubernetes-client to 6.9.0 for K8s 1.28

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45465:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Upgrade kubernetes-client to 6.9.0 for K8s 1.28
> ---
>
> Key: SPARK-45465
> URL: https://issues.apache.org/jira/browse/SPARK-45465
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45465) Upgrade kubernetes-client to 6.9.0 for K8s 1.28

2023-10-08 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45465:
-

 Summary: Upgrade kubernetes-client to 6.9.0 for K8s 1.28
 Key: SPARK-45465
 URL: https://issues.apache.org/jira/browse/SPARK-45465
 Project: Spark
  Issue Type: Improvement
  Components: Build, Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45457) Surface sc.setLocalProperty() value=NULL param meaning

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45457.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43269
[https://github.com/apache/spark/pull/43269]

> Surface sc.setLocalProperty() value=NULL param meaning
> --
>
> Key: SPARK-45457
> URL: https://issues.apache.org/jira/browse/SPARK-45457
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> sc.setLocalProperty() has special meaning/feature and when supplied then 
> removes the property associated to key parameter. It's only mentioned in the 
> Fair Scheduler section of the documentation.
> It would be nice to document that on the API's as well for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45460) Replace `scala.collection.convert.ImplicitConversions` to `scala.jdk.CollectionConverters`

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45460.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43275
[https://github.com/apache/spark/pull/43275]

> Replace `scala.collection.convert.ImplicitConversions` to 
> `scala.jdk.CollectionConverters`
> --
>
> Key: SPARK-45460
> URL: https://issues.apache.org/jira/browse/SPARK-45460
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari edited comment on SPARK-45443 at 10/8/23 8:42 PM:
-

Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and can introduce potential 
regression when IMR cached data size is high (e.g: observing this behavior when 
IMR needs to read high shuffle data size). Also, the queries can have multiple 
IMR instances by referencing multiple replicated IMR instances, this can also 
increase probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solution options (if makes sense):
For queries using AQE, can introducing TableCacheQueryStage into physical plan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before its TableCacheQueryStage instance is created by 
AdaptiveSparkPlanExec and TableCacheQueryStage can materialize unique IMR 
instance once.


was (Author: erenavsarogullari):
Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and can introduce potential 
regression when IMR cached data size is high (e.g: observing this behavior when 
IMR needs to read high shuffle data size). Also, the queries can have multiple 
IMR instances by referencing multiple replicated IMR instances, this can also 
increase probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options (if makes sense):
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec and 
TableCacheQueryStage can materialize unique IMR instance once.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>

[jira] [Comment Edited] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari edited comment on SPARK-45443 at 10/8/23 8:40 PM:
-

Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and can introduce potential 
regression when IMR cached data size is high (e.g: observing this behavior when 
IMR needs to read high shuffle data size). Also, the queries can have multiple 
IMR instances by referencing multiple replicated IMR instances, this can also 
increase probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options (if makes sense):
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec and 
TableCacheQueryStage can materialize unique IMR instance once.


was (Author: erenavsarogullari):
Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and can introduce potential 
regression when IMR cached data size is high (e.g: observing this behavior when 
IMR needs to read high shuffle data size). Also, the queries can have multiple 
IMR instances by referencing multiple replicated IMR instances, this can also 
increase probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec and 
TableCacheQueryStage can materialize unique IMR instance once.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>

[jira] [Comment Edited] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari edited comment on SPARK-45443 at 10/8/23 8:33 PM:
-

Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and can introduce potential 
regression when IMR cached data size is high (e.g: observing this behavior when 
IMR needs to read high shuffle data size). Also, the queries can have multiple 
IMR instances by referencing multiple replicated IMR instances, this can also 
increase probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec and 
TableCacheQueryStage can materialize unique IMR instance once.


was (Author: erenavsarogullari):
Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec and 
TableCacheQueryStage can materialize unique IMR instance once.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>   .count().as("count")
>

[jira] [Comment Edited] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari edited comment on SPARK-45443 at 10/8/23 8:29 PM:
-

Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec and 
TableCacheQueryStage can materialize unique IMR instance once.


was (Author: erenavsarogullari):
Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>   .count().as("count")
> df.persist()
> val df2 = df.sort("count").filter('count <= 2)
> val df3 =

[jira] [Created] (SPARK-45464) [CORE] Fix yarn distribution build

2023-10-08 Thread Hasnain Lakhani (Jira)

Hasnain Lakhani created SPARK-45464:
---

 Summary: [CORE] Fix yarn distribution build
 Key: SPARK-45464
 URL: https://issues.apache.org/jira/browse/SPARK-45464
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 4.0.0
Reporter: Hasnain Lakhani


[https://github.com/apache/spark/pull/43164] introduced a regression in:

 

```

./dev/make-distribution.sh --tgz -Phive -Phive-thriftserver -Pyarn

```

 

this needs to be fixed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari edited comment on SPARK-45443 at 10/8/23 8:27 PM:
-

Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.


was (Author: erenavsarogullari):
Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential following solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>   .count().as("count")
> df.persist()
> val df2 = df.sort("count").filter('count <= 2)
> val df3 = df.sort("count").filter('count >= 3)
> val df4 = df2.join(df3, Seq("id",

[jira] [Comment Edited] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari edited comment on SPARK-45443 at 10/8/23 8:25 PM:
-

Hi [~ulysses], 

Firstly, thanks for reply.

For above sample query, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential following solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.


was (Author: erenavsarogullari):
Hi [~ulysses], 

Firstly, thanks for reply.

For queries using AQE like above query, if TableCacheQueryStage flow is 
disabled, IMR materialization will be triggered by ShuffleQueryStage 
(introduced by Sort' s Exchange node). Both ShuffleQueryStage nodes will also 
need to materialize same IMR instance in this case so i believe same issue may 
also occur in previous flow. TableCacheQueryStage materializes IMR eagerly as 
different from previous flow. Can this increase probability of concurrent IMR 
materialization for same IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential following solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>   .count().as("count")
> df.persist()
> val df2 = df.sort("count").filter('count <= 2)
> val df3 = df.sort("count").filter('count >= 3)
> val df4

[jira] [Comment Edited] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari edited comment on SPARK-45443 at 10/8/23 8:24 PM:
-

Hi [~ulysses], 

Firstly, thanks for reply.

For queries using AQE like above query, if TableCacheQueryStage flow is 
disabled, IMR materialization will be triggered by ShuffleQueryStage 
(introduced by Sort' s Exchange node). Both ShuffleQueryStage nodes will also 
need to materialize same IMR instance in this case so i believe same issue may 
also occur in previous flow. TableCacheQueryStage materializes IMR eagerly as 
different from previous flow. Can this increase probability of concurrent IMR 
materialization for same IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential following solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.


was (Author: erenavsarogullari):
Hi [~ulysses], 

Firstly, thanks for reply.

For queries using AQE, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential following solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>   .count().as("count")
> df.persist()
> val df2 = df.sort("count").filter('count <= 2)
> val df3 = df.sort("count").filter('count >= 3)
> val df4

[jira] [Commented] (SPARK-45443) Revisit TableCacheQueryStage to avoid replicated InMemoryRelation materialization

2023-10-08 Thread Eren Avsarogullari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773050#comment-17773050
 ] 

Eren Avsarogullari commented on SPARK-45443:


Hi [~ulysses], 

Firstly, thanks for reply.

For queries using AQE, if TableCacheQueryStage flow is disabled, IMR 
materialization will be triggered by ShuffleQueryStage (introduced by Sort' s 
Exchange node). Both ShuffleQueryStage nodes will also need to materialize same 
IMR instance in this case so i believe same issue may also occur in previous 
flow. TableCacheQueryStage materializes IMR eagerly as different from previous 
flow. Can this increase probability of concurrent IMR materialization for same 
IMR instance?

I think this behavior is not visible when IMR cached data size is low. However, 
replicated IMR materialization can be expensive and introduce regression when 
IMR cached data size is high (e.g: observing this behavior when IMR needs to 
read high shuffle data size). Also, the queries can have multiple IMR instances 
by referencing multiple replicated IMR instances, this can also increase 
probability of concurrent IMR materialization for same IMR instance.

Thinking on potential following solutions options if makes sense:
For queries using AQE, can introducing TableCacheQueryStage into PhysicalPlan 
once per unique IMR instance help? IMR instances can be compared if they are 
equivalent before TableCacheQueryStage is created by AdaptiveSparkPlanExec.

> Revisit TableCacheQueryStage to avoid replicated InMemoryRelation 
> materialization
> -
>
> Key: SPARK-45443
> URL: https://issues.apache.org/jira/browse/SPARK-45443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eren Avsarogullari
>Priority: Major
> Attachments: IMR Materialization - Stage 2.png, IMR Materialization - 
> Stage 3.png
>
>
> TableCacheQueryStage is created per InMemoryTableScanExec by 
> AdaptiveSparkPlanExec and it materializes InMemoryTableScanExec output 
> (cached RDD) to provide runtime stats in order to apply AQE  optimizations 
> into remaining physical plan stages. TableCacheQueryStage materializes 
> InMemoryTableScanExec eagerly by submitting job per TableCacheQueryStage 
> instance. For example, if there are 2 TableCacheQueryStage instances 
> referencing same IMR instance (cached RDD) and first InMemoryTableScanExec' s 
> materialization takes longer, following logic will return false 
> (inMemoryTableScan.isMaterialized => false) and this may cause replicated IMR 
> materialization. This behavior can be more visible when cached RDD size is 
> high.
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala#L281]
> Would like to get community feedback. Thanks in advance.
> cc [~ulysses] [~cloud_fan]
> *Sample Query to simulate the problem:*
> // Both join legs uses same IMR instance
> {code:java}
> import spark.implicits._
> val arr = (1 to 12).map { i => {
> val index = i % 5
> (index, s"Employee_$index", s"Department_$index")
>   }
> }
> val df = arr.toDF("id", "name", "department")
>   .filter('id >= 0)
>   .sort("id")
>   .groupBy('id, 'name, 'department)
>   .count().as("count")
> df.persist()
> val df2 = df.sort("count").filter('count <= 2)
> val df3 = df.sort("count").filter('count >= 3)
> val df4 = df2.join(df3, Seq("id", "name", "department"), "fullouter")
> df4.show() {code}
> *Physical Plan:*
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan (31)
> +- == Final Plan ==
>    CollectLimit (21)
>    +- * Project (20)
>       +- * SortMergeJoin FullOuter (19)
>          :- * Sort (10)
>          :  +- * Filter (9)
>          :     +- TableCacheQueryStage (8), Statistics(sizeInBytes=210.0 B, 
> rowCount=5)
>          :        +- InMemoryTableScan (1)
>          :              +- InMemoryRelation (2)
>          :                    +- AdaptiveSparkPlan (7)
>          :                       +- HashAggregate (6)
>          :                          +- Exchange (5)
>          :                             +- HashAggregate (4)
>          :                                +- LocalTableScan (3)
>          +- * Sort (18)
>             +- * Filter (17)
>                +- TableCacheQueryStage (16), Statistics(sizeInBytes=210.0 B, 
> rowCount=5)
>                   +- InMemoryTableScan (11)
>                         +- InMemoryRelation (12)
>                               +- AdaptiveSparkPlan (15)
>                                  +- HashAggregate (14)
>                                     +- Exchange (13)
>                                        +- HashAggregate (4)
>                                           +- LocalTableScan (3) {code}
> *Stages DAGs

[jira] [Resolved] (SPARK-45461) Introduce a mapper for StorageLevel

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45461.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43278
[https://github.com/apache/spark/pull/43278]

> Introduce a mapper for StorageLevel
> ---
>
> Key: SPARK-45461
> URL: https://issues.apache.org/jira/browse/SPARK-45461
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, StorageLevel provides fromString to get the StorageLevel's 
> instance with StorageLevel's name. So developers or users have to copy the 
> string literal about StorageLevel's name to set or get instance of 
> StorageLevel. This issue lead to developers need to manually maintain its 
> consistency. It is easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43704) Enable IndexesParityTests.test_to_series

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-43704:
-

Assignee: Haejoon Lee

> Enable IndexesParityTests.test_to_series
> 
>
> Key: SPARK-43704
> URL: https://issues.apache.org/jira/browse/SPARK-43704
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43704) Enable IndexesParityTests.test_to_series

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-43704.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43228
[https://github.com/apache/spark/pull/43228]

> Enable IndexesParityTests.test_to_series
> 
>
> Key: SPARK-43704
> URL: https://issues.apache.org/jira/browse/SPARK-43704
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45462) Show `Duration` in `ApplicationPage`

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45462.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43279
[https://github.com/apache/spark/pull/43279]

> Show `Duration` in `ApplicationPage`
> 
>
> Key: SPARK-45462
> URL: https://issues.apache.org/jira/browse/SPARK-45462
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45462) Show `Duration` in `ApplicationPage`

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45462:
-

Assignee: Dongjoon Hyun

> Show `Duration` in `ApplicationPage`
> 
>
> Key: SPARK-45462
> URL: https://issues.apache.org/jira/browse/SPARK-45462
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45413) Add warning for prepare drop LevelDB support

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45413.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43217
[https://github.com/apache/spark/pull/43217]

> Add warning for prepare drop LevelDB support
> 
>
> Key: SPARK-45413
> URL: https://issues.apache.org/jira/browse/SPARK-45413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add warning for prepare drop LevelDB support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45413) Add warning for prepare drop LevelDB support

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45413:
-

Assignee: Jia Fan

> Add warning for prepare drop LevelDB support
> 
>
> Key: SPARK-45413
> URL: https://issues.apache.org/jira/browse/SPARK-45413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Add warning for prepare drop LevelDB support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45454) Set the table's default owner to current_user

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45454.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43264
[https://github.com/apache/spark/pull/43264]

> Set the table's default owner to current_user
> -
>
> Key: SPARK-45454
> URL: https://issues.apache.org/jira/browse/SPARK-45454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45463) Allow ShuffleDriverComponent to support reliable store with specified executorId

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45463:
---
Labels: pull-request-available  (was: )

> Allow ShuffleDriverComponent to support reliable store with specified 
> executorId
> 
>
> Key: SPARK-45463
> URL: https://issues.apache.org/jira/browse/SPARK-45463
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: zhoubin
>Priority: Major
>  Labels: pull-request-available
>
> After SPARK-42689, ShuffleDriverComponent.supportsReliableStorage is 
> determined globally.
> Downstream projects may have different shuffle policy(cause by cluster loads 
> or columnar support) for different stages, for example or Apache Uniffle with 
> Gluten or Apache Celeborn.
> In this situation, ShuffleDriverComponent should use the mapTrackerMaster to 
> decide weather support reliable storage by the specified executorId



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45463) Allow ShuffleDriverComponent to support reliable store with specified executorId

2023-10-08 Thread zhoubin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoubin updated SPARK-45463:

Summary: Allow ShuffleDriverComponent to support reliable store with 
specified executorId  (was: Allow ShuffleDriverComponent to decide weather 
shuffle data is reliably stored when different stage has different policy)

> Allow ShuffleDriverComponent to support reliable store with specified 
> executorId
> 
>
> Key: SPARK-45463
> URL: https://issues.apache.org/jira/browse/SPARK-45463
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: zhoubin
>Priority: Major
>
> After SPARK-42689, ShuffleDriverComponent.supportsReliableStorage is 
> determined globally.
> Downstream projects may have different shuffle policy(cause by cluster loads 
> or columnar support) for different stages, for example or Apache Uniffle with 
> Gluten or Apache Celeborn.
> In this situation, ShuffleDriverComponent should use the mapTrackerMaster to 
> decide weather support reliable storage by the specified executorId



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45463) Allow ShuffleDriverComponent to decide weather shuffle data is reliably stored when different stage has different policy

2023-10-08 Thread zhoubin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoubin updated SPARK-45463:

Description: 
After SPARK-42689, ShuffleDriverComponent.supportsReliableStorage is determined 
globally.

Downstream projects may have different shuffle policy(cause by cluster loads or 
columnar support) for different stages, for example or Apache Uniffle with 
Gluten or Apache Celeborn.

In this situation, ShuffleDriverComponent should use the mapTrackerMaster to 
decide weather support reliable storage by the specified executorId

  was:
After SPARK-42689, ShuffleDriverComponent.supportsReliableStorage is determined 
globally.

Downstream projects may have different shuffle policy to adapt to cluster loads 
for different stages, for example or Apache Uniffle with Gluten or Apache 
Celeborn. In this situation, ShuffleDriverComponent should use the 
mapTrackerMaster to decide weather support reliable storage by the specified 
executorId


> Allow ShuffleDriverComponent to decide weather shuffle data is reliably 
> stored when different stage has different policy
> 
>
> Key: SPARK-45463
> URL: https://issues.apache.org/jira/browse/SPARK-45463
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: zhoubin
>Priority: Major
>
> After SPARK-42689, ShuffleDriverComponent.supportsReliableStorage is 
> determined globally.
> Downstream projects may have different shuffle policy(cause by cluster loads 
> or columnar support) for different stages, for example or Apache Uniffle with 
> Gluten or Apache Celeborn.
> In this situation, ShuffleDriverComponent should use the mapTrackerMaster to 
> decide weather support reliable storage by the specified executorId



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45463) Allow ShuffleDriverComponent to decide weather shuffle data is reliably stored when different stage has different policy

2023-10-08 Thread zhoubin (Jira)

zhoubin created SPARK-45463:
---

 Summary: Allow ShuffleDriverComponent to decide weather shuffle 
data is reliably stored when different stage has different policy
 Key: SPARK-45463
 URL: https://issues.apache.org/jira/browse/SPARK-45463
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0, 3.5.1
Reporter: zhoubin


After SPARK-42689, ShuffleDriverComponent.supportsReliableStorage is determined 
globally.

Downstream projects may have different shuffle policy to adapt to cluster loads 
for different stages, for example or Apache Uniffle with Gluten or Apache 
Celeborn. In this situation, ShuffleDriverComponent should use the 
mapTrackerMaster to decide weather support reliable storage by the specified 
executorId



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45462) Show `Duration` in `ApplicationPage`

2023-10-08 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45462:
--
Summary: Show `Duration` in `ApplicationPage`  (was: Show `Duration` in 
ApplicationPage)

> Show `Duration` in `ApplicationPage`
> 
>
> Key: SPARK-45462
> URL: https://issues.apache.org/jira/browse/SPARK-45462
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45462) Show `Duration` in `ApplicationPage`

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45462:
---
Labels: pull-request-available  (was: )

> Show `Duration` in `ApplicationPage`
> 
>
> Key: SPARK-45462
> URL: https://issues.apache.org/jira/browse/SPARK-45462
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45462) Show `Duration` in ApplicationPage

2023-10-08 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45462:
-

 Summary: Show `Duration` in ApplicationPage
 Key: SPARK-45462
 URL: https://issues.apache.org/jira/browse/SPARK-45462
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45352) Eliminate foldable window partitions

2023-10-08 Thread zhuml (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuml updated SPARK-45352:
--
Summary: Eliminate foldable window partitions  (was: Remove window 
partition if partition expression are foldable)

> Eliminate foldable window partitions
> 
>
> Key: SPARK-45352
> URL: https://issues.apache.org/jira/browse/SPARK-45352
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: zhuml
>Priority: Major
>  Labels: pull-request-available
>
> Foldable partition is redundant, remove it not only can simplify plan, but 
> some rules can also take effect when the partitions are all foldable, such as 
> `LimitPushDownThroughWindow{{{}`{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45461) Introduce a mapper for StorageLevel

2023-10-08 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng reassigned SPARK-45461:
--

Assignee: Jiaan Geng

> Introduce a mapper for StorageLevel
> ---
>
> Key: SPARK-45461
> URL: https://issues.apache.org/jira/browse/SPARK-45461
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, StorageLevel provides fromString to get the StorageLevel's 
> instance with StorageLevel's name. So developers or users have to copy the 
> string literal about StorageLevel's name to set or get instance of 
> StorageLevel. This issue lead to developers need to manually maintain its 
> consistency. It is easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45461) Introduce a mapper for StorageLevel

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45461:
---
Labels: pull-request-available  (was: )

> Introduce a mapper for StorageLevel
> ---
>
> Key: SPARK-45461
> URL: https://issues.apache.org/jira/browse/SPARK-45461
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, StorageLevel provides fromString to get the StorageLevel's 
> instance with StorageLevel's name. So developers or users have to copy the 
> string literal about StorageLevel's name to set or get instance of 
> StorageLevel. This issue lead to developers need to manually maintain its 
> consistency. It is easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45461) Introduce a mapper for StorageLevel

2023-10-08 Thread Jiaan Geng (Jira)

Jiaan Geng created SPARK-45461:
--

 Summary: Introduce a mapper for StorageLevel
 Key: SPARK-45461
 URL: https://issues.apache.org/jira/browse/SPARK-45461
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng


Currently, StorageLevel provides fromString to get the StorageLevel's instance 
with StorageLevel's name. So developers or users have to copy the string 
literal about StorageLevel's name to set or get instance of StorageLevel. This 
issue lead to developers need to manually maintain its consistency. It is easy 
to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45459) Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45459:
--

Assignee: Apache Spark

> Remove the last 2 extra spaces in the automatically generated 
> `sql-error-conditions.md` file
> 
>
> Key: SPARK-45459
> URL: https://issues.apache.org/jira/browse/SPARK-45459
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL, Tests
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45459) Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45459:
--

Assignee: (was: Apache Spark)

> Remove the last 2 extra spaces in the automatically generated 
> `sql-error-conditions.md` file
> 
>
> Key: SPARK-45459
> URL: https://issues.apache.org/jira/browse/SPARK-45459
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL, Tests
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45460) Replace `scala.collection.convert.ImplicitConversions` to `scala.jdk.CollectionConverters`

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45460:
---
Labels: pull-request-available  (was: )

> Replace `scala.collection.convert.ImplicitConversions` to 
> `scala.jdk.CollectionConverters`
> --
>
> Key: SPARK-45460
> URL: https://issues.apache.org/jira/browse/SPARK-45460
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45460) Replace `scala.collection.convert.ImplicitConversions` to `scala.jdk.CollectionConverters`

2023-10-08 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-45460:
---

 Summary: Replace `scala.collection.convert.ImplicitConversions` to 
`scala.jdk.CollectionConverters`
 Key: SPARK-45460
 URL: https://issues.apache.org/jira/browse/SPARK-45460
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45459) Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45459:
---
Labels: pull-request-available  (was: )

> Remove the last 2 extra spaces in the automatically generated 
> `sql-error-conditions.md` file
> 
>
> Key: SPARK-45459
> URL: https://issues.apache.org/jira/browse/SPARK-45459
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL, Tests
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39600) Enhance pushdown limit through window

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-39600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-39600:
---
Labels: pull-request-available  (was: )

> Enhance pushdown limit through window
> -
>
> Key: SPARK-39600
> URL: https://issues.apache.org/jira/browse/SPARK-39600
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>
> Improve TPC-DS q67 performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45459) Remove the last 2 extra spaces in the automatically generated `sql-error-conditions.md` file

2023-10-08 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-45459:
---

 Summary: Remove the last 2 extra spaces in the automatically 
generated `sql-error-conditions.md` file
 Key: SPARK-45459
 URL: https://issues.apache.org/jira/browse/SPARK-45459
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL, Tests
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45455) [SQL][JDBC] Improve the rename interface of Postgres Dialect

2023-10-08 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-45455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

蔡灿材 updated SPARK-45455:

Description: Improve the rename interface of pgdialect  (was: Improve the 
rename interface of pgdialect and mysqldialec

[MySQL :: MySQL 8.0 Reference Manual :: 13.1.36 RENAME TABLE 
Statement|https://dev.mysql.com/doc/refman/8.0/en/rename-table.html])

> [SQL][JDBC] Improve the rename interface of Postgres Dialect
> 
>
> Key: SPARK-45455
> URL: https://issues.apache.org/jira/browse/SPARK-45455
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: 蔡灿材
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Improve the rename interface of pgdialect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45458) Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions and add some UT

2023-10-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45458:
---
Labels: pull-request-available  (was: )

> Convert IllegalArgumentException to SparkIllegalArgumentException in 
> bitwiseExpressions and add some UT
> ---
>
> Key: SPARK-45458
> URL: https://issues.apache.org/jira/browse/SPARK-45458
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45458) Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions and add some UT

2023-10-08 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-45458:
---

 Summary: Convert IllegalArgumentException to 
SparkIllegalArgumentException in bitwiseExpressions and add some UT
 Key: SPARK-45458
 URL: https://issues.apache.org/jira/browse/SPARK-45458
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

61 matches

Mail list logo