[jira] [Updated] (SPARK-45621) Add feature to evaluate subquery before push down filter Optimizer rule

2023-10-20 Thread Maytas Monsereenusorn (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maytas Monsereenusorn updated SPARK-45621:
--
Summary: Add feature to evaluate subquery before push down filter Optimizer 
rule  (was: Add feature to evaluate subquery before Optimizer rule to push down 
filter)

> Add feature to evaluate subquery before push down filter Optimizer rule
> ---
>
> Key: SPARK-45621
> URL: https://issues.apache.org/jira/browse/SPARK-45621
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Maytas Monsereenusorn
>Priority: Major
>
> Some queries can benefit from having it's scalar subquery in the filter 
> evaluated while planning so that the scalar result (from the subquery) can be 
> push down. 
> This adds a new feature(which is disabled by default to maintain current 
> behavior) that would evaluate scalar subqueries in the Optimizer before rule 
> to push down filter. 
> For example, a query like 
> {code:java}
> select * from t2 where b > (select max(a) from t1) {code}
> where t1 is a small table but t2 is a very large table can benefit if we 
> first evaluate the subquery then push down the result to the pushed filter 
> (instead of having the subquery in the post scan filter)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45621) Add feature to evaluate subquery before Optimizer rule to push down filter

2023-10-20 Thread Maytas Monsereenusorn (Jira)
Maytas Monsereenusorn created SPARK-45621:
-

 Summary: Add feature to evaluate subquery before Optimizer rule to 
push down filter
 Key: SPARK-45621
 URL: https://issues.apache.org/jira/browse/SPARK-45621
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.2
Reporter: Maytas Monsereenusorn


Some queries can benefit from having it's scalar subquery in the filter 
evaluated while planning so that the scalar result (from the subquery) can be 
push down. 

This adds a new feature(which is disabled by default to maintain current 
behavior) that would evaluate scalar subqueries in the Optimizer before rule to 
push down filter. 

For example, a query like 
{code:java}
select * from t2 where b > (select max(a) from t1) {code}
where t1 is a small table but t2 is a very large table can benefit if we first 
evaluate the subquery then push down the result to the pushed filter (instead 
of having the subquery in the post scan filter)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43778) RewriteCorrelatedScalarSubquery should handle duplicate attributes

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43778:
---
Labels: pull-request-available  (was: )

> RewriteCorrelatedScalarSubquery should handle duplicate attributes
> --
>
> Key: SPARK-43778
> URL: https://issues.apache.org/jira/browse/SPARK-43778
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Andrey Gubichev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This is a correctness problem caused by the fact that the decorrelation rule 
> does not dedup join attributes properly. This leads to the join on (c1 = c1), 
> which is simplified to True and the join becomes a cross product.
>  
> Example query:
>  
> {code:java}
> create view t(c1, c2) as values (0, 1), (0, 2), (1, 2)
> select c1, c2, (select count(*) cnt from t t2 where t1.c1 = t2.c1 having cnt 
> = 0) from t t1
> -- Correct answer: [(0, 1, null), (0, 2, null), (1, 2, null)]
> +---+---+--+
> |c1 |c2 |scalarsubquery(c1)|
> +---+---+--+
> |0  |1  |null  |
> |0  |1  |null  |
> |0  |2  |null  |
> |0  |2  |null  |
> |1  |2  |null  |
> |1  |2  |null  |
> +---+---+--+ {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45620) Fix user-facing APIs related to Python UDTF to use camelCase.

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45620:
---
Labels: pull-request-available  (was: )

> Fix user-facing APIs related to Python UDTF to use camelCase.
> -
>
> Key: SPARK-45620
> URL: https://issues.apache.org/jira/browse/SPARK-45620
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45620) Fix user-facing APIs related to Python UDTF to use camelCase.

2023-10-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45620:
-

 Summary: Fix user-facing APIs related to Python UDTF to use 
camelCase.
 Key: SPARK-45620
 URL: https://issues.apache.org/jira/browse/SPARK-45620
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45523) Return useful error message if UDTF returns None for non-nullable column

2023-10-20 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45523.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 43356
https://github.com/apache/spark/pull/43356

> Return useful error message if UDTF returns None for non-nullable column
> 
>
> Key: SPARK-45523
> URL: https://issues.apache.org/jira/browse/SPARK-45523
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45619) Apply the observed metrics to Observation object.

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45619:
---
Labels: pull-request-available  (was: )

> Apply the observed metrics to Observation object.
> -
>
> Key: SPARK-45619
> URL: https://issues.apache.org/jira/browse/SPARK-45619
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45619) Apply the observed metrics to Observation object.

2023-10-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45619:
-

 Summary: Apply the observed metrics to Observation object.
 Key: SPARK-45619
 URL: https://issues.apache.org/jira/browse/SPARK-45619
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45617) Upgrade Apache Commons Crypto 1.2.0

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45617:
---
Labels: pull-request-available  (was: )

> Upgrade Apache Commons Crypto 1.2.0
> ---
>
> Key: SPARK-45617
> URL: https://issues.apache.org/jira/browse/SPARK-45617
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Priority: Minor
>  Labels: pull-request-available
>
> Currently used 1.1.0 is more than 3 years ago (2020-08-28 released). We 
> should upgrade the library to latest 1.2.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45618) Remove BaseErrorHandler

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45618:
---
Labels: pull-request-available  (was: )

> Remove BaseErrorHandler
> ---
>
> Key: SPARK-45618
> URL: https://issues.apache.org/jira/browse/SPARK-45618
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Priority: Minor
>  Labels: pull-request-available
>
> We can remove a workaround trait BaseErrorHandler which was added long time 
> ago (SPARK-25535) for CRYPTO-141 which was fixed 5 years ago.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45618) Remove BaseErrorHandler

2023-10-20 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-45618:
---

 Summary: Remove BaseErrorHandler
 Key: SPARK-45618
 URL: https://issues.apache.org/jira/browse/SPARK-45618
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: L. C. Hsieh


We can remove a workaround trait BaseErrorHandler which was added long time ago 
(SPARK-25535) for CRYPTO-141 which was fixed 5 years ago.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45617) Upgrade Apache Commons Crypto 1.2.0

2023-10-20 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-45617:
---

 Summary: Upgrade Apache Commons Crypto 1.2.0
 Key: SPARK-45617
 URL: https://issues.apache.org/jira/browse/SPARK-45617
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: L. C. Hsieh


Currently used 1.1.0 is more than 3 years ago (2020-08-28 released). We should 
upgrade the library to latest 1.2.0.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45023) SPIP: Python Stored Procedures

2023-10-20 Thread Allison Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1954#comment-1954
 ] 

Allison Wang commented on SPARK-45023:
--

[~abhinavofficial] this proposal is on hold, given the feedback received from 
the SPIP.

> SPIP: Python Stored Procedures
> --
>
> Key: SPARK-45023
> URL: https://issues.apache.org/jira/browse/SPARK-45023
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Stored procedures are an extension of the ANSI SQL standard. They play a 
> crucial role in improving the capabilities of SQL by encapsulating complex 
> logic into reusable routines. 
> This proposal aims to extend Spark SQL by introducing support for stored 
> procedures, starting with Python as the procedural language. This addition 
> will allow users to execute procedural programs, leveraging programming 
> constructs of Python to perform tasks with complex logic. Additionally, users 
> can persist these procedural routines in catalogs such as HMS for future 
> reuse. By providing this functionality, we intend to seamlessly empower Spark 
> users to integrate with Python routines within their SQL workflows.
> {*}SPIP{*}: 
> [https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45023) SPIP: Python Stored Procedures

2023-10-20 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang resolved SPARK-45023.
--
Resolution: Won't Do

> SPIP: Python Stored Procedures
> --
>
> Key: SPARK-45023
> URL: https://issues.apache.org/jira/browse/SPARK-45023
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Stored procedures are an extension of the ANSI SQL standard. They play a 
> crucial role in improving the capabilities of SQL by encapsulating complex 
> logic into reusable routines. 
> This proposal aims to extend Spark SQL by introducing support for stored 
> procedures, starting with Python as the procedural language. This addition 
> will allow users to execute procedural programs, leveraging programming 
> constructs of Python to perform tasks with complex logic. Additionally, users 
> can persist these procedural routines in catalogs such as HMS for future 
> reuse. By providing this functionality, we intend to seamlessly empower Spark 
> users to integrate with Python routines within their SQL workflows.
> {*}SPIP{*}: 
> [https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45616) Usages of ParVector are unsafe because it does not propagate ThreadLocals or SparkSession

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45616:
---
Labels: pull-request-available  (was: )

> Usages of ParVector are unsafe because it does not propagate ThreadLocals or 
> SparkSession
> -
>
> Key: SPARK-45616
> URL: https://issues.apache.org/jira/browse/SPARK-45616
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL, Tests
>Affects Versions: 3.5.0
>Reporter: Ankur Dave
>Assignee: Ankur Dave
>Priority: Minor
>  Labels: pull-request-available
>
> CastSuiteBase and ExpressionInfoSuite use ParVector.foreach() to run Spark 
> SQL queries in parallel. They incorrectly assume that each parallel operation 
> will inherit the main thread’s active SparkSession. This is only true when 
> these parallel operations run in freshly-created threads. However, when other 
> code has already run some parallel operations before Spark was started, then 
> there may be existing threads that do not have an active SparkSession. In 
> that case, these tests fail with NullPointerExceptions when creating 
> SparkPlans or running SQL queries.
> The fix is to use the existing method ThreadUtils.parmap(). This method 
> creates fresh threads that inherit the current active SparkSession, and it 
> propagates the Spark ThreadLocals.
> We should also add a scalastyle warning against use of ParVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45616) Usages of ParVector are unsafe because it does not propagate ThreadLocals or SparkSession

2023-10-20 Thread Ankur Dave (Jira)
Ankur Dave created SPARK-45616:
--

 Summary: Usages of ParVector are unsafe because it does not 
propagate ThreadLocals or SparkSession
 Key: SPARK-45616
 URL: https://issues.apache.org/jira/browse/SPARK-45616
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL, Tests
Affects Versions: 3.5.0
Reporter: Ankur Dave
Assignee: Ankur Dave


CastSuiteBase and ExpressionInfoSuite use ParVector.foreach() to run Spark SQL 
queries in parallel. They incorrectly assume that each parallel operation will 
inherit the main thread’s active SparkSession. This is only true when these 
parallel operations run in freshly-created threads. However, when other code 
has already run some parallel operations before Spark was started, then there 
may be existing threads that do not have an active SparkSession. In that case, 
these tests fail with NullPointerExceptions when creating SparkPlans or running 
SQL queries.

The fix is to use the existing method ThreadUtils.parmap(). This method creates 
fresh threads that inherit the current active SparkSession, and it propagates 
the Spark ThreadLocals.

We should also add a scalastyle warning against use of ParVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30848) Remove manual backport of Murmur3 MurmurHash3.productHash fix from Scala 2.13

2023-10-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-30848.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43161
[https://github.com/apache/spark/pull/43161]

> Remove manual backport of Murmur3 MurmurHash3.productHash fix from Scala 2.13
> -
>
> Key: SPARK-30848
> URL: https://issues.apache.org/jira/browse/SPARK-30848
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-30847 introduced a manual backport to work around a Scala issue in hash 
> implementation. Once we drop Scala 2.12, we can remove the fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30848) Remove manual backport of Murmur3 MurmurHash3.productHash fix from Scala 2.13

2023-10-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-30848:


Assignee: BingKun Pan

> Remove manual backport of Murmur3 MurmurHash3.productHash fix from Scala 2.13
> -
>
> Key: SPARK-30848
> URL: https://issues.apache.org/jira/browse/SPARK-30848
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-30847 introduced a manual backport to work around a Scala issue in hash 
> implementation. Once we drop Scala 2.12, we can remove the fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30848) Remove manual backport of Murmur3 MurmurHash3.productHash fix from Scala 2.13

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-30848:
---
Labels: pull-request-available  (was: )

> Remove manual backport of Murmur3 MurmurHash3.productHash fix from Scala 2.13
> -
>
> Key: SPARK-30848
> URL: https://issues.apache.org/jira/browse/SPARK-30848
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> SPARK-30847 introduced a manual backport to work around a Scala issue in hash 
> implementation. Once we drop Scala 2.12, we can remove the fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45583) Spark SQL returning incorrect values for full outer join on keys with the same name.

2023-10-20 Thread Bruce Robbins (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins resolved SPARK-45583.
---
Resolution: Fixed

> Spark SQL returning incorrect values for full outer join on keys with the 
> same name.
> 
>
> Key: SPARK-45583
> URL: https://issues.apache.org/jira/browse/SPARK-45583
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Huw
>Priority: Major
> Fix For: 3.5.0
>
>
> {{The following query gives the wrong results.}}
>  
> {{WITH people as (}}
> {{  SELECT * FROM (VALUES }}
> {{    (1, 'Peter'), }}
> {{    (2, 'Homer'), }}
> {{    (3, 'Ned'),}}
> {{    (3, 'Jenny')}}
> {{  ) AS Idiots(id, FirstName)}}
> {{{}){}}}{{{}, location as ({}}}
> {{  SELECT * FROM (VALUES}}
> {{    (1, 'sample0'),}}
> {{    (1, 'sample1'),}}
> {{    (2, 'sample2')  }}
> {{  ) as Locations(id, address)}}
> {{{}){}}}{{{}SELECT{}}}
> {{  *}}
> {{FROM}}
> {{  people}}
> {{FULL OUTER JOIN}}
> {{  location}}
> {{ON}}
> {{  people.id = location.id}}
> {{We find the following table:}}
> ||id: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |null|Ned|null|null|
> |null|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|
> {{But clearly the first `id` column is wrong, the nulls should be 3.}}
> If we rename the id column in (only) the person table to pid we get the 
> correct results:
> ||pid: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |3|Ned|null|null|
> |3|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45602) Replace `s.c.MapOps.filterKeys` with `s.c.MapOps.view.filterKeys`

2023-10-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45602.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43445
[https://github.com/apache/spark/pull/43445]

> Replace `s.c.MapOps.filterKeys` with `s.c.MapOps.view.filterKeys`
> -
>
> Key: SPARK-45602
> URL: https://issues.apache.org/jira/browse/SPARK-45602
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core, SQL, YARN
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> /** Filters this map by retaining only keys satisfying a predicate.
>   *  @param  p   the predicate used to test keys
>   *  @return an immutable map consisting only of those key value pairs of 
> this map where the key satisfies
>   *  the predicate `p`. The resulting map wraps the original map 
> without copying any elements.
>   */
> @deprecated("Use .view.filterKeys(f). A future version will include a strict 
> version of this method (for now, .view.filterKeys(p).toMap).", "2.13.0")
> def filterKeys(p: K => Boolean): MapView[K, V] = new MapView.FilterKeys(this, 
> p) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45595) Expose SQLSTATE in error message

2023-10-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45595.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43438
[https://github.com/apache/spark/pull/43438]

> Expose SQLSTATE in error message
> 
>
> Key: SPARK-45595
> URL: https://issues.apache.org/jira/browse/SPARK-45595
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When using spark.sql.error.messageFormat in MINIMAL or STANDARD mode the 
> SQLSTATE is exposed;
> We want to extend this to PRETTY mode, now that all errors have SQLSTATEs
> We propose to trail the SQLSTATE after the text message, so it does not take 
> away from the reading experience of the message, while still being easily 
> found by tooling or humans.
> []  SQLSTATE: 
> 
> Example:
> {{[DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate divisor 
> being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error. SQLSTATE: 22013}}
> {{{}== SQL(line 1, position 8){}}}{{{}==
> {}}}{{{}SELECT 1/0
> {}}}{{   ^^^}}
> Other options considered have been:
> {{[DIVIDE_BY_ZERO](22013) ** Division by zero. Use `try_divide` to tolerate 
> divisor being 0 and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error. }}
> {{{}== SQL(line 1, position 8){}}}{{{}==
> {}}}{{{}SELECT 1/0
> {}}}{{   ^^^}}
> {{and}}
> [DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate 
> divisor being 0 and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error.}}
> {{{}== SQL(line 1, position 8){}}}{{{}=={}}}
> {{SELECT 1/0}}
> {{   ^^^}}
> SQLSTATE: 22013
> }}{{{}{{}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45595) Expose SQLSTATE in error message

2023-10-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-45595:
---

Assignee: Serge Rielau

> Expose SQLSTATE in error message
> 
>
> Key: SPARK-45595
> URL: https://issues.apache.org/jira/browse/SPARK-45595
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> When using spark.sql.error.messageFormat in MINIMAL or STANDARD mode the 
> SQLSTATE is exposed;
> We want to extend this to PRETTY mode, now that all errors have SQLSTATEs
> We propose to trail the SQLSTATE after the text message, so it does not take 
> away from the reading experience of the message, while still being easily 
> found by tooling or humans.
> []  SQLSTATE: 
> 
> Example:
> {{[DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate divisor 
> being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error. SQLSTATE: 22013}}
> {{{}== SQL(line 1, position 8){}}}{{{}==
> {}}}{{{}SELECT 1/0
> {}}}{{   ^^^}}
> Other options considered have been:
> {{[DIVIDE_BY_ZERO](22013) ** Division by zero. Use `try_divide` to tolerate 
> divisor being 0 and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error. }}
> {{{}== SQL(line 1, position 8){}}}{{{}==
> {}}}{{{}SELECT 1/0
> {}}}{{   ^^^}}
> {{and}}
> [DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate 
> divisor being 0 and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error.}}
> {{{}== SQL(line 1, position 8){}}}{{{}=={}}}
> {{SELECT 1/0}}
> {{   ^^^}}
> SQLSTATE: 22013
> }}{{{}{{}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45509) Investigate the behavior difference in self-join

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45509:
---
Labels: pull-request-available  (was: )

> Investigate the behavior difference in self-join
> 
>
> Key: SPARK-45509
> URL: https://issues.apache.org/jira/browse/SPARK-45509
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-45220 discovers a behavior difference for a self-join scenario between 
> classic Spark and Spark Connect.
> For instance, here is the query that works without Spark Connect: 
> {code:java}
> df = spark.createDataFrame([Row(name="Alice", age=2), Row(name="Bob", age=5)])
> df2 = spark.createDataFrame([Row(name="Tom", height=80), Row(name="Bob", 
> height=85)]){code}
> {code:java}
> joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
> joined.show(){code}
> But in Spark Connect, it throws this exception:
> {code:java}
> pyspark.errors.exceptions.connect.AnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
> with name `name` cannot be resolved. Did you mean one of the following? 
> [`name`, `name`, `age`, `height`].;
> 'Sort ['name DESC NULLS LAST], true
> +- Join FullOuter, (name#64 = name#78)
>:- LocalRelation [name#64, age#65L]
>+- LocalRelation [name#78, height#79L]
>  {code}
>  
> On the other hand, this query failed in classic Spark Connect:
> {code:java}
> df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
> {code:java}
> pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
> ambiguous... {code}
>  
> but this query works with Spark Connect.
> We need to investigate the behavior difference and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45610) Fix "Auto-application to `()` is deprecated."

2023-10-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45610:
-
Summary: Fix "Auto-application to `()` is deprecated."  (was: Handle 
"Auto-application to `()` is deprecated.")

> Fix "Auto-application to `()` is deprecated."
> -
>
> Key: SPARK-45610
> URL: https://issues.apache.org/jira/browse/SPARK-45610
> Project: Spark
>  Issue Type: Sub-task
>  Components: GraphX, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> For the following case, a compile warning will be issued in Scala 2.13:
>  
> {code:java}
> Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.8).
> Type in expressions for evaluation. Or try :help.
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> class Foo
> scala> val foo = new Foo
> val foo: Foo = Foo@7061622
> scala> val ret = foo.isEmpty
>                      ^
>        warning: Auto-application to `()` is deprecated. Supply the empty 
> argument list `()` explicitly to invoke method isEmpty,
>        or remove the empty argument list from its definition (Java-defined 
> methods are exempt).
>        In Scala 3, an unapplied method like this will be eta-expanded into a 
> function. [quickfixable]
> val ret: Boolean = true {code}
> But for Scala 3, it is a compile error:
> {code:java}
> Welcome to Scala 3.3.1 (17.0.8, Java OpenJDK 64-Bit Server VM).
> Type in expressions for evaluation. Or try :help.
>                                                                               
>                                                                               
>                                                                               
>            
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> // defined class Foo
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val foo = new Foo
> val foo: Foo = Foo@591f6f83
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val ret = foo.isEmpty
> -- [E100] Syntax Error: 
> 
> 1 |val ret = foo.isEmpty
>   |          ^^^
>   |          method isEmpty in class Foo must be called with () argument
>   |
>   | longer explanation available when compiling with `-explain`
> 1 error found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45615) Remove redundant"Auto-application to `()` is deprecated" compile suppression rules.

2023-10-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-45615:


 Summary: Remove redundant"Auto-application to `()` is deprecated" 
compile suppression rules.
 Key: SPARK-45615
 URL: https://issues.apache.org/jira/browse/SPARK-45615
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie


Due to the issue https://github.com/scalatest/scalatest/issues/2297, we need to 
wait until we upgrade a scalatest version before removing these suppression 
rules.

Maybe 3.2.18



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39910) DataFrameReader API cannot read files from hadoop archives (.har)

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-39910:
---
Labels: DataFrameReader pull-request-available  (was: DataFrameReader)

> DataFrameReader API cannot read files from hadoop archives (.har)
> -
>
> Key: SPARK-39910
> URL: https://issues.apache.org/jira/browse/SPARK-39910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.3, 3.3.0, 3.2.2
>Reporter: Christophe Préaud
>Priority: Minor
>  Labels: DataFrameReader, pull-request-available
>
> Reading a file from an hadoop archive using the DataFrameReader API returns 
> an empty Dataset:
> {code:java}
> scala> val df = 
> spark.read.textFile("har:///user/preaudc/logs/lead/jp/2022/202207.har/20220719")
> df: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> df.count
> res7: Long = 0 {code}
>  
> On the other hand, reading the same file, from the same hadoop archive, but 
> using the RDD API yields the correct result:
> {code:java}
> scala> val df = 
> sc.textFile("har:///user/preaudc/logs/lead/jp/2022/202207.har/20220719").toDF("value")
> df: org.apache.spark.sql.DataFrame = [value: string]
> scala> df.count
> res8: Long = 5589 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45592) AQE and InMemoryTableScanExec correctness bug

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45592:
--

Assignee: (was: Apache Spark)

> AQE and InMemoryTableScanExec correctness bug
> -
>
> Key: SPARK-45592
> URL: https://issues.apache.org/jira/browse/SPARK-45592
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>  Labels: pull-request-available
>
> The following query should return 100
> {code:java}
> import org.apache.spark.storage.StorageLevelval
> df = spark.range(0, 100, 1, 5).map(l => (l, l))
> val ee = df.select($"_1".as("src"), $"_2".as("dst"))
>   .persist(StorageLevel.MEMORY_AND_DISK)
> ee.count()
> val minNbrs1 = ee
>   .groupBy("src").agg(min(col("dst")).as("min_number"))
>   .persist(StorageLevel.MEMORY_AND_DISK)
> val join = ee.join(minNbrs1, "src")
> join.count(){code}
> but on spark 3.5.0 there is a correctness bug causing it to return `104800` 
> or some other smaller value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45592) AQE and InMemoryTableScanExec correctness bug

2023-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45592:
--

Assignee: Apache Spark

> AQE and InMemoryTableScanExec correctness bug
> -
>
> Key: SPARK-45592
> URL: https://issues.apache.org/jira/browse/SPARK-45592
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Emil Ejbyfeldt
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> The following query should return 100
> {code:java}
> import org.apache.spark.storage.StorageLevelval
> df = spark.range(0, 100, 1, 5).map(l => (l, l))
> val ee = df.select($"_1".as("src"), $"_2".as("dst"))
>   .persist(StorageLevel.MEMORY_AND_DISK)
> ee.count()
> val minNbrs1 = ee
>   .groupBy("src").agg(min(col("dst")).as("min_number"))
>   .persist(StorageLevel.MEMORY_AND_DISK)
> val join = ee.join(minNbrs1, "src")
> join.count(){code}
> but on spark 3.5.0 there is a correctness bug causing it to return `104800` 
> or some other smaller value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45609) Include SqlState in SparkThrowable proto message

2023-10-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45609.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43457
[https://github.com/apache/spark/pull/43457]

> Include SqlState in SparkThrowable proto message
> 
>
> Key: SPARK-45609
> URL: https://issues.apache.org/jira/browse/SPARK-45609
> Project: Spark
>  Issue Type: Test
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45609) Include SqlState in SparkThrowable proto message

2023-10-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45609:


Assignee: Yihong He

> Include SqlState in SparkThrowable proto message
> 
>
> Key: SPARK-45609
> URL: https://issues.apache.org/jira/browse/SPARK-45609
> Project: Spark
>  Issue Type: Test
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43851) Support LCA in grouping expressions

2023-10-20 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1591#comment-1591
 ] 

Yuming Wang commented on SPARK-43851:
-

The resolution should be unresolved.

> Support LCA in grouping expressions
> ---
>
> Key: SPARK-43851
> URL: https://issues.apache.org/jira/browse/SPARK-43851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> Teradata supports it:
> {code:sql}
> create table t1(a int) using  parquet;
> select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2;
> {code}
> {noformat}
> [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not 
> supported: Referencing a lateral column alias via GROUP BY alias/ALL is not 
> supported yet.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-43851) Support LCA in grouping expressions

2023-10-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reopened SPARK-43851:
-
  Assignee: (was: Jia Fan)

> Support LCA in grouping expressions
> ---
>
> Key: SPARK-43851
> URL: https://issues.apache.org/jira/browse/SPARK-43851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Teradata supports it:
> {code:sql}
> create table t1(a int) using  parquet;
> select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2;
> {code}
> {noformat}
> [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not 
> supported: Referencing a lateral column alias via GROUP BY alias/ALL is not 
> supported yet.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43851) Support LCA in grouping expressions

2023-10-20 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43851:

Fix Version/s: (was: 3.5.0)

> Support LCA in grouping expressions
> ---
>
> Key: SPARK-43851
> URL: https://issues.apache.org/jira/browse/SPARK-43851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> Teradata supports it:
> {code:sql}
> create table t1(a int) using  parquet;
> select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2;
> {code}
> {noformat}
> [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not 
> supported: Referencing a lateral column alias via GROUP BY alias/ALL is not 
> supported yet.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45613) Expose DeterministicLevel as a DeveloperApi

2023-10-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45613.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43461
[https://github.com/apache/spark/pull/43461]

> Expose DeterministicLevel as a DeveloperApi
> ---
>
> Key: SPARK-45613
> URL: https://issues.apache.org/jira/browse/SPARK-45613
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: Mridul Muralidharan
>Assignee: Mridul Muralidharan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {{RDD.getOutputDeterministicLevel}} is a {{DeveloperApi}} which users can 
> override to specify the {{DeterministicLevel}} of the {{RDD}}.
> Unfortunately, {{DeterministicLevel}} itself is {{private[spark]}}.
> Expose {{DeterministicLevel}} to allow users to users this method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45613) Expose DeterministicLevel as a DeveloperApi

2023-10-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45613:


Assignee: Mridul Muralidharan

> Expose DeterministicLevel as a DeveloperApi
> ---
>
> Key: SPARK-45613
> URL: https://issues.apache.org/jira/browse/SPARK-45613
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: Mridul Muralidharan
>Assignee: Mridul Muralidharan
>Priority: Major
>  Labels: pull-request-available
>
> {{RDD.getOutputDeterministicLevel}} is a {{DeveloperApi}} which users can 
> override to specify the {{DeterministicLevel}} of the {{RDD}}.
> Unfortunately, {{DeterministicLevel}} itself is {{private[spark]}}.
> Expose {{DeterministicLevel}} to allow users to users this method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org