[jira] [Updated] (SPARK-46303) Remove unused code in `pyspark.pandas.tests.series.* `

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46303:
---
Labels: pull-request-available  (was: )

> Remove unused code in `pyspark.pandas.tests.series.* `
> --
>
> Key: SPARK-46303
> URL: https://issues.apache.org/jira/browse/SPARK-46303
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46303) Remove unused code in `pyspark.pandas.tests.series.* `

2023-12-06 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46303:
-

 Summary: Remove unused code in `pyspark.pandas.tests.series.* `
 Key: SPARK-46303
 URL: https://issues.apache.org/jira/browse/SPARK-46303
 Project: Spark
  Issue Type: Test
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46302) Fix maven daily testing

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46302:
---
Labels: pull-request-available  (was: )

> Fix maven daily testing
> ---
>
> Key: SPARK-46302
> URL: https://issues.apache.org/jira/browse/SPARK-46302
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46302) Fix maven daily testing

2023-12-06 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-46302:
---

 Summary: Fix maven daily testing
 Key: SPARK-46302
 URL: https://issues.apache.org/jira/browse/SPARK-46302
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46300) Test missing test coverage for Column (pyspark.sql.column)

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46300.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44228
[https://github.com/apache/spark/pull/44228]

> Test missing test coverage for Column (pyspark.sql.column)
> --
>
> Key: SPARK-46300
> URL: https://issues.apache.org/jira/browse/SPARK-46300
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/column.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46300) Test missing test coverage for Column (pyspark.sql.column)

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46300:


Assignee: Hyukjin Kwon

> Test missing test coverage for Column (pyspark.sql.column)
> --
>
> Key: SPARK-46300
> URL: https://issues.apache.org/jira/browse/SPARK-46300
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/column.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46298) Test catalog error classes (pyspark.sql.catalog)

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46298:


Assignee: Hyukjin Kwon

> Test catalog error classes (pyspark.sql.catalog)
> 
>
> Key: SPARK-46298
> URL: https://issues.apache.org/jira/browse/SPARK-46298
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See 
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/catalog.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46298) Test catalog error classes (pyspark.sql.catalog)

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46298.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44226
[https://github.com/apache/spark/pull/44226]

> Test catalog error classes (pyspark.sql.catalog)
> 
>
> Key: SPARK-46298
> URL: https://issues.apache.org/jira/browse/SPARK-46298
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See 
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/catalog.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46058) [CORE] Add separate flag for privateKeyPassword

2023-12-06 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-46058:

Labels: pull-request-available  (was: pull-request-available releasenotes)

> [CORE] Add separate flag for privateKeyPassword
> ---
>
> Key: SPARK-46058
> URL: https://issues.apache.org/jira/browse/SPARK-46058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Right now with config inheritance we support:
>  * JKS with password A, PEM with password B
>  * JKS with no password, PEM with password A
>  * JKS and PEM with no password
>  
> But we do not support the case where JKS has a password and PEM does not. If 
> we set keyPassword we will attempt to use it, and cannot set 
> `spark.ssl.rpc.keyPassword` to null. So let's make it a separate flag as the 
> easiest workaround.
>  
> This was noticed while migrating some existing deployments to the RPC SSL 
> support where we use openssl support for RPC and use a key with no password



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46058) [CORE] Add separate flag for privateKeyPassword

2023-12-06 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-46058:

Labels: pull-request-available releasenotes  (was: pull-request-available)

> [CORE] Add separate flag for privateKeyPassword
> ---
>
> Key: SPARK-46058
> URL: https://issues.apache.org/jira/browse/SPARK-46058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available, releasenotes
> Fix For: 4.0.0
>
>
> Right now with config inheritance we support:
>  * JKS with password A, PEM with password B
>  * JKS with no password, PEM with password A
>  * JKS and PEM with no password
>  
> But we do not support the case where JKS has a password and PEM does not. If 
> we set keyPassword we will attempt to use it, and cannot set 
> `spark.ssl.rpc.keyPassword` to null. So let's make it a separate flag as the 
> easiest workaround.
>  
> This was noticed while migrating some existing deployments to the RPC SSL 
> support where we use openssl support for RPC and use a key with no password



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46301) Support `spark.worker.(initial|max)RegistrationRetries`

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46301:
-

Assignee: Dongjoon Hyun

> Support `spark.worker.(initial|max)RegistrationRetries`
> ---
>
> Key: SPARK-46301
> URL: https://issues.apache.org/jira/browse/SPARK-46301
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46296) Test captured errors (pyspark.errors.exceptions.captured)

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46296.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44224
[https://github.com/apache/spark/pull/44224]

> Test captured errors (pyspark.errors.exceptions.captured)
> -
>
> Key: SPARK-46296
> URL: https://issues.apache.org/jira/browse/SPARK-46296
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/errors/exceptions/captured.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46301) Support `spark.worker.(initial|max)RegistrationRetries`

2023-12-06 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46301:
-

 Summary: Support `spark.worker.(initial|max)RegistrationRetries`
 Key: SPARK-46301
 URL: https://issues.apache.org/jira/browse/SPARK-46301
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46301) Support `spark.worker.(initial|max)RegistrationRetries`

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46301:
---
Labels: pull-request-available  (was: )

> Support `spark.worker.(initial|max)RegistrationRetries`
> ---
>
> Key: SPARK-46301
> URL: https://issues.apache.org/jira/browse/SPARK-46301
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46296) Test captured errors (pyspark.errors.exceptions.captured)

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46296:


Assignee: Hyukjin Kwon

> Test captured errors (pyspark.errors.exceptions.captured)
> -
>
> Key: SPARK-46296
> URL: https://issues.apache.org/jira/browse/SPARK-46296
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/errors/exceptions/captured.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46300) Test missing test coverage for Column (pyspark.sql.column)

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46300:
---
Labels: pull-request-available  (was: )

> Test missing test coverage for Column (pyspark.sql.column)
> --
>
> Key: SPARK-46300
> URL: https://issues.apache.org/jira/browse/SPARK-46300
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/column.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46300) Test missing test coverage for Column (pyspark.sql.column)

2023-12-06 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46300:


 Summary: Test missing test coverage for Column (pyspark.sql.column)
 Key: SPARK-46300
 URL: https://issues.apache.org/jira/browse/SPARK-46300
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/column.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46299) Make `spark.deploy.recovery*` documentation up-to-date

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46299:
--
Summary: Make `spark.deploy.recovery*` documentation up-to-date  (was: Make 
`spark.deploy.recovery*` up-to-date)

> Make `spark.deploy.recovery*` documentation up-to-date
> --
>
> Key: SPARK-46299
> URL: https://issues.apache.org/jira/browse/SPARK-46299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45580:
--
Fix Version/s: 3.3.4

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.3.4, 3.4.3
>
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45580:
--
Fix Version/s: 3.4.3

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0, 3.5.1, 3.4.3
>
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46299) Make `spark.deploy.recovery*` up-to-date

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46299.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44227
[https://github.com/apache/spark/pull/44227]

> Make `spark.deploy.recovery*` up-to-date
> 
>
> Key: SPARK-46299
> URL: https://issues.apache.org/jira/browse/SPARK-46299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46299) Make `spark.deploy.recovery*` up-to-date

2023-12-06 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46299:
-

 Summary: Make `spark.deploy.recovery*` up-to-date
 Key: SPARK-46299
 URL: https://issues.apache.org/jira/browse/SPARK-46299
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46299) Make `spark.deploy.recovery*` up-to-date

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46299:
---
Labels: pull-request-available  (was: )

> Make `spark.deploy.recovery*` up-to-date
> 
>
> Key: SPARK-46299
> URL: https://issues.apache.org/jira/browse/SPARK-46299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46298) Test catalog error classes (pyspark.sql.catalog)

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46298:
---
Labels: pull-request-available  (was: )

> Test catalog error classes (pyspark.sql.catalog)
> 
>
> Key: SPARK-46298
> URL: https://issues.apache.org/jira/browse/SPARK-46298
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See 
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/catalog.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46297) Exclude generated files from the code coverage report

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46297:


Assignee: Hyukjin Kwon

> Exclude generated files from the code coverage report
> -
>
> Key: SPARK-46297
> URL: https://issues.apache.org/jira/browse/SPARK-46297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We should exclude 
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/tree/python/pyspark/sql/connect/proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46297) Exclude generated files from the code coverage report

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46297.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44225
[https://github.com/apache/spark/pull/44225]

> Exclude generated files from the code coverage report
> -
>
> Key: SPARK-46297
> URL: https://issues.apache.org/jira/browse/SPARK-46297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should exclude 
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/tree/python/pyspark/sql/connect/proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46298) Test catalog error classes (pyspark.sql.catalog)

2023-12-06 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46298:


 Summary: Test catalog error classes (pyspark.sql.catalog)
 Key: SPARK-46298
 URL: https://issues.apache.org/jira/browse/SPARK-46298
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See 
https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/sql/catalog.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46296) Test captured errors (pyspark.errors.exceptions.captured)

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46296:
-
Summary: Test captured errors (pyspark.errors.exceptions.captured)  (was: 
Test captured errors of TestResult (pyspark.errors.exceptions.captured))

> Test captured errors (pyspark.errors.exceptions.captured)
> -
>
> Key: SPARK-46296
> URL: https://issues.apache.org/jira/browse/SPARK-46296
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/errors/exceptions/captured.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46297) Exclude generated files from the code coverage report

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46297:
---
Labels: pull-request-available  (was: )

> Exclude generated files from the code coverage report
> -
>
> Key: SPARK-46297
> URL: https://issues.apache.org/jira/browse/SPARK-46297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We should exclude 
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/tree/python/pyspark/sql/connect/proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46297) Exclude generated files from the code coverage report

2023-12-06 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46297:


 Summary: Exclude generated files from the code coverage report
 Key: SPARK-46297
 URL: https://issues.apache.org/jira/browse/SPARK-46297
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We should exclude 
https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/tree/python/pyspark/sql/connect/proto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46296) Test captured errors of TestResult (pyspark.errors.exceptions.captured)

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46296:
---
Labels: pull-request-available  (was: )

> Test captured errors of TestResult (pyspark.errors.exceptions.captured)
> ---
>
> Key: SPARK-46296
> URL: https://issues.apache.org/jira/browse/SPARK-46296
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/errors/exceptions/captured.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46296) Test captured errors of TestResult (pyspark.errors.exceptions.captured)

2023-12-06 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46296:


 Summary: Test captured errors of TestResult 
(pyspark.errors.exceptions.captured)
 Key: SPARK-46296
 URL: https://issues.apache.org/jira/browse/SPARK-46296
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://app.codecov.io/gh/apache/spark/commit/1a651753f4e760643d719add3b16acd311454c76/blob/python/pyspark/errors/exceptions/captured.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46058) [CORE] Add separate flag for privateKeyPassword

2023-12-06 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-46058.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43998
[https://github.com/apache/spark/pull/43998]

> [CORE] Add separate flag for privateKeyPassword
> ---
>
> Key: SPARK-46058
> URL: https://issues.apache.org/jira/browse/SPARK-46058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Right now with config inheritance we support:
>  * JKS with password A, PEM with password B
>  * JKS with no password, PEM with password A
>  * JKS and PEM with no password
>  
> But we do not support the case where JKS has a password and PEM does not. If 
> we set keyPassword we will attempt to use it, and cannot set 
> `spark.ssl.rpc.keyPassword` to null. So let's make it a separate flag as the 
> easiest workaround.
>  
> This was noticed while migrating some existing deployments to the RPC SSL 
> support where we use openssl support for RPC and use a key with no password



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46058) [CORE] Add separate flag for privateKeyPassword

2023-12-06 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-46058:
---

Assignee: Hasnain Lakhani

> [CORE] Add separate flag for privateKeyPassword
> ---
>
> Key: SPARK-46058
> URL: https://issues.apache.org/jira/browse/SPARK-46058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>
> Right now with config inheritance we support:
>  * JKS with password A, PEM with password B
>  * JKS with no password, PEM with password A
>  * JKS and PEM with no password
>  
> But we do not support the case where JKS has a password and PEM does not. If 
> we set keyPassword we will attempt to use it, and cannot set 
> `spark.ssl.rpc.keyPassword` to null. So let's make it a separate flag as the 
> easiest workaround.
>  
> This was noticed while migrating some existing deployments to the RPC SSL 
> support where we use openssl support for RPC and use a key with no password



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46292) Show a summary of workers in MasterPage

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46292:
-

Assignee: Dongjoon Hyun

> Show a summary of workers in MasterPage
> ---
>
> Key: SPARK-46292
> URL: https://issues.apache.org/jira/browse/SPARK-46292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46292) Show a summary of workers in MasterPage

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46292.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44218
[https://github.com/apache/spark/pull/44218]

> Show a summary of workers in MasterPage
> ---
>
> Key: SPARK-46292
> URL: https://issues.apache.org/jira/browse/SPARK-46292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46290) Change saveMode to overwrite for DataSourceWriter constructor

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46290:


Assignee: Allison Wang

> Change saveMode to overwrite for DataSourceWriter constructor
> -
>
> Key: SPARK-46290
> URL: https://issues.apache.org/jira/browse/SPARK-46290
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46290) Change saveMode to overwrite for DataSourceWriter constructor

2023-12-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46290.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44216
[https://github.com/apache/spark/pull/44216]

> Change saveMode to overwrite for DataSourceWriter constructor
> -
>
> Key: SPARK-46290
> URL: https://issues.apache.org/jira/browse/SPARK-46290
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46295) TPCDS q39a and a39b have correctness issues with broadcast hash join and shuffled hash join

2023-12-06 Thread Kazuyuki Tanimura (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated SPARK-46295:
--
Affects Version/s: 3.4.2
   (was: 3.4.1)

> TPCDS q39a and a39b have correctness issues with broadcast hash join and 
> shuffled hash join
> ---
>
> Key: SPARK-46295
> URL: https://issues.apache.org/jira/browse/SPARK-46295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Kazuyuki Tanimura
>Priority: Major
>  Labels: correctness
>
> {{TPCDSQueryTestSuite}} fails for q39a and a39b with 
> {{broadcastHashJoinConf}} and {{shuffledHashJoinConf}}. It works fine with 
> {{sortMergeJoinConf}}
> {code}SPARK_TPCDS_DATA= build/sbt "~sql/testOnly 
> *TPCDSQueryTestSuite -- -z q39a"{code}
> {code}
> [info] - q39a *** FAILED *** (19 seconds, 139 milliseconds)
> [info]   java.lang.Exception: Expected "...25 1.022382911080458[8 ..." but 
> got "...25 1.022382911080458[5 ..."
> {code}
> {code}SPARK_TPCDS_DATA= build/sbt "~sql/testOnly 
> *TPCDSQueryTestSuite -- -z q39b"{code}
> {code}
> [info] - q39b *** FAILED *** (19 seconds, 351 milliseconds)
> [info]   java.lang.Exception: Expected "...34 1.563403519178623[3 3   
> 10427   2   381.25  1.0623056061004696
> [info] 3  33151   271.75  1.555976998814345   3   3315
> 2   393.75  1.0196319345405949
> [info] 3  33931   260.0   1.5009563026568116  3   3393
> 2   470.25  1.129275872154205
> [info] 4  16211   1   257.7   1.6381074811154002] 
> 4   16211   2   352.25  1", but got "...34  1.563403519178623[5   
>   3   10427   2   381.25  1.0623056061004696
> [info] 3  33151   271.75  1.555976998814345   3   3315
> 2   393.75  1.0196319345405949
> [info] 3  33931   260.0   1.5009563026568118  3   3393
> 2   470.25  1.129275872154205
> [info] 4  16211   1   257.7   1.6381074811154]
> 4   16211   2   352.25  1" Result did not match
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46295) TPCDS q39a and a39b have correctness issues with broadcast hash join and shuffled hash join

2023-12-06 Thread Kazuyuki Tanimura (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated SPARK-46295:
--
Labels: correctness  (was: )

> TPCDS q39a and a39b have correctness issues with broadcast hash join and 
> shuffled hash join
> ---
>
> Key: SPARK-46295
> URL: https://issues.apache.org/jira/browse/SPARK-46295
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Kazuyuki Tanimura
>Priority: Major
>  Labels: correctness
>
> {{TPCDSQueryTestSuite}} fails for q39a and a39b with 
> {{broadcastHashJoinConf}} and {{shuffledHashJoinConf}}. It works fine with 
> {{sortMergeJoinConf}}
> {code}SPARK_TPCDS_DATA= build/sbt "~sql/testOnly 
> *TPCDSQueryTestSuite -- -z q39a"{code}
> {code}
> [info] - q39a *** FAILED *** (19 seconds, 139 milliseconds)
> [info]   java.lang.Exception: Expected "...25 1.022382911080458[8 ..." but 
> got "...25 1.022382911080458[5 ..."
> {code}
> {code}SPARK_TPCDS_DATA= build/sbt "~sql/testOnly 
> *TPCDSQueryTestSuite -- -z q39b"{code}
> {code}
> [info] - q39b *** FAILED *** (19 seconds, 351 milliseconds)
> [info]   java.lang.Exception: Expected "...34 1.563403519178623[3 3   
> 10427   2   381.25  1.0623056061004696
> [info] 3  33151   271.75  1.555976998814345   3   3315
> 2   393.75  1.0196319345405949
> [info] 3  33931   260.0   1.5009563026568116  3   3393
> 2   470.25  1.129275872154205
> [info] 4  16211   1   257.7   1.6381074811154002] 
> 4   16211   2   352.25  1", but got "...34  1.563403519178623[5   
>   3   10427   2   381.25  1.0623056061004696
> [info] 3  33151   271.75  1.555976998814345   3   3315
> 2   393.75  1.0196319345405949
> [info] 3  33931   260.0   1.5009563026568118  3   3393
> 2   470.25  1.129275872154205
> [info] 4  16211   1   257.7   1.6381074811154]
> 4   16211   2   352.25  1" Result did not match
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46294) Clean up initValue vs zeroValue semantics in SQLMetrics

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46294:
---
Labels: pull-request-available  (was: )

> Clean up initValue vs zeroValue semantics in SQLMetrics
> ---
>
> Key: SPARK-46294
> URL: https://issues.apache.org/jira/browse/SPARK-46294
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Davin Tjong
>Priority: Minor
>  Labels: pull-request-available
>
> The semantics of initValue and _zeroValue in SQLMetrics is a little bit 
> confusing, since they effectively mean the same thing. Changing it to the 
> following would be clearer, especially in terms of defining what an "invalid" 
> metric is.
>  
> proposed definitions:
>  
> initValue is the starting value for a SQLMetric. If a metric has value equal 
> to its initValue, then it should be filtered out before aggregating with 
> SQLMetrics.stringValue().
>  
> zeroValue defines the lowest value considered valid. If a SQLMetric is 
> invalid, it is set to zeroValue upon receiving any updates, and it also 
> reports zeroValue as its value to avoid exposing it to the user 
> programatically (concern previouosly addressed in SPARK-41442).
> For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that 
> the metric is by default invalid. At the end of a task, we will update the 
> metric making it valid, and the invalid metrics will be filtered out when 
> calculating min, max, etc. as a workaround for SPARK-11013.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46295) TPCDS q39a and a39b have correctness issues with broadcast hash join and shuffled hash join

2023-12-06 Thread Kazuyuki Tanimura (Jira)
Kazuyuki Tanimura created SPARK-46295:
-

 Summary: TPCDS q39a and a39b have correctness issues with 
broadcast hash join and shuffled hash join
 Key: SPARK-46295
 URL: https://issues.apache.org/jira/browse/SPARK-46295
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1, 4.0.0
Reporter: Kazuyuki Tanimura


{{TPCDSQueryTestSuite}} fails for q39a and a39b with {{broadcastHashJoinConf}} 
and {{shuffledHashJoinConf}}. It works fine with {{sortMergeJoinConf}}

{code}SPARK_TPCDS_DATA= build/sbt "~sql/testOnly 
*TPCDSQueryTestSuite -- -z q39a"{code}
{code}
[info] - q39a *** FAILED *** (19 seconds, 139 milliseconds)
[info]   java.lang.Exception: Expected "...25   1.022382911080458[8 ..." but 
got "...25 1.022382911080458[5 ..."
{code}

{code}SPARK_TPCDS_DATA= build/sbt "~sql/testOnly 
*TPCDSQueryTestSuite -- -z q39b"{code}

{code}
[info] - q39b *** FAILED *** (19 seconds, 351 milliseconds)
[info]   java.lang.Exception: Expected "...34   1.563403519178623[3 3   
10427   2   381.25  1.0623056061004696
[info] 333151   271.75  1.555976998814345   3   3315
2   393.75  1.0196319345405949
[info] 333931   260.0   1.5009563026568116  3   3393
2   470.25  1.129275872154205
[info] 416211   1   257.7   1.6381074811154002] 
4   16211   2   352.25  1", but got "...34  1.563403519178623[5 
3   10427   2   381.25  1.0623056061004696
[info] 333151   271.75  1.555976998814345   3   3315
2   393.75  1.0196319345405949
[info] 333931   260.0   1.5009563026568118  3   3393
2   470.25  1.129275872154205
[info] 416211   1   257.7   1.6381074811154]
4   16211   2   352.25  1" Result did not match
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46294) Clean up initValue vs zeroValue semantics in SQLMetrics

2023-12-06 Thread Davin Tjong (Jira)
Davin Tjong created SPARK-46294:
---

 Summary: Clean up initValue vs zeroValue semantics in SQLMetrics
 Key: SPARK-46294
 URL: https://issues.apache.org/jira/browse/SPARK-46294
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Davin Tjong


The semantics of initValue and _zeroValue in SQLMetrics is a little bit 
confusing, since they effectively mean the same thing. Changing it to the 
following would be clearer, especially in terms of defining what an "invalid" 
metric is.
 
proposed definitions:
 
initValue is the starting value for a SQLMetric. If a metric has value equal to 
its initValue, then it should be filtered out before aggregating with 
SQLMetrics.stringValue().
 
zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, 
it is set to zeroValue upon receiving any updates, and it also reports 
zeroValue as its value to avoid exposing it to the user programatically 
(concern previouosly addressed in SPARK-41442).

For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that 
the metric is by default invalid. At the end of a task, we will update the 
metric making it valid, and the invalid metrics will be filtered out when 
calculating min, max, etc. as a workaround for SPARK-11013.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46294) Clean up initValue vs zeroValue semantics in SQLMetrics

2023-12-06 Thread Davin Tjong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davin Tjong updated SPARK-46294:

Component/s: SQL
 (was: Spark Core)

> Clean up initValue vs zeroValue semantics in SQLMetrics
> ---
>
> Key: SPARK-46294
> URL: https://issues.apache.org/jira/browse/SPARK-46294
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Davin Tjong
>Priority: Minor
>
> The semantics of initValue and _zeroValue in SQLMetrics is a little bit 
> confusing, since they effectively mean the same thing. Changing it to the 
> following would be clearer, especially in terms of defining what an "invalid" 
> metric is.
>  
> proposed definitions:
>  
> initValue is the starting value for a SQLMetric. If a metric has value equal 
> to its initValue, then it should be filtered out before aggregating with 
> SQLMetrics.stringValue().
>  
> zeroValue defines the lowest value considered valid. If a SQLMetric is 
> invalid, it is set to zeroValue upon receiving any updates, and it also 
> reports zeroValue as its value to avoid exposing it to the user 
> programatically (concern previouosly addressed in SPARK-41442).
> For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that 
> the metric is by default invalid. At the end of a task, we will update the 
> metric making it valid, and the invalid metrics will be filtered out when 
> calculating min, max, etc. as a workaround for SPARK-11013.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44976) Preserve full principal user name on executor side

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44976:
---
Labels: pull-request-available  (was: )

> Preserve full principal user name on executor side
> --
>
> Key: SPARK-44976
> URL: https://issues.apache.org/jira/browse/SPARK-44976
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.3, 3.3.3, 3.4.1
>Reporter: YUBI LEE
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use 
> shortname instead of full principal name.
> Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the 
> side of non-kerberized hdfs namenode.
> For example, I use 2 hdfs cluster. One is kerberized, the other one is not 
> kerberized.
> I make a rule to add some prefix to username on the non-kerberized cluster if 
> some one access it from the kerberized cluster.
> {code}
>   
> hadoop.security.auth_to_local
> 
> RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/
> RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/
> DEFAULT
>   
> {code}
> However, if I submit spark job with keytab & principal option, hdfs directory 
> and files ownership is not coherent.
> (I change some words for privacy.)
> {code}
> $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23
> Found 52 items
> -rw-rw-rw-   3 _ex_eub hdfs  0 2023-05-11 00:16 
> hdfs:///user/eub/some/path/20230510/23/_SUCCESS
> -rw-r--r--   3 eub  hdfs  134418857 2023-05-11 00:15 
> hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz
> -rw-r--r--   3 eub  hdfs  153410049 2023-05-11 00:16 
> hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz
> -rw-r--r--   3 eub  hdfs  157260989 2023-05-11 00:16 
> hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz
> -rw-r--r--   3 eub  hdfs  156222760 2023-05-11 00:16 
> hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz
> {code}
> Another interesting point is that if I submit spark job without keytab and 
> principal option but with kerberos authentication with {{kinit}}, it will not 
> follow {{hadoop.security.auth_to_local}} rule completely.
> {code}
> $ hdfs dfs -ls  hdfs:///user/eub/output/
> Found 3 items
> -rw-rw-r--+  3 eub hdfs  0 2023-08-25 12:31 
> hdfs:///user/eub/output/_SUCCESS
> -rw-rw-r--+  3 eub hdfs512 2023-08-25 12:31 
> hdfs:///user/eub/output/part-0.gz
> -rw-rw-r--+  3 eub hdfs574 2023-08-25 12:31 
> hdfs:///user/eub/output/part-1.gz
> {code}
> I finally found that if I submit spark job with {{--principal}} and 
> {{--keytab}} option, ugi will be different.
> (refer to 
> https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905).
> Only file ({{_SUCCESS}}) and output directory created by driver (application 
> master side) will respect {{hadoop.security.auth_to_local}} on the 
> non-kerberized namenode only if {{--principal}} and {{--keytab}] options are 
> provided.
> No matter how hdfs files or directory are created by executor or driver, 
> those should respect {{hadoop.security.auth_to_local}} rule and should be the 
> same.
> Workaround is to pass additional argument to change {{SPARK_USER}} on the 
> executor side.
> e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}}
> {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. 
> There are some logics to append environment value with {{:}} (colon) as a 
> separator.
> - 
> https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893
> - 
> https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46293) Add protobuf to required dependency for Spark Connect

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46293:
---
Labels: pull-request-available  (was: )

> Add protobuf to required dependency for Spark Connect
> -
>
> Key: SPARK-46293
> URL: https://issues.apache.org/jira/browse/SPARK-46293
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Add missing required package for docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46293) Add protobuf to required dependency for Spark Connect

2023-12-06 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-46293:
---

 Summary: Add protobuf to required dependency for Spark Connect
 Key: SPARK-46293
 URL: https://issues.apache.org/jira/browse/SPARK-46293
 Project: Spark
  Issue Type: Bug
  Components: Connect, Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


Add missing required package for docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45580:
--
Fix Version/s: 3.5.1

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46274) Range operator computeStats() proper long conversions

2023-12-06 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46274:
---

Assignee: Kelvin Jiang

> Range operator computeStats() proper long conversions
> -
>
> Key: SPARK-46274
> URL: https://issues.apache.org/jira/browse/SPARK-46274
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Range operator's `computeStats()` function unsafely casts from `BigInt` to 
> `Long` and causes issues downstream with statistics estimation. Adds bounds 
> checking to avoid crashing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46274) Range operator computeStats() proper long conversions

2023-12-06 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46274.
-
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44191
[https://github.com/apache/spark/pull/44191]

> Range operator computeStats() proper long conversions
> -
>
> Key: SPARK-46274
> URL: https://issues.apache.org/jira/browse/SPARK-46274
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> Range operator's `computeStats()` function unsafely casts from `BigInt` to 
> `Long` and causes issues downstream with statistics estimation. Adds bounds 
> checking to avoid crashing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46292) Show a summary of workers in MasterPage

2023-12-06 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-46292:
-

 Summary: Show a summary of workers in MasterPage
 Key: SPARK-46292
 URL: https://issues.apache.org/jira/browse/SPARK-46292
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Web UI
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46292) Show a summary of workers in MasterPage

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46292:
---
Labels: pull-request-available  (was: )

> Show a summary of workers in MasterPage
> ---
>
> Key: SPARK-46292
> URL: https://issues.apache.org/jira/browse/SPARK-46292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46290) Change saveMode to overwrite for DataSourceWriter constructor

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46290:
---
Labels: pull-request-available  (was: )

> Change saveMode to overwrite for DataSourceWriter constructor
> -
>
> Key: SPARK-46290
> URL: https://issues.apache.org/jira/browse/SPARK-46290
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46291) Koalas Testing Migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46291:
-
Description: Test migration from Koalas to Spark repository, including 
setting up the testing environment and dependencies, and CI jobs.

> Koalas Testing Migration
> 
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Test migration from Koalas to Spark repository, including setting up the 
> testing environment and dependencies, and CI jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46291) Koalas Testing Migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46291:
-
Summary: Koalas Testing Migration  (was: Testing migration)

> Koalas Testing Migration
> 
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46291:


Assignee: Xinrong Meng

> Testing migration
> -
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46291.
--
Resolution: Done

> Testing migration
> -
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34999) Consolidate PySpark testing utils

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34999:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Consolidate PySpark testing utils
> -
>
> Key: SPARK-34999
> URL: https://issues.apache.org/jira/browse/SPARK-34999
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> `python/pyspark/pandas/testing` hold test utilites for pandas-on-spark, and 
> `python/pyspark/testing` contain test utilities for pyspark. Consolidating 
> them makes code cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35012) Port Koalas DataFrame related unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35012:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas DataFrame related unit tests into PySpark
> -
>
> Key: SPARK-35012
> URL: https://issues.apache.org/jira/browse/SPARK-35012
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas DataFrame related unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35300) Standardize module name in install.rst

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35300:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Standardize module name in install.rst
> --
>
> Key: SPARK-35300
> URL: https://issues.apache.org/jira/browse/SPARK-35300
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> We should use the full names of modules in install.rst.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35034) Port Koalas miscellaneous unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35034:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas miscellaneous unit tests into PySpark
> -
>
> Key: SPARK-35034
> URL: https://issues.apache.org/jira/browse/SPARK-35034
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas miscellaneous unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35035) Port Koalas internal implementation unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35035:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas internal implementation unit tests into PySpark
> ---
>
> Key: SPARK-35035
> URL: https://issues.apache.org/jira/browse/SPARK-35035
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas internal implementation related unit tests to 
> [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35040) Remove Spark-version related codes from test codes.

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35040:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Remove Spark-version related codes from test codes.
> ---
>
> Key: SPARK-35040
> URL: https://issues.apache.org/jira/browse/SPARK-35040
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> There are several places to check the PySpark version and switch the tests, 
> but now those are not necessary.
> We should remove them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35098) Revisit pandas-on-Spark test cases that are disabled because of pandas nondeterministic return values

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35098:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Revisit pandas-on-Spark test cases that are disabled because of pandas 
> nondeterministic return values
> -
>
> Key: SPARK-35098
> URL: https://issues.apache.org/jira/browse/SPARK-35098
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> Some test cases have been disabled in the places as shown below because of 
> pandas nondeterministic return values:
>  * pandas returns `None` or `nan` randomly
> python/pyspark/pandas/tests/test_series.py test_astype
>  * pandas returns `True` or `False` randomly
> python/pyspark/pandas/tests/indexes/test_base.py test_monotonic
> We should revisit them later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35033) Port Koalas plot unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35033:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas plot unit tests into PySpark
> 
>
> Key: SPARK-35033
> URL: https://issues.apache.org/jira/browse/SPARK-35033
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas plot unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35032) Port Koalas Index unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35032:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas Index unit tests into PySpark
> -
>
> Key: SPARK-35032
> URL: https://issues.apache.org/jira/browse/SPARK-35032
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas Index unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35031) Port Koalas operations on different frames tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35031:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas operations on different frames tests into PySpark
> -
>
> Key: SPARK-35031
> URL: https://issues.apache.org/jira/browse/SPARK-35031
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas operations on different frames related unit 
> tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34996) Port Koalas Series related unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34996:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas Series related unit tests into PySpark
> --
>
> Key: SPARK-34996
> URL: https://issues.apache.org/jira/browse/SPARK-34996
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas Series related unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34887) Port/integrate Koalas dependencies into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34887:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port/integrate Koalas dependencies into PySpark
> ---
>
> Key: SPARK-34887
> URL: https://issues.apache.org/jira/browse/SPARK-34887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas dependencies appropriately to PySpark 
> dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34886) Port/integrate Koalas DataFrame unit test into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34886:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port/integrate Koalas DataFrame unit test into PySpark
> --
>
> Key: SPARK-34886
> URL: https://issues.apache.org/jira/browse/SPARK-34886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port [Koalas DataFrame 
> test|https://github.com/databricks/koalas/tree/master/databricks/koalas/tests/test_dataframe.py]
>  appropriately to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46291:


 Summary: Testing migration
 Key: SPARK-46291
 URL: https://issues.apache.org/jira/browse/SPARK-46291
 Project: Spark
  Issue Type: Umbrella
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46275) Protobuf: Permissive mode should return null rather than struct with null fields

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46275:
---
Labels: pull-request-available  (was: )

> Protobuf: Permissive mode should return null rather than struct with null 
> fields
> 
>
> Key: SPARK-46275
> URL: https://issues.apache.org/jira/browse/SPARK-46275
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> Consider a protobuf with two fields {{message Person { string name = 1; int 
> id = 2; }}
>  * The struct returned by {{from_protobuf("Person")}} like this:
>  ** STRUCT
>  * If the underlying binary record fails to deserialize, it results in a 
> exception and query fails.
>  * Buf if the option {{mode}} is set to {{PERMISSIVE}} , malformed records 
> are tolerated {{null}} is returned.
>  ** {*}BUT{*}: The retuned struct looks like this \{"name: null, id: "null"}
>  * 
>  ** 
>  *** This is not convenient to the user.
>  *** *Ideally,* {{from_protobuf()}} *should return* {{null}} *.*
>  ** {{from_protobuf()}} borrowed the current behavior from {{from_avro()}} 
> implementation. It is not clear what the motivation was.
> I think we should update the implementation to return {{null}} rather than a 
> struct with null-fields inside.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45580.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
> Fix For: 4.0.0
>
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46230) Migrate RetriesExceeded into PySpark error.

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46230:
-

Assignee: Haejoon Lee

> Migrate RetriesExceeded into PySpark error.
> ---
>
> Key: SPARK-46230
> URL: https://issues.apache.org/jira/browse/SPARK-46230
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46230) Migrate RetriesExceeded into PySpark error.

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46230.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44147
[https://github.com/apache/spark/pull/44147]

> Migrate RetriesExceeded into PySpark error.
> ---
>
> Key: SPARK-46230
> URL: https://issues.apache.org/jira/browse/SPARK-46230
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46270) Use java14 instanceof expressions to replace the java8 instanceof statement

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46270.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44187
[https://github.com/apache/spark/pull/44187]

> Use java14 instanceof expressions to replace the java8 instanceof statement
> ---
>
> Key: SPARK-46270
> URL: https://issues.apache.org/jira/browse/SPARK-46270
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46290) Change saveMode to overwrite for DataSourceWriter constructor

2023-12-06 Thread Allison Wang (Jira)
Allison Wang created SPARK-46290:


 Summary: Change saveMode to overwrite for DataSourceWriter 
constructor
 Key: SPARK-46290
 URL: https://issues.apache.org/jira/browse/SPARK-46290
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45580:
-

Assignee: Bruce Robbins

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45580:
--
Target Version/s: 3.3.4

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45580:
--
Labels: correctness pull-request-available  (was: pull-request-available)

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: correctness, pull-request-available
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45580:
--
Priority: Blocker  (was: Major)

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793889#comment-17793889
 ] 

Dongjoon Hyun commented on SPARK-45580:
---

I raised this issue to the blocker for Apache Spark 3.3.4.

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Priority: Blocker
>  Labels: correctness, pull-request-available
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45580) Subquery changes the output schema of the outer query

2023-12-06 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793888#comment-17793888
 ] 

Dongjoon Hyun commented on SPARK-45580:
---

Thank you, [~bersprockets].

> Subquery changes the output schema of the outer query
> -
>
> Key: SPARK-45580
> URL: https://issues.apache.org/jira/browse/SPARK-45580
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> A query can have an incorrect output schema because of a subquery.
> Assume this data:
> {noformat}
> create or replace temp view t1(a) as values (1), (2), (3), (7);
> create or replace temp view t2(c1) as values (1), (2), (3);
> create or replace temp view t3(col1) as values (3), (9);
> cache table t1;
> cache table t2;
> cache table t3;
> {noformat}
> When run in {{spark-sql}}, the following query has a superfluous boolean 
> column:
> {noformat}
> select *
> from t1
> where exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> 1 false
> 2 false
> 3 true
> {noformat}
> The result should be:
> {noformat}
> 1
> 2
> 3
> {noformat}
> When executed via the {{Dataset}} API, you don't see the incorrect result, 
> because the Dataset API truncates the right-side of the rows based on the 
> analyzed plan's schema (it's the optimized plan's schema that goes wrong).
> However, even with the {{Dataset}} API, this query goes wrong:
> {noformat}
> select (
>   select *
>   from t1
>   where exists (
> select c1
> from t2
> where a = c1
> or a in (select col1 from t3)
>   )
>   limit 1
> )
> from range(1);
> java.lang.AssertionError: assertion failed: Expects 1 field, but got 2; 
> something went wrong in analysis
>   at scala.Predef$.assert(Predef.scala:279)
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1(SparkPlan.scala:276)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$waitForSubqueries$1$adapted(SparkPlan.scala:275)
>   at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
> ...
> {noformat}
> Other queries that have the wrong schema:
> {noformat}
> select *
> from t1
> where a in (
>   select c1
>   from t2
>   where a in (select col1 from t3)
> );
> {noformat}
> and
> {noformat}
> select *
> from t1
> where not exists (
>   select c1
>   from t2
>   where a = c1
>   or a in (select col1 from t3)
> );
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46283) Avoid testing the `streaming-kinesis-asl` module in the daily tests of branch-3.x.

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46283.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44204
[https://github.com/apache/spark/pull/44204]

> Avoid testing the `streaming-kinesis-asl` module in the daily tests of 
> branch-3.x.
> --
>
> Key: SPARK-46283
> URL: https://issues.apache.org/jira/browse/SPARK-46283
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> After the merge of https://github.com/apache/spark/pull/43736, the master 
> branch began testing the `streaming-kinesis-asl` module. 
> At the same time, because the daily test will reuse `build_and_test.yml`, the 
> daily test of branch-3.x also began testing `streaming-kinesis-asl`. 
> However, in branch-3.x, the env `ENABLE_KINESIS_TESTS` is hard-coded as 1 in 
> `dev/sparktestsupport/modules.py`:
> https://github.com/apache/spark/blob/1321b4e64deaa1e58bf297c25b72319083056568/dev/sparktestsupport/modules.py#L332-L346
> which leads to the failure of the daily test of branch-3.x:
> - branch-3.3: https://github.com/apache/spark/actions/runs/7111246311
> - branch-3.4: https://github.com/apache/spark/actions/runs/7098435892
> - branch-3.5: https://github.com/apache/spark/actions/runs/7099811235
> ```
> [info] 
> org.apache.spark.streaming.kinesis.WithoutAggregationKinesisStreamSuite *** 
> ABORTED *** (1 second, 14 milliseconds)
> [info]   java.lang.Exception: Kinesis tests enabled using environment 
> variable ENABLE_KINESIS_TESTS
> [info] but could not find AWS credentials. Please follow instructions in AWS 
> documentation
> [info] to set the credentials in your system such that the 
> DefaultAWSCredentialsProviderChain
> [info] can find the credentials.
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:258)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils.kinesisClient$lzycompute(KinesisTestUtils.scala:58)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils.kinesisClient(KinesisTestUtils.scala:57)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:168)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:181)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:84)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisStreamTests.$anonfun$beforeAll$1(KinesisStreamSuite.scala:61)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisFunSuite.runIfTestsEnabled(KinesisFunSuite.scala:41)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisFunSuite.runIfTestsEnabled$(KinesisFunSuite.scala:39)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisStreamTests.runIfTestsEnabled(KinesisStreamSuite.scala:42)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisStreamTests.beforeAll(KinesisStreamSuite.scala:59)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisStreamTests.org$scalatest$BeforeAndAfter$$super$run(KinesisStreamSuite.scala:42)
> [info]   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:273)
> [info]   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:271)
> [info]   at 
> org.apache.spark.streaming.kinesis.KinesisStreamTests.run(KinesisStreamSuite.scala:42)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
> [info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:750)
> [info] Test run 
> org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite started
> [info] Test 
> org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite.testJavaKinesisDStreamBuilderOldApi
>  started
> [info] Test 
> 

[jira] [Resolved] (SPARK-46286) Document spark.io.compression.zstd.bufferPool.enabled

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46286.
---
Fix Version/s: 3.3.4
   3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44207
[https://github.com/apache/spark/pull/44207]

> Document spark.io.compression.zstd.bufferPool.enabled
> -
>
> Key: SPARK-46286
> URL: https://issues.apache.org/jira/browse/SPARK-46286
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.4, 3.4.3, 3.5.1, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46286) Document spark.io.compression.zstd.bufferPool.enabled

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46286:
-

Assignee: Kent Yao

> Document spark.io.compression.zstd.bufferPool.enabled
> -
>
> Key: SPARK-46286
> URL: https://issues.apache.org/jira/browse/SPARK-46286
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46287) DataFrame.isEmpty should work with all datatypes

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46287.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44209
[https://github.com/apache/spark/pull/44209]

> DataFrame.isEmpty should work with all datatypes
> 
>
> Key: SPARK-46287
> URL: https://issues.apache.org/jira/browse/SPARK-46287
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46288) Remove unused code in `pyspark.pandas.tests.frame.*`

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46288:
-

Assignee: Ruifeng Zheng

> Remove unused code in `pyspark.pandas.tests.frame.*`
> 
>
> Key: SPARK-46288
> URL: https://issues.apache.org/jira/browse/SPARK-46288
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46288) Remove unused code in `pyspark.pandas.tests.frame.*`

2023-12-06 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46288.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44212
[https://github.com/apache/spark/pull/44212]

> Remove unused code in `pyspark.pandas.tests.frame.*`
> 
>
> Key: SPARK-46288
> URL: https://issues.apache.org/jira/browse/SPARK-46288
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46273) Support INSERT INTO/OVERWRITE using DSv2 sources

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46273:
---
Labels: pull-request-available  (was: )

> Support INSERT INTO/OVERWRITE using DSv2 sources
> 
>
> Key: SPARK-46273
> URL: https://issues.apache.org/jira/browse/SPARK-46273
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46289) Exception when ordering by UDT in interpreted mode

2023-12-06 Thread Bruce Robbins (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-46289:
--
Affects Version/s: 3.3.3

> Exception when ordering by UDT in interpreted mode
> --
>
> Key: SPARK-46289
> URL: https://issues.apache.org/jira/browse/SPARK-46289
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.5.0
>Reporter: Bruce Robbins
>Priority: Major
>
> In interpreted mode, ordering by a UDT will result in an exception. For 
> example:
> {noformat}
> import org.apache.spark.ml.linalg.{DenseVector, Vector}
> val df = Seq.tabulate(30) { x =>
>   (x, x + 1, x + 2, new DenseVector(Array((x/100.0).toDouble, ((x + 
> 1)/100.0).toDouble, ((x + 3)/100.0).toDouble)))
> }.toDF("id", "c1", "c2", "c3")
> df.createOrReplaceTempView("df")
> // this works
> sql("select * from df order by c3").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this gets an error
> sql("select * from df order by c3").collect
> {noformat}
> The second {{collect}} action results in the following exception:
> {noformat}
> org.apache.spark.SparkIllegalArgumentException: Type 
> UninitializedPhysicalType does not support ordered operations.
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.orderedOperationUnsupportedByDataTypeError(QueryExecutionErrors.scala:348)
>   at 
> org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:332)
>   at 
> org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:329)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:60)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:39)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$RowComparator.compare(UnsafeExternalRowSorter.java:254)
> {noformat}
> Note: You don't get an error if you use {{show}} rather than {{collect}}. 
> This is because {{show}} will implicitly add a {{limit}}, in which case the 
> ordering is performed by {{TakeOrderedAndProject}} rather than 
> {{UnsafeExternalRowSorter}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46289) Exception when ordering by UDT in interpreted mode

2023-12-06 Thread Bruce Robbins (Jira)
Bruce Robbins created SPARK-46289:
-

 Summary: Exception when ordering by UDT in interpreted mode
 Key: SPARK-46289
 URL: https://issues.apache.org/jira/browse/SPARK-46289
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.2
Reporter: Bruce Robbins


In interpreted mode, ordering by a UDT will result in an exception. For example:
{noformat}
import org.apache.spark.ml.linalg.{DenseVector, Vector}

val df = Seq.tabulate(30) { x =>
  (x, x + 1, x + 2, new DenseVector(Array((x/100.0).toDouble, ((x + 
1)/100.0).toDouble, ((x + 3)/100.0).toDouble)))
}.toDF("id", "c1", "c2", "c3")

df.createOrReplaceTempView("df")

// this works
sql("select * from df order by c3").collect

sql("set spark.sql.codegen.wholeStage=false")
sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")

// this gets an error
sql("select * from df order by c3").collect
{noformat}
The second {{collect}} action results in the following exception:
{noformat}
org.apache.spark.SparkIllegalArgumentException: Type UninitializedPhysicalType 
does not support ordered operations.
at 
org.apache.spark.sql.errors.QueryExecutionErrors$.orderedOperationUnsupportedByDataTypeError(QueryExecutionErrors.scala:348)
at 
org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:332)
at 
org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:329)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:60)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:39)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter$RowComparator.compare(UnsafeExternalRowSorter.java:254)
{noformat}
Note: You don't get an error if you use {{show}} rather than {{collect}}. This 
is because {{show}} will implicitly add a {{limit}}, in which case the ordering 
is performed by {{TakeOrderedAndProject}} rather than 
{{UnsafeExternalRowSorter}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46173) Skipping trimAll call in stringToDate functions to avoid needless string copy

2023-12-06 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46173:
---

Assignee: Aleksandar Tomic

> Skipping trimAll call in stringToDate functions to avoid needless string copy
> -
>
> Key: SPARK-46173
> URL: https://issues.apache.org/jira/browse/SPARK-46173
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>
> In StringToDate function call we currently first call trimAll to remove any 
> whitespace and isocontrol characters. Trimming results in copying the input 
> string which is not really needed given that we can do all the parsing in 
> place by just skipping the whitespace/isocontrol chars.
> Given that we have customers complaining about speed of stringtodate 
> function, especially when input string is long/potentially malformed proposal 
> is to skip trimAll call and do parsing in place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46173) Skipping trimAll call in stringToDate functions to avoid needless string copy

2023-12-06 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46173.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44110
[https://github.com/apache/spark/pull/44110]

> Skipping trimAll call in stringToDate functions to avoid needless string copy
> -
>
> Key: SPARK-46173
> URL: https://issues.apache.org/jira/browse/SPARK-46173
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In StringToDate function call we currently first call trimAll to remove any 
> whitespace and isocontrol characters. Trimming results in copying the input 
> string which is not really needed given that we can do all the parsing in 
> place by just skipping the whitespace/isocontrol chars.
> Given that we have customers complaining about speed of stringtodate 
> function, especially when input string is long/potentially malformed proposal 
> is to skip trimAll call and do parsing in place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45888) Apply error class framework to state data source & state metadata data source

2023-12-06 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-45888:


Assignee: Jungtaek Lim

> Apply error class framework to state data source & state metadata data source
> -
>
> Key: SPARK-45888
> URL: https://issues.apache.org/jira/browse/SPARK-45888
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Blocker
>  Labels: pull-request-available
>
> Intended to be a blocker issue for the release of state data source reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45888) Apply error class framework to state data source & state metadata data source

2023-12-06 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-45888.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44025
[https://github.com/apache/spark/pull/44025]

> Apply error class framework to state data source & state metadata data source
> -
>
> Key: SPARK-45888
> URL: https://issues.apache.org/jira/browse/SPARK-45888
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Intended to be a blocker issue for the release of state data source reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46288) Remove unused code in `pyspark.pandas.tests.frame.*`

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46288:
---
Labels: pull-request-available  (was: )

> Remove unused code in `pyspark.pandas.tests.frame.*`
> 
>
> Key: SPARK-46288
> URL: https://issues.apache.org/jira/browse/SPARK-46288
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46288) Remove unused code in `pyspark.pandas.tests.frame.*`

2023-12-06 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46288:
-

 Summary: Remove unused code in `pyspark.pandas.tests.frame.*`
 Key: SPARK-46288
 URL: https://issues.apache.org/jira/browse/SPARK-46288
 Project: Spark
  Issue Type: Test
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45720) Upgrade AWS SDK to v2 for Spark Kinesis connector

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45720:
---
Labels: pull-request-available  (was: )

> Upgrade AWS SDK to v2 for Spark Kinesis connector
> -
>
> Key: SPARK-45720
> URL: https://issues.apache.org/jira/browse/SPARK-45720
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect Contrib
>Affects Versions: 3.5.0
>Reporter: Lantao Jin
>Priority: Major
>  Labels: pull-request-available
>
> Sub-task of [SPARK-44124|https://issues.apache.org/jira/browse/SPARK-44124]. 
> In this issue, we focus on the AWS SDK v2 upgrade in Kinesis connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46287) DataFrame.isEmpty should work with all datatypes

2023-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46287:
---
Labels: pull-request-available  (was: )

> DataFrame.isEmpty should work with all datatypes
> 
>
> Key: SPARK-46287
> URL: https://issues.apache.org/jira/browse/SPARK-46287
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >