date:20231213

[jira] [Updated] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source for filtering

2023-12-13 Thread jiahong.li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiahong.li updated SPARK-46392:
---
Summary: In Function DataSourceStrategy.translateFilterWithMapping, we need 
transfer cast expression to  data source for filtering  (was: In Function 
DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression 
to  data source filter)

> In Function DataSourceStrategy.translateFilterWithMapping, we need transfer 
> cast expression to  data source for filtering
> -
>
> Key: SPARK-46392
> URL: https://issues.apache.org/jira/browse/SPARK-46392
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiahong.li
>Priority: Minor
>  Labels: pull-request-available
>
> Considering this Situation:
>  We create a partition table that created by source which is extends 
> TableProvider, if we select data from  some specific partitions, choose 
> partition dataType differ from table partition type leads partition can not 
> be pushed down .
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46402) Add getMessageParameters and getQueryContext support

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46402:
---
Labels: pull-request-available  (was: )

> Add getMessageParameters and getQueryContext support
> 
>
> Key: SPARK-46402
> URL: https://issues.apache.org/jira/browse/SPARK-46402
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46402) Add getMessageParameters and getQueryContext support

2023-12-13 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-46402:


 Summary: Add getMessageParameters and getQueryContext support
 Key: SPARK-46402
 URL: https://issues.apache.org/jira/browse/SPARK-46402
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46394) spark.catalog.listDatabases() fails if containing schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true

2023-12-13 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46394.
-
Fix Version/s: 4.0.0
 Assignee: Xinyi Yu
   Resolution: Fixed

> spark.catalog.listDatabases() fails if containing schemas with special 
> characters when spark.sql.legacy.keepCommandOutputSchema set to true
> ---
>
> Key: SPARK-46394
> URL: https://issues.apache.org/jira/browse/SPARK-46394
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When the SQL conf {{spark.sql.legacy.keepCommandOutputSchema}} is set to true:
> Before:
> // support there is a xyyu-db-with-hyphen schema in the catalog
> spark.catalog.listDatabases()
> [INVALID_IDENTIFIER] The identifier xyyu-db-with-hyphen is invalid. Please, 
> consider quoting it with back-quotes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232

2023-12-13 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796561#comment-17796561
 ] 

Yang Jie commented on SPARK-42307:
--

Sorry, I mistakenly responded to this Jira, please ignore the previous message.
 
 
 
 
 

> Assign name to _LEGACY_ERROR_TEMP_2232
> --
>
> Key: SPARK-42307
> URL: https://issues.apache.org/jira/browse/SPARK-42307
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-44172) Use Jackson API Instead of Json4s

2023-12-13 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796560#comment-17796560
 ] 

Yang Jie edited comment on SPARK-44172 at 12/14/23 5:30 AM:


[~hannahkamundson] 
Thank you very much for your interest in this work, but it would be best to 
initiate a formal discussion in the dev mailing list before starting work.

 

A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], 
but didn't get much of a response. I created this Jira, but I'm also not sure 
if now is the right time to drop the dependency on Json4s.


was (Author: luciferyang):
[~hannahkamundson] 
Thank you very much for your interest in this ticket, but it would be best to 
initiate a formal discussion in the dev mailing list before starting work.

 

A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], 
but didn't get much of a response. I created this Jira, but I'm also not sure 
if now is the right time to drop the dependency on Json4s.

> Use Jackson API Instead of Json4s
> -
>
> Key: SPARK-44172
> URL: https://issues.apache.org/jira/browse/SPARK-44172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232

2023-12-13 Thread Yang Jie (Jira)



[ https://issues.apache.org/jira/browse/SPARK-42307 ]


Yang Jie deleted comment on SPARK-42307:
--

was (Author: luciferyang):
[~hannahkamundson] 
Thank you very much for your interest in this ticket, but it would be best to 
initiate a formal discussion in the dev mailing list before starting work.

 

A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], 
but didn't get much of a response. I created this Jira, but I'm also not sure 
if now is the right time to drop the dependency on Json4s.

> Assign name to _LEGACY_ERROR_TEMP_2232
> --
>
> Key: SPARK-42307
> URL: https://issues.apache.org/jira/browse/SPARK-42307
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44172) Use Jackson API Instead of Json4s

2023-12-13 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796560#comment-17796560
 ] 

Yang Jie commented on SPARK-44172:
--

[~hannahkamundson] 
Thank you very much for your interest in this ticket, but it would be best to 
initiate a formal discussion in the dev mailing list before starting work.

 

A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], 
but didn't get much of a response. I created this Jira, but I'm also not sure 
if now is the right time to drop the dependency on Json4s.

> Use Jackson API Instead of Json4s
> -
>
> Key: SPARK-44172
> URL: https://issues.apache.org/jira/browse/SPARK-44172
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232

2023-12-13 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796559#comment-17796559
 ] 

Yang Jie edited comment on SPARK-42307 at 12/14/23 5:30 AM:


[~hannahkamundson] 
Thank you very much for your interest in this ticket, but it would be best to 
initiate a formal discussion in the dev mailing list before starting work.

 

A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], 
but didn't get much of a response. I created this Jira, but I'm also not sure 
if now is the right time to drop the dependency on Json4s.


was (Author: luciferyang):
[~hannahkamundson] 
Thank you very much for your interest in this ticket, but it would be best to 
initiate a formal discussion in the dev mailing list before starting work.

 

A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], 
but didn't get much of a response. I created this Jira, but I'm also not sure 
if now is the right time to drop the dependency on Json4s.
 
 
 
 
 
 
 
 
 
 

> Assign name to _LEGACY_ERROR_TEMP_2232
> --
>
> Key: SPARK-42307
> URL: https://issues.apache.org/jira/browse/SPARK-42307
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232

2023-12-13 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796559#comment-17796559
 ] 

Yang Jie commented on SPARK-42307:
--

[~hannahkamundson] 
Thank you very much for your interest in this ticket, but it would be best to 
initiate a formal discussion in the dev mailing list before starting work.

 

A long time ago, I submitted a [PR|https://github.com/apache/spark/pull/37604], 
but didn't get much of a response. I created this Jira, but I'm also not sure 
if now is the right time to drop the dependency on Json4s.
 
 
 
 
 
 
 
 
 
 

> Assign name to _LEGACY_ERROR_TEMP_2232
> --
>
> Key: SPARK-42307
> URL: https://issues.apache.org/jira/browse/SPARK-42307
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46401:
---
Labels: pull-request-available  (was: )

> Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`
> --
>
> Key: SPARK-46401
> URL: https://issues.apache.org/jira/browse/SPARK-46401
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46401) Should use `!isEmpty()` on RoaringBitmap instead of `getCardinality() > 0`

2023-12-13 Thread Yang Jie (Jira)

Yang Jie created SPARK-46401:


 Summary: Should use `!isEmpty()` on RoaringBitmap instead of 
`getCardinality() > 0`
 Key: SPARK-46401
 URL: https://issues.apache.org/jira/browse/SPARK-46401
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46032:
---
Labels: pull-request-available  (was: )

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.readOb

[jira] [Updated] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46384:
---
Labels: pull-request-available  (was: )

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46357) Replace use of setConf with conf.set in docs

2023-12-13 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46357.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44290
[https://github.com/apache/spark/pull/44290]

> Replace use of setConf with conf.set in docs
> 
>
> Key: SPARK-46357
> URL: https://issues.apache.org/jira/browse/SPARK-46357
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.5.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46357) Replace use of setConf with conf.set in docs

2023-12-13 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46357:


Assignee: Nicholas Chammas

> Replace use of setConf with conf.set in docs
> 
>
> Key: SPARK-46357
> URL: https://issues.apache.org/jira/browse/SPARK-46357
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.5.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46400) When there are corrupted files in the local maven repo, retry to skip this cache

2023-12-13 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46400:

Summary: When there are corrupted files in the local maven repo, retry to 
skip this cache  (was: When there are corrupted files in the local maven repo, 
try to skip this cache)

> When there are corrupted files in the local maven repo, retry to skip this 
> cache
> 
>
> Key: SPARK-46400
> URL: https://issues.apache.org/jira/browse/SPARK-46400
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46400) When there are corrupted files in the local maven repo, try to skip this cache

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46400:
---
Labels: pull-request-available  (was: )

> When there are corrupted files in the local maven repo, try to skip this cache
> --
>
> Key: SPARK-46400
> URL: https://issues.apache.org/jira/browse/SPARK-46400
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46400) When there are corrupted files in the local maven repo, try to skip this cache

2023-12-13 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-46400:
---

 Summary: When there are corrupted files in the local maven repo, 
try to skip this cache
 Key: SPARK-46400
 URL: https://issues.apache.org/jira/browse/SPARK-46400
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46399) Add exit status to the Application End event for the use of Spark Listener

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46399:
---
Labels: pull-request-available  (was: )

> Add exit status to the Application End event for the use of Spark Listener
> --
>
> Key: SPARK-46399
> URL: https://issues.apache.org/jira/browse/SPARK-46399
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Reza Safi
>Priority: Minor
>  Labels: pull-request-available
>
> Currently SparkListenerApplicationEnd only has a timestamp value and there is 
> not exit status recorded with it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46399) Add exit status to the Application End event for the use of Spark Listener

2023-12-13 Thread Reza Safi (Jira)

Reza Safi created SPARK-46399:
-

 Summary: Add exit status to the Application End event for the use 
of Spark Listener
 Key: SPARK-46399
 URL: https://issues.apache.org/jira/browse/SPARK-46399
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Reza Safi


Currently SparkListenerApplicationEnd only has a timestamp value and there is 
not exit status recorded with it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46398:
---
Labels: pull-request-available  (was: )

> Test rangeBetween window function (pyspark.sql.window)
> --
>
> Key: SPARK-46398
> URL: https://issues.apache.org/jira/browse/SPARK-46398
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)

2023-12-13 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-46398:


 Summary: Test rangeBetween window function (pyspark.sql.window)
 Key: SPARK-46398
 URL: https://issues.apache.org/jira/browse/SPARK-46398
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46397) sha2(df.a, 1024) throws a different exception in Spark Connect

2023-12-13 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-46397:


 Summary: sha2(df.a, 1024) throws a different exception in Spark 
Connect
 Key: SPARK-46397
 URL: https://issues.apache.org/jira/browse/SPARK-46397
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
from pyspark.sql import functions as sf
spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect()
{code}

Non-connect:

{code}
...
pyspark.errors.exceptions.captured.IllegalArgumentException: requirement 
failed: numBits 1024 is not in the permitted values (0, 224, 256, 384, 512)
{code}

Connect:

{code}
...
pyspark.errors.exceptions.connect.AnalysisException: 
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "sha2(id, 1024)" due 
to data type mismatch: Parameter 1 requires the "BINARY" type, however "id" has 
the type "BIGINT". SQLSTATE: 42K09;
'Project [unresolvedalias(sha2(id#1L, 1024))]
+- Range (0, 1, step=1, splits=Some(1))
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46396) LegacyFastTimestampFormatter.parseOptional should not throw exception

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46396:
---
Labels: pull-request-available  (was: )

> LegacyFastTimestampFormatter.parseOptional should not throw exception
> -
>
> Key: SPARK-46396
> URL: https://issues.apache.org/jira/browse/SPARK-46396
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> When setting spark.sql.legacy.timeParserPolicy=LEGACY, Spark will use the 
> LegacyFastTimestampFormatter to infer potential timestamp columns. The 
> inference shouldn't throw exception.
> However, when the input is 23012150952, there is exception:
> ```
> For input string: "23012150952"
> java.lang.NumberFormatException: For input string: "23012150952"
> at 
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
> at java.base/java.lang.Integer.parseInt(Integer.java:668)
> at java.base/java.lang.Integer.parseInt(Integer.java:786)
> at 
> org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304)
> at 
> org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045)
> at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651)
> at 
> org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46396) LegacyFastTimestampFormatter.parseOptional should not throw exception

2023-12-13 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-46396:
--

 Summary: LegacyFastTimestampFormatter.parseOptional should not 
throw exception
 Key: SPARK-46396
 URL: https://issues.apache.org/jira/browse/SPARK-46396
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


When setting spark.sql.legacy.timeParserPolicy=LEGACY, Spark will use the 
LegacyFastTimestampFormatter to infer potential timestamp columns. The 
inference shouldn't throw exception.

However, when the input is 23012150952, there is exception:

```

For input string: "23012150952"

java.lang.NumberFormatException: For input string: "23012150952"

at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)

at java.base/java.lang.Integer.parseInt(Integer.java:668)

at java.base/java.lang.Integer.parseInt(Integer.java:786)

at 
org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304)

at org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045)

at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651)

at 
org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418)

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46395) Automatically generate SQL configuration tables for documentation

2023-12-13 Thread Nicholas Chammas (Jira)

Nicholas Chammas created SPARK-46395:


 Summary: Automatically generate SQL configuration tables for 
documentation
 Key: SPARK-46395
 URL: https://issues.apache.org/jira/browse/SPARK-46395
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 3.5.0
Reporter: Nicholas Chammas






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42307:
---
Labels: pull-request-available  (was: )

> Assign name to _LEGACY_ERROR_TEMP_2232
> --
>
> Key: SPARK-42307
> URL: https://issues.apache.org/jira/browse/SPARK-42307
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42307) Assign name to _LEGACY_ERROR_TEMP_2232

2023-12-13 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796453#comment-17796453
 ] 

Hannah Amundson commented on SPARK-42307:
-

I'll work on this

> Assign name to _LEGACY_ERROR_TEMP_2232
> --
>
> Key: SPARK-42307
> URL: https://issues.apache.org/jira/browse/SPARK-42307
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46393) Classify exceptions in the JDBC table catalog

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46393:
---
Labels: pull-request-available  (was: )

> Classify exceptions in the JDBC table catalog
> -
>
> Key: SPARK-46393
> URL: https://issues.apache.org/jira/browse/SPARK-46393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Handle exceptions from JDBC drivers and convert them to AnalysisException 
> with error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)

2023-12-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46385:


Assignee: Xinrong Meng

> Test aggregate functions for groups (pyspark.sql.group)
> ---
>
> Key: SPARK-46385
> URL: https://issues.apache.org/jira/browse/SPARK-46385
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)

2023-12-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46385.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44322
[https://github.com/apache/spark/pull/44322]

> Test aggregate functions for groups (pyspark.sql.group)
> ---
>
> Key: SPARK-46385
> URL: https://issues.apache.org/jira/browse/SPARK-46385
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46394) spark.catalog.listDatabases() fails if containing schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46394:
---
Labels: pull-request-available  (was: )

> spark.catalog.listDatabases() fails if containing schemas with special 
> characters when spark.sql.legacy.keepCommandOutputSchema set to true
> ---
>
> Key: SPARK-46394
> URL: https://issues.apache.org/jira/browse/SPARK-46394
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Xinyi Yu
>Priority: Major
>  Labels: pull-request-available
>
> When the SQL conf {{spark.sql.legacy.keepCommandOutputSchema}} is set to true:
> Before:
> // support there is a xyyu-db-with-hyphen schema in the catalog
> spark.catalog.listDatabases()
> [INVALID_IDENTIFIER] The identifier xyyu-db-with-hyphen is invalid. Please, 
> consider quoting it with back-quotes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46153) XML: Add TimestampNTZType support

2023-12-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46153.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44329
[https://github.com/apache/spark/pull/44329]

> XML: Add TimestampNTZType support
> -
>
> Key: SPARK-46153
> URL: https://issues.apache.org/jira/browse/SPARK-46153
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Assignee: Sandip Agarwala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46153) XML: Add TimestampNTZType support

2023-12-13 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46153:


Assignee: Sandip Agarwala

> XML: Add TimestampNTZType support
> -
>
> Key: SPARK-46153
> URL: https://issues.apache.org/jira/browse/SPARK-46153
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Assignee: Sandip Agarwala
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-38465) Use error classes in org.apache.spark.launcher

2023-12-13 Thread Hannah Amundson (Jira)



[ https://issues.apache.org/jira/browse/SPARK-38465 ]


Hannah Amundson deleted comment on SPARK-38465:
-

was (Author: hannahkamundson):
I will work on this since it has been a while since the last comment

> Use error classes in org.apache.spark.launcher
> --
>
> Key: SPARK-38465
> URL: https://issues.apache.org/jira/browse/SPARK-38465
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46394) spark.catalog.listDatabases() fails if containing schemas with special characters when spark.sql.legacy.keepCommandOutputSchema set to true

2023-12-13 Thread Xinyi Yu (Jira)

Xinyi Yu created SPARK-46394:


 Summary: spark.catalog.listDatabases() fails if containing schemas 
with special characters when spark.sql.legacy.keepCommandOutputSchema set to 
true
 Key: SPARK-46394
 URL: https://issues.apache.org/jira/browse/SPARK-46394
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Xinyi Yu


When the SQL conf {{spark.sql.legacy.keepCommandOutputSchema}} is set to true:
Before:
// support there is a xyyu-db-with-hyphen schema in the catalog
spark.catalog.listDatabases()

[INVALID_IDENTIFIER] The identifier xyyu-db-with-hyphen is invalid. Please, 
consider quoting it with back-quotes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38465) Use error classes in org.apache.spark.launcher

2023-12-13 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796412#comment-17796412
 ] 

Hannah Amundson commented on SPARK-38465:
-

I will work on this since it has been a while since the last comment

> Use error classes in org.apache.spark.launcher
> --
>
> Key: SPARK-38465
> URL: https://issues.apache.org/jira/browse/SPARK-38465
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46373) Create DataFrame Bug

2023-12-13 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796385#comment-17796385
 ] 

Bruce Robbins commented on SPARK-46373:
---

Maybe due to this (from [the docs|https://spark.apache.org/docs/3.5.0/]):

{quote}Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.8+, and R 
3.5+.{quote}

Scala 3 is not listed as a supported version.

> Create DataFrame Bug
> 
>
> Key: SPARK-46373
> URL: https://issues.apache.org/jira/browse/SPARK-46373
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Bleibtreu
>Priority: Major
>
> Scala version is 3.3.1
> Spark version is 3.5.0
> I am using spark-core 3.5.1. I am trying to create a DataFrame through the 
> reflection api, but "No TypeTag available for Person" will appear. I have 
> tried for a long time, but I still don't quite understand why TypeTag cannot 
> recognize my Person case class. 
> {code:java}
>     import sparkSession.implicits._
>     import scala.reflect.runtime.universe._
>     case class Person(name: String)
>     val a = List(Person("A"), Person("B"), Person("C"))
>     val df = sparkSession.createDataFrame(a)
>     df.show(){code}
> !https://media.discordapp.net/attachments/839723072239566878/1183747749204725821/image.png?ex=65897600&is=65770100&hm=4eeba8d8499499439590a34260f8b441c6594c572c545f5f61f8dc65beeb6a4b&=&format=webp&quality=lossless&width=1178&height=142!
> I tested it and it is indeed a problem unique to Scala3
> There is no problem on Scala2.13
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42332) Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error class

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42332:
---
Labels: pull-request-available  (was: )

> Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error 
> class
> ---
>
> Key: SPARK-42332
> URL: https://issues.apache.org/jira/browse/SPARK-42332
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> See https://github.com/apache/spark/pull/39851#pullrequestreview-1280621488



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46054) SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator

2023-12-13 Thread Soam Acharya (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796381#comment-17796381
 ] 

Soam Acharya commented on SPARK-46054:
--

+1. As our recent [AWS re:Invent 
talk|https://www.youtube.com/watch?v=G9aNXEu_a8k] indicates, Pinterest relies 
heavily on the Spark Operator for our Spark on EKS platform. We would love to 
see continued investment in the Spark Operator and are happy to contribute 
potential changes we make to the codebase back to the community. We believe the 
proposed setup is a good vehicle for us to keep moving forward with the Spark 
Operator.

> SPIP: Proposal to Adopt Google's Spark K8s Operator as Official Spark Operator
> --
>
> Key: SPARK-46054
> URL: https://issues.apache.org/jira/browse/SPARK-46054
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Vara Bonthu
>Priority: Minor
>
> *Description:*
> This proposal aims to recommend the adoption of [Google's Spark K8s 
> Operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] as the 
> official Spark Operator for the Apache Spark community. The operator has 
> gained significant traction among many users and organizations and used 
> heavily in production environments, but challenges related to maintenance and 
> governance necessitate this recommendation.
> *Background:*
>  * Google's Spark K8s Operator is currently in use by hundreds of users and 
> organizations. However, due to maintenance issues, many of these users and 
> organizations have resorted to forking the repository and implementing their 
> own fixes.
>  * The project boasts an impressive user base with 167 contributors, 2.5k 
> likes, and endorsements from 45 organizations, as documented in the "Who is 
> using" document. Notably, there are many more organizations using it than the 
> initially reported 45.
>  * The primary issue at hand is that this project resides under the 
> GoogleCloudPlatform GitHub organization and is exclusively moderated by a 
> Google employee. Concerns have been raised by numerous users and customers 
> regarding the maintenance of the repository.
>  * The existing Google maintainers are constrained by limitations in terms of 
> time and support, which negatively impacts both the project and its user 
> community.
>  
> *Recent Developments:*
>  * During Kubecon Chicago 2023, AWS OSS Architects (Vara Bonthu) and the 
> Apple infrastructure team engaged in discussions with the Google's team, 
> specifically with Marcin Wielgus. They expressed their interest in 
> contributing the project to either the Kubeflow or Apache Spark community.
>  * *{color:#00875a}Marcin from Google confirmed their willingness to donate 
> the project to either of these communities.{color}*
>  * An adoption process has been initiated by the Kubeflow project under CNCF, 
> as documented in the following thread: [Link to the 
> thread|https://github.com/kubeflow/community/issues/648].
>  
> *Primary Goal:*
>  * The primary goal is to ensure the collaborative support and adoption of 
> Google's Spark Operator by the Apache Spark , thereby avoiding the 
> development of redundant tools and reducing confusion among users.
> *Next Steps:*
>  * *Meeting with Apache Spark Working Group Maintainers:* We propose 
> arranging a meeting with the Apache Spark working group maintainers to delve 
> deeper into this matter, address any questions or concerns they may have, and 
> collectively work towards a decision.
>  * *Establish a New Working Group:* Upon reaching an agreement, we intend to 
> create a new working group comprising members from diverse organizations who 
> are willing to contribute and collaborate on this initiative.
>  * *Repository Transfer:* Our plan involves transferring the project 
> repository from Google's organization to either the Apache or Kubeflow 
> organization, aligning with the chosen community.
>  * *Roadmap Development:* We will formulate a new roadmap that encompasses 
> immediate issue resolution and a long-term design strategy aimed at enhancing 
> performance, scalability, and security for this tool.
>  
> We believe that working towards one Spark Operator will benefit the Apache 
> Spark community and address the current maintenance challenges. Your feedback 
> and support in this matter are highly valued. Let's collaborate to ensure a 
> robust and well-maintained Spark Operator for the Apache Spark community's 
> benefit.
> *Community members are encouraged to leave their comments or give a thumbs-up 
> to express their support for adopting Google's Spark Operator as the official 
> Apache Spark operator.*
>  
> *Proposed Authors*
> Vara Bonthu (AWS)
> Marcin Wielgus (Google)
>  



--

[jira] [Created] (SPARK-46393) Classify exceptions in the JDBC table catalog

2023-12-13 Thread Max Gekk (Jira)

Max Gekk created SPARK-46393:


 Summary: Classify exceptions in the JDBC table catalog
 Key: SPARK-46393
 URL: https://issues.apache.org/jira/browse/SPARK-46393
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


Handle exceptions from JDBC drivers and convert them to AnalysisException with 
error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42332) Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error class

2023-12-13 Thread Hannah Amundson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796360#comment-17796360
 ] 

Hannah Amundson commented on SPARK-42332:
-

I'll work on this!

> Convert `require` in `ComplexTypeMergingExpression.dataTypeCheck` to an error 
> class
> ---
>
> Key: SPARK-42332
> URL: https://issues.apache.org/jira/browse/SPARK-42332
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Major
>
> See https://github.com/apache/spark/pull/39851#pullrequestreview-1280621488



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-45190) XML: StructType schema issue in pyspark connect

2023-12-13 Thread Hannah Amundson (Jira)



[ https://issues.apache.org/jira/browse/SPARK-45190 ]


Hannah Amundson deleted comment on SPARK-45190:
-

was (Author: hannahkamundson):
Hello everyone (and [~sandip.agarwala] ),

I am a graduate student at the University of Texas (Computer Science). I have a 
project in my Distributed Systems course to contribute to an open source 
distributed project. Would it be okay if I worked on this ticket?

 

Thanks for your help,

Hannah

> XML: StructType schema issue in pyspark connect
> ---
>
> Key: SPARK-45190
> URL: https://issues.apache.org/jira/browse/SPARK-45190
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>
> The following PR added support for from_xml to pyspark.
> https://github.com/apache/spark/pull/42938
>  
> However, StructType schema is resulting in schema parse error for pyspark 
> connect. 
> Filing a Jira to track this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-45190) XML: StructType schema issue in pyspark connect

2023-12-13 Thread Hannah Amundson (Jira)



[ https://issues.apache.org/jira/browse/SPARK-45190 ]


Hannah Amundson deleted comment on SPARK-45190:
-

was (Author: hannahkamundson):
I have started on this ticket

> XML: StructType schema issue in pyspark connect
> ---
>
> Key: SPARK-45190
> URL: https://issues.apache.org/jira/browse/SPARK-45190
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>
> The following PR added support for from_xml to pyspark.
> https://github.com/apache/spark/pull/42938
>  
> However, StructType schema is resulting in schema parse error for pyspark 
> connect. 
> Filing a Jira to track this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46392:
---
Labels: pull-request-available  (was: )

> In Function DataSourceStrategy.translateFilterWithMapping, we need transfer 
> cast expression to  data source filter
> --
>
> Key: SPARK-46392
> URL: https://issues.apache.org/jira/browse/SPARK-46392
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiahong.li
>Priority: Minor
>  Labels: pull-request-available
>
> Considering this Situation:
>  We create a partition table that created by source which is extends 
> TableProvider, if we select data from  some specific partitions, choose 
> partition dataType differ from table partition type leads partition can not 
> be pushed down .
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter

2023-12-13 Thread jiahong.li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiahong.li updated SPARK-46392:
---
Description: 
Considering this Situation:
 We create a partition table that created by source which is extends 
TableProvider, if we select data from  some specific partitions, choose 
partition dataType differ from table partition type leads partition can not be 
pushed down .

 

  was:
Considering this Situation:
 We create a partition table that created by source which is extends 
TableProvider, if we select data from  some specific partitions, choose 
partition dataType differ from table partition type leads partition can not be 
pushed down 

 


> In Function DataSourceStrategy.translateFilterWithMapping, we need transfer 
> cast expression to  data source filter
> --
>
> Key: SPARK-46392
> URL: https://issues.apache.org/jira/browse/SPARK-46392
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiahong.li
>Priority: Minor
>
> Considering this Situation:
>  We create a partition table that created by source which is extends 
> TableProvider, if we select data from  some specific partitions, choose 
> partition dataType differ from table partition type leads partition can not 
> be pushed down .
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source filter

2023-12-13 Thread jiahong.li (Jira)

jiahong.li created SPARK-46392:
--

 Summary: In Function 
DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression 
to  data source filter
 Key: SPARK-46392
 URL: https://issues.apache.org/jira/browse/SPARK-46392
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiahong.li


Considering this Situation:
 We create a partition table that created by source which is extends 
TableProvider, if we select data from  some specific partitions, choose 
partition dataType differ from table partition type leads partition can not be 
pushed down 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46391) Reorganize `ExpandingParityTests`

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46391:
---
Labels: pull-request-available  (was: )

> Reorganize `ExpandingParityTests`
> -
>
> Key: SPARK-46391
> URL: https://issues.apache.org/jira/browse/SPARK-46391
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46391) Reorganize `ExpandingParityTests`

2023-12-13 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-46391:
-

 Summary: Reorganize `ExpandingParityTests`
 Key: SPARK-46391
 URL: https://issues.apache.org/jira/browse/SPARK-46391
 Project: Spark
  Issue Type: Test
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46388) HiveAnalysis misses pattern guard `query.resolved`

2023-12-13 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46388.
--
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44326
[https://github.com/apache/spark/pull/44326]

> HiveAnalysis misses pattern guard `query.resolved`
> --
>
> Key: SPARK-46388
> URL: https://issues.apache.org/jira/browse/SPARK-46388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46388) HiveAnalysis misses pattern guard `query.resolved`

2023-12-13 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46388:


Assignee: Kent Yao

> HiveAnalysis misses pattern guard `query.resolved`
> --
>
> Key: SPARK-46388
> URL: https://issues.apache.org/jira/browse/SPARK-46388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46384) Structured Streaming UI doesn't display graph correctly

2023-12-13 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796103#comment-17796103
 ] 

Kent Yao commented on SPARK-46384:
--

I will take a look asap

> Structured Streaming UI doesn't display graph correctly
> ---
>
> Key: SPARK-46384
> URL: https://issues.apache.org/jira/browse/SPARK-46384
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming, Web UI
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>
> The Streaming UI is broken currently at spark master. Running a simple query:
> ```
> q = 
> spark.readStream.format("rate").load().writeStream.format("memory").queryName("test_wei").start()
> ```
> Would make the spark UI shows empty graph for "operation duration":
> !https://private-user-images.githubusercontent.com/10248890/289990561-fdb78c92-2d6f-41a9-ba23-3068d128caa8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjEtZmRiNzhjOTItMmQ2Zi00MWE5LWJhMjMtMzA2OGQxMjhjYWE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI4N2FkZDYyZGQwZGZmNWJhN2IzMTM3ZmI1MzNhOGExZGY2MThjZjMwZDU5MzZiOTI4ZGVkMjc3MjBhMTNhZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.KXhI0NnwIpfTRjVcsXuA82AnaURHgtkLOYVzifI-mp8!
> Here is the error:
> !https://private-user-images.githubusercontent.com/10248890/289990953-cd477d48-a45e-4ee9-b45e-06dc1dbeb9d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA5NTMtY2Q0NzdkNDgtYTQ1ZS00ZWU5LWI0NWUtMDZkYzFkYmViOWQ5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2MzIyYjQ5OWQ3YWZlMGYzYjFmYTljZjIwZjBmZDBiNzQyZmE3OTI2ZjkxNzVhNWU0ZDAwYTA4NDRkMTRjOTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JAqEG_4NCEiRvHO6yv59hZdkH_5_tSUuaOkpEbH-I20!
>  
> I verified the same query runs fine on spark 3.5, as in the following graph.
> !https://private-user-images.githubusercontent.com/10248890/289990563-642aa6c3-7728-43c7-8a11-cbf79c4362c5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0MjI2NjksIm5iZiI6MTcwMjQyMjM2OSwicGF0aCI6Ii8xMDI0ODg5MC8yODk5OTA1NjMtNjQyYWE2YzMtNzcyOC00M2M3LThhMTEtY2JmNzljNDM2MmM1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEyVDIzMDYwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0YmZiYTkyNGZkNzBkOGMzMmUyYTYzZTMyYzQ1ZTZkNDU3MDk2M2ZlOGNlZmQxNGYzNTFjZWRiNTQ2ZmQzZWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MJ8-78Qv5KkoLNGBrLXS-8gcC7LZepFsOD4r7pcnzSI!
>  
> This should be a problem from the library updates, this could be a potential 
> source of error: [https://github.com/apache/spark/pull/42879]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46381) Migrate sub-classes of AnalysisException to error classes

2023-12-13 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46381.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44314
[https://github.com/apache/spark/pull/44314]

> Migrate sub-classes of AnalysisException to error classes
> -
>
> Key: SPARK-46381
> URL: https://issues.apache.org/jira/browse/SPARK-46381
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, there are a few sub-classes of AnalysisException haven't ported on 
> error classes yet:
> - NonEmptyNamespaceException
> - UnresolvedException
> - ExtendedAnalysisException
> This ticket aims to migrate two of them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45807:
--

Assignee: Eduard Tudenhoefner  (was: Apache Spark)

> DataSourceV2: Improve ViewCatalog API
> -
>
> Key: SPARK-45807
> URL: https://issues.apache.org/jira/browse/SPARK-45807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Major
>  Labels: pull-request-available
>
> The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
> ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45807:
--

Assignee: Apache Spark  (was: Eduard Tudenhoefner)

> DataSourceV2: Improve ViewCatalog API
> -
>
> Key: SPARK-45807
> URL: https://issues.apache.org/jira/browse/SPARK-45807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eduard Tudenhoefner
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
> ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46390) Update the Example page

2023-12-13 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-46390:
-

 Summary: Update the Example page
 Key: SPARK-46390
 URL: https://issues.apache.org/jira/browse/SPARK-46390
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45807:
--

Assignee: Eduard Tudenhoefner  (was: Apache Spark)

> DataSourceV2: Improve ViewCatalog API
> -
>
> Key: SPARK-45807
> URL: https://issues.apache.org/jira/browse/SPARK-45807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Major
>  Labels: pull-request-available
>
> The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
> ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46381) Migrate sub-classes of AnalysisException to error classes

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46381:
--

Assignee: Apache Spark  (was: Max Gekk)

> Migrate sub-classes of AnalysisException to error classes
> -
>
> Key: SPARK-46381
> URL: https://issues.apache.org/jira/browse/SPARK-46381
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Currently, there are a few sub-classes of AnalysisException haven't ported on 
> error classes yet:
> - NonEmptyNamespaceException
> - UnresolvedException
> - ExtendedAnalysisException
> This ticket aims to migrate two of them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45807:
--

Assignee: Apache Spark  (was: Eduard Tudenhoefner)

> DataSourceV2: Improve ViewCatalog API
> -
>
> Key: SPARK-45807
> URL: https://issues.apache.org/jira/browse/SPARK-45807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eduard Tudenhoefner
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
> ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45807:
--

Assignee: Apache Spark  (was: Eduard Tudenhoefner)

> DataSourceV2: Improve ViewCatalog API
> -
>
> Key: SPARK-45807
> URL: https://issues.apache.org/jira/browse/SPARK-45807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eduard Tudenhoefner
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
> ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45959:
--

Assignee: Apache Spark

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46246:
--

Assignee: Apache Spark

> Support EXECUTE IMMEDIATE sytax in Spark SQL
> 
>
> Key: SPARK-46246
> URL: https://issues.apache.org/jira/browse/SPARK-46246
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Milan Stefanovic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries 
> from within SQL.
> This API executes query passed as string with arguments.
> Other DBs that support this:
>  * 
> [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm]
>  * 
> [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate]
>  * 
> [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46246:
--

Assignee: (was: Apache Spark)

> Support EXECUTE IMMEDIATE sytax in Spark SQL
> 
>
> Key: SPARK-46246
> URL: https://issues.apache.org/jira/browse/SPARK-46246
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
>
> Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries 
> from within SQL.
> This API executes query passed as string with arguments.
> Other DBs that support this:
>  * 
> [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm]
>  * 
> [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate]
>  * 
> [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45959:
--

Assignee: (was: Apache Spark)

> Abusing DataSet.withColumn can cause huge tree with severe perf degradation
> ---
>
> Key: SPARK-45959
> URL: https://issues.apache.org/jira/browse/SPARK-45959
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Asif
>Priority: Minor
>  Labels: pull-request-available
>
> Though documentation clearly recommends to add all columns in a single shot, 
> but in reality is difficult to expect customer to modify their code, as in 
> spark2  the rules in analyzer were such that  they did not do deep tree 
> traversal.  Moreover in Spark3 , the plans are cloned before giving to 
> analyzer , optimizer etc which was not the case in Spark2.
> All these things have resulted in query time being increased from 5 min to 2 
> - 3 hrs.
> Many times the columns are added to plan via some for loop logic which just 
> keeps adding new computation based on some rule.
> So,  my suggestion is to do some intial check in the withColumn api, before 
> creating a new projection, like if all the existing columns are still being 
> projected, and the new column being added has an expression which is not 
> depending on the output of the top node , but its child,  then instead of 
> adding a new project, the column can be added to the existing node.
> For starts, may be we can just handle Project node ..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46246:
--

Assignee: (was: Apache Spark)

> Support EXECUTE IMMEDIATE sytax in Spark SQL
> 
>
> Key: SPARK-46246
> URL: https://issues.apache.org/jira/browse/SPARK-46246
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
>
> Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries 
> from within SQL.
> This API executes query passed as string with arguments.
> Other DBs that support this:
>  * 
> [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm]
>  * 
> [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate]
>  * 
> [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46246) Support EXECUTE IMMEDIATE sytax in Spark SQL

2023-12-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46246:
--

Assignee: Apache Spark

> Support EXECUTE IMMEDIATE sytax in Spark SQL
> 
>
> Key: SPARK-46246
> URL: https://issues.apache.org/jira/browse/SPARK-46246
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Milan Stefanovic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Introducing new EXECUTE IMMEDIATE syntax to support parameterized queries 
> from within SQL.
> This API executes query passed as string with arguments.
> Other DBs that support this:
>  * 
> [Oracle|https://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems017.htm]
>  * 
> [Snowflake|https://docs.snowflake.com/en/sql-reference/sql/execute-immediate]
>  * 
> [PgSql|https://www.postgresql.org/docs/current/ecpg-sql-execute-immediate.html#:~:text=Description,statement%2C%20without%20retrieving%20result%20rows.]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46365) Spark 3.5.0 Regression: Window Function Combination Yields Null Values

2023-12-13 Thread Josh Rosen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796054#comment-17796054
 ] 

Josh Rosen commented on SPARK-46365:


I think that this is a duplicate of SPARK-45543, which is fixed in the 
forthcoming Spark 3.5.1.

I figured this out by running `git bisect` in `branch-3.5` with the above 
reproduction.

> Spark 3.5.0 Regression: Window Function Combination Yields Null Values 
> ---
>
> Key: SPARK-46365
> URL: https://issues.apache.org/jira/browse/SPARK-46365
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Boris PEREVALOV
>Priority: Major
>
> When combining two window functions (first one to get the previous non-null 
> value, second one to get the latest rows only), the result is not correct 
> since version 3.5.0.
>  
> Here is a simple Scala example: 
>  
> {code:java}
> import org.apache.spark.sql.expressions.Window
> import org.apache.spark.sql.functions._
>  
> case class Event(timestamp: Long, id: Long, value: String)
>  
> val events = Seq(
>  Event(timestamp = 1702289001, id = 1 , value = "non-null value"),
>  Event(timestamp = 1702289002, id = 1 , value = "new non-null value"),
>  Event(timestamp = 1702289003, id = 1 , value = null),
>  Event(timestamp = 1702289004, id = 2 , value = "non-null value"),
>  Event(timestamp = 1702289005, id = 2 , value = null),
> ).toDF
>  
> val window = Window.partitionBy("id").orderBy($"timestamp".desc)
>  
> val eventsWithLatestNonNullValue = events
>  .withColumn(
>    "value",
>    first("value", ignoreNulls = true) over 
> window.rangeBetween(Window.currentRow, Window.unboundedFollowing)
>  )
>  
> eventsWithLatestNonNullValue.show
>  
> val latestEvents = eventsWithLatestNonNullValue
>  .withColumn("n", row_number over window)
>  .where("n = 1")
>  .drop("n")
>  
> latestEvents.show
> {code}
>  
>  
> Current result (Spark 3.5.0)
>  
> {code:java}
> +--+---+-+
> | timestamp| id|value|
> +--+---+-+
> |1702289003|  1| NULL|
> |1702289005|  2| NULL|
> +--+---+-+
> {code}
>  
>  
> Expected result (all versions > 3.5.0):
>  
> {code:java}
> +--+---+--+
> | timestamp| id|             value|
> +--+---+--+
> |1702289003|  1|new non-null value|
> |1702289005|  2|    non-null value|
> +--+---+--+
> {code}
>  
> Execution plans are different.
>  
> Spark 3.5.0: 
>  
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [timestamp#1856L, id#1857L, value#1867]
>   +- Filter (n#1887 = 1)
>  +- Window [first(value#1858, true) windowspecdefinition(id#1857L, 
> timestamp#1856L DESC NULLS LAST, specifiedwindowframe(RangeFrame, 
> currentrow$(), unboundedfollowing$())) AS value#1867, row_number() 
> windowspecdefinition(id#1857L, timestamp#1856L DESC NULLS LAST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> n#1887], [id#1857L], [timestamp#1856L DESC NULLS LAST]
> +- WindowGroupLimit [id#1857L], [timestamp#1856L DESC NULLS LAST], 
> row_number(), 1, Final
>    +- Sort [id#1857L ASC NULLS FIRST, timestamp#1856L DESC NULLS 
> LAST], false, 0
>   +- Exchange hashpartitioning(id#1857L, 200), 
> ENSURE_REQUIREMENTS, [plan_id=326]
>  +- WindowGroupLimit [id#1857L], [timestamp#1856L DESC NULLS 
> LAST], row_number(), 1, Partial
> +- Sort [id#1857L ASC NULLS FIRST, timestamp#1856L DESC 
> NULLS LAST], false, 0
>    +- LocalTableScan [timestamp#1856L, id#1857L, 
> value#1858]
> {code}
>  
> Spark 3.4.0:
>  
> {code:java}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [timestamp#6L, id#7L, value#17]
>    +- Filter (n#37 = 1)
>   +- Window [first(value#8, true) windowspecdefinition(id#7L, 
> timestamp#6L DESC NULLS LAST, specifiedwindowframe(RangeFrame, currentrow$(), 
> unboundedfollowing$())) AS value#17, row_number() windowspecdefinition(id#7L, 
> timestamp#6L DESC NULLS LAST, specifiedwindowframe(RowFrame, 
> unboundedpreceding$(), currentrow$())) AS n#37], [id#7L], [timestamp#6L DESC 
> NULLS LAST]
>  +- Sort [id#7L ASC NULLS FIRST, timestamp#6L DESC NULLS LAST], 
> false, 0
> +- Exchange hashpartitioning(id#7L, 200), ENSURE_REQUIREMENTS, 
> [plan_id=60]
>    +- LocalTableScan [timestamp#6L, id#7L, value#8]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

68 matches

Mail list logo