date:20240416

[jira] [Resolved] (SPARK-47588) Hive module: Migrate logInfo with variables to structured logging framework

2024-04-16 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47588.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46086
[https://github.com/apache/spark/pull/46086]

> Hive module: Migrate logInfo with variables to structured logging framework
> ---
>
> Key: SPARK-47588
> URL: https://issues.apache.org/jira/browse/SPARK-47588
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47882:
---
Labels: pull-request-available  (was: )

> createTableColumnTypes need to be mapped to database types instead of using 
> directly
> 
>
> Key: SPARK-47882
> URL: https://issues.apache.org/jira/browse/SPARK-47882
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47361) Improve JDBC data sources

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47361:
-
Affects Version/s: 3.5.1
   3.4.2

> Improve JDBC data sources
> -
>
> Key: SPARK-47361
> URL: https://issues.apache.org/jira/browse/SPARK-47361
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Assignee: Kent Yao
>Priority: Major
>  Labels: releasenotes
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47882:
-
Affects Version/s: 3.5.1
   3.4.2

> createTableColumnTypes need to be mapped to database types instead of using 
> directly
> 
>
> Key: SPARK-47882
> URL: https://issues.apache.org/jira/browse/SPARK-47882
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47361) Improve JDBC data sources

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47361:
-
Affects Version/s: (was: 3.4.2)
   (was: 3.5.1)

> Improve JDBC data sources
> -
>
> Key: SPARK-47361
> URL: https://issues.apache.org/jira/browse/SPARK-47361
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Kent Yao
>Priority: Major
>  Labels: releasenotes
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly

2024-04-16 Thread Kent Yao (Jira)

Kent Yao created SPARK-47882:


 Summary: createTableColumnTypes need to be mapped to database 
types instead of using directly
 Key: SPARK-47882
 URL: https://issues.apache.org/jira/browse/SPARK-47882
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47880.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46092
[https://github.com/apache/spark/pull/46092]

> Oracle: Document Mapping Spark SQL Data Types to Oracle
> ---
>
> Key: SPARK-47880
> URL: https://issues.apache.org/jira/browse/SPARK-47880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-47880:


Assignee: Kent Yao

> Oracle: Document Mapping Spark SQL Data Types to Oracle
> ---
>
> Key: SPARK-47880
> URL: https://issues.apache.org/jira/browse/SPARK-47880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47879) Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47879.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46091
[https://github.com/apache/spark/pull/46091]

> Oracle:  Use VARCHAR2 instead of VARCHAR for VarcharType mapping
> 
>
> Key: SPARK-47879
> URL: https://issues.apache.org/jira/browse/SPARK-47879
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47880:
---
Labels: pull-request-available  (was: )

> Oracle: Document Mapping Spark SQL Data Types to Oracle
> ---
>
> Key: SPARK-47880
> URL: https://issues.apache.org/jira/browse/SPARK-47880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47881) Not working HDFS path for hive.metastore.jars.path

2024-04-16 Thread jungho.choi (Jira)

jungho.choi created SPARK-47881:
---

 Summary: Not working HDFS path for hive.metastore.jars.path
 Key: SPARK-47881
 URL: https://issues.apache.org/jira/browse/SPARK-47881
 Project: Spark
  Issue Type: Question
  Components: SQL
Affects Versions: 3.4.2
Reporter: jungho.choi


I trying to use Hive Metastore version 3.1.3 with Spark version 3.4.2, but 
encountering an error when specifying the path to the metastore JARs on HDFS.

According to the official documentation, you've followed the guidelines and 
specified the path using an HDFS URI:
{code:java}
spark.sql.hive.metastore.version 3.1.3
spark.sql.hive.metastore.jarspath
spark.sql.hive.metastore.jars.path   hdfs://namespace/spark/hive3_lib/* {code}
However, when tested it, encountered an error stating that the URI schema in 
HiveClientImpl.scala is not file.
{code:java}
Caused by: java.lang.ExceptionInInitializerError: 
java.lang.IllegalArgumentException: URI scheme is not "file"
  at 
org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1296)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:174)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:139)
  at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
  at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:315)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:517)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:377)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:70)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:69)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
  ... 143 more {code}
To resolve this, changed the spark.sql.hive.metastore.jars.path to a local file 
path instead of an HDFS path, and it worked fine. I think I followed the 
instructions correctly, but are there any specific configurations or 
preferences required to use HDFS paths?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47881) Not working HDFS path for hive.metastore.jars.path

2024-04-16 Thread jungho.choi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jungho.choi updated SPARK-47881:

Description: 
I trying to use Hive Metastore version 3.1.3 with Spark version 3.4.2, but 
encountering an error when specifying the path to the metastore JARs on HDFS.

According to the official documentation, followed the guidelines and specified 
the path using an HDFS URI:
{code:java}
spark.sql.hive.metastore.version 3.1.3
spark.sql.hive.metastore.jarspath
spark.sql.hive.metastore.jars.path   hdfs://namespace/spark/hive3_lib/* {code}
However, when tested it, encountered an error stating that the URI schema in 
HiveClientImpl.scala is not file.
{code:java}
Caused by: java.lang.ExceptionInInitializerError: 
java.lang.IllegalArgumentException: URI scheme is not "file"
  at 
org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1296)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:174)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:139)
  at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
  at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:315)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:517)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:377)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:70)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:69)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
  ... 143 more {code}
To resolve this, changed the spark.sql.hive.metastore.jars.path to a local file 
path instead of an HDFS path, and it worked fine. I think I followed the 
instructions correctly, but are there any specific configurations or 
preferences required to use HDFS paths?

  was:
I trying to use Hive Metastore version 3.1.3 with Spark version 3.4.2, but 
encountering an error when specifying the path to the metastore JARs on HDFS.

According to the official documentation, you've followed the guidelines and 
specified the path using an HDFS URI:
{code:java}
spark.sql.hive.metastore.version 3.1.3
spark.sql.hive.metastore.jarspath
spark.sql.hive.metastore.jars.path   hdfs://namespace/spark/hive3_lib/* {code}
However, when tested it, encountered an error stating that the URI schema in 
HiveClientImpl.scala is not file.
{code:java}
Caused by: java.lang.ExceptionInInitializerError: 
java.lang.IllegalArgumentException: URI scheme is not "file"
  at 
org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1296)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:174)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:139)
  at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
  at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:315)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:517)
  at 
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:377)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:70)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:69)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
  ... 143 more {code}
To resolve this, changed the spark.sql.hive.metastore.jars.path to a local file 
path instead of an HDFS path, and it worked fine. I think I followed the 
instructions correctly, but are there any specific configurations or 
preferences required to use

[jira] [Created] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle

2024-04-16 Thread Kent Yao (Jira)

Kent Yao created SPARK-47880:


 Summary: Oracle: Document Mapping Spark SQL Data Types to Oracle
 Key: SPARK-47880
 URL: https://issues.apache.org/jira/browse/SPARK-47880
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47879) Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47879:
---
Labels: pull-request-available  (was: )

> Oracle:  Use VARCHAR2 instead of VARCHAR for VarcharType mapping
> 
>
> Key: SPARK-47879
> URL: https://issues.apache.org/jira/browse/SPARK-47879
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47879) Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping

2024-04-16 Thread Kent Yao (Jira)

Kent Yao created SPARK-47879:


 Summary: Oracle:  Use VARCHAR2 instead of VARCHAR for VarcharType 
mapping
 Key: SPARK-47879
 URL: https://issues.apache.org/jira/browse/SPARK-47879
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47172) Upgrade Transport block cipher mode to GCM

2024-04-16 Thread Mridul Muralidharan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837927#comment-17837927
 ] 

Mridul Muralidharan commented on SPARK-47172:
-

Given we have addressed SPARK-47318, and with Spark supporting TLS from 4.0 - 
is this a concern ?
Given it will be backwardly incompatible, I am hesitant to expand support for 
something which is expected to go away "soon".

> Upgrade Transport block cipher mode to GCM
> --
>
> Key: SPARK-47172
> URL: https://issues.apache.org/jira/browse/SPARK-47172
> Project: Spark
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: 3.4.2, 3.5.0
>Reporter: Steve Weis
>Priority: Minor
>
> The cipher transformation currently used for encrypting RPC calls is an 
> unauthenticated mode (AES/CTR/NoPadding). This needs to be upgraded to an 
> authenticated mode (AES/GCM/NoPadding) to prevent ciphertext from being 
> modified in transit.
> The relevant line is here: 
> [https://github.com/apache/spark/blob/a939a7d0fd9c6b23c879cbee05275c6fbc939e38/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L220]
> GCM is relatively more computationally expensive than CTR and adds a 16-byte 
> block of authentication tag data to each payload. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47876) Improve docstring of mapInArrow

2024-04-16 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-47876.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/46088

> Improve docstring of mapInArrow
> ---
>
> Key: SPARK-47876
> URL: https://issues.apache.org/jira/browse/SPARK-47876
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve docstring of mapInArrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46375) Add documentation for Python data source API

2024-04-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46375.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46089
[https://github.com/apache/spark/pull/46089]

> Add documentation for Python data source API
> 
>
> Key: SPARK-46375
> URL: https://issues.apache.org/jira/browse/SPARK-46375
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add documentation (user guide) for Python data soruce API.
>  
> Note the documentation should clarify the required dependency: pyarrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46375) Add documentation for Python data source API

2024-04-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46375:


Assignee: Allison Wang

> Add documentation for Python data source API
> 
>
> Key: SPARK-46375
> URL: https://issues.apache.org/jira/browse/SPARK-46375
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add documentation (user guide) for Python data soruce API.
>  
> Note the documentation should clarify the required dependency: pyarrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47763) Reeanble Protobuf function doctests

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47763:
---
Labels: pull-request-available  (was: )

> Reeanble Protobuf function doctests
> ---
>
> Key: SPARK-47763
> URL: https://issues.apache.org/jira/browse/SPARK-47763
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47877) Speed up test_parity_listener

2024-04-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47877.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46072
[https://github.com/apache/spark/pull/46072]

> Speed up test_parity_listener
> -
>
> Key: SPARK-47877
> URL: https://issues.apache.org/jira/browse/SPARK-47877
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47760) Reeanble Avro function doctests

2024-04-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47760.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46055
[https://github.com/apache/spark/pull/46055]

> Reeanble Avro function doctests
> ---
>
> Key: SPARK-47760
> URL: https://issues.apache.org/jira/browse/SPARK-47760
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47875) Remove `spark.deploy.recoverySerializer`

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47875.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46087
[https://github.com/apache/spark/pull/46087]

> Remove `spark.deploy.recoverySerializer`
> 
>
> Key: SPARK-47875
> URL: https://issues.apache.org/jira/browse/SPARK-47875
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47763) Reeanble Protobuf function doctests

2024-04-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47763.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46055
[https://github.com/apache/spark/pull/46055]

> Reeanble Protobuf function doctests
> ---
>
> Key: SPARK-47763
> URL: https://issues.apache.org/jira/browse/SPARK-47763
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47868) Recursion Limit Error in SparkSession and SparkConnectPlanner

2024-04-16 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47868.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46075
[https://github.com/apache/spark/pull/46075]

> Recursion Limit Error in SparkSession and SparkConnectPlanner
> -
>
> Key: SPARK-47868
> URL: https://issues.apache.org/jira/browse/SPARK-47868
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46375) Add documentation for Python data source API

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46375:
---
Labels: pull-request-available  (was: )

> Add documentation for Python data source API
> 
>
> Key: SPARK-46375
> URL: https://issues.apache.org/jira/browse/SPARK-46375
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add documentation (user guide) for Python data soruce API.
>  
> Note the documentation should clarify the required dependency: pyarrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47877) Speed up test_parity_listener

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47877:
---
Labels: pull-request-available  (was: )

> Speed up test_parity_listener
> -
>
> Key: SPARK-47877
> URL: https://issues.apache.org/jira/browse/SPARK-47877
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47877) Speed up test_parity_listener

2024-04-16 Thread Wei Liu (Jira)

Wei Liu created SPARK-47877:
---

 Summary: Speed up test_parity_listener
 Key: SPARK-47877
 URL: https://issues.apache.org/jira/browse/SPARK-47877
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SS
Affects Versions: 4.0.0
Reporter: Wei Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47876) Improve docstring of mapInArrow

2024-04-16 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-47876:


 Summary: Improve docstring of mapInArrow
 Key: SPARK-47876
 URL: https://issues.apache.org/jira/browse/SPARK-47876
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Improve docstring of mapInArrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47875) Remove `spark.deploy.recoverySerializer`

2024-04-16 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47875:
-

 Summary: Remove `spark.deploy.recoverySerializer`
 Key: SPARK-47875
 URL: https://issues.apache.org/jira/browse/SPARK-47875
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47590) Hive-thriftserver: Migrate logWarn with variables to structured logging framework

2024-04-16 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47590.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45923
[https://github.com/apache/spark/pull/45923]

> Hive-thriftserver: Migrate logWarn with variables to structured logging 
> framework
> -
>
> Key: SPARK-47590
> URL: https://issues.apache.org/jira/browse/SPARK-47590
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47588) Hive module: Migrate logInfo with variables to structured logging framework

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47588:
---
Labels: pull-request-available  (was: )

> Hive module: Migrate logInfo with variables to structured logging framework
> ---
>
> Key: SPARK-47588
> URL: https://issues.apache.org/jira/browse/SPARK-47588
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47594) Connector module: Migrate logInfo with variables to structured logging framework

2024-04-16 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-47594:
--

Assignee: BingKun Pan

> Connector module: Migrate logInfo with variables to structured logging 
> framework
> 
>
> Key: SPARK-47594
> URL: https://issues.apache.org/jira/browse/SPARK-47594
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47594) Connector module: Migrate logInfo with variables to structured logging framework

2024-04-16 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47594.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46022
[https://github.com/apache/spark/pull/46022]

> Connector module: Migrate logInfo with variables to structured logging 
> framework
> 
>
> Key: SPARK-47594
> URL: https://issues.apache.org/jira/browse/SPARK-47594
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47874) Multiple bugs with map operations in combination with collations

2024-04-16 Thread Nikola Mandic (Jira)

Nikola Mandic created SPARK-47874:
-

 Summary: Multiple bugs with map operations in combination with 
collations
 Key: SPARK-47874
 URL: https://issues.apache.org/jira/browse/SPARK-47874
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Nikola Mandic


Following two queries produce different results (first succeeds, second throws 
an exceptions):
{code:java}
select map('a', 1, 'A' collate utf8_binary_lcase, 2); -- success
select map('a' collate utf8_binary_lcase, 1, 'A', 2); -- exception{code}
Following query results in 1:
{code:java}
select cast(map('a', 1, 'A', 2) as map)['A' collate utf8_binary_lcase]; -- 1{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47871.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46080
[https://github.com/apache/spark/pull/46080]

> Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
> ---
>
> Key: SPARK-47871
> URL: https://issues.apache.org/jira/browse/SPARK-47871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47871:
-

Assignee: Kent Yao

> Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
> ---
>
> Key: SPARK-47871
> URL: https://issues.apache.org/jira/browse/SPARK-47871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47417:
---

Assignee: Nikola Mandic

> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences (all collations)
> --
>
> Key: SPARK-47417
> URL: https://issues.apache.org/jira/browse/SPARK-47417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47745) Add License to Spark Operator

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47745:
--
Affects Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add License to Spark Operator
> -
>
> Key: SPARK-47745
> URL: https://issues.apache.org/jira/browse/SPARK-47745
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Add license to the recently established operator repository.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47417.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45933
[https://github.com/apache/spark/pull/45933]

> Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, 
> FormatNumber, Sentences (all collations)
> --
>
> Key: SPARK-47417
> URL: https://issues.apache.org/jira/browse/SPARK-47417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47745) Add License to Spark Operator

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47745:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add License to Spark Operator
> -
>
> Key: SPARK-47745
> URL: https://issues.apache.org/jira/browse/SPARK-47745
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Add license to the recently established operator repository.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47745) Add License to Spark Operator

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47745.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 3
[https://github.com/apache/spark-kubernetes-operator/pull/3]

> Add License to Spark Operator
> -
>
> Key: SPARK-47745
> URL: https://issues.apache.org/jira/browse/SPARK-47745
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add license to the recently established operator repository.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47745) Add License to Spark Operator

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47745:
-

Assignee: Zhou JIANG

> Add License to Spark Operator
> -
>
> Key: SPARK-47745
> URL: https://issues.apache.org/jira/browse/SPARK-47745
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> Add license to the recently established operator repository.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33822) TPCDS Q5 fails if spark.sql.adaptive.enabled=true

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-33822:
---
Labels: pull-request-available  (was: )

> TPCDS Q5 fails if spark.sql.adaptive.enabled=true
> -
>
> Key: SPARK-33822
> URL: https://issues.apache.org/jira/browse/SPARK-33822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Takeshi Yamamuro
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.0.2, 3.1.0
>
>
> **PROBLEM STATEMENT**
> {code}
> >>> tables = ['call_center', 'catalog_page', 'catalog_returns', 
> >>> 'catalog_sales', 'customer', 'customer_address', 'customer_demographics', 
> >>> 'date_dim', 'household_demographics', 'income_band', 'inventory', 'item', 
> >>> 'promotion', 'reason', 'ship_mode', 'store', 'store_returns', 
> >>> 'store_sales', 'time_dim', 'warehouse', 'web_page', 'web_returns', 
> >>> 'web_sales', 'web_site']
> >>> for t in tables:
> ... spark.sql("CREATE TABLE %s USING PARQUET LOCATION 
> '/Users/dongjoon/data/10g/%s'" % (t, t))
> >>> spark.sql(spark.sparkContext.wholeTextFiles("/Users/dongjoon/data/query/q5.sql").take(1)[0][1]).show(1)
> +---++-+---+-+
> |channel|  id|sales|returns|   profit|
> +---++-+---+-+
> |   null|null|1143646603.07|30617460.71|-317540732.87|
> |catalog channel|null| 393609478.06| 9451732.79| -44801262.72|
> |catalog channel|catalog_pageA...| 0.00|   39037.48|-25330.29|
> ...
> +---++-+---+-+
> >>> sql("set spark.sql.adaptive.enabled=true")
> >>> spark.sql(spark.sparkContext.wholeTextFiles("/Users/dongjoon/data/query/q5.sql").take(1)[0][1]).show(1)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/pyspark/sql/dataframe.py",
>  line 440, in show
> print(self._jdf.showString(n, 20, vertical))
>   File 
> "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
>  line 1305, in __call__
>   File 
> "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/pyspark/sql/utils.py",
>  line 128, in deco
> return f(*a, **kw)
>   File 
> "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py",
>  line 328, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o160.showString.
> : java.lang.UnsupportedOperationException: BroadcastExchange does not support 
> the execute() code path.
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:190)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
>   at 
> org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecute(Exchange.scala:61)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
>   at 
> org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
>   at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:316)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:392)
>

[jira] [Resolved] (SPARK-47356) Add support for ConcatWs & Elt (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47356.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46061
[https://github.com/apache/spark/pull/46061]

> Add support for ConcatWs & Elt (all collations)
> ---
>
> Key: SPARK-47356
> URL: https://issues.apache.org/jira/browse/SPARK-47356
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47356) Add support for ConcatWs & Elt (all collations)

2024-04-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47356:
---

Assignee: Mihailo Milosevic

> Add support for ConcatWs & Elt (all collations)
> ---
>
> Key: SPARK-47356
> URL: https://issues.apache.org/jira/browse/SPARK-47356
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47873:
---
Labels: pull-request-available  (was: )

> Write collated strings to hive as regular strings
> -
>
> Key: SPARK-47873
> URL: https://issues.apache.org/jira/browse/SPARK-47873
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> As hive doesn't support collations we should write collated strings with a 
> regular string type but keep the collation in table metadata to properly read 
> them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47873) Write collated strings to hive as regular strings

2024-04-16 Thread Stefan Kandic (Jira)

Stefan Kandic created SPARK-47873:
-

 Summary: Write collated strings to hive as regular strings
 Key: SPARK-47873
 URL: https://issues.apache.org/jira/browse/SPARK-47873
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Stefan Kandic


As hive doesn't support collations we should write collated strings with a 
regular string type but keep the collation in table metadata to properly read 
them back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47418:
---
Labels: pull-request-available  (was: )

> Optimize string predicate expressions for UTF8_BINARY_LCASE collation
> -
>
> Key: SPARK-47418
> URL: https://issues.apache.org/jira/browse/SPARK-47418
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string 
> Spark functions using optimized lowercase comparison approach introduced by 
> [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to 
> the latest design and code structure imposed by [~uros-db] in 
> https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation 
> support is introduced for Spark SQL expressions. In addition, review previous 
> Jira tickets under the current parent in order to understand how 
> *StringPredicate* expressions are currently used and tested in Spark:
>  * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131]
>  * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248]
>  * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295]
> These tickets should help you understand what changes were introduced in 
> order to enable collation support for these functions. Lastly, feel free to 
> use your chosen Spark SQL Editor to play around with the existing functions 
> and learn more about how they work.
>  
> The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE 
> implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* 
> functions so that they use optimized lowercase comparison approach (following 
> the general logic in Nikola's PR), and benchmark the results accordingly. As 
> for testing, the currently existing unit test cases and end-to-end tests 
> should already fully cover the expected behaviour of *StringPredicate* 
> expressions for all collation types. In other words, the objective of this 
> ticket is only to enhance the internal implementation, without introducing 
> any user-facing changes to Spark SQL API.
>  
> Finally, feel free to refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47353) TBD

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47353:
-
Summary: TBD  (was: regexp_count & regexp_substr (binary & lowercase 
collation only))

> TBD
> ---
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-47872) Add the option to initially hide a CustomMetric from the Spark UI

2024-04-16 Thread Robert Dillitz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Dillitz closed SPARK-47872.
--

> Add the option to initially hide a CustomMetric from the Spark UI
> -
>
> Key: SPARK-47872
> URL: https://issues.apache.org/jira/browse/SPARK-47872
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Robert Dillitz
>Priority: Major
>  Labels: Metrics, UI
>
> There is currently no way to have experimental CustomMetrics that are 
> initially hidden in the Spark UI. Add this option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47872) Add the option to initially hide a CustomMetric from the Spark UI

2024-04-16 Thread Robert Dillitz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Dillitz resolved SPARK-47872.

Resolution: Works for Me

> Add the option to initially hide a CustomMetric from the Spark UI
> -
>
> Key: SPARK-47872
> URL: https://issues.apache.org/jira/browse/SPARK-47872
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Robert Dillitz
>Priority: Major
>  Labels: Metrics, UI
>
> There is currently no way to have experimental CustomMetrics that are 
> initially hidden in the Spark UI. Add this option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46841) Language support for collations

2024-04-16 Thread Nikola Mandic (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837668#comment-17837668
 ] 

Nikola Mandic commented on SPARK-46841:
---

Working on it.

> Language support for collations
> ---
>
> Key: SPARK-46841
> URL: https://issues.apache.org/jira/browse/SPARK-46841
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47872) Add the option to initially hide a CustomMetric from the Spark UI

2024-04-16 Thread Robert Dillitz (Jira)

Robert Dillitz created SPARK-47872:
--

 Summary: Add the option to initially hide a CustomMetric from the 
Spark UI
 Key: SPARK-47872
 URL: https://issues.apache.org/jira/browse/SPARK-47872
 Project: Spark
  Issue Type: Improvement
  Components: UI
Affects Versions: 4.0.0
Reporter: Robert Dillitz


There is currently no way to have experimental CustomMetrics that are initially 
hidden in the Spark UI. Add this option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47739) Avro does not register custom logical types on spark startup

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47739.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45895
[https://github.com/apache/spark/pull/45895]

> Avro does not register custom logical types on spark startup
> 
>
> Key: SPARK-47739
> URL: https://issues.apache.org/jira/browse/SPARK-47739
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Milan Stefanovic
>Assignee: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> It can happen that we resolve avro schema before we register avro logical 
> types leading to two consecutive calls to resolve avro schema providing wrong 
> results.
>  
> Example:
> !image-2024-04-05-16-27-05-489.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46574) Upgrade maven plugin to latest version

2024-04-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46574.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46043
[https://github.com/apache/spark/pull/46043]

> Upgrade maven plugin to latest version
> --
>
> Key: SPARK-46574
> URL: https://issues.apache.org/jira/browse/SPARK-46574
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47871:
---
Labels: pull-request-available  (was: )

> Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
> ---
>
> Key: SPARK-47871
> URL: https://issues.apache.org/jira/browse/SPARK-47871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45859) Make UDF objects in ml.functions lazy

2024-04-16 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837644#comment-17837644
 ] 

Kent Yao commented on SPARK-45859:
--

I removed 3.0 and 3.1 from *Affects Version/s:* here because I manually cleaned 
them in the background. 

> Make UDF objects in ml.functions lazy
> -
>
> Key: SPARK-45859
> URL: https://issues.apache.org/jira/browse/SPARK-45859
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45859) Make UDF objects in ml.functions lazy

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-45859:
-
Affects Version/s: (was: 3.0)
   (was: 3.1)

> Make UDF objects in ml.functions lazy
> -
>
> Key: SPARK-45859
> URL: https://issues.apache.org/jira/browse/SPARK-45859
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47556) [K8] Spark App ID collision resulting in deleting wrong resources

2024-04-16 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47556:
-
Affects Version/s: 3.1.0
   (was: 3.1)

> [K8] Spark App ID collision resulting in deleting wrong resources
> -
>
> Key: SPARK-47556
> URL: https://issues.apache.org/jira/browse/SPARK-47556
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Sundeep K
>Priority: Major
>  Labels: pull-request-available
>
> h3. Issue:
> We noticed that sometimes K8s executor pods go in a crash loop. Reason being 
> 'Error: MountVolume.SetUp failed for volume "spark-conf-volume-exec"'. Upon 
> investigation we noticed that there are 2 spark jobs that launched with same 
> application id and when one of them finishes first it deletes all it's 
> resources and deletes the resources of other job too.
> -> Spark application ID is created using this 
> [code|https://github.com/apache/spark/blob/36126a5c1821b4418afd5788963a939ea7f64078/core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala#L38]
> "spark-application-" + System.currentTimeMillis
> This means if 2 applications launch at the same milli second they could end 
> up having same AppId
> ->  
> [spark-app-selector|https://github.com/apache/spark/blob/93f98c0a61ddb66eb777c3940fbf29fc58e2d79b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala#L23]
>  label is added to all resource created by driver and it's value is 
> application Id. Kubernetes Scheduler deletes all the apps with same 
> [label|https://github.com/apache/spark/blob/2a8bb5cdd3a5a2d63428b82df5e5066a805ce878/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L162C1-L172C6]
>  upon termination.
> This results in deletion of config map and executor pods of job that's still 
> running, driver tries to relaunch the executor pods, but config map is not 
> present, so it's in crash loop
> h3. Context
> We are using [Spark of Kubernetes 
> |https://spark.apache.org/docs/latest/running-on-kubernetes.html]and launch 
> our spark jobs using PySpark. We launch multiple Spark Jobs within a given 
> k8s namespace. Each Spark job can be launched from different pods or from 
> different processes in a pod. Every time a job is launched it has a unique 
> app name. Here is how the job is launched (omitting irrelevant details):
> {code:java}
> # spark_conf has settings required for spark on k8s 
> sp = SparkSession.builder \
> .config(conf=spark_conf) \
> .appName('testapp')
> sp.master(f'k8s://{kubernetes_host}')
> session = sp.getOrCreate()
> with session:
> session.sql('SELECT 1'){code}
> h3. Repro
> Set same app id in spark config, run 2 different jobs, one that finishes 
> fast, one that runs slow. Slower job goes into crash loop
> {code:java}
> "spark.app.id": ""{code}
> h3. Workaround
> Set unique spark.app.id for all the jobs that run on k8s
> eg:
> {code:java}
> "spark.app.id": f'{AppName}-{CurrTimeInMilliSecs}-{UUId}'[:63]{code}
> h3. Fix
> Add unique hash add the end of Application ID: 
> [https://github.com/apache/spark/pull/45712] 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2024-04-16 Thread Goutam Ghosh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837634#comment-17837634
 ] 

Goutam Ghosh commented on SPARK-5159:
-

can  the patch by [~angerszhuuu] be verified ?

> Thrift server does not respect hive.server2.enable.doAs=true
> 
>
> Key: SPARK-5159
> URL: https://issues.apache.org/jira/browse/SPARK-5159
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andrew Ray
>Priority: Major
>  Labels: bulk-closed
> Attachments: spark_thrift_server_log.txt
>
>
> I'm currently testing the spark sql thrift server on a kerberos secured 
> cluster in YARN mode. Currently any user can access any table regardless of 
> HDFS permissions as all data is read as the hive user. In HiveServer2 the 
> property hive.server2.enable.doAs=true causes all access to be done as the 
> submitting user. We should do the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE

2024-04-16 Thread Kent Yao (Jira)

Kent Yao created SPARK-47871:


 Summary: Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME 
ZONE
 Key: SPARK-47871
 URL: https://issues.apache.org/jira/browse/SPARK-47871
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-47353) regexp_count & regexp_substr (binary & lowercase collation only)

2024-04-16 Thread Milan Dankovic (Jira)



[ https://issues.apache.org/jira/browse/SPARK-47353 ]


Milan Dankovic deleted comment on SPARK-47353:


was (Author: JIRAUSER304529):
I am working on this

> regexp_count & regexp_substr (binary & lowercase collation only)
> 
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47476) StringReplace (all collations)

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47476:
--

Assignee: (was: Apache Spark)

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47476) StringReplace (all collations)

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47476:
--

Assignee: Apache Spark

> StringReplace (all collations)
> --
>
> Key: SPARK-47476
> URL: https://issues.apache.org/jira/browse/SPARK-47476
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringReplace* built-in string function in 
> Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringReplace* function so 
> it supports all collation types currently supported in Spark. To understand 
> what changes were introduced in order to enable full collation support for 
> other existing functions in Spark, take a look at the Spark PRs and Jira 
> tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47858) Refactoring the structure for DataFrame error context

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47858:
--

Assignee: (was: Apache Spark)

> Refactoring the structure for DataFrame error context
> -
>
> Key: SPARK-47858
> URL: https://issues.apache.org/jira/browse/SPARK-47858
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> The current implementation for PySpark DataFrame error context could be more 
> flexible by addressing some hacky spots.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47858) Refactoring the structure for DataFrame error context

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47858:
--

Assignee: Apache Spark

> Refactoring the structure for DataFrame error context
> -
>
> Key: SPARK-47858
> URL: https://issues.apache.org/jira/browse/SPARK-47858
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> The current implementation for PySpark DataFrame error context could be more 
> flexible by addressing some hacky spots.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47741:
--

Assignee: (was: Apache Spark)

> Handle stack overflow when parsing query
> 
>
> Key: SPARK-47741
> URL: https://issues.apache.org/jira/browse/SPARK-47741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Milan Stefanovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing complex queries which can lead to stack overflow.
> We need to catch this exception and convert it to proper parser exc with 
> error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47741:
--

Assignee: Apache Spark

> Handle stack overflow when parsing query
> 
>
> Key: SPARK-47741
> URL: https://issues.apache.org/jira/browse/SPARK-47741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Milan Stefanovic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Parsing complex queries which can lead to stack overflow.
> We need to catch this exception and convert it to proper parser exc with 
> error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47411:
--

Assignee: Apache Spark

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47411:
--

Assignee: (was: Apache Spark)

> StringInstr, FindInSet (all collations)
> ---
>
> Key: SPARK-47411
> URL: https://issues.apache.org/jira/browse/SPARK-47411
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringInstr* and *FindInSet* built-in 
> string functions in Spark. First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringInstr* and 
> *FindInSet* functions so that they support all collation types currently 
> supported in Spark. To understand what changes were introduced in order to 
> enable full collation support for other existing functions in Spark, take a 
> look at the Spark PRs and Jira tickets for completed tasks in this parent 
> (for example: Contains, StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47596) Streaming: Migrate logWarn with variables to structured logging framework

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47596:
---
Labels: pull-request-available  (was: )

> Streaming: Migrate logWarn with variables to structured logging framework
> -
>
> Key: SPARK-47596
> URL: https://issues.apache.org/jira/browse/SPARK-47596
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47869) Upgrade built in hive to Hive-4.0

2024-04-16 Thread Simhadri Govindappa (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa updated SPARK-47869:

Description: 
Hive 4.0 has been released. It brings in a lot of new features, bug fixes and 
performance improvements. 

We would like to update the version of hive used in spark to hive-4.0

[https://lists.apache.org/thread/2jqpvsx8n801zb5pmlhb8f4zloq27p82] 

  was:
Hive 4.0 has been released. It brings in a lot of new features, bug fixes and 
performance improvements. 

We would like to update the version of hive used in spark to hive-4.0


> Upgrade built in hive to Hive-4.0
> -
>
> Key: SPARK-47869
> URL: https://issues.apache.org/jira/browse/SPARK-47869
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.5.1
>Reporter: Simhadri Govindappa
>Priority: Major
>
> Hive 4.0 has been released. It brings in a lot of new features, bug fixes and 
> performance improvements. 
> We would like to update the version of hive used in spark to hive-4.0
> [https://lists.apache.org/thread/2jqpvsx8n801zb5pmlhb8f4zloq27p82] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47869) Upgrade built in hive to Hive-4.0

2024-04-16 Thread Simhadri Govindappa (Jira)

Simhadri Govindappa created SPARK-47869:
---

 Summary: Upgrade built in hive to Hive-4.0
 Key: SPARK-47869
 URL: https://issues.apache.org/jira/browse/SPARK-47869
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.5.1
Reporter: Simhadri Govindappa


Hive 4.0 has been released. It brings in a lot of new features, bug fixes and 
performance improvements. 

We would like to update the version of hive used in spark to hive-4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47414) Regexp expressions (binary & lowercase collation only)

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47414:
---
Labels: pull-request-available  (was: )

> Regexp expressions (binary & lowercase collation only)
> --
>
> Key: SPARK-47414
> URL: https://issues.apache.org/jira/browse/SPARK-47414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47416) Add benchmark for stringpredicate expressions

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47416:
---
Labels: pull-request-available  (was: )

> Add benchmark for stringpredicate expressions
> -
>
> Key: SPARK-47416
> URL: https://issues.apache.org/jira/browse/SPARK-47416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47416) Add benchmark for stringpredicate expressions

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47416:
-
Summary: Add benchmark for stringpredicate expressions  (was: TBD)

> Add benchmark for stringpredicate expressions
> -
>
> Key: SPARK-47416
> URL: https://issues.apache.org/jira/browse/SPARK-47416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match

2024-04-16 Thread TianyiMa (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TianyiMa updated SPARK-45387:
-
Affects Version/s: 3.5.1

> Optimize hive patition filter when the comparision dataType not match
> -
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0, 3.5.0, 3.5.1
>Reporter: TianyiMa
>Priority: Critical
>  Labels: pull-request-available
> Attachments: PruneFileSourcePartitions.diff
>
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source directly, but if the filter condition is number, e.g. dt = 123, Spark 
> will not known which partition should be pushed down. Thus in the process of 
> physical plan optimization, Spark will pull all of that table's partition 
> meta data to client side, to decide which partition filter should be push 
> down to the data source. This is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match

2024-04-16 Thread TianyiMa (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TianyiMa updated SPARK-45387:
-
Affects Version/s: 3.5.0

> Optimize hive patition filter when the comparision dataType not match
> -
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0, 3.5.0
>Reporter: TianyiMa
>Priority: Critical
>  Labels: pull-request-available
> Attachments: PruneFileSourcePartitions.diff
>
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source directly, but if the filter condition is number, e.g. dt = 123, Spark 
> will not known which partition should be pushed down. Thus in the process of 
> physical plan optimization, Spark will pull all of that table's partition 
> meta data to client side, to decide which partition filter should be push 
> down to the data source. This is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47355) TBD

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47355:
-
Summary: TBD  (was: regexp_replace & regexp_instr (binary & lowercase 
collation only))

> TBD
> ---
>
> Key: SPARK-47355
> URL: https://issues.apache.org/jira/browse/SPARK-47355
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47354) TBD

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47354:
-
Summary: TBD  (was: regexp_extract & regexp_extract_all (binary & lowercase 
collation only))

> TBD
> ---
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47868) Recursion Limit Error in SparkSession and SparkConnectPlanner

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47868:
---
Labels: pull-request-available  (was: )

> Recursion Limit Error in SparkSession and SparkConnectPlanner
> -
>
> Key: SPARK-47868
> URL: https://issues.apache.org/jira/browse/SPARK-47868
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47351) TBD

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47351:
-
Summary: TBD  (was: ilike & rlike (binary & lowercase collation only))

> TBD
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47414) Regexp expressions (binary & lowercase collation only)

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47414:
-
Summary: Regexp expressions (binary & lowercase collation only)  (was: 
Regexp expressions)

> Regexp expressions (binary & lowercase collation only)
> --
>
> Key: SPARK-47414
> URL: https://issues.apache.org/jira/browse/SPARK-47414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47350) TBD

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47350:
-
Summary: TBD  (was: like (binary & lowercase collation only))

> TBD
> ---
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47414) Regexp expressions

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47414:
-
Summary: Regexp expressions  (was: TBD)

> Regexp expressions
> --
>
> Key: SPARK-47414
> URL: https://issues.apache.org/jira/browse/SPARK-47414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests

2024-04-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47818:


Assignee: Xi Lyu

> Introduce plan cache in SparkConnectPlanner to improve performance of Analyze 
> requests
> --
>
> Key: SPARK-47818
> URL: https://issues.apache.org/jira/browse/SPARK-47818
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Assignee: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> While building the DataFrame step by step, each time a new DataFrame is 
> generated with an empty schema, which is lazily computed on access. However, 
> if a user's code frequently accesses the schema of these new DataFrames using 
> methods such as `df.columns`, it will result in a large number of Analyze 
> requests to the server. Each time, the entire plan needs to be reanalyzed, 
> leading to poor performance, especially when constructing highly complex 
> plans.
> Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the 
> overhead of repeated analysis during this process. This is achieved by saving 
> significant computation if the resolved logical plan of a subtree of can be 
> cached.
> A minimal example of the problem:
> {code:java}
> import pyspark.sql.functions as F
> df = spark.range(10)
> for i in range(200):
>   if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze 
> request in every iteration
>     df = df.withColumn(str(i), F.col("id") + i)
> df.show() {code}
> With this patch, the performance of the above code improved from ~110s to ~5s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests

2024-04-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47818.
--
Resolution: Fixed

Issue resolved by pull request 46012
[https://github.com/apache/spark/pull/46012]

> Introduce plan cache in SparkConnectPlanner to improve performance of Analyze 
> requests
> --
>
> Key: SPARK-47818
> URL: https://issues.apache.org/jira/browse/SPARK-47818
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Assignee: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> While building the DataFrame step by step, each time a new DataFrame is 
> generated with an empty schema, which is lazily computed on access. However, 
> if a user's code frequently accesses the schema of these new DataFrames using 
> methods such as `df.columns`, it will result in a large number of Analyze 
> requests to the server. Each time, the entire plan needs to be reanalyzed, 
> leading to poor performance, especially when constructing highly complex 
> plans.
> Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the 
> overhead of repeated analysis during this process. This is achieved by saving 
> significant computation if the resolved logical plan of a subtree of can be 
> cached.
> A minimal example of the problem:
> {code:java}
> import pyspark.sql.functions as F
> df = spark.range(10)
> for i in range(200):
>   if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze 
> request in every iteration
>     df = df.withColumn(str(i), F.col("id") + i)
> df.show() {code}
> With this patch, the performance of the above code improved from ~110s to ~5s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47868) Recursion Limit Error in SparkSession and SparkConnectPlanner

2024-04-16 Thread Tom van Bussel (Jira)

Tom van Bussel created SPARK-47868:
--

 Summary: Recursion Limit Error in SparkSession and 
SparkConnectPlanner
 Key: SPARK-47868
 URL: https://issues.apache.org/jira/browse/SPARK-47868
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Tom van Bussel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match

2024-04-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45387:
---
Labels: pull-request-available  (was: )

> Optimize hive patition filter when the comparision dataType not match
> -
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0
>Reporter: TianyiMa
>Priority: Critical
>  Labels: pull-request-available
> Attachments: PruneFileSourcePartitions.diff
>
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source directly, but if the filter condition is number, e.g. dt = 123, Spark 
> will not known which partition should be pushed down. Thus in the process of 
> physical plan optimization, Spark will pull all of that table's partition 
> meta data to client side, to decide which partition filter should be push 
> down to the data source. This is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match

2024-04-16 Thread TianyiMa (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TianyiMa updated SPARK-45387:
-
Summary: Optimize hive patition filter when the comparision dataType not 
match  (was: Partition key filter cannot be pushed down when using cast)

> Optimize hive patition filter when the comparision dataType not match
> -
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0
>Reporter: TianyiMa
>Priority: Critical
> Attachments: PruneFileSourcePartitions.diff
>
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source directly, but if the filter condition is number, e.g. dt = 123, Spark 
> will not known which partition should be pushed down. Thus in the process of 
> physical plan optimization, Spark will pull all of that table's partition 
> meta data to client side, to decide which partition filter should be push 
> down to the data source. This is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45387) Partition key filter cannot be pushed down when using cast

2024-04-16 Thread TianyiMa (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TianyiMa updated SPARK-45387:
-
Description: Suppose we have a partitioned table `table_pt` with partition 
colum `dt` which is StringType and the table metadata is managed by Hive 
Metastore, if we filter partition by dt = '123', this filter can be pushed down 
to data source directly, but if the filter condition is number, e.g. dt = 123, 
Spark will not known which partition should be pushed down. Thus in the process 
of physical plan optimization, Spark will pull all of that table's partition 
meta data to client side, to decide which partition filter should be push down 
to the data source. This is poor of performance if the table has thousands of 
partitions and increasing the risk of hive metastore oom.  (was: Suppose we 
have a partitioned table `table_pt` with partition colum `dt` which is 
StringType and the table metadata is managed by Hive Metastore, if we filter 
partition by dt = '123', this filter can be pushed down to data source, but if 
the filter condition is number, e.g. dt = 123, that cannot be pushed down to 
data source, causing spark to pull all of that table's partition meta data to 
client, which is poor of performance if the table has thousands of partitions 
and increasing the risk of hive metastore oom.)

> Partition key filter cannot be pushed down when using cast
> --
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0
>Reporter: TianyiMa
>Priority: Critical
> Attachments: PruneFileSourcePartitions.diff
>
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source directly, but if the filter condition is number, e.g. dt = 123, Spark 
> will not known which partition should be pushed down. Thus in the process of 
> physical plan optimization, Spark will pull all of that table's partition 
> meta data to client side, to decide which partition filter should be push 
> down to the data source. This is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness

2024-04-16 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47352:
-
Summary: Fix Upper, Lower, InitCap collation awareness  (was: TBD)

> Fix Upper, Lower, InitCap collation awareness
> -
>
> Key: SPARK-47352
> URL: https://issues.apache.org/jira/browse/SPARK-47352
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

93 matches

Mail list logo