[jira] [Resolved] (SPARK-47588) Hive module: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47588. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46086 [https://github.com/apache/spark/pull/46086] > Hive module: Migrate logInfo with variables to structured logging framework > --- > > Key: SPARK-47588 > URL: https://issues.apache.org/jira/browse/SPARK-47588 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly
[ https://issues.apache.org/jira/browse/SPARK-47882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47882: --- Labels: pull-request-available (was: ) > createTableColumnTypes need to be mapped to database types instead of using > directly > > > Key: SPARK-47882 > URL: https://issues.apache.org/jira/browse/SPARK-47882 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47361) Improve JDBC data sources
[ https://issues.apache.org/jira/browse/SPARK-47361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47361: - Affects Version/s: 3.5.1 3.4.2 > Improve JDBC data sources > - > > Key: SPARK-47361 > URL: https://issues.apache.org/jira/browse/SPARK-47361 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Dongjoon Hyun >Assignee: Kent Yao >Priority: Major > Labels: releasenotes > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly
[ https://issues.apache.org/jira/browse/SPARK-47882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47882: - Affects Version/s: 3.5.1 3.4.2 > createTableColumnTypes need to be mapped to database types instead of using > directly > > > Key: SPARK-47882 > URL: https://issues.apache.org/jira/browse/SPARK-47882 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47361) Improve JDBC data sources
[ https://issues.apache.org/jira/browse/SPARK-47361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47361: - Affects Version/s: (was: 3.4.2) (was: 3.5.1) > Improve JDBC data sources > - > > Key: SPARK-47361 > URL: https://issues.apache.org/jira/browse/SPARK-47361 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Kent Yao >Priority: Major > Labels: releasenotes > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly
Kent Yao created SPARK-47882: Summary: createTableColumnTypes need to be mapped to database types instead of using directly Key: SPARK-47882 URL: https://issues.apache.org/jira/browse/SPARK-47882 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle
[ https://issues.apache.org/jira/browse/SPARK-47880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47880. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46092 [https://github.com/apache/spark/pull/46092] > Oracle: Document Mapping Spark SQL Data Types to Oracle > --- > > Key: SPARK-47880 > URL: https://issues.apache.org/jira/browse/SPARK-47880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle
[ https://issues.apache.org/jira/browse/SPARK-47880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47880: Assignee: Kent Yao > Oracle: Document Mapping Spark SQL Data Types to Oracle > --- > > Key: SPARK-47880 > URL: https://issues.apache.org/jira/browse/SPARK-47880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47879) Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping
[ https://issues.apache.org/jira/browse/SPARK-47879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47879. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46091 [https://github.com/apache/spark/pull/46091] > Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping > > > Key: SPARK-47879 > URL: https://issues.apache.org/jira/browse/SPARK-47879 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle
[ https://issues.apache.org/jira/browse/SPARK-47880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47880: --- Labels: pull-request-available (was: ) > Oracle: Document Mapping Spark SQL Data Types to Oracle > --- > > Key: SPARK-47880 > URL: https://issues.apache.org/jira/browse/SPARK-47880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47881) Not working HDFS path for hive.metastore.jars.path
jungho.choi created SPARK-47881: --- Summary: Not working HDFS path for hive.metastore.jars.path Key: SPARK-47881 URL: https://issues.apache.org/jira/browse/SPARK-47881 Project: Spark Issue Type: Question Components: SQL Affects Versions: 3.4.2 Reporter: jungho.choi I trying to use Hive Metastore version 3.1.3 with Spark version 3.4.2, but encountering an error when specifying the path to the metastore JARs on HDFS. According to the official documentation, you've followed the guidelines and specified the path using an HDFS URI: {code:java} spark.sql.hive.metastore.version 3.1.3 spark.sql.hive.metastore.jarspath spark.sql.hive.metastore.jars.path hdfs://namespace/spark/hive3_lib/* {code} However, when tested it, encountered an error stating that the URI schema in HiveClientImpl.scala is not file. {code:java} Caused by: java.lang.ExceptionInInitializerError: java.lang.IllegalArgumentException: URI scheme is not "file" at org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1296) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:174) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:139) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:315) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:517) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:377) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:70) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:69) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101) ... 143 more {code} To resolve this, changed the spark.sql.hive.metastore.jars.path to a local file path instead of an HDFS path, and it worked fine. I think I followed the instructions correctly, but are there any specific configurations or preferences required to use HDFS paths? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47881) Not working HDFS path for hive.metastore.jars.path
[ https://issues.apache.org/jira/browse/SPARK-47881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jungho.choi updated SPARK-47881: Description: I trying to use Hive Metastore version 3.1.3 with Spark version 3.4.2, but encountering an error when specifying the path to the metastore JARs on HDFS. According to the official documentation, followed the guidelines and specified the path using an HDFS URI: {code:java} spark.sql.hive.metastore.version 3.1.3 spark.sql.hive.metastore.jarspath spark.sql.hive.metastore.jars.path hdfs://namespace/spark/hive3_lib/* {code} However, when tested it, encountered an error stating that the URI schema in HiveClientImpl.scala is not file. {code:java} Caused by: java.lang.ExceptionInInitializerError: java.lang.IllegalArgumentException: URI scheme is not "file" at org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1296) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:174) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:139) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:315) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:517) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:377) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:70) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:69) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101) ... 143 more {code} To resolve this, changed the spark.sql.hive.metastore.jars.path to a local file path instead of an HDFS path, and it worked fine. I think I followed the instructions correctly, but are there any specific configurations or preferences required to use HDFS paths? was: I trying to use Hive Metastore version 3.1.3 with Spark version 3.4.2, but encountering an error when specifying the path to the metastore JARs on HDFS. According to the official documentation, you've followed the guidelines and specified the path using an HDFS URI: {code:java} spark.sql.hive.metastore.version 3.1.3 spark.sql.hive.metastore.jarspath spark.sql.hive.metastore.jars.path hdfs://namespace/spark/hive3_lib/* {code} However, when tested it, encountered an error stating that the URI schema in HiveClientImpl.scala is not file. {code:java} Caused by: java.lang.ExceptionInInitializerError: java.lang.IllegalArgumentException: URI scheme is not "file" at org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1296) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:174) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:139) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:315) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:517) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:377) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:70) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:69) at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101) ... 143 more {code} To resolve this, changed the spark.sql.hive.metastore.jars.path to a local file path instead of an HDFS path, and it worked fine. I think I followed the instructions correctly, but are there any specific configurations or preferences required to use
[jira] [Created] (SPARK-47880) Oracle: Document Mapping Spark SQL Data Types to Oracle
Kent Yao created SPARK-47880: Summary: Oracle: Document Mapping Spark SQL Data Types to Oracle Key: SPARK-47880 URL: https://issues.apache.org/jira/browse/SPARK-47880 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47879) Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping
[ https://issues.apache.org/jira/browse/SPARK-47879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47879: --- Labels: pull-request-available (was: ) > Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping > > > Key: SPARK-47879 > URL: https://issues.apache.org/jira/browse/SPARK-47879 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47879) Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping
Kent Yao created SPARK-47879: Summary: Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping Key: SPARK-47879 URL: https://issues.apache.org/jira/browse/SPARK-47879 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47172) Upgrade Transport block cipher mode to GCM
[ https://issues.apache.org/jira/browse/SPARK-47172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837927#comment-17837927 ] Mridul Muralidharan commented on SPARK-47172: - Given we have addressed SPARK-47318, and with Spark supporting TLS from 4.0 - is this a concern ? Given it will be backwardly incompatible, I am hesitant to expand support for something which is expected to go away "soon". > Upgrade Transport block cipher mode to GCM > -- > > Key: SPARK-47172 > URL: https://issues.apache.org/jira/browse/SPARK-47172 > Project: Spark > Issue Type: Improvement > Components: Security >Affects Versions: 3.4.2, 3.5.0 >Reporter: Steve Weis >Priority: Minor > > The cipher transformation currently used for encrypting RPC calls is an > unauthenticated mode (AES/CTR/NoPadding). This needs to be upgraded to an > authenticated mode (AES/GCM/NoPadding) to prevent ciphertext from being > modified in transit. > The relevant line is here: > [https://github.com/apache/spark/blob/a939a7d0fd9c6b23c879cbee05275c6fbc939e38/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L220] > GCM is relatively more computationally expensive than CTR and adds a 16-byte > block of authentication tag data to each payload. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47876) Improve docstring of mapInArrow
[ https://issues.apache.org/jira/browse/SPARK-47876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-47876. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/46088 > Improve docstring of mapInArrow > --- > > Key: SPARK-47876 > URL: https://issues.apache.org/jira/browse/SPARK-47876 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Improve docstring of mapInArrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46375) Add documentation for Python data source API
[ https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46375. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46089 [https://github.com/apache/spark/pull/46089] > Add documentation for Python data source API > > > Key: SPARK-46375 > URL: https://issues.apache.org/jira/browse/SPARK-46375 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add documentation (user guide) for Python data soruce API. > > Note the documentation should clarify the required dependency: pyarrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46375) Add documentation for Python data source API
[ https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46375: Assignee: Allison Wang > Add documentation for Python data source API > > > Key: SPARK-46375 > URL: https://issues.apache.org/jira/browse/SPARK-46375 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Add documentation (user guide) for Python data soruce API. > > Note the documentation should clarify the required dependency: pyarrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47763) Reeanble Protobuf function doctests
[ https://issues.apache.org/jira/browse/SPARK-47763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47763: --- Labels: pull-request-available (was: ) > Reeanble Protobuf function doctests > --- > > Key: SPARK-47763 > URL: https://issues.apache.org/jira/browse/SPARK-47763 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47877) Speed up test_parity_listener
[ https://issues.apache.org/jira/browse/SPARK-47877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47877. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46072 [https://github.com/apache/spark/pull/46072] > Speed up test_parity_listener > - > > Key: SPARK-47877 > URL: https://issues.apache.org/jira/browse/SPARK-47877 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47760) Reeanble Avro function doctests
[ https://issues.apache.org/jira/browse/SPARK-47760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47760. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46055 [https://github.com/apache/spark/pull/46055] > Reeanble Avro function doctests > --- > > Key: SPARK-47760 > URL: https://issues.apache.org/jira/browse/SPARK-47760 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47875) Remove `spark.deploy.recoverySerializer`
[ https://issues.apache.org/jira/browse/SPARK-47875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47875. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46087 [https://github.com/apache/spark/pull/46087] > Remove `spark.deploy.recoverySerializer` > > > Key: SPARK-47875 > URL: https://issues.apache.org/jira/browse/SPARK-47875 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47763) Reeanble Protobuf function doctests
[ https://issues.apache.org/jira/browse/SPARK-47763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47763. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46055 [https://github.com/apache/spark/pull/46055] > Reeanble Protobuf function doctests > --- > > Key: SPARK-47763 > URL: https://issues.apache.org/jira/browse/SPARK-47763 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47868) Recursion Limit Error in SparkSession and SparkConnectPlanner
[ https://issues.apache.org/jira/browse/SPARK-47868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-47868. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46075 [https://github.com/apache/spark/pull/46075] > Recursion Limit Error in SparkSession and SparkConnectPlanner > - > > Key: SPARK-47868 > URL: https://issues.apache.org/jira/browse/SPARK-47868 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46375) Add documentation for Python data source API
[ https://issues.apache.org/jira/browse/SPARK-46375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46375: --- Labels: pull-request-available (was: ) > Add documentation for Python data source API > > > Key: SPARK-46375 > URL: https://issues.apache.org/jira/browse/SPARK-46375 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Add documentation (user guide) for Python data soruce API. > > Note the documentation should clarify the required dependency: pyarrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47877) Speed up test_parity_listener
[ https://issues.apache.org/jira/browse/SPARK-47877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47877: --- Labels: pull-request-available (was: ) > Speed up test_parity_listener > - > > Key: SPARK-47877 > URL: https://issues.apache.org/jira/browse/SPARK-47877 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47877) Speed up test_parity_listener
Wei Liu created SPARK-47877: --- Summary: Speed up test_parity_listener Key: SPARK-47877 URL: https://issues.apache.org/jira/browse/SPARK-47877 Project: Spark Issue Type: New Feature Components: Connect, SS Affects Versions: 4.0.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47876) Improve docstring of mapInArrow
Xinrong Meng created SPARK-47876: Summary: Improve docstring of mapInArrow Key: SPARK-47876 URL: https://issues.apache.org/jira/browse/SPARK-47876 Project: Spark Issue Type: Documentation Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Improve docstring of mapInArrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47875) Remove `spark.deploy.recoverySerializer`
Dongjoon Hyun created SPARK-47875: - Summary: Remove `spark.deploy.recoverySerializer` Key: SPARK-47875 URL: https://issues.apache.org/jira/browse/SPARK-47875 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47590) Hive-thriftserver: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47590. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45923 [https://github.com/apache/spark/pull/45923] > Hive-thriftserver: Migrate logWarn with variables to structured logging > framework > - > > Key: SPARK-47590 > URL: https://issues.apache.org/jira/browse/SPARK-47590 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47588) Hive module: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47588: --- Labels: pull-request-available (was: ) > Hive module: Migrate logInfo with variables to structured logging framework > --- > > Key: SPARK-47588 > URL: https://issues.apache.org/jira/browse/SPARK-47588 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47594) Connector module: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-47594: -- Assignee: BingKun Pan > Connector module: Migrate logInfo with variables to structured logging > framework > > > Key: SPARK-47594 > URL: https://issues.apache.org/jira/browse/SPARK-47594 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47594) Connector module: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47594. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46022 [https://github.com/apache/spark/pull/46022] > Connector module: Migrate logInfo with variables to structured logging > framework > > > Key: SPARK-47594 > URL: https://issues.apache.org/jira/browse/SPARK-47594 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47874) Multiple bugs with map operations in combination with collations
Nikola Mandic created SPARK-47874: - Summary: Multiple bugs with map operations in combination with collations Key: SPARK-47874 URL: https://issues.apache.org/jira/browse/SPARK-47874 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Nikola Mandic Following two queries produce different results (first succeeds, second throws an exceptions): {code:java} select map('a', 1, 'A' collate utf8_binary_lcase, 2); -- success select map('a' collate utf8_binary_lcase, 1, 'A', 2); -- exception{code} Following query results in 1: {code:java} select cast(map('a', 1, 'A', 2) as map)['A' collate utf8_binary_lcase]; -- 1{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
[ https://issues.apache.org/jira/browse/SPARK-47871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47871. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46080 [https://github.com/apache/spark/pull/46080] > Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE > --- > > Key: SPARK-47871 > URL: https://issues.apache.org/jira/browse/SPARK-47871 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
[ https://issues.apache.org/jira/browse/SPARK-47871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47871: - Assignee: Kent Yao > Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE > --- > > Key: SPARK-47871 > URL: https://issues.apache.org/jira/browse/SPARK-47871 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47417: --- Assignee: Nikola Mandic > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences (all collations) > -- > > Key: SPARK-47417 > URL: https://issues.apache.org/jira/browse/SPARK-47417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47745) Add License to Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47745: -- Affects Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Add License to Spark Operator > - > > Key: SPARK-47745 > URL: https://issues.apache.org/jira/browse/SPARK-47745 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > Add license to the recently established operator repository. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47417) Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47417. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45933 [https://github.com/apache/spark/pull/45933] > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences (all collations) > -- > > Key: SPARK-47417 > URL: https://issues.apache.org/jira/browse/SPARK-47417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47745) Add License to Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47745: -- Fix Version/s: kubernetes-operator-0.1.0 (was: 4.0.0) > Add License to Spark Operator > - > > Key: SPARK-47745 > URL: https://issues.apache.org/jira/browse/SPARK-47745 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > Add license to the recently established operator repository. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47745) Add License to Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47745. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 3 [https://github.com/apache/spark-kubernetes-operator/pull/3] > Add License to Spark Operator > - > > Key: SPARK-47745 > URL: https://issues.apache.org/jira/browse/SPARK-47745 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add license to the recently established operator repository. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47745) Add License to Spark Operator
[ https://issues.apache.org/jira/browse/SPARK-47745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47745: - Assignee: Zhou JIANG > Add License to Spark Operator > - > > Key: SPARK-47745 > URL: https://issues.apache.org/jira/browse/SPARK-47745 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > > Add license to the recently established operator repository. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33822) TPCDS Q5 fails if spark.sql.adaptive.enabled=true
[ https://issues.apache.org/jira/browse/SPARK-33822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-33822: --- Labels: pull-request-available (was: ) > TPCDS Q5 fails if spark.sql.adaptive.enabled=true > - > > Key: SPARK-33822 > URL: https://issues.apache.org/jira/browse/SPARK-33822 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Takeshi Yamamuro >Priority: Blocker > Labels: pull-request-available > Fix For: 3.0.2, 3.1.0 > > > **PROBLEM STATEMENT** > {code} > >>> tables = ['call_center', 'catalog_page', 'catalog_returns', > >>> 'catalog_sales', 'customer', 'customer_address', 'customer_demographics', > >>> 'date_dim', 'household_demographics', 'income_band', 'inventory', 'item', > >>> 'promotion', 'reason', 'ship_mode', 'store', 'store_returns', > >>> 'store_sales', 'time_dim', 'warehouse', 'web_page', 'web_returns', > >>> 'web_sales', 'web_site'] > >>> for t in tables: > ... spark.sql("CREATE TABLE %s USING PARQUET LOCATION > '/Users/dongjoon/data/10g/%s'" % (t, t)) > >>> spark.sql(spark.sparkContext.wholeTextFiles("/Users/dongjoon/data/query/q5.sql").take(1)[0][1]).show(1) > +---++-+---+-+ > |channel| id|sales|returns| profit| > +---++-+---+-+ > | null|null|1143646603.07|30617460.71|-317540732.87| > |catalog channel|null| 393609478.06| 9451732.79| -44801262.72| > |catalog channel|catalog_pageA...| 0.00| 39037.48|-25330.29| > ... > +---++-+---+-+ > >>> sql("set spark.sql.adaptive.enabled=true") > >>> spark.sql(spark.sparkContext.wholeTextFiles("/Users/dongjoon/data/query/q5.sql").take(1)[0][1]).show(1) > Traceback (most recent call last): > File "", line 1, in > File > "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/pyspark/sql/dataframe.py", > line 440, in show > print(self._jdf.showString(n, 20, vertical)) > File > "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", > line 1305, in __call__ > File > "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/pyspark/sql/utils.py", > line 128, in deco > return f(*a, **kw) > File > "/Users/dongjoon/APACHE/spark-release/spark-3.0.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", > line 328, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o160.showString. > : java.lang.UnsupportedOperationException: BroadcastExchange does not support > the execute() code path. > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecute(BroadcastExchangeExec.scala:190) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) > at > org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecute(Exchange.scala:61) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) > at > org.apache.spark.sql.execution.adaptive.QueryStageExec.doExecute(QueryStageExec.scala:115) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) > at > org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:316) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:392) >
[jira] [Resolved] (SPARK-47356) Add support for ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47356. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46061 [https://github.com/apache/spark/pull/46061] > Add support for ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47356) Add support for ConcatWs & Elt (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47356: --- Assignee: Mihailo Milosevic > Add support for ConcatWs & Elt (all collations) > --- > > Key: SPARK-47356 > URL: https://issues.apache.org/jira/browse/SPARK-47356 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47873: --- Labels: pull-request-available (was: ) > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47873) Write collated strings to hive as regular strings
Stefan Kandic created SPARK-47873: - Summary: Write collated strings to hive as regular strings Key: SPARK-47873 URL: https://issues.apache.org/jira/browse/SPARK-47873 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Stefan Kandic As hive doesn't support collations we should write collated strings with a regular string type but keep the collation in table metadata to properly read them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47418) Optimize string predicate expressions for UTF8_BINARY_LCASE collation
[ https://issues.apache.org/jira/browse/SPARK-47418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47418: --- Labels: pull-request-available (was: ) > Optimize string predicate expressions for UTF8_BINARY_LCASE collation > - > > Key: SPARK-47418 > URL: https://issues.apache.org/jira/browse/SPARK-47418 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Implement {*}contains{*}, {*}startsWith{*}, and *endsWith* built-in string > Spark functions using optimized lowercase comparison approach introduced by > [~nikolamand-db] in [https://github.com/apache/spark/pull/45816]. Refer to > the latest design and code structure imposed by [~uros-db] in > https://issues.apache.org/jira/browse/SPARK-47410 to understand how collation > support is introduced for Spark SQL expressions. In addition, review previous > Jira tickets under the current parent in order to understand how > *StringPredicate* expressions are currently used and tested in Spark: > * [SPARK-47131|https://issues.apache.org/jira/browse/SPARK-47131] > * [SPARK-47248|https://issues.apache.org/jira/browse/SPARK-47248] > * [SPARK-47295|https://issues.apache.org/jira/browse/SPARK-47295] > These tickets should help you understand what changes were introduced in > order to enable collation support for these functions. Lastly, feel free to > use your chosen Spark SQL Editor to play around with the existing functions > and learn more about how they work. > > The goal for this Jira ticket is to improve the UTF8_BINARY_LCASE > implementation for the {*}contains{*}, {*}startsWith{*}, and *endsWith* > functions so that they use optimized lowercase comparison approach (following > the general logic in Nikola's PR), and benchmark the results accordingly. As > for testing, the currently existing unit test cases and end-to-end tests > should already fully cover the expected behaviour of *StringPredicate* > expressions for all collation types. In other words, the objective of this > ticket is only to enhance the internal implementation, without introducing > any user-facing changes to Spark SQL API. > > Finally, feel free to refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47353) TBD
[ https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47353: - Summary: TBD (was: regexp_count & regexp_substr (binary & lowercase collation only)) > TBD > --- > > Key: SPARK-47353 > URL: https://issues.apache.org/jira/browse/SPARK-47353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-47872) Add the option to initially hide a CustomMetric from the Spark UI
[ https://issues.apache.org/jira/browse/SPARK-47872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Dillitz closed SPARK-47872. -- > Add the option to initially hide a CustomMetric from the Spark UI > - > > Key: SPARK-47872 > URL: https://issues.apache.org/jira/browse/SPARK-47872 > Project: Spark > Issue Type: Improvement > Components: UI >Affects Versions: 4.0.0 >Reporter: Robert Dillitz >Priority: Major > Labels: Metrics, UI > > There is currently no way to have experimental CustomMetrics that are > initially hidden in the Spark UI. Add this option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47872) Add the option to initially hide a CustomMetric from the Spark UI
[ https://issues.apache.org/jira/browse/SPARK-47872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Dillitz resolved SPARK-47872. Resolution: Works for Me > Add the option to initially hide a CustomMetric from the Spark UI > - > > Key: SPARK-47872 > URL: https://issues.apache.org/jira/browse/SPARK-47872 > Project: Spark > Issue Type: Improvement > Components: UI >Affects Versions: 4.0.0 >Reporter: Robert Dillitz >Priority: Major > Labels: Metrics, UI > > There is currently no way to have experimental CustomMetrics that are > initially hidden in the Spark UI. Add this option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46841) Language support for collations
[ https://issues.apache.org/jira/browse/SPARK-46841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837668#comment-17837668 ] Nikola Mandic commented on SPARK-46841: --- Working on it. > Language support for collations > --- > > Key: SPARK-46841 > URL: https://issues.apache.org/jira/browse/SPARK-46841 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47872) Add the option to initially hide a CustomMetric from the Spark UI
Robert Dillitz created SPARK-47872: -- Summary: Add the option to initially hide a CustomMetric from the Spark UI Key: SPARK-47872 URL: https://issues.apache.org/jira/browse/SPARK-47872 Project: Spark Issue Type: Improvement Components: UI Affects Versions: 4.0.0 Reporter: Robert Dillitz There is currently no way to have experimental CustomMetrics that are initially hidden in the Spark UI. Add this option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47739) Avro does not register custom logical types on spark startup
[ https://issues.apache.org/jira/browse/SPARK-47739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47739. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45895 [https://github.com/apache/spark/pull/45895] > Avro does not register custom logical types on spark startup > > > Key: SPARK-47739 > URL: https://issues.apache.org/jira/browse/SPARK-47739 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Milan Stefanovic >Assignee: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > It can happen that we resolve avro schema before we register avro logical > types leading to two consecutive calls to resolve avro schema providing wrong > results. > > Example: > !image-2024-04-05-16-27-05-489.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46574) Upgrade maven plugin to latest version
[ https://issues.apache.org/jira/browse/SPARK-46574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46574. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46043 [https://github.com/apache/spark/pull/46043] > Upgrade maven plugin to latest version > -- > > Key: SPARK-46574 > URL: https://issues.apache.org/jira/browse/SPARK-46574 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
[ https://issues.apache.org/jira/browse/SPARK-47871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47871: --- Labels: pull-request-available (was: ) > Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE > --- > > Key: SPARK-47871 > URL: https://issues.apache.org/jira/browse/SPARK-47871 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45859) Make UDF objects in ml.functions lazy
[ https://issues.apache.org/jira/browse/SPARK-45859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837644#comment-17837644 ] Kent Yao commented on SPARK-45859: -- I removed 3.0 and 3.1 from *Affects Version/s:* here because I manually cleaned them in the background. > Make UDF objects in ml.functions lazy > - > > Key: SPARK-45859 > URL: https://issues.apache.org/jira/browse/SPARK-45859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45859) Make UDF objects in ml.functions lazy
[ https://issues.apache.org/jira/browse/SPARK-45859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-45859: - Affects Version/s: (was: 3.0) (was: 3.1) > Make UDF objects in ml.functions lazy > - > > Key: SPARK-45859 > URL: https://issues.apache.org/jira/browse/SPARK-45859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.2.0, 3.3.0, 3.4.0, 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47556) [K8] Spark App ID collision resulting in deleting wrong resources
[ https://issues.apache.org/jira/browse/SPARK-47556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47556: - Affects Version/s: 3.1.0 (was: 3.1) > [K8] Spark App ID collision resulting in deleting wrong resources > - > > Key: SPARK-47556 > URL: https://issues.apache.org/jira/browse/SPARK-47556 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Sundeep K >Priority: Major > Labels: pull-request-available > > h3. Issue: > We noticed that sometimes K8s executor pods go in a crash loop. Reason being > 'Error: MountVolume.SetUp failed for volume "spark-conf-volume-exec"'. Upon > investigation we noticed that there are 2 spark jobs that launched with same > application id and when one of them finishes first it deletes all it's > resources and deletes the resources of other job too. > -> Spark application ID is created using this > [code|https://github.com/apache/spark/blob/36126a5c1821b4418afd5788963a939ea7f64078/core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala#L38] > "spark-application-" + System.currentTimeMillis > This means if 2 applications launch at the same milli second they could end > up having same AppId > -> > [spark-app-selector|https://github.com/apache/spark/blob/93f98c0a61ddb66eb777c3940fbf29fc58e2d79b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala#L23] > label is added to all resource created by driver and it's value is > application Id. Kubernetes Scheduler deletes all the apps with same > [label|https://github.com/apache/spark/blob/2a8bb5cdd3a5a2d63428b82df5e5066a805ce878/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L162C1-L172C6] > upon termination. > This results in deletion of config map and executor pods of job that's still > running, driver tries to relaunch the executor pods, but config map is not > present, so it's in crash loop > h3. Context > We are using [Spark of Kubernetes > |https://spark.apache.org/docs/latest/running-on-kubernetes.html]and launch > our spark jobs using PySpark. We launch multiple Spark Jobs within a given > k8s namespace. Each Spark job can be launched from different pods or from > different processes in a pod. Every time a job is launched it has a unique > app name. Here is how the job is launched (omitting irrelevant details): > {code:java} > # spark_conf has settings required for spark on k8s > sp = SparkSession.builder \ > .config(conf=spark_conf) \ > .appName('testapp') > sp.master(f'k8s://{kubernetes_host}') > session = sp.getOrCreate() > with session: > session.sql('SELECT 1'){code} > h3. Repro > Set same app id in spark config, run 2 different jobs, one that finishes > fast, one that runs slow. Slower job goes into crash loop > {code:java} > "spark.app.id": ""{code} > h3. Workaround > Set unique spark.app.id for all the jobs that run on k8s > eg: > {code:java} > "spark.app.id": f'{AppName}-{CurrTimeInMilliSecs}-{UUId}'[:63]{code} > h3. Fix > Add unique hash add the end of Application ID: > [https://github.com/apache/spark/pull/45712] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837634#comment-17837634 ] Goutam Ghosh commented on SPARK-5159: - can the patch by [~angerszhuuu] be verified ? > Thrift server does not respect hive.server2.enable.doAs=true > > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Andrew Ray >Priority: Major > Labels: bulk-closed > Attachments: spark_thrift_server_log.txt > > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47871) Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE
Kent Yao created SPARK-47871: Summary: Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE Key: SPARK-47871 URL: https://issues.apache.org/jira/browse/SPARK-47871 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-47353) regexp_count & regexp_substr (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47353 ] Milan Dankovic deleted comment on SPARK-47353: was (Author: JIRAUSER304529): I am working on this > regexp_count & regexp_substr (binary & lowercase collation only) > > > Key: SPARK-47353 > URL: https://issues.apache.org/jira/browse/SPARK-47353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47476: -- Assignee: (was: Apache Spark) > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47476) StringReplace (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47476: -- Assignee: Apache Spark > StringReplace (all collations) > -- > > Key: SPARK-47476 > URL: https://issues.apache.org/jira/browse/SPARK-47476 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringReplace* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringReplace* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47858) Refactoring the structure for DataFrame error context
[ https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47858: -- Assignee: (was: Apache Spark) > Refactoring the structure for DataFrame error context > - > > Key: SPARK-47858 > URL: https://issues.apache.org/jira/browse/SPARK-47858 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > The current implementation for PySpark DataFrame error context could be more > flexible by addressing some hacky spots. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47858) Refactoring the structure for DataFrame error context
[ https://issues.apache.org/jira/browse/SPARK-47858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47858: -- Assignee: Apache Spark > Refactoring the structure for DataFrame error context > - > > Key: SPARK-47858 > URL: https://issues.apache.org/jira/browse/SPARK-47858 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > The current implementation for PySpark DataFrame error context could be more > flexible by addressing some hacky spots. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47741: -- Assignee: (was: Apache Spark) > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47741) Handle stack overflow when parsing query
[ https://issues.apache.org/jira/browse/SPARK-47741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47741: -- Assignee: Apache Spark > Handle stack overflow when parsing query > > > Key: SPARK-47741 > URL: https://issues.apache.org/jira/browse/SPARK-47741 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Milan Stefanovic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Parsing complex queries which can lead to stack overflow. > We need to catch this exception and convert it to proper parser exc with > error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47411: -- Assignee: Apache Spark > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47411: -- Assignee: (was: Apache Spark) > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47596) Streaming: Migrate logWarn with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47596: --- Labels: pull-request-available (was: ) > Streaming: Migrate logWarn with variables to structured logging framework > - > > Key: SPARK-47596 > URL: https://issues.apache.org/jira/browse/SPARK-47596 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47869) Upgrade built in hive to Hive-4.0
[ https://issues.apache.org/jira/browse/SPARK-47869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simhadri Govindappa updated SPARK-47869: Description: Hive 4.0 has been released. It brings in a lot of new features, bug fixes and performance improvements. We would like to update the version of hive used in spark to hive-4.0 [https://lists.apache.org/thread/2jqpvsx8n801zb5pmlhb8f4zloq27p82] was: Hive 4.0 has been released. It brings in a lot of new features, bug fixes and performance improvements. We would like to update the version of hive used in spark to hive-4.0 > Upgrade built in hive to Hive-4.0 > - > > Key: SPARK-47869 > URL: https://issues.apache.org/jira/browse/SPARK-47869 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Simhadri Govindappa >Priority: Major > > Hive 4.0 has been released. It brings in a lot of new features, bug fixes and > performance improvements. > We would like to update the version of hive used in spark to hive-4.0 > [https://lists.apache.org/thread/2jqpvsx8n801zb5pmlhb8f4zloq27p82] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47869) Upgrade built in hive to Hive-4.0
Simhadri Govindappa created SPARK-47869: --- Summary: Upgrade built in hive to Hive-4.0 Key: SPARK-47869 URL: https://issues.apache.org/jira/browse/SPARK-47869 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 3.5.1 Reporter: Simhadri Govindappa Hive 4.0 has been released. It brings in a lot of new features, bug fixes and performance improvements. We would like to update the version of hive used in spark to hive-4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47414) Regexp expressions (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47414: --- Labels: pull-request-available (was: ) > Regexp expressions (binary & lowercase collation only) > -- > > Key: SPARK-47414 > URL: https://issues.apache.org/jira/browse/SPARK-47414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47416) Add benchmark for stringpredicate expressions
[ https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47416: --- Labels: pull-request-available (was: ) > Add benchmark for stringpredicate expressions > - > > Key: SPARK-47416 > URL: https://issues.apache.org/jira/browse/SPARK-47416 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47416) Add benchmark for stringpredicate expressions
[ https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47416: - Summary: Add benchmark for stringpredicate expressions (was: TBD) > Add benchmark for stringpredicate expressions > - > > Key: SPARK-47416 > URL: https://issues.apache.org/jira/browse/SPARK-47416 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match
[ https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TianyiMa updated SPARK-45387: - Affects Version/s: 3.5.1 > Optimize hive patition filter when the comparision dataType not match > - > > Key: SPARK-45387 > URL: https://issues.apache.org/jira/browse/SPARK-45387 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0, 3.5.0, 3.5.1 >Reporter: TianyiMa >Priority: Critical > Labels: pull-request-available > Attachments: PruneFileSourcePartitions.diff > > > Suppose we have a partitioned table `table_pt` with partition colum `dt` > which is StringType and the table metadata is managed by Hive Metastore, if > we filter partition by dt = '123', this filter can be pushed down to data > source directly, but if the filter condition is number, e.g. dt = 123, Spark > will not known which partition should be pushed down. Thus in the process of > physical plan optimization, Spark will pull all of that table's partition > meta data to client side, to decide which partition filter should be push > down to the data source. This is poor of performance if the table has > thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match
[ https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TianyiMa updated SPARK-45387: - Affects Version/s: 3.5.0 > Optimize hive patition filter when the comparision dataType not match > - > > Key: SPARK-45387 > URL: https://issues.apache.org/jira/browse/SPARK-45387 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0, 3.5.0 >Reporter: TianyiMa >Priority: Critical > Labels: pull-request-available > Attachments: PruneFileSourcePartitions.diff > > > Suppose we have a partitioned table `table_pt` with partition colum `dt` > which is StringType and the table metadata is managed by Hive Metastore, if > we filter partition by dt = '123', this filter can be pushed down to data > source directly, but if the filter condition is number, e.g. dt = 123, Spark > will not known which partition should be pushed down. Thus in the process of > physical plan optimization, Spark will pull all of that table's partition > meta data to client side, to decide which partition filter should be push > down to the data source. This is poor of performance if the table has > thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47355) TBD
[ https://issues.apache.org/jira/browse/SPARK-47355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47355: - Summary: TBD (was: regexp_replace & regexp_instr (binary & lowercase collation only)) > TBD > --- > > Key: SPARK-47355 > URL: https://issues.apache.org/jira/browse/SPARK-47355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47354) TBD
[ https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47354: - Summary: TBD (was: regexp_extract & regexp_extract_all (binary & lowercase collation only)) > TBD > --- > > Key: SPARK-47354 > URL: https://issues.apache.org/jira/browse/SPARK-47354 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47868) Recursion Limit Error in SparkSession and SparkConnectPlanner
[ https://issues.apache.org/jira/browse/SPARK-47868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47868: --- Labels: pull-request-available (was: ) > Recursion Limit Error in SparkSession and SparkConnectPlanner > - > > Key: SPARK-47868 > URL: https://issues.apache.org/jira/browse/SPARK-47868 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47351) TBD
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47351: - Summary: TBD (was: ilike & rlike (binary & lowercase collation only)) > TBD > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47414) Regexp expressions (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47414: - Summary: Regexp expressions (binary & lowercase collation only) (was: Regexp expressions) > Regexp expressions (binary & lowercase collation only) > -- > > Key: SPARK-47414 > URL: https://issues.apache.org/jira/browse/SPARK-47414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47350) TBD
[ https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47350: - Summary: TBD (was: like (binary & lowercase collation only)) > TBD > --- > > Key: SPARK-47350 > URL: https://issues.apache.org/jira/browse/SPARK-47350 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47414) Regexp expressions
[ https://issues.apache.org/jira/browse/SPARK-47414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47414: - Summary: Regexp expressions (was: TBD) > Regexp expressions > -- > > Key: SPARK-47414 > URL: https://issues.apache.org/jira/browse/SPARK-47414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
[ https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47818: Assignee: Xi Lyu > Introduce plan cache in SparkConnectPlanner to improve performance of Analyze > requests > -- > > Key: SPARK-47818 > URL: https://issues.apache.org/jira/browse/SPARK-47818 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xi Lyu >Assignee: Xi Lyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > While building the DataFrame step by step, each time a new DataFrame is > generated with an empty schema, which is lazily computed on access. However, > if a user's code frequently accesses the schema of these new DataFrames using > methods such as `df.columns`, it will result in a large number of Analyze > requests to the server. Each time, the entire plan needs to be reanalyzed, > leading to poor performance, especially when constructing highly complex > plans. > Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the > overhead of repeated analysis during this process. This is achieved by saving > significant computation if the resolved logical plan of a subtree of can be > cached. > A minimal example of the problem: > {code:java} > import pyspark.sql.functions as F > df = spark.range(10) > for i in range(200): > if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze > request in every iteration > df = df.withColumn(str(i), F.col("id") + i) > df.show() {code} > With this patch, the performance of the above code improved from ~110s to ~5s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47818) Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests
[ https://issues.apache.org/jira/browse/SPARK-47818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47818. -- Resolution: Fixed Issue resolved by pull request 46012 [https://github.com/apache/spark/pull/46012] > Introduce plan cache in SparkConnectPlanner to improve performance of Analyze > requests > -- > > Key: SPARK-47818 > URL: https://issues.apache.org/jira/browse/SPARK-47818 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xi Lyu >Assignee: Xi Lyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > While building the DataFrame step by step, each time a new DataFrame is > generated with an empty schema, which is lazily computed on access. However, > if a user's code frequently accesses the schema of these new DataFrames using > methods such as `df.columns`, it will result in a large number of Analyze > requests to the server. Each time, the entire plan needs to be reanalyzed, > leading to poor performance, especially when constructing highly complex > plans. > Now, by introducing plan cache in SparkConnectPlanner, we aim to reduce the > overhead of repeated analysis during this process. This is achieved by saving > significant computation if the resolved logical plan of a subtree of can be > cached. > A minimal example of the problem: > {code:java} > import pyspark.sql.functions as F > df = spark.range(10) > for i in range(200): > if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze > request in every iteration > df = df.withColumn(str(i), F.col("id") + i) > df.show() {code} > With this patch, the performance of the above code improved from ~110s to ~5s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47868) Recursion Limit Error in SparkSession and SparkConnectPlanner
Tom van Bussel created SPARK-47868: -- Summary: Recursion Limit Error in SparkSession and SparkConnectPlanner Key: SPARK-47868 URL: https://issues.apache.org/jira/browse/SPARK-47868 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Tom van Bussel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match
[ https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45387: --- Labels: pull-request-available (was: ) > Optimize hive patition filter when the comparision dataType not match > - > > Key: SPARK-45387 > URL: https://issues.apache.org/jira/browse/SPARK-45387 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0 >Reporter: TianyiMa >Priority: Critical > Labels: pull-request-available > Attachments: PruneFileSourcePartitions.diff > > > Suppose we have a partitioned table `table_pt` with partition colum `dt` > which is StringType and the table metadata is managed by Hive Metastore, if > we filter partition by dt = '123', this filter can be pushed down to data > source directly, but if the filter condition is number, e.g. dt = 123, Spark > will not known which partition should be pushed down. Thus in the process of > physical plan optimization, Spark will pull all of that table's partition > meta data to client side, to decide which partition filter should be push > down to the data source. This is poor of performance if the table has > thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45387) Optimize hive patition filter when the comparision dataType not match
[ https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TianyiMa updated SPARK-45387: - Summary: Optimize hive patition filter when the comparision dataType not match (was: Partition key filter cannot be pushed down when using cast) > Optimize hive patition filter when the comparision dataType not match > - > > Key: SPARK-45387 > URL: https://issues.apache.org/jira/browse/SPARK-45387 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0 >Reporter: TianyiMa >Priority: Critical > Attachments: PruneFileSourcePartitions.diff > > > Suppose we have a partitioned table `table_pt` with partition colum `dt` > which is StringType and the table metadata is managed by Hive Metastore, if > we filter partition by dt = '123', this filter can be pushed down to data > source directly, but if the filter condition is number, e.g. dt = 123, Spark > will not known which partition should be pushed down. Thus in the process of > physical plan optimization, Spark will pull all of that table's partition > meta data to client side, to decide which partition filter should be push > down to the data source. This is poor of performance if the table has > thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45387) Partition key filter cannot be pushed down when using cast
[ https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TianyiMa updated SPARK-45387: - Description: Suppose we have a partitioned table `table_pt` with partition colum `dt` which is StringType and the table metadata is managed by Hive Metastore, if we filter partition by dt = '123', this filter can be pushed down to data source directly, but if the filter condition is number, e.g. dt = 123, Spark will not known which partition should be pushed down. Thus in the process of physical plan optimization, Spark will pull all of that table's partition meta data to client side, to decide which partition filter should be push down to the data source. This is poor of performance if the table has thousands of partitions and increasing the risk of hive metastore oom. (was: Suppose we have a partitioned table `table_pt` with partition colum `dt` which is StringType and the table metadata is managed by Hive Metastore, if we filter partition by dt = '123', this filter can be pushed down to data source, but if the filter condition is number, e.g. dt = 123, that cannot be pushed down to data source, causing spark to pull all of that table's partition meta data to client, which is poor of performance if the table has thousands of partitions and increasing the risk of hive metastore oom.) > Partition key filter cannot be pushed down when using cast > -- > > Key: SPARK-45387 > URL: https://issues.apache.org/jira/browse/SPARK-45387 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0 >Reporter: TianyiMa >Priority: Critical > Attachments: PruneFileSourcePartitions.diff > > > Suppose we have a partitioned table `table_pt` with partition colum `dt` > which is StringType and the table metadata is managed by Hive Metastore, if > we filter partition by dt = '123', this filter can be pushed down to data > source directly, but if the filter condition is number, e.g. dt = 123, Spark > will not known which partition should be pushed down. Thus in the process of > physical plan optimization, Spark will pull all of that table's partition > meta data to client side, to decide which partition filter should be push > down to the data source. This is poor of performance if the table has > thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness
[ https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47352: - Summary: Fix Upper, Lower, InitCap collation awareness (was: TBD) > Fix Upper, Lower, InitCap collation awareness > - > > Key: SPARK-47352 > URL: https://issues.apache.org/jira/browse/SPARK-47352 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org