[jira] [Resolved] (SPARK-46522) Block Python data source registration with name conflicts
[ https://issues.apache.org/jira/browse/SPARK-46522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46522. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44507 [https://github.com/apache/spark/pull/44507] > Block Python data source registration with name conflicts > - > > Key: SPARK-46522 > URL: https://issues.apache.org/jira/browse/SPARK-46522 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Users should not be allowed to register Python data sources with names that > are the same as builtin or existing Scala/Java data sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46522) Block Python data source registration with name conflicts
[ https://issues.apache.org/jira/browse/SPARK-46522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46522: Assignee: Allison Wang > Block Python data source registration with name conflicts > - > > Key: SPARK-46522 > URL: https://issues.apache.org/jira/browse/SPARK-46522 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Users should not be allowed to register Python data sources with names that > are the same as builtin or existing Scala/Java data sources. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46616) Disallow re-registration of statically registered data sources
Allison Wang created SPARK-46616: Summary: Disallow re-registration of statically registered data sources Key: SPARK-46616 URL: https://issues.apache.org/jira/browse/SPARK-46616 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Allison Wang This is a follow-up for SPARK-46522. Currently, Spark allows the re-registration of both statically and dynamically registered Python data sources. However, for (statically) registered Java/Scala data sources, Spark currently throws exceptions if users try to register the data soruce with the same name. We should make this behavior consistent: either allow re-registration of all statically loaded data sources, or disallow them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46615) Support s.c.immutable.ArraySeq in ArrowDeserializers
[ https://issues.apache.org/jira/browse/SPARK-46615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46615: --- Labels: pull-request-available (was: ) > Support s.c.immutable.ArraySeq in ArrowDeserializers > > > Key: SPARK-46615 > URL: https://issues.apache.org/jira/browse/SPARK-46615 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46615) Support s.c.immutable.ArraySeq in ArrowDeserializers
BingKun Pan created SPARK-46615: --- Summary: Support s.c.immutable.ArraySeq in ArrowDeserializers Key: SPARK-46615 URL: https://issues.apache.org/jira/browse/SPARK-46615 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46611) Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter
[ https://issues.apache.org/jira/browse/SPARK-46611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46611. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44613 [https://github.com/apache/spark/pull/44613] > Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter > - > > Key: SPARK-46611 > URL: https://issues.apache.org/jira/browse/SPARK-46611 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-46437: -- Assignee: (was: Nicholas Chammas) Reverted at https://github.com/apache/spark/commit/a88c64e7dbdd813fa0a9df85a0ce9f1db6706ede > Remove unnecessary cruft from SQL built-in functions docs > - > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs
[ https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46437: - Fix Version/s: (was: 4.0.0) > Remove unnecessary cruft from SQL built-in functions docs > - > > Key: SPARK-46437 > URL: https://issues.apache.org/jira/browse/SPARK-46437 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.5.0 >Reporter: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46608) Restore backward compatibility of JdbcDialect.classifyException
[ https://issues.apache.org/jira/browse/SPARK-46608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46608: --- Labels: pull-request-available (was: ) > Restore backward compatibility of JdbcDialect.classifyException > --- > > Key: SPARK-46608 > URL: https://issues.apache.org/jira/browse/SPARK-46608 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Restore backward compatibility of JdbcDialect.classifyException and support > both w/ error class and without it. This should allow to avoid breaking of > existing JDBC dialects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38488) Spark doc build not work on Mac OS M1
[ https://issues.apache.org/jira/browse/SPARK-38488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-38488: --- Labels: pull-request-available (was: ) > Spark doc build not work on Mac OS M1 > - > > Key: SPARK-38488 > URL: https://issues.apache.org/jira/browse/SPARK-38488 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0, 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > > {code:java} > diff --git a/docs/.bundle/config b/docs/.bundle/config > index b13821f801..68c1ee493a 100644 > --- a/docs/.bundle/config > +++ b/docs/.bundle/config > @@ -1,2 +1,3 @@ > --- > BUNDLE_PATH: ".local_ruby_bundle" > +BUNDLE_BUILD__FFI: "--enable-libffi-alloc" > diff --git a/docs/Gemfile b/docs/Gemfile > index f991622708..6c35201296 100644 > --- a/docs/Gemfile > +++ b/docs/Gemfile > @@ -17,6 +17,7 @@ > source "https://rubygems.org; > +gem "ffi", "1.15.5" > gem "jekyll", "4.2.1" > gem "rouge", "3.26.0" > gem "jekyll-redirect-from", "0.16.0" > {code} > After above patch redo `bundle install`, then it works, you could see this as > ref if you meet the same issue. > will take a deep look to solve this. > > related: https://github.com/ffi/ffi/issues/864 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46614) Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`
BingKun Pan created SPARK-46614: --- Summary: Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval` Key: SPARK-46614 URL: https://issues.apache.org/jira/browse/SPARK-46614 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
[ https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46613. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44617 [https://github.com/apache/spark/pull/44617] > Log full exception when failed to lookup Python Data Sources > > > Key: SPARK-46613 > URL: https://issues.apache.org/jira/browse/SPARK-46613 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > See https://github.com/apache/spark/pull/44617 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
[ https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46613: Assignee: Hyukjin Kwon > Log full exception when failed to lookup Python Data Sources > > > Key: SPARK-46613 > URL: https://issues.apache.org/jira/browse/SPARK-46613 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > See https://github.com/apache/spark/pull/44617 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46603) Refine docstring of `parse_url/url_encode/url_decode`
[ https://issues.apache.org/jira/browse/SPARK-46603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46603: Assignee: Yang Jie > Refine docstring of `parse_url/url_encode/url_decode` > - > > Key: SPARK-46603 > URL: https://issues.apache.org/jira/browse/SPARK-46603 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46603) Refine docstring of `parse_url/url_encode/url_decode`
[ https://issues.apache.org/jira/browse/SPARK-46603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46603. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44604 [https://github.com/apache/spark/pull/44604] > Refine docstring of `parse_url/url_encode/url_decode` > - > > Key: SPARK-46603 > URL: https://issues.apache.org/jira/browse/SPARK-46603 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46606) Refine docstring of `convert_timezone/make_dt_interval/make_interval`
[ https://issues.apache.org/jira/browse/SPARK-46606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46606. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44610 [https://github.com/apache/spark/pull/44610] > Refine docstring of `convert_timezone/make_dt_interval/make_interval` > - > > Key: SPARK-46606 > URL: https://issues.apache.org/jira/browse/SPARK-46606 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46606) Refine docstring of `convert_timezone/make_dt_interval/make_interval`
[ https://issues.apache.org/jira/browse/SPARK-46606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46606: Assignee: BingKun Pan > Refine docstring of `convert_timezone/make_dt_interval/make_interval` > - > > Key: SPARK-46606 > URL: https://issues.apache.org/jira/browse/SPARK-46606 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
[ https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46613: - Description: See https://github.com/apache/spark/pull/44617 > Log full exception when failed to lookup Python Data Sources > > > Key: SPARK-46613 > URL: https://issues.apache.org/jira/browse/SPARK-46613 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > See https://github.com/apache/spark/pull/44617 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
Hyukjin Kwon created SPARK-46613: Summary: Log full exception when failed to lookup Python Data Sources Key: SPARK-46613 URL: https://issues.apache.org/jira/browse/SPARK-46613 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46613) Log full exception when failed to lookup Python Data Sources
[ https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46613: --- Labels: pull-request-available (was: ) > Log full exception when failed to lookup Python Data Sources > > > Key: SPARK-46613 > URL: https://issues.apache.org/jira/browse/SPARK-46613 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46612) Clickhouse's JDBC throws `java.lang.IllegalArgumentException: Unknown data type: string` when write array string with Apache Spark scala
[ https://issues.apache.org/jira/browse/SPARK-46612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nguyen Phan Huy updated SPARK-46612: Description: Issue is also reported on Clickhouse's github: [https://github.com/ClickHouse/clickhouse-java/issues/1505] h3. Bug description When using Scala spark to write an array of string to Clickhouse, the driver throws {{java.lang.IllegalArgumentException: Unknown data type: string}} exception. Exception is thrown by: [https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238] This was caused by Spark JDBC Utils tried to cast the type to lower case ({{{}String{}}} -> {{{}string{}}}). [https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639] h3. Steps to reproduce # Create Clickhouse table with String Array field ([https://clickhouse.com/]). # Write data to the table with scala Spark, via Clickhouse's JDBC ([https://github.com/ClickHouse/clickhouse-java)] {code:java} // code extraction, will need to setup a Scala Spark job with clickhouse jdbc val clickHouseSchema = StructType( Seq( StructField("str_array", ArrayType(StringType)) ) ) val data = Seq( Row( Seq("a", "b") ) ) val clickHouseDf = spark.createDataFrame(sc.parallelize(data), clickHouseSchema) val props = new Properties props.put("user", "default") clickHouseDf.write .mode(SaveMode.Append) .option("driver", com.clickhouse.jdbc.ClickHouseDriver) .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props) {code} h2. Fix - [https://github.com/apache/spark/pull/44459] was: h3. Bug description When using Scala spark to write an array of string to Clickhouse, the driver throws {{java.lang.IllegalArgumentException: Unknown data type: string}} exception. Exception is thrown by: [https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238] This was caused by Spark JDBC Utils tried to cast the type to lower case ({{{}String{}}} -> {{{}string{}}}). [https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639] h3. Steps to reproduce # Create Clickhouse table with String Array field (https://clickhouse.com/). # Write data to the table with scala Spark, via Clickhouse's JDBC ([https://github.com/ClickHouse/clickhouse-java)] {code:java} // code extraction, will need to setup a Scala Spark job with clickhouse jdbc val clickHouseSchema = StructType( Seq( StructField("str_array", ArrayType(StringType)) ) ) val data = Seq( Row( Seq("a", "b") ) ) val clickHouseDf = spark.createDataFrame(sc.parallelize(data), clickHouseSchema) val props = new Properties props.put("user", "default") clickHouseDf.write .mode(SaveMode.Append) .option("driver", com.clickhouse.jdbc.ClickHouseDriver) .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props) {code} h2. Fix - [https://github.com/apache/spark/pull/44459] > Clickhouse's JDBC throws `java.lang.IllegalArgumentException: Unknown data > type: string` when write array string with Apache Spark scala > > > Key: SPARK-46612 > URL: https://issues.apache.org/jira/browse/SPARK-46612 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Nguyen Phan Huy >Priority: Major > > Issue is also reported on Clickhouse's github: > [https://github.com/ClickHouse/clickhouse-java/issues/1505] > h3. Bug description > When using Scala spark to write an array of string to Clickhouse, the driver > throws {{java.lang.IllegalArgumentException: Unknown data type: string}} > exception. > Exception is thrown by: > [https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238] > This was caused by Spark JDBC Utils tried to cast the type to lower case > ({{{}String{}}} -> {{{}string{}}}). > [https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639] > h3. Steps to reproduce > # Create Clickhouse table with String Array field > ([https://clickhouse.com/]). > # Write data to the
[jira] [Created] (SPARK-46612) Clickhouse's JDBC throws `java.lang.IllegalArgumentException: Unknown data type: string` when write array string with Apache Spark scala
Nguyen Phan Huy created SPARK-46612: --- Summary: Clickhouse's JDBC throws `java.lang.IllegalArgumentException: Unknown data type: string` when write array string with Apache Spark scala Key: SPARK-46612 URL: https://issues.apache.org/jira/browse/SPARK-46612 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Nguyen Phan Huy h3. Bug description When using Scala spark to write an array of string to Clickhouse, the driver throws {{java.lang.IllegalArgumentException: Unknown data type: string}} exception. Exception is thrown by: [https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238] This was caused by Spark JDBC Utils tried to cast the type to lower case ({{{}String{}}} -> {{{}string{}}}). [https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639] h3. Steps to reproduce # Create Clickhouse table with String Array field (https://clickhouse.com/). # Write data to the table with scala Spark, via Clickhouse's JDBC ([https://github.com/ClickHouse/clickhouse-java)] {code:java} // code extraction, will need to setup a Scala Spark job with clickhouse jdbc val clickHouseSchema = StructType( Seq( StructField("str_array", ArrayType(StringType)) ) ) val data = Seq( Row( Seq("a", "b") ) ) val clickHouseDf = spark.createDataFrame(sc.parallelize(data), clickHouseSchema) val props = new Properties props.put("user", "default") clickHouseDf.write .mode(SaveMode.Append) .option("driver", com.clickhouse.jdbc.ClickHouseDriver) .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props) {code} h2. Fix - [https://github.com/apache/spark/pull/44459] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML
[ https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46248: Assignee: Shujing Yang > Support ignoreCorruptFiles and ignoreMissingFiles options in XML > > > Key: SPARK-46248 > URL: https://issues.apache.org/jira/browse/SPARK-46248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > This PR corrects the handling of corrupt or missing multiline XML files by > respecting user-specific options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46607) Check the testing mode
[ https://issues.apache.org/jira/browse/SPARK-46607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46607: Assignee: Ruifeng Zheng > Check the testing mode > -- > > Key: SPARK-46607 > URL: https://issues.apache.org/jira/browse/SPARK-46607 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML
[ https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46248. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44163 [https://github.com/apache/spark/pull/44163] > Support ignoreCorruptFiles and ignoreMissingFiles options in XML > > > Key: SPARK-46248 > URL: https://issues.apache.org/jira/browse/SPARK-46248 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This PR corrects the handling of corrupt or missing multiline XML files by > respecting user-specific options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46607) Check the testing mode
[ https://issues.apache.org/jira/browse/SPARK-46607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46607. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44611 [https://github.com/apache/spark/pull/44611] > Check the testing mode > -- > > Key: SPARK-46607 > URL: https://issues.apache.org/jira/browse/SPARK-46607 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org