[jira] [Resolved] (SPARK-46522) Block Python data source registration with name conflicts

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46522.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44507
[https://github.com/apache/spark/pull/44507]

> Block Python data source registration with name conflicts
> -
>
> Key: SPARK-46522
> URL: https://issues.apache.org/jira/browse/SPARK-46522
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Users should not be allowed to register Python data sources with names that 
> are the same as builtin or existing Scala/Java data sources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46522) Block Python data source registration with name conflicts

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46522:


Assignee: Allison Wang

> Block Python data source registration with name conflicts
> -
>
> Key: SPARK-46522
> URL: https://issues.apache.org/jira/browse/SPARK-46522
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Users should not be allowed to register Python data sources with names that 
> are the same as builtin or existing Scala/Java data sources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46616) Disallow re-registration of statically registered data sources

2024-01-07 Thread Allison Wang (Jira)
Allison Wang created SPARK-46616:


 Summary: Disallow re-registration of statically registered data 
sources
 Key: SPARK-46616
 URL: https://issues.apache.org/jira/browse/SPARK-46616
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


This is a follow-up for SPARK-46522. Currently, Spark allows the 
re-registration of both statically and dynamically registered Python data 
sources. However, for (statically) registered Java/Scala data sources, Spark 
currently throws exceptions if users try to register the data soruce with the 
same name.

We should make this behavior consistent: either allow re-registration of all 
statically loaded data sources, or disallow them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46615) Support s.c.immutable.ArraySeq in ArrowDeserializers

2024-01-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46615:
---
Labels: pull-request-available  (was: )

> Support s.c.immutable.ArraySeq in ArrowDeserializers
> 
>
> Key: SPARK-46615
> URL: https://issues.apache.org/jira/browse/SPARK-46615
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46615) Support s.c.immutable.ArraySeq in ArrowDeserializers

2024-01-07 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-46615:
---

 Summary: Support s.c.immutable.ArraySeq in ArrowDeserializers
 Key: SPARK-46615
 URL: https://issues.apache.org/jira/browse/SPARK-46615
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46611) Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-07 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-46611.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44613
[https://github.com/apache/spark/pull/44613]

> Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter
> -
>
> Key: SPARK-46611
> URL: https://issues.apache.org/jira/browse/SPARK-46611
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-46437:
--
  Assignee: (was: Nicholas Chammas)

Reverted at 
https://github.com/apache/spark/commit/a88c64e7dbdd813fa0a9df85a0ce9f1db6706ede

> Remove unnecessary cruft from SQL built-in functions docs
> -
>
> Key: SPARK-46437
> URL: https://issues.apache.org/jira/browse/SPARK-46437
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.5.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46437:
-
Fix Version/s: (was: 4.0.0)

> Remove unnecessary cruft from SQL built-in functions docs
> -
>
> Key: SPARK-46437
> URL: https://issues.apache.org/jira/browse/SPARK-46437
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.5.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46608) Restore backward compatibility of JdbcDialect.classifyException

2024-01-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46608:
---
Labels: pull-request-available  (was: )

> Restore backward compatibility of JdbcDialect.classifyException
> ---
>
> Key: SPARK-46608
> URL: https://issues.apache.org/jira/browse/SPARK-46608
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Restore backward compatibility of JdbcDialect.classifyException and support 
> both w/ error class and without it. This should allow to avoid breaking of 
> existing JDBC dialects.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38488) Spark doc build not work on Mac OS M1

2024-01-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-38488:
---
Labels: pull-request-available  (was: )

> Spark doc build not work on Mac OS M1
> -
>
> Key: SPARK-38488
> URL: https://issues.apache.org/jira/browse/SPARK-38488
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
>  
> {code:java}
> diff --git a/docs/.bundle/config b/docs/.bundle/config
> index b13821f801..68c1ee493a 100644
> --- a/docs/.bundle/config
> +++ b/docs/.bundle/config
> @@ -1,2 +1,3 @@
>  ---
>  BUNDLE_PATH: ".local_ruby_bundle"
> +BUNDLE_BUILD__FFI: "--enable-libffi-alloc"
> diff --git a/docs/Gemfile b/docs/Gemfile
> index f991622708..6c35201296 100644
> --- a/docs/Gemfile
> +++ b/docs/Gemfile
> @@ -17,6 +17,7 @@
>  source "https://rubygems.org;
> +gem "ffi", "1.15.5"
>  gem "jekyll", "4.2.1"
>  gem "rouge", "3.26.0"
>  gem "jekyll-redirect-from", "0.16.0"
> {code}
> After above patch redo `bundle install`, then it works, you could see this as 
> ref if you meet the same issue.
> will take a deep look to solve this.
>  
> related: https://github.com/ffi/ffi/issues/864



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46614) Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`

2024-01-07 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-46614:
---

 Summary: Refine docstring 
`make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval`
 Key: SPARK-46614
 URL: https://issues.apache.org/jira/browse/SPARK-46614
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46613) Log full exception when failed to lookup Python Data Sources

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46613.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44617
[https://github.com/apache/spark/pull/44617]

> Log full exception when failed to lookup Python Data Sources
> 
>
> Key: SPARK-46613
> URL: https://issues.apache.org/jira/browse/SPARK-46613
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See https://github.com/apache/spark/pull/44617



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46613) Log full exception when failed to lookup Python Data Sources

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46613:


Assignee: Hyukjin Kwon

> Log full exception when failed to lookup Python Data Sources
> 
>
> Key: SPARK-46613
> URL: https://issues.apache.org/jira/browse/SPARK-46613
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> See https://github.com/apache/spark/pull/44617



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46603) Refine docstring of `parse_url/url_encode/url_decode`

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46603:


Assignee: Yang Jie

> Refine docstring of `parse_url/url_encode/url_decode`
> -
>
> Key: SPARK-46603
> URL: https://issues.apache.org/jira/browse/SPARK-46603
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46603) Refine docstring of `parse_url/url_encode/url_decode`

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46603.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44604
[https://github.com/apache/spark/pull/44604]

> Refine docstring of `parse_url/url_encode/url_decode`
> -
>
> Key: SPARK-46603
> URL: https://issues.apache.org/jira/browse/SPARK-46603
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46606) Refine docstring of `convert_timezone/make_dt_interval/make_interval`

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46606.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44610
[https://github.com/apache/spark/pull/44610]

> Refine docstring of `convert_timezone/make_dt_interval/make_interval`
> -
>
> Key: SPARK-46606
> URL: https://issues.apache.org/jira/browse/SPARK-46606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46606) Refine docstring of `convert_timezone/make_dt_interval/make_interval`

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46606:


Assignee: BingKun Pan

> Refine docstring of `convert_timezone/make_dt_interval/make_interval`
> -
>
> Key: SPARK-46606
> URL: https://issues.apache.org/jira/browse/SPARK-46606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46613) Log full exception when failed to lookup Python Data Sources

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46613:
-
Description: See https://github.com/apache/spark/pull/44617

> Log full exception when failed to lookup Python Data Sources
> 
>
> Key: SPARK-46613
> URL: https://issues.apache.org/jira/browse/SPARK-46613
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> See https://github.com/apache/spark/pull/44617



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46613) Log full exception when failed to lookup Python Data Sources

2024-01-07 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46613:


 Summary: Log full exception when failed to lookup Python Data 
Sources
 Key: SPARK-46613
 URL: https://issues.apache.org/jira/browse/SPARK-46613
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46613) Log full exception when failed to lookup Python Data Sources

2024-01-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46613:
---
Labels: pull-request-available  (was: )

> Log full exception when failed to lookup Python Data Sources
> 
>
> Key: SPARK-46613
> URL: https://issues.apache.org/jira/browse/SPARK-46613
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46612) Clickhouse's JDBC throws `java.lang.IllegalArgumentException: Unknown data type: string` when write array string with Apache Spark scala

2024-01-07 Thread Nguyen Phan Huy (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nguyen Phan Huy updated SPARK-46612:

Description: 
Issue is also reported on Clickhouse's github: 
[https://github.com/ClickHouse/clickhouse-java/issues/1505] 
h3. Bug description



When using Scala spark to write an array of string to Clickhouse, the driver 
throws {{java.lang.IllegalArgumentException: Unknown data type: string}} 
exception.

Exception is thrown by: 
[https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238]

This was caused by Spark JDBC Utils tried to cast the type to lower case 
({{{}String{}}} -> {{{}string{}}}).
[https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639]
h3. Steps to reproduce
 # Create Clickhouse table with String Array field ([https://clickhouse.com/]).
 # Write data to the table with scala Spark, via Clickhouse's JDBC 
([https://github.com/ClickHouse/clickhouse-java)] 
{code:java}
   // code extraction, will need to setup a Scala Spark job with clickhouse jdbc
val clickHouseSchema = StructType(
  Seq(
StructField("str_array", ArrayType(StringType))
  )
)
val data = Seq(
  Row(
Seq("a", "b")
  )
)

val clickHouseDf = spark.createDataFrame(sc.parallelize(data), 
clickHouseSchema)
   
val props = new Properties
props.put("user", "default")
clickHouseDf.write
  .mode(SaveMode.Append)
  .option("driver", com.clickhouse.jdbc.ClickHouseDriver)
  .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props) {code}
h2. Fix

 - [https://github.com/apache/spark/pull/44459] 

  was:
h3. Bug description

 

When using Scala spark to write an array of string to Clickhouse, the driver 
throws {{java.lang.IllegalArgumentException: Unknown data type: string}} 
exception.

Exception is thrown by: 
[https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238]

This was caused by Spark JDBC Utils tried to cast the type to lower case 
({{{}String{}}} -> {{{}string{}}}).
[https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639]
h3. Steps to reproduce

 # Create Clickhouse table with String Array field (https://clickhouse.com/).
 # Write data to the table with scala Spark, via Clickhouse's JDBC 
([https://github.com/ClickHouse/clickhouse-java)] 
{code:java}
   // code extraction, will need to setup a Scala Spark job with clickhouse jdbc
val clickHouseSchema = StructType(
  Seq(
StructField("str_array", ArrayType(StringType))
  )
)
val data = Seq(
  Row(
Seq("a", "b")
  )
)

val clickHouseDf = spark.createDataFrame(sc.parallelize(data), 
clickHouseSchema)
   
val props = new Properties
props.put("user", "default")
clickHouseDf.write
  .mode(SaveMode.Append)
  .option("driver", com.clickhouse.jdbc.ClickHouseDriver)
  .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props) {code}
h2. Fix
- [https://github.com/apache/spark/pull/44459] 


> Clickhouse's JDBC throws `java.lang.IllegalArgumentException: Unknown data 
> type: string` when write array string with Apache Spark scala
> 
>
> Key: SPARK-46612
> URL: https://issues.apache.org/jira/browse/SPARK-46612
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Nguyen Phan Huy
>Priority: Major
>
> Issue is also reported on Clickhouse's github: 
> [https://github.com/ClickHouse/clickhouse-java/issues/1505] 
> h3. Bug description
> When using Scala spark to write an array of string to Clickhouse, the driver 
> throws {{java.lang.IllegalArgumentException: Unknown data type: string}} 
> exception.
> Exception is thrown by: 
> [https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238]
> This was caused by Spark JDBC Utils tried to cast the type to lower case 
> ({{{}String{}}} -> {{{}string{}}}).
> [https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639]
> h3. Steps to reproduce
>  # Create Clickhouse table with String Array field 
> ([https://clickhouse.com/]).
>  # Write data to the 

[jira] [Created] (SPARK-46612) Clickhouse's JDBC throws `java.lang.IllegalArgumentException: Unknown data type: string` when write array string with Apache Spark scala

2024-01-07 Thread Nguyen Phan Huy (Jira)
Nguyen Phan Huy created SPARK-46612:
---

 Summary: Clickhouse's JDBC throws 
`java.lang.IllegalArgumentException: Unknown data type: string` when write 
array string with Apache Spark scala
 Key: SPARK-46612
 URL: https://issues.apache.org/jira/browse/SPARK-46612
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Nguyen Phan Huy


h3. Bug description

 

When using Scala spark to write an array of string to Clickhouse, the driver 
throws {{java.lang.IllegalArgumentException: Unknown data type: string}} 
exception.

Exception is thrown by: 
[https://github.com/ClickHouse/clickhouse-java/blob/aa3870eadb1a2d3675fd5119714c85851800f076/clickhouse-data/src/main/java/com/clickhouse/data/ClickHouseDataType.java#L238]

This was caused by Spark JDBC Utils tried to cast the type to lower case 
({{{}String{}}} -> {{{}string{}}}).
[https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639]
h3. Steps to reproduce

 # Create Clickhouse table with String Array field (https://clickhouse.com/).
 # Write data to the table with scala Spark, via Clickhouse's JDBC 
([https://github.com/ClickHouse/clickhouse-java)] 
{code:java}
   // code extraction, will need to setup a Scala Spark job with clickhouse jdbc
val clickHouseSchema = StructType(
  Seq(
StructField("str_array", ArrayType(StringType))
  )
)
val data = Seq(
  Row(
Seq("a", "b")
  )
)

val clickHouseDf = spark.createDataFrame(sc.parallelize(data), 
clickHouseSchema)
   
val props = new Properties
props.put("user", "default")
clickHouseDf.write
  .mode(SaveMode.Append)
  .option("driver", com.clickhouse.jdbc.ClickHouseDriver)
  .jdbc("jdbc:clickhouse://localhost:8123/foo", table = "bar", props) {code}
h2. Fix
- [https://github.com/apache/spark/pull/44459] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46248:


Assignee: Shujing Yang

> Support ignoreCorruptFiles and ignoreMissingFiles options in XML
> 
>
> Key: SPARK-46248
> URL: https://issues.apache.org/jira/browse/SPARK-46248
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> This PR corrects the handling of corrupt or missing multiline XML files by 
> respecting user-specific options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46607) Check the testing mode

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46607:


Assignee: Ruifeng Zheng

> Check the testing mode
> --
>
> Key: SPARK-46607
> URL: https://issues.apache.org/jira/browse/SPARK-46607
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46248) Support ignoreCorruptFiles and ignoreMissingFiles options in XML

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46248.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44163
[https://github.com/apache/spark/pull/44163]

> Support ignoreCorruptFiles and ignoreMissingFiles options in XML
> 
>
> Key: SPARK-46248
> URL: https://issues.apache.org/jira/browse/SPARK-46248
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This PR corrects the handling of corrupt or missing multiline XML files by 
> respecting user-specific options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46607) Check the testing mode

2024-01-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46607.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44611
[https://github.com/apache/spark/pull/44611]

> Check the testing mode
> --
>
> Key: SPARK-46607
> URL: https://issues.apache.org/jira/browse/SPARK-46607
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org