[jira] [Comment Edited] (SPARK-50656) JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino

Jie Han (Jira) Thu, 29 May 2025 07:25:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-50656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954953#comment-17954953
 ]


Jie Han edited comment on SPARK-50656 at 5/29/25 2:24 PM:
----------------------------------------------------------

We should {{provide a trino implementation of }}{{{}JdbcDialect trait so that 
we could parse the complex typename of array/map types{}}}.


was (Author: JIRAUSER285788):
*In JDBC, there's no standardized array or Map type.* To access their type 
metadata—particularly the element type we care most about—we have to rely on 
the {{{}typeName{}}}. However, {{typeName}} varies by implementation and is 
database-dialect specific. *We should delegate the {{getCatalystType}} request 
to the {{{}JdbcDialect trait and provide a trino implementation{}}}.*

> JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino
> -----------------------------------------------------------------
>
>                 Key: SPARK-50656
>                 URL: https://issues.apache.org/jira/browse/SPARK-50656
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell
>    Affects Versions: 3.5.4
>         Environment: {*}Environment{*}:
>  * Spark version: 3.5.x 
>  * Trino version: 457
>  * JDBC Driver: {{io.trino.jdbc.TrinoDriver}}
>            Reporter: Narayan Bhawar
>            Priority: Major
>              Labels: Trino, complextype, jdbc, spark
>
> {*}Description{*}:
> I am encountering an issue when using Spark to read data from a Trino 
> instance via JDBC. Specifically, when querying complex types such as 
> {{ARRAY}} or {{MAP}} from Trino, Spark throws an error indicating that it 
> cannot recognize these SQL types. Below is the context:
> {*}Code Example{*}:
>  
> {code:java}
> val sourceDF = spark.read
>   .format("jdbc")
>   .option("driver", "io.trino.jdbc.TrinoDriver")
>   .option("url", "jdbc:trino://localhost:8181")
>   .option("query", "select address from minio.qa.nbcheck1")
>   .load(){code}
>  
>  
> *Error Message:*
>  
> {code:java}
> 2/04 03:49:59 INFO SparkContext: SparkContext already stopped.
> Exception in thread "main" org.apache.spark.SparkSQLException: 
> [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: array (row(city 
> varchar, state varchar)), id: ARRAY.
> at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:992){code}
>  
>  
> {*}Root Cause{*}:
> The error seems to be occurring because Spark's JDBC data source does not 
> recognize complex SQL types like {{ARRAY}} or {{MAP}} from Trino by default. 
> This is confirmed by the following relevant section of Spark's code:
> [https://github.com/apache/spark/blob/v3.5.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala]
> {code:java}
> private def getCatalystType(
>   sqlType: Int,
>   typeName: String,
>   precision: Int,
>   scale: Int,
>   signed: Boolean,
>   isTimestampNTZ: Boolean): DataType = sqlType match {
>     ...
>     case _ =>
>       // For unmatched types:
>       // including java.sql.Types.ARRAY, DATALINK, DISTINCT, JAVA_OBJECT, 
> NULL, OTHER, REF_CURSOR,
>       // TIME_WITH_TIMEZONE, TIMESTAMP_WITH_TIMEZONE, and others.
>       val jdbcType = classOf[JDBCType].getEnumConstants()
>         .find(_.getVendorTypeNumber == sqlType)
>         .map(_.getName)
>         .getOrElse(sqlType.toString)
>       throw QueryExecutionErrors.unrecognizedSqlTypeError(jdbcType, 
> typeName){code}
> As you can see, the method for translating JDBC types to Spark Catalyst types 
> doesn't currently handle ARRAY or MAP, among other types, leading to the 
> error. The JDBC schema translation fails when complex types such as ARRAY or 
> MAP are present.
> {*}Expected Behavior{*}:
> Spark should not fail when encountering complex types like {{ARRAY}} or 
> {{MAP}} from a Trino JDBC source. Instead, it should either:
>  # Convert these complex types into a serialized string format (e.g., JSON) 
> over the wire.
>  # Provide an option for users to manually handle such complex types after 
> loading them into a DataFrame.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-50656) JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino

Reply via email to