[ https://issues.apache.org/jira/browse/SPARK-50656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954953#comment-17954953 ]
Jie Han edited comment on SPARK-50656 at 5/29/25 2:24 PM: ---------------------------------------------------------- We should {{provide a trino implementation of }}{{{}JdbcDialect trait so that we could parse the complex typename of array/map types{}}}. was (Author: JIRAUSER285788): *In JDBC, there's no standardized array or Map type.* To access their type metadata—particularly the element type we care most about—we have to rely on the {{{}typeName{}}}. However, {{typeName}} varies by implementation and is database-dialect specific. *We should delegate the {{getCatalystType}} request to the {{{}JdbcDialect trait and provide a trino implementation{}}}.* > JDBC Reader Fails to Handle Complex Types (Array, Map) from Trino > ----------------------------------------------------------------- > > Key: SPARK-50656 > URL: https://issues.apache.org/jira/browse/SPARK-50656 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell > Affects Versions: 3.5.4 > Environment: {*}Environment{*}: > * Spark version: 3.5.x > * Trino version: 457 > * JDBC Driver: {{io.trino.jdbc.TrinoDriver}} > Reporter: Narayan Bhawar > Priority: Major > Labels: Trino, complextype, jdbc, spark > > {*}Description{*}: > I am encountering an issue when using Spark to read data from a Trino > instance via JDBC. Specifically, when querying complex types such as > {{ARRAY}} or {{MAP}} from Trino, Spark throws an error indicating that it > cannot recognize these SQL types. Below is the context: > {*}Code Example{*}: > > {code:java} > val sourceDF = spark.read > .format("jdbc") > .option("driver", "io.trino.jdbc.TrinoDriver") > .option("url", "jdbc:trino://localhost:8181") > .option("query", "select address from minio.qa.nbcheck1") > .load(){code} > > > *Error Message:* > > {code:java} > 2/04 03:49:59 INFO SparkContext: SparkContext already stopped. > Exception in thread "main" org.apache.spark.SparkSQLException: > [UNRECOGNIZED_SQL_TYPE] Unrecognized SQL type - name: array (row(city > varchar, state varchar)), id: ARRAY. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.unrecognizedSqlTypeError(QueryExecutionErrors.scala:992){code} > > > {*}Root Cause{*}: > The error seems to be occurring because Spark's JDBC data source does not > recognize complex SQL types like {{ARRAY}} or {{MAP}} from Trino by default. > This is confirmed by the following relevant section of Spark's code: > [https://github.com/apache/spark/blob/v3.5.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala] > {code:java} > private def getCatalystType( > sqlType: Int, > typeName: String, > precision: Int, > scale: Int, > signed: Boolean, > isTimestampNTZ: Boolean): DataType = sqlType match { > ... > case _ => > // For unmatched types: > // including java.sql.Types.ARRAY, DATALINK, DISTINCT, JAVA_OBJECT, > NULL, OTHER, REF_CURSOR, > // TIME_WITH_TIMEZONE, TIMESTAMP_WITH_TIMEZONE, and others. > val jdbcType = classOf[JDBCType].getEnumConstants() > .find(_.getVendorTypeNumber == sqlType) > .map(_.getName) > .getOrElse(sqlType.toString) > throw QueryExecutionErrors.unrecognizedSqlTypeError(jdbcType, > typeName){code} > As you can see, the method for translating JDBC types to Spark Catalyst types > doesn't currently handle ARRAY or MAP, among other types, leading to the > error. The JDBC schema translation fails when complex types such as ARRAY or > MAP are present. > {*}Expected Behavior{*}: > Spark should not fail when encountering complex types like {{ARRAY}} or > {{MAP}} from a Trino JDBC source. Instead, it should either: > # Convert these complex types into a serialized string format (e.g., JSON) > over the wire. > # Provide an option for users to manually handle such complex types after > loading them into a DataFrame. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org