[ https://issues.apache.org/jira/browse/SPARK-32992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peng Cheng updated SPARK-32992: ------------------------------- Description: Most JDBC drivers use long SQL type for dataset row ID: (in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils) {code:java} private def getCatalystType( sqlType: Int, precision: Int, scale: Int, signed: Boolean): DataType = { val answer = sqlType match { // scalastyle:off ... case java.sql.Types.ROWID => LongType ... case _ => throw new SQLException("Unrecognized SQL type " + sqlType) // scalastyle:on } if (answer == null) { throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName) } answer {code} Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted: (in oracle.jdbc.driver.RowidAccessor, decompiled bytecode) {code:java} ... String getString(int var1) throws SQLException { return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1)); } Object getObject(int var1) throws SQLException { return this.getROWID(var1); } ... {code} This caused an exception to be thrown when importing datasets from an Oracle DB, as reported in [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:] {code:java} {{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440) at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228) at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}} {code} Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added: {code:java} case Types.ROWID => Some(StringType) {code} was: Most JDBC drivers use long SQL type for dataset row ID: (in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils) ```scala private def getCatalystType( sqlType: Int, precision: Int, scale: Int, signed: Boolean): DataType = { val answer = sqlType match { // scalastyle:off ... {color:#de350b}case java.sql.Types.ROWID => LongType{color} ... case _ => throw new SQLException("Unrecognized SQL type " + sqlType) // scalastyle:on } if (answer == null) { throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName) } answer } ``` Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted: (in oracle.jdbc.driver.RowidAccessor, decompiled bytecode) ```java ... String getString(int var1) throws SQLException { return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1)); } Object getObject(int var1) throws SQLException { return this.getROWID(var1); } ... ``` This caused the common exception to be reported in [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:] ``` {{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440) at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228) at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}} ``` Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added: ``` case Types.ROWID => Some(StringType) ``` > In OracleDialect, "RowID" SQL type should be converted into "String" Catalyst > type > ---------------------------------------------------------------------------------- > > Key: SPARK-32992 > URL: https://issues.apache.org/jira/browse/SPARK-32992 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.7, 3.1.0 > Reporter: Peng Cheng > Priority: Minor > Labels: jdbc, jdbc_connector > > Most JDBC drivers use long SQL type for dataset row ID: > > (in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils) > > {code:java} > private def getCatalystType( > sqlType: Int, > precision: Int, > scale: Int, > signed: Boolean): DataType = { > val answer = sqlType match { > // scalastyle:off > ... > case java.sql.Types.ROWID => LongType > ... > case _ => > throw new SQLException("Unrecognized SQL type " + sqlType) > // scalastyle:on > } > if (answer == null) > { throw new SQLException("Unsupported type " + > JDBCType.valueOf(sqlType).getName) } > answer > {code} > > > Oracle JDBC drivers (of all versions) are rare exception, only String value > can be extracted: > > (in oracle.jdbc.driver.RowidAccessor, decompiled bytecode) > {code:java} > ... > String getString(int var1) throws SQLException > { return this.isNull(var1) ? null : > this.rowData.getString(this.getOffset(var1), this.getLength(var1), > this.statement.connection.conversion.getCharacterSet((short)1)); } > Object getObject(int var1) throws SQLException > { return this.getROWID(var1); } > ... > {code} > > > > This caused an exception to be thrown when importing datasets from an Oracle > DB, as reported in > [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:] > {code:java} > > {{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage > 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): > java.sql.SQLException: Invalid column type: getLong not implemented for class > oracle.jdbc.driver.T4CRowidAccessor at > oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440) > at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228) > at > oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}} > > {code} > > Therefore, the default SQL type => Catalyst type conversion rule should be > overriden in OracleDialect. Specifically, the following rule should be added: > > {code:java} > case Types.ROWID => Some(StringType) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org