[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable

Jason Moore (JIRA) Tue, 23 Aug 2016 00:43:05 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432314#comment-15432314
 ]


Jason Moore commented on SPARK-17195:
-------------------------------------

That's right.

The JDBC API has ResultSetMetaData.isNullable returning:

* ResultSetMetaData.columnNoNulls (= 0) which means the column does not allow 
NULL values
* ResultSetMetaData.columnNullable (= 1) which means the column allows NULL 
values
* ResultSetMetaData.columnNullableUnknown (= 2) which means the nullability of 
a column's values is unknown

In Spark we take this result and do as you've described: If something is not 
non null then it is nullable.  See first link in the ticket description above.

> Dealing with JDBC column nullability when it is not reliable
> ------------------------------------------------------------
>
>                 Key: SPARK-17195
>                 URL: https://issues.apache.org/jira/browse/SPARK-17195
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Jason Moore
>
> Starting with Spark 2.0.0, the column "nullable" property is important to 
> have correct for the code generation to work properly.  Marking the column as 
> nullable = false used to (<2.0.0) allow null values to be operated on, but 
> now this will result in:
> {noformat}
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)
>         at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>         at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> {noformat}
> I'm all for the change towards a more ridged behavior (enforcing correct 
> input).  But the problem I'm facing now is that when I used JDBC to read from 
> a Teradata server, the column nullability is often not correct (particularly 
> when sub-queries are involved).
> This is the line in question:
> https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140
> I'm trying to work out what would be the way forward for me on this.  I know 
> that it's really the fault of the Teradata database server not returning the 
> correct schema, but I'll need to make Spark itself or my application 
> resilient to this behavior.
> One of the Teradata JDBC Driver tech leads has told me that "when the 
> rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length 
> string, then the other metadata values may not be completely accurate" - so 
> one option could be to treat the nullability (at least) the same way as the 
> "unknown" case (as nullable = true).  For reference, see the rest of our 
> discussion here: 
> http://forums.teradata.com/forum/connectivity/teradata-jdbc-driver-returns-the-wrong-schema-column-nullability
> Any other thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable

Reply via email to