[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable

Jason Moore (JIRA) Mon, 22 Aug 2016 21:47:11 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432129#comment-15432129
 ]


Jason Moore commented on SPARK-17195:
-------------------------------------

> I think it might be sensible to support this by implementing 
> `SchemaRelationProvider`.

That was one of the thoughts I had too.

> it should be fixed in Teradata

I more or less agree with this, but I'm not certain their JDBC driver is being 
provided with everything they need from the server to decide that it should be 
"columnNullableUnknown".  Maybe I'll shoot some more questions their way on 
this.

> Dealing with JDBC column nullability when it is not reliable
> ------------------------------------------------------------
>
>                 Key: SPARK-17195
>                 URL: https://issues.apache.org/jira/browse/SPARK-17195
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Jason Moore
>
> Starting with Spark 2.0.0, the column "nullable" property is important to 
> have correct for the code generation to work properly.  Marking the column as 
> nullable = false used to (<2.0.0) allow null values to be operated on, but 
> now this will result in:
> {noformat}
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)
>         at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>         at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> {noformat}
> I'm all for the change towards a more ridged behavior (enforcing correct 
> input).  But the problem I'm facing now is that when I used JDBC to read from 
> a Teradata server, the column nullability is often not correct (particularly 
> when sub-queries are involved).
> This is the line in question:
> https://github.com/apache/spark/blob/v2.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L140
> I'm trying to work out what would be the way forward for me on this.  I know 
> that it's really the fault of the Teradata database server not returning the 
> correct schema, but I'll need to make Spark itself or my application 
> resilient to this behavior.
> One of the Teradata JDBC Driver tech leads has told me that "when the 
> rsmd.getSchemaName and rsmd.getTableName methods return an empty zero-length 
> string, then the other metadata values may not be completely accurate" - so 
> one option could be to treat the nullability (at least) the same way as the 
> "unknown" case (as nullable = true).  For reference, see the rest of our 
> discussion here: 
> http://forums.teradata.com/forum/connectivity/teradata-jdbc-driver-returns-the-wrong-schema-column-nullability
> Any other thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable

Reply via email to