Hi Spark Users,

We do a lot of processing in Spark using data that is in MS SQL server.
Today, I created a DataFrame against a table in SQL Server using the
following:

val dfSql=spark.read.jdbc(connectionString, table, props)

I noticed that every column in the DataFrame showed as *nullable=true, *even
though many of them are required.

I went hunting in the code, and I found that in JDBCRDD, when it resolves
the schema of a table, it passes in *alwaysNullable=true* to JdbcUtils,
which forces all columns to resolve as nullable.

https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L62

I don't see a way to change that functionality. Is this by design, or could
it be a bug?

Thanks!
Subhash

Reply via email to