cfmcgrady commented on a change in pull request #33146: URL: https://github.com/apache/spark/pull/33146#discussion_r661945052
########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ########## @@ -406,19 +406,21 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit if (row.numFields > 0) { val st = fields.map(_.dataType) val toUTF8StringFuncs = st.map(castToString) - if (row.isNullAt(0)) { + if (fields(0).nullable && row.isNullAt(0)) { Review comment: If user create dataframe from `spark.internalCreateDataFrame()`, the `row.isNullAt()` may be true even though the schema nullable is false. For instance: ```scala val schema = StructType(Seq( StructField("x", StructType(Seq( StructField("y", IntegerType, true), StructField("z", IntegerType, false) ))))) val rdd = spark.sparkContext.parallelize(Seq(InternalRow(InternalRow(1, null)))) val df = spark.internalCreateDataFrame(rdd, schema) df.show // current master branch output // +---------+ // | x| // +---------+ // |{1, null}| // +---------+ ``` Although the `spark.internalCreateDataFrame()` is sql package private API, but `spark.read.json()` and `spark.read.csv()` call it without null value handled.(the example show in pr description) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org