cfmcgrady commented on a change in pull request #33146:
URL: https://github.com/apache/spark/pull/33146#discussion_r661945052



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##########
@@ -406,19 +406,21 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
         if (row.numFields > 0) {
           val st = fields.map(_.dataType)
           val toUTF8StringFuncs = st.map(castToString)
-          if (row.isNullAt(0)) {
+          if (fields(0).nullable && row.isNullAt(0)) {

Review comment:
       If user create dataframe from `spark.internalCreateDataFrame()`, the 
`row.isNullAt()` may be true even though the schema nullable is false.
   For instance:
   ```scala
     val schema = StructType(Seq(
       StructField("x",
         StructType(Seq(
           StructField("y", IntegerType, true),
           StructField("z", IntegerType, false)
         )))))
     val rdd = spark.sparkContext.parallelize(Seq(InternalRow(InternalRow(1, 
null))))
     val df = spark.internalCreateDataFrame(rdd, schema)
     df.show
     // current master branch output
     //  +---------+
     //  |        x|
     //  +---------+
     //  |{1, null}|
     //  +---------+
   ```
   
   Although the `spark.internalCreateDataFrame()` is sql package private API, 
but `spark.read.json()` and `spark.read.csv()` call it without null value 
handled.(the example show in pr description)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to