Github user logannc commented on a diff in the pull request: https://github.com/apache/spark/pull/18945#discussion_r140412745 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1761,12 +1761,37 @@ def toPandas(self): raise ImportError("%s\n%s" % (e.message, msg)) else: dtype = {} + columns_with_null_int = {} + def null_handler(rows, columns_with_null_int): + for row in rows: + row = row.asDict() + for column in columns_with_null_int: + val = row[column] + dt = dtype[column] + if val is not None: --- End diff -- If `pandas_type in (np.int8, np.int16, np.int32) and field.nullable` and there are ANY non-null values, the dtype of the column is changed to `np.float32` or `np.float64`, both of which properly handle `None` values. That said, if the entire column was `None`, it would fail. Therefore I have preemptively changed the type on line 1787 to `np.float32`. Per `null_handler`, it may still change to `np.float64` if needed.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org