Github user logannc commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18945#discussion_r140412857
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1761,12 +1761,37 @@ def toPandas(self):
                     raise ImportError("%s\n%s" % (e.message, msg))
             else:
                 dtype = {}
    +            columns_with_null_int = {}
    +            def null_handler(rows, columns_with_null_int):
    +                for row in rows:
    +                    row = row.asDict()
    +                    for column in columns_with_null_int:
    +                        val = row[column]
    +                        dt = dtype[column]
    +                        if val is not None:
    --- End diff --
    
    If `pandas_type in (np.int8, np.int16, np.int32) and field.nullable` and 
there are ANY non-null values, the dtype of the column is changed to 
`np.float32` or `np.float64`, both of which properly handle `None` values.
    
    That said, if the entire column was `None`, it would fail. Therefore I have 
preemptively changed the type on line 1787 to `np.float32`. Per `null_handler`, 
it may still change to `np.float64` if needed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to