Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18732#discussion_r142243923
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1624,6 +1624,34 @@ def toArrowType(dt):
         return arrow_type
     
     
    +def from_pandas_type(dt):
    +    """ Convert pandas data type to Spark data type
    +    """
    +    import pandas as pd
    +    import numpy as np
    +    if dt == np.int32:
    +        return IntegerType()
    +    elif dt == np.int64:
    +        return LongType()
    +    elif dt == np.float32:
    +        return FloatType()
    +    elif dt == np.float64:
    +        return DoubleType()
    +    elif dt == np.object:
    +        return StringType()
    --- End diff --
    
    Aren't there other types that are plain `object`s besides strings?  I think 
it would be better to use Arrow to map Pandas dtype to Arrow type, then have 
`def from_arrow_type(t)` to map Arrow to Spark.  This will be easier to support 
and we have similar type conversion in Scala. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to