Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142243923 --- Diff: python/pyspark/sql/types.py --- @@ -1624,6 +1624,34 @@ def toArrowType(dt): return arrow_type +def from_pandas_type(dt): + """ Convert pandas data type to Spark data type + """ + import pandas as pd + import numpy as np + if dt == np.int32: + return IntegerType() + elif dt == np.int64: + return LongType() + elif dt == np.float32: + return FloatType() + elif dt == np.float64: + return DoubleType() + elif dt == np.object: + return StringType() --- End diff -- Aren't there other types that are plain `object`s besides strings? I think it would be better to use Arrow to map Pandas dtype to Arrow type, then have `def from_arrow_type(t)` to map Arrow to Spark. This will be easier to support and we have similar type conversion in Scala.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org