Haejoon Lee created SPARK-47543:
-----------------------------------

             Summary: Inferring `dict` as `MapType` from Pandas DataFrame to 
allow DataFrame creation.
                 Key: SPARK-47543
                 URL: https://issues.apache.org/jira/browse/SPARK-47543
             Project: Spark
          Issue Type: Bug
          Components: Connect, PySpark
    Affects Versions: 4.0.0
            Reporter: Haejoon Lee


Currently the PyArrow infers the Pandas dictionary field as StructType instead 
of MapType, so Spark can't handle the schema properly:
{code:java}
>>> pdf = pd.DataFrame({"str_col": ['second'], "dict_col": [{'first': 0.7, 
>>> 'second': 0.3}]})
>>> pa.Schema.from_pandas(pdf)
str_col: string
dict_col: struct<first: double, second: double>
  child 0, first: double
  child 1, second: double
{code}
We cannot handle this case since we use PyArrow for schema creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to