[jira] [Commented] (SPARK-52266) Arrow fails to infer the schema with string and int column when creating a DataFrame

Xinrong Meng (Jira) Thu, 22 May 2025 11:58:24 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-52266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953529#comment-17953529
 ]


Xinrong Meng commented on SPARK-52266:
--------------------------------------

numpy version might be the cause

> Arrow fails to infer the schema with string and int column when creating a 
> DataFrame
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-52266
>                 URL: https://issues.apache.org/jira/browse/SPARK-52266
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PS
>    Affects Versions: 4.1.0
>            Reporter: Xinrong Meng
>            Priority: Major
>
> {code:java}
> >>> pdf = pd.DataFrame({"a": ["x"], "b": [0]})
> >>> pdf
>    a  b
> 0  x  0
> >>> psdf = ps.from_pandas(pdf)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/xinrong.meng/spark/python/pyspark/pandas/namespace.py", line 
> 187, in from_pandas
>     return DataFrame(pobj)
>   File "/Users/xinrong.meng/spark/python/pyspark/pandas/frame.py", line 573, 
> in __init__
>     internal = InternalFrame.from_pandas(pdf)
>   File "/Users/xinrong.meng/spark/python/pyspark/pandas/internal.py", line 
> 1480, in from_pandas
>     ) = InternalFrame.prepare_pandas_frame(pdf, 
> prefer_timestamp_ntz=prefer_timestamp_ntz)
>   File "/Users/xinrong.meng/spark/python/pyspark/pandas/internal.py", line 
> 1581, in prepare_pandas_frame
>     spark_type = infer_pd_series_spark_type(reset_index[col], dtype, 
> prefer_timestamp_ntz)
>   File 
> "/Users/xinrong.meng/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 368, in infer_pd_series_spark_type
>     return from_arrow_type(pa.Array.from_pandas(pser).type, 
> prefer_timestamp_ntz)
>   File "pyarrow/array.pxi", line 1115, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 339, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: Input object was not a NumPy array
> >>>  
> {code}
>  
> {code:java}
> >>> pd.__version__
> '2.2.3'
> >>> pa.__version__
> '15.0.2'
> >>> np.__version__
> '2.0.2' {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-52266) Arrow fails to infer the schema with string and int column when creating a DataFrame

Reply via email to