[ 
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292989#comment-17292989
 ] 

Maciej Szymkiewicz commented on SPARK-34544:
--------------------------------------------

Overall:

* {{DataFrameLike}} should be replaced with {{pandas.core.frame.DataFrame}} 
once we have stable source of annotations form pandas-dev. It was always the 
intention.
* While I am not enthusiastic about keeping up with Pandas API changes, we 
could update the protocol by re-exporting Pandas annotation with stubgen.
* Alternatively we can try to use third party annotations and provide setup 
guide for the users.
* I'd be against removing {{DataFrameLike}} without having working alternative 
in place. It won't make end user experience better. 

> pyspark toPandas() should return pd.DataFrame
> ---------------------------------------------
>
>                 Key: SPARK-34544
>                 URL: https://issues.apache.org/jira/browse/SPARK-34544
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.1.1
>            Reporter: Rafal Wojdyla
>            Assignee: Maciej Szymkiewicz
>            Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete 
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that 
> certain pandas methods are not present in {{DataFrameLike}}, even tho those 
> methods are valid methods on pandas {{DataFrame}}, which is the actual type 
> of the object. This requires type ignore comments or asserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to