[ 
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292977#comment-17292977
 ] 

Maciej Szymkiewicz commented on SPARK-34544:
--------------------------------------------

The problem we're dealing with here is, that Pandas is not PEP 561 (there is 
longer discussion about this issue [on Pandas issue 
tracker|https://github.com/pandas-dev/pandas/issues/28142]). 

{{DataFrameLike}} is basically a Band-Aid ‒ it allows us to type check things 
without type ignoring the pandas package (which in turn would treat Pandas 
object as wildcard / {{Any}}, which caused some nasty problems in the past). It 
was included on provisional basis until Pandas officially exposes their 
annotations and it is more a dev utility than user a facing feature.

As far as I am aware removing it doesn't resolve any of the problems described 
here and it makes maintaining annotations harder. For project development needs 
we could probably use 
[microsoft/python-type-stubs|https://github.com/microsoft/python-type-stubs] 
(it wasn't public one I came up with protocol approach), but it is another (and 
non-trivial) dependency for the end users and still doesn't address versioning 
issue.











> pyspark toPandas() should return pd.DataFrame
> ---------------------------------------------
>
>                 Key: SPARK-34544
>                 URL: https://issues.apache.org/jira/browse/SPARK-34544
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.1.1
>            Reporter: Rafal Wojdyla
>            Assignee: Maciej Szymkiewicz
>            Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete 
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that 
> certain pandas methods are not present in {{DataFrameLike}}, even tho those 
> methods are valid methods on pandas {{DataFrame}}, which is the actual type 
> of the object. This requires type ignore comments or asserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to