[ https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292977#comment-17292977 ]
Maciej Szymkiewicz edited comment on SPARK-34544 at 3/1/21, 3:49 PM: --------------------------------------------------------------------- The problem we're dealing with here is, that Pandas is not PEP 561 (there is longer discussion about this issue [on Pandas issue tracker|https://github.com/pandas-dev/pandas/issues/28142]). {{DataFrameLike}} is basically a Band-Aid ‒ it allows us to type check things without type ignoring the pandas package (which in turn would treat Pandas object as wildcard / {{Any}}, which caused some nasty problems in the past). It was included on provisional basis until Pandas officially exposes their annotations and it is more a dev utility than user a facing feature. As far as I am aware removing it doesn't resolve any of the problems described here and it makes maintaining annotations harder. For project development needs we could probably use [microsoft/python-type-stubs|https://github.com/microsoft/python-type-stubs] (it wasn't a public one when I came up with protocol approach), but it is another (and non-trivial) dependency for the end users and still doesn't address versioning issue. was (Author: zero323): The problem we're dealing with here is, that Pandas is not PEP 561 (there is longer discussion about this issue [on Pandas issue tracker|https://github.com/pandas-dev/pandas/issues/28142]). {{DataFrameLike}} is basically a Band-Aid ‒ it allows us to type check things without type ignoring the pandas package (which in turn would treat Pandas object as wildcard / {{Any}}, which caused some nasty problems in the past). It was included on provisional basis until Pandas officially exposes their annotations and it is more a dev utility than user a facing feature. As far as I am aware removing it doesn't resolve any of the problems described here and it makes maintaining annotations harder. For project development needs we could probably use [microsoft/python-type-stubs|https://github.com/microsoft/python-type-stubs] (it wasn't public one I came up with protocol approach), but it is another (and non-trivial) dependency for the end users and still doesn't address versioning issue. > pyspark toPandas() should return pd.DataFrame > --------------------------------------------- > > Key: SPARK-34544 > URL: https://issues.apache.org/jira/browse/SPARK-34544 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.1.1 > Reporter: Rafal Wojdyla > Assignee: Maciej Szymkiewicz > Priority: Major > > Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete > "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that > certain pandas methods are not present in {{DataFrameLike}}, even tho those > methods are valid methods on pandas {{DataFrame}}, which is the actual type > of the object. This requires type ignore comments or asserts. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org