[ https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292987#comment-17292987 ]
Rafal Wojdyla commented on SPARK-34544: --------------------------------------- 👋 [~zero323] > it is more a dev utility than user a facing feature. We use mypy to type check our codebase, and we hit this issue as users, for an example of an issue see SPARK-34540 (which is just one case). Btw I could not find any documentation for the pyspark typing contributions (like in what cases new symbols should be added to public protocols, and why protocols are incomplete etc), I probably missed it, could you please point me towards it? > As far as I am aware removing it doesn't resolve any of the problems > described here Removing the {{DataFrameLike}} as the return type of the {{toPandas}}, would make mypy stop shouting about missing symbols (which are not part of {{DataFrameLike}}, but are in fact valid methods of pandas' {{DataFrame}}). This is obviously suboptimal since then it just becomes {{Any}}. An alternative is to add the missing symbols to the {{DataFrameLike}} as in SPARK-34540. But until pyspark release, how would we monkey patch that change in our projects? So in the end it sounds like we have a bunch of suboptimal ideas, how should we proceed? > pyspark toPandas() should return pd.DataFrame > --------------------------------------------- > > Key: SPARK-34544 > URL: https://issues.apache.org/jira/browse/SPARK-34544 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.1.1 > Reporter: Rafal Wojdyla > Assignee: Maciej Szymkiewicz > Priority: Major > > Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete > "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that > certain pandas methods are not present in {{DataFrameLike}}, even tho those > methods are valid methods on pandas {{DataFrame}}, which is the actual type > of the object. This requires type ignore comments or asserts. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org