[ https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293013#comment-17293013 ]
Rafal Wojdyla commented on SPARK-34544: --------------------------------------- [~zero323] I appreciate your prompt answers. Re: {quote} For in-house deployments the easiest way is to actually patch Spark to mark return type as pandas.core.frame.DataFrame and either patch Pandas (https://github.com/pandas-dev/pandas/pull/28831) or put extracted stubs in MYPYPATH. {quote} so that would require that we build our own pyspark? That is certainly "doable", but I hope you see that's it's not very user friendly. Are there any other options? Re: {quote} If there are popular methods which didn't get into protocol I'd probably add these as a temporary fix. {quote} Some examples we have hit (this list is not complete): {{head}}, {{convert_dtypes}}. wdyt? Re: {quote} Looking forward to Spark 3.2 we can closely monitor Pandas progress ‒ if they become PEP 561 we simply drop the protocol. Otherwise we can give Microsoft stubs a shot. {quote} What would be the timeline for that (roughly)? > pyspark toPandas() should return pd.DataFrame > --------------------------------------------- > > Key: SPARK-34544 > URL: https://issues.apache.org/jira/browse/SPARK-34544 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.1.1 > Reporter: Rafal Wojdyla > Assignee: Maciej Szymkiewicz > Priority: Major > > Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete > "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that > certain pandas methods are not present in {{DataFrameLike}}, even tho those > methods are valid methods on pandas {{DataFrame}}, which is the actual type > of the object. This requires type ignore comments or asserts. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org