[ 
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293013#comment-17293013
 ] 

Rafal Wojdyla commented on SPARK-34544:
---------------------------------------

[~zero323] I appreciate your prompt answers.

 Re:
{quote}
For in-house deployments the easiest way is to actually patch Spark to mark 
return type as pandas.core.frame.DataFrame and either patch Pandas 
(https://github.com/pandas-dev/pandas/pull/28831) or put extracted stubs in 
MYPYPATH.
{quote}

so that would require that we build our own pyspark? That is certainly 
"doable", but I hope you see that's it's not very user friendly. Are there any 
other options?

Re:
{quote}
If there are popular methods which didn't get into protocol I'd probably add 
these as a temporary fix.
{quote}

Some examples we have hit (this list is not complete): {{head}}, 
{{convert_dtypes}}. wdyt?

Re:
{quote}
Looking forward to Spark 3.2 we can closely monitor Pandas progress ‒ if they 
become PEP 561 we simply drop the protocol. Otherwise we can give Microsoft 
stubs a shot.
{quote}

What would be the timeline for that (roughly)?

> pyspark toPandas() should return pd.DataFrame
> ---------------------------------------------
>
>                 Key: SPARK-34544
>                 URL: https://issues.apache.org/jira/browse/SPARK-34544
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.1.1
>            Reporter: Rafal Wojdyla
>            Assignee: Maciej Szymkiewicz
>            Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete 
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that 
> certain pandas methods are not present in {{DataFrameLike}}, even tho those 
> methods are valid methods on pandas {{DataFrame}}, which is the actual type 
> of the object. This requires type ignore comments or asserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to