[ 
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297752#comment-17297752
 ] 

Maciej Szymkiewicz commented on SPARK-34544:
--------------------------------------------

{quote} so that would require that we build our own pyspark? That is certainly 
"doable", but I hope you see that's it's not very user friendly. Are there any 
other options?{quote}

To be honest I am not sure. Maybe {{cast}} could help, if you can put more 
comprehensive annotations in path. 



{code:python}

from typing import cast
import pandas as pd

df_ = cast(pd.DataFrame, df)
...
df_.head()
{code}

Or, if you don't mind suppressing all errors with {{Any}}

{code:python}

from typing import cast, Any

df_ = cast(pd.Any, df)
...
df_.head()
{code}


{quote} What would be the timeline for that (roughly)? {quote}

On Spark side 
[Sean|http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-3-2-Expectation-td30826.html]
 suggests July or so.  I can definitely check viability of 3rd stubs by then. 

As of Pandas progress I am not really qualified to say. They're taking this 
slow (it's quite understandable given the size and complexity of the project) 
and marking package as  PEP 561 compatible was rejected as premature before.



> pyspark toPandas() should return pd.DataFrame
> ---------------------------------------------
>
>                 Key: SPARK-34544
>                 URL: https://issues.apache.org/jira/browse/SPARK-34544
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 3.1.1
>            Reporter: Rafal Wojdyla
>            Assignee: Maciej Szymkiewicz
>            Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete 
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that 
> certain pandas methods are not present in {{DataFrameLike}}, even tho those 
> methods are valid methods on pandas {{DataFrame}}, which is the actual type 
> of the object. This requires type ignore comments or asserts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to