[ 
https://issues.apache.org/jira/browse/SPARK-39199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-39199:
---------------------------------
    Description: 
pandas API on Spark aims to make pandas code work on Spark clusters without any 
changes. So full API coverage has been one of our major goals. Currently, most 
pandas functions are implemented, whereas some of them are have incomplete 
parameters support.

There are some common parameters missing (resolved):
 * How to do with NAs   
 * Filter data types    
 * Control result length    
 * Reindex result   

There are remaining missing parameters to implement (see doc below).

See the design and the current status at 
[https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing].

  was:
pandas API on Spark aims to achieve full pandas API coverage. Currently, most 
pandas functions are supported in pandas API on Spark with parameters missing.

There are some common parameters missing:
- how to do with NAs: `skipna`, `dropna`
- filter data types: `numeric_only`, `bool_only`
- filter result length: `keep`
- reindex result: `ignore_index`

They support common use cases and should be prioritized.



> Implement pandas API missing parameters
> ---------------------------------------
>
>                 Key: SPARK-39199
>                 URL: https://issues.apache.org/jira/browse/SPARK-39199
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Pandas API on Spark, PySpark
>    Affects Versions: 3.3.0, 3.4.0, 3.3.1
>            Reporter: Xinrong Meng
>            Priority: Major
>
> pandas API on Spark aims to make pandas code work on Spark clusters without 
> any changes. So full API coverage has been one of our major goals. Currently, 
> most pandas functions are implemented, whereas some of them are have 
> incomplete parameters support.
> There are some common parameters missing (resolved):
>  * How to do with NAs   
>  * Filter data types    
>  * Control result length    
>  * Reindex result   
> There are remaining missing parameters to implement (see doc below).
> See the design and the current status at 
> [https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to