[ 
https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32082:
---------------------------------
    Priority: Critical  (was: Major)

> Project Zen: Improving Python usability
> ---------------------------------------
>
>                 Key: SPARK-32082
>                 URL: https://issues.apache.org/jira/browse/SPARK-32082
>             Project: Spark
>          Issue Type: Epic
>          Components: PySpark
>    Affects Versions: 3.1.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Critical
>
> The importance of Python and PySpark has grown radically in the last few 
> years. The number of PySpark downloads reached [more than 1.3 million _every 
> week_|https://pypistats.org/packages/pyspark] when we count them _only_ in 
> PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error 
> messages as an example, and the API documentation is poorly written.
> This epic tickets aims to improve the usability in PySpark, and make it more 
> Pythonic. To be more explicit, this JIRA targets four bullet points below. 
> Each includes examples:
>  * Being Pythonic
>  ** Pandas UDF enhancements and type hints
>  ** Avoid dynamic function definitions, for example, at {{funcitons.py}} 
> which makes IDEs unable to detect.
>  * Better and easier usability in PySpark
>  ** User-facing error message and warnings
>  ** Documentation
>  ** User guide
>  ** Better examples and API documentation, e.g. 
> [Koalas|https://koalas.readthedocs.io/en/latest/] and 
> [pandas|https://pandas.pydata.org/docs/]
>  * Better interoperability with other Python libraries
>  ** Visualization and plotting
>  ** Potentially better interface by leveraging Arrow
>  ** Compatibility with other libraries such as NumPy universal functions or 
> pandas possibly by leveraging Koalas
>  * PyPI Installation
>  ** PySpark with Hadoop 3 support on PyPi
>  ** Better error handling



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to