[ https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-32082: --------------------------------- Priority: Critical (was: Major) > Project Zen: Improving Python usability > --------------------------------------- > > Key: SPARK-32082 > URL: https://issues.apache.org/jira/browse/SPARK-32082 > Project: Spark > Issue Type: Epic > Components: PySpark > Affects Versions: 3.1.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Critical > > The importance of Python and PySpark has grown radically in the last few > years. The number of PySpark downloads reached [more than 1.3 million _every > week_|https://pypistats.org/packages/pyspark] when we count them _only_ in > PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error > messages as an example, and the API documentation is poorly written. > This epic tickets aims to improve the usability in PySpark, and make it more > Pythonic. To be more explicit, this JIRA targets four bullet points below. > Each includes examples: > * Being Pythonic > ** Pandas UDF enhancements and type hints > ** Avoid dynamic function definitions, for example, at {{funcitons.py}} > which makes IDEs unable to detect. > * Better and easier usability in PySpark > ** User-facing error message and warnings > ** Documentation > ** User guide > ** Better examples and API documentation, e.g. > [Koalas|https://koalas.readthedocs.io/en/latest/] and > [pandas|https://pandas.pydata.org/docs/] > * Better interoperability with other Python libraries > ** Visualization and plotting > ** Potentially better interface by leveraging Arrow > ** Compatibility with other libraries such as NumPy universal functions or > pandas possibly by leveraging Koalas > * PyPI Installation > ** PySpark with Hadoop 3 support on PyPi > ** Better error handling -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org