This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 92492df5f18 [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API support list 92492df5f18 is described below commit 92492df5f1843ee192580e3955b2410ba012303f Author: beobest2 <clea...@naver.com> AuthorDate: Thu Jun 2 15:08:08 2022 +0900 [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API support list ### What changes were proposed in this pull request? The description provided in the supported pandas API list document or the code comment needs improvement. Also, there are cases where the link of the function property provided in the document is not connected, so it needs to be corrected. ### Why are the changes needed? To improve document readability for users and to link to the correct API document. ### Does this PR introduce _any_ user-facing change? Yes, the "Supported pandas APIs" page has changed as below. <img width="1026" alt="Screen Shot 2022-05-30 at 10 51 12 PM" src="https://user-images.githubusercontent.com/7010554/171085952-9ba07017-f0f7-46bc-88d5-f39a84b21f1a.png"> ### How was this patch tested? Manually check the links in the documents & the existing doc build should be passed. Closes #36729 from beobest2/SPARK-39295. Authored-by: beobest2 <clea...@naver.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/pyspark/pandas/supported_api_gen.py | 36 ++++++++++++++---------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/python/pyspark/pandas/supported_api_gen.py b/python/pyspark/pandas/supported_api_gen.py index f4dadcce2e0..392b5408020 100644 --- a/python/pyspark/pandas/supported_api_gen.py +++ b/python/pyspark/pandas/supported_api_gen.py @@ -42,32 +42,30 @@ MODULE_GROUP_MATCH = [(pd, ps), (pdw, psw), (pdg, psg)] RST_HEADER = """ ===================== -Supported pandas APIs +Supported pandas API ===================== .. currentmodule:: pyspark.pandas The following table shows the pandas APIs that implemented or non-implemented from pandas API on -Spark. +Spark. Some pandas API do not implement full parameters, so the third column shows missing +parameters for each API. -Some pandas APIs do not implement full parameters, so the third column shows missing parameters for -each API. +* 'Y' in the second column means it's implemented including its whole parameter. +* 'N' means it's not implemented yet. +* 'P' means it's partially implemented with the missing of some parameters. -'Y' in the second column means it's implemented including its whole parameter. -'N' means it's not implemented yet. -'P' means it's partially implemented with the missing of some parameters. +All API in the list below computes the data with distributed execution except the ones that require +the local execution by design. For example, `DataFrame.to_numpy() <https://spark.apache.org/docs/ +latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.to_numpy.html>`__ +requires to collect the data to the driver side. If there is non-implemented pandas API or parameter you want, you can create an `Apache Spark -JIRA <https://issues.apache.org/jira/projects/SPARK/summary>`__ to request or to contribute by your -own. +JIRA <https://issues.apache.org/jira/projects/SPARK/summary>`__ to request or to contribute by +your own. -The API list is updated based on the `latest pandas official API -reference <https://pandas.pydata.org/docs/reference/index.html#>`__. - -All implemented APIs listed here are distributed except the ones that requires the local -computation by design. For example, `DataFrame.to_numpy() <https://spark.apache.org -/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame. -to_numpy.html>`__ requires to collect the data to the driver side. +The API list is updated based on the `latest pandas official API reference +<https://pandas.pydata.org/docs/reference/index.html#>`__. """ @@ -81,7 +79,7 @@ class Implemented(Enum): class SupportedStatus(NamedTuple): """ - Defines a supported status for a specific pandas API + Defines a supported status for specific pandas API """ implemented: str @@ -91,6 +89,7 @@ class SupportedStatus(NamedTuple): def generate_supported_api(output_rst_file_path: str) -> None: """ Generate supported APIs status dictionary. + Parameters ---------- output_rst_file_path : str @@ -300,12 +299,11 @@ def _write_table( Write table by using Sphinx list-table directive. """ lines = [] - lines.append("Supported ") if module_name: lines.append(module_name) else: lines.append("General Function") - lines.append(" APIs\n") + lines.append(" API\n") lines.append("-" * 100) lines.append("\n") lines.append(".. currentmodule:: %s" % module_path) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org