[spark] branch master updated: [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API support list

gurwls223 Wed, 01 Jun 2022 23:08:26 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 92492df5f18 [SPARK-39295][PYTHON][DOCS] Improve documentation of 
pandas API support list
92492df5f18 is described below

commit 92492df5f1843ee192580e3955b2410ba012303f
Author: beobest2 <clea...@naver.com>
AuthorDate: Thu Jun 2 15:08:08 2022 +0900

    [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API support list
    
    ### What changes were proposed in this pull request?
    
    The description provided in the supported pandas API list document or the 
code comment needs improvement.
    Also, there are cases where the link of the function property provided in 
the document is not connected, so it needs to be corrected.
    
    ### Why are the changes needed?
    
    To improve document readability for users and to link to the correct API 
document.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the "Supported pandas APIs" page has changed as below.
    <img width="1026" alt="Screen Shot 2022-05-30 at 10 51 12 PM" 
src="https://user-images.githubusercontent.com/7010554/171085952-9ba07017-f0f7-46bc-88d5-f39a84b21f1a.png";>
    
    ### How was this patch tested?
    
    Manually check the links in the documents & the existing doc build should 
be passed.
    
    Closes #36729 from beobest2/SPARK-39295.
    
    Authored-by: beobest2 <clea...@naver.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/pyspark/pandas/supported_api_gen.py | 36 ++++++++++++++----------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/python/pyspark/pandas/supported_api_gen.py 
b/python/pyspark/pandas/supported_api_gen.py
index f4dadcce2e0..392b5408020 100644
--- a/python/pyspark/pandas/supported_api_gen.py
+++ b/python/pyspark/pandas/supported_api_gen.py
@@ -42,32 +42,30 @@ MODULE_GROUP_MATCH = [(pd, ps), (pdw, psw), (pdg, psg)]
 
 RST_HEADER = """
 =====================
-Supported pandas APIs
+Supported pandas API
 =====================
 
 .. currentmodule:: pyspark.pandas
 
 The following table shows the pandas APIs that implemented or non-implemented 
from pandas API on
-Spark.
+Spark. Some pandas API do not implement full parameters, so the third column 
shows missing
+parameters for each API.
 
-Some pandas APIs do not implement full parameters, so the third column shows 
missing parameters for
-each API.
+* 'Y' in the second column means it's implemented including its whole 
parameter.
+* 'N' means it's not implemented yet.
+* 'P' means it's partially implemented with the missing of some parameters.
 
-'Y' in the second column means it's implemented including its whole parameter.
-'N' means it's not implemented yet.
-'P' means it's partially implemented with the missing of some parameters.
+All API in the list below computes the data with distributed execution except 
the ones that require
+the local execution by design. For example, `DataFrame.to_numpy() 
<https://spark.apache.org/docs/
+latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.to_numpy.html>`__
+requires to collect the data to the driver side.
 
 If there is non-implemented pandas API or parameter you want, you can create 
an `Apache Spark
-JIRA <https://issues.apache.org/jira/projects/SPARK/summary>`__ to request or 
to contribute by your
-own.
+JIRA <https://issues.apache.org/jira/projects/SPARK/summary>`__ to request or 
to contribute by
+your own.
 
-The API list is updated based on the `latest pandas official API
-reference <https://pandas.pydata.org/docs/reference/index.html#>`__.
-
-All implemented APIs listed here are distributed except the ones that requires 
the local
-computation by design. For example, `DataFrame.to_numpy() 
<https://spark.apache.org
-/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.
-to_numpy.html>`__ requires to collect the data to the driver side.
+The API list is updated based on the `latest pandas official API reference
+<https://pandas.pydata.org/docs/reference/index.html#>`__.
 
 """
 
@@ -81,7 +79,7 @@ class Implemented(Enum):
 
 class SupportedStatus(NamedTuple):
     """
-    Defines a supported status for a specific pandas API
+    Defines a supported status for specific pandas API
     """
 
     implemented: str
@@ -91,6 +89,7 @@ class SupportedStatus(NamedTuple):
 def generate_supported_api(output_rst_file_path: str) -> None:
     """
     Generate supported APIs status dictionary.
+
     Parameters
     ----------
     output_rst_file_path : str
@@ -300,12 +299,11 @@ def _write_table(
     Write table by using Sphinx list-table directive.
     """
     lines = []
-    lines.append("Supported ")
     if module_name:
         lines.append(module_name)
     else:
         lines.append("General Function")
-    lines.append(" APIs\n")
+    lines.append(" API\n")
     lines.append("-" * 100)
     lines.append("\n")
     lines.append(".. currentmodule:: %s" % module_path)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API support list

Reply via email to