Hyukjin Kwon created SPARK-40005:
------------------------------------

             Summary: Self-contained examples with parameter descriptions in 
PySpark documentation
                 Key: SPARK-40005
                 URL: https://issues.apache.org/jira/browse/SPARK-40005
             Project: Spark
          Issue Type: Umbrella
          Components: Documentation, PySpark
    Affects Versions: 3.4.0
            Reporter: Hyukjin Kwon


This JIRA aims to improve PySpark documentation in:
- {{pyspark}}
- {{pyspark.ml}}
- {{pyspark.sql}}
- {{pyspark.sql.streaming}}

We should:
- Make the examples self-contained, e.g., 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
- Document {{Parameters}} 
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
 There are many API that misses parameters in PySpark, e.g., 
[DataFrame.union|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.union.html#pyspark.sql.DataFrame.union]

If the size of file is large, e.g., dataframe.py, we should split that down 
into each subtask, and improve documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to