[jira] [Updated] (SPARK-39456) Fix broken function links in the auto-generated pandas API support list documentation.

2022-06-13 Thread Hyunwoo Park (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunwoo Park updated SPARK-39456:
-
Summary: Fix broken function links in the auto-generated pandas API support 
list documentation.  (was: Fix broken function links in the auto-generated 
pandas API support list documetation.)

> Fix broken function links in the auto-generated pandas API support list 
> documentation.
> --
>
> Key: SPARK-39456
> URL: https://issues.apache.org/jira/browse/SPARK-39456
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> In the auto-generated documentation on pandas API support list, there are 
> cases where the link of the function property provided in the document is not 
> connected, so it needs to be corrected.
> The current 'supported API generation' function dynamically compares the 
> modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
> difference. At this time, the inherited class is also aggregated, and the 
> link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
> internally by inheriting `{{{}Index.all()`{}}}.) because it does not match 
> the pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39456) Fix broken function links in the auto-generated pandas API support list documetation.

2022-06-13 Thread Hyunwoo Park (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunwoo Park updated SPARK-39456:
-
Summary: Fix broken function links in the auto-generated pandas API support 
list documetation.  (was: Fix broken function links in the auto-generated 
documetation on pandas API support list.)

> Fix broken function links in the auto-generated pandas API support list 
> documetation.
> -
>
> Key: SPARK-39456
> URL: https://issues.apache.org/jira/browse/SPARK-39456
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> In the auto-generated documentation on pandas API support list, there are 
> cases where the link of the function property provided in the document is not 
> connected, so it needs to be corrected.
> The current 'supported API generation' function dynamically compares the 
> modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
> difference. At this time, the inherited class is also aggregated, and the 
> link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
> internally by inheriting `{{{}Index.all()`{}}}.) because it does not match 
> the pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39456) Fix broken function links in the auto-generated documetation on pandas API support list.

2022-06-13 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553638#comment-17553638
 ] 

Hyunwoo Park commented on SPARK-39456:
--

Related: [https://github.com/apache/spark/pull/36729#issuecomment-1141632078]

> Fix broken function links in the auto-generated documetation on pandas API 
> support list.
> 
>
> Key: SPARK-39456
> URL: https://issues.apache.org/jira/browse/SPARK-39456
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> In the auto-generated documentation on pandas API support list, there are 
> cases where the link of the function property provided in the document is not 
> connected, so it needs to be corrected.
> The current 'supported API generation' function dynamically compares the 
> modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
> difference. At this time, the inherited class is also aggregated, and the 
> link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
> internally by inheriting `{{{}Index.all()`{}}}.) because it does not match 
> the pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39456) Fix broken function links in the auto-generated documetation on pandas API support list.

2022-06-13 Thread Hyunwoo Park (Jira)
Hyunwoo Park created SPARK-39456:


 Summary: Fix broken function links in the auto-generated 
documetation on pandas API support list.
 Key: SPARK-39456
 URL: https://issues.apache.org/jira/browse/SPARK-39456
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Hyunwoo Park


In the auto-generated documentation on pandas API support list, there are cases 
where the link of the function property provided in the document is not 
connected, so it needs to be corrected.

The current 'supported API generation' function dynamically compares the 
modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the 
difference. At this time, the inherited class is also aggregated, and the link 
is not generated correctly (such as {{`CategoricalIndex.all()`}} is used 
internally by inheriting `{{{}Index.all()`{}}}.) because it does not match the 
pattern of each API document.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39295) Improve documentation of pandas API support list.

2022-05-25 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542285#comment-17542285
 ] 

Hyunwoo Park commented on SPARK-39295:
--

I am working on it.

> Improve documentation of pandas API support list.
> -
>
> Key: SPARK-39295
> URL: https://issues.apache.org/jira/browse/SPARK-39295
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> The description provided in the supported pandas API list document or the 
> code comment needs improvement. Also, there are cases where the link of the 
> function property provided in the document is not connected, so it needs to 
> be corrected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39295) Improve documentation of pandas API support list.

2022-05-25 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542284#comment-17542284
 ] 

Hyunwoo Park commented on SPARK-39295:
--

Related: https://github.com/apache/spark/pull/36509#discussion_r881581978

> Improve documentation of pandas API support list.
> -
>
> Key: SPARK-39295
> URL: https://issues.apache.org/jira/browse/SPARK-39295
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Priority: Major
>
> The description provided in the supported pandas API list document or the 
> code comment needs improvement. Also, there are cases where the link of the 
> function property provided in the document is not connected, so it needs to 
> be corrected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39295) Improve documentation of pandas API support list.

2022-05-25 Thread Hyunwoo Park (Jira)
Hyunwoo Park created SPARK-39295:


 Summary: Improve documentation of pandas API support list.
 Key: SPARK-39295
 URL: https://issues.apache.org/jira/browse/SPARK-39295
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Hyunwoo Park


The description provided in the supported pandas API list document or the code 
comment needs improvement. Also, there are cases where the link of the function 
property provided in the document is not connected, so it needs to be corrected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-05-12 Thread Hyunwoo Park (Jira)
Hyunwoo Park created SPARK-39170:


 Summary: ImportError when creating pyspark.pandas document 
"Supported APIs" if pandas version is low.
 Key: SPARK-39170
 URL: https://issues.apache.org/jira/browse/SPARK-39170
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Hyunwoo Park


The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])

At this point, we need to verify the version of pandas. It can be applied after 
the docker image used in github action is upgraded and republished at 
https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.

Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-09 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534012#comment-17534012
 ] 

Hyunwoo Park edited comment on SPARK-38961 at 5/9/22 9:01 PM:
--

[~hyukjin.kwon] Sure! I'm working on this. I'll check sphinx auto-generation :)

[~yikunkero] Thank you. I will utilize it.


was (Author: beobest2):
[~hyukjin.kwon] Sure! I'm working on this :)

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-09 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534012#comment-17534012
 ] 

Hyunwoo Park commented on SPARK-38961:
--

[~hyukjin.kwon] Sure! I'm working on this :)

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-09 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533699#comment-17533699
 ] 

Hyunwoo Park commented on SPARK-38961:
--

How about this way?




{code:python}

from inspect import getmembers, isclass, isfunction
import pandas as pd
from pyspark import pandas as ps

# automatically generated pyspark.pandas APIs
ps_classes = tuple(map(lambda x: x[0], getmembers(ps, isclass)))
for ps_class in ps_classes:
    for method, _ in getmembers(getattr(ps, ps_class), isfunction):
        print(f"{ps_class}.{method}")

# also it is possible to automatically create a missing list
common_classes = set(map(lambda x: x[0], getmembers(pd, isclass))) & \
                 set(map(lambda x: x[0], getmembers(ps, isclass)))
print(common_classes)
# {'Series', 'DataFrame', 'MultiIndex', 'DatetimeIndex', 'NamedAgg', 'Index', 
'Int64Index', 'TimedeltaIndex', 'CategoricalIndex', 'Float64Index'}

for _class in common_classes:
    not_implemented = set(
        map(lambda x: x[0], getmembers(getattr(pd, _class), isfunction))
    ) - set(
        map(lambda x: x[0], getmembers(getattr(ps, _class), isfunction))
    )

    print(f"class: {_class}")
    print(f"not_implemented: {not_implemented}")

{code}

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org