[jira] [Updated] (SPARK-39456) Fix broken function links in the auto-generated pandas API support list documentation.
[ https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyunwoo Park updated SPARK-39456: - Summary: Fix broken function links in the auto-generated pandas API support list documentation. (was: Fix broken function links in the auto-generated pandas API support list documetation.) > Fix broken function links in the auto-generated pandas API support list > documentation. > -- > > Key: SPARK-39456 > URL: https://issues.apache.org/jira/browse/SPARK-39456 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyunwoo Park >Priority: Major > > In the auto-generated documentation on pandas API support list, there are > cases where the link of the function property provided in the document is not > connected, so it needs to be corrected. > The current 'supported API generation' function dynamically compares the > modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the > difference. At this time, the inherited class is also aggregated, and the > link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used > internally by inheriting `{{{}Index.all()`{}}}.) because it does not match > the pattern of each API document. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39456) Fix broken function links in the auto-generated pandas API support list documetation.
[ https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyunwoo Park updated SPARK-39456: - Summary: Fix broken function links in the auto-generated pandas API support list documetation. (was: Fix broken function links in the auto-generated documetation on pandas API support list.) > Fix broken function links in the auto-generated pandas API support list > documetation. > - > > Key: SPARK-39456 > URL: https://issues.apache.org/jira/browse/SPARK-39456 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyunwoo Park >Priority: Major > > In the auto-generated documentation on pandas API support list, there are > cases where the link of the function property provided in the document is not > connected, so it needs to be corrected. > The current 'supported API generation' function dynamically compares the > modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the > difference. At this time, the inherited class is also aggregated, and the > link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used > internally by inheriting `{{{}Index.all()`{}}}.) because it does not match > the pattern of each API document. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39456) Fix broken function links in the auto-generated documetation on pandas API support list.
[ https://issues.apache.org/jira/browse/SPARK-39456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553638#comment-17553638 ] Hyunwoo Park commented on SPARK-39456: -- Related: [https://github.com/apache/spark/pull/36729#issuecomment-1141632078] > Fix broken function links in the auto-generated documetation on pandas API > support list. > > > Key: SPARK-39456 > URL: https://issues.apache.org/jira/browse/SPARK-39456 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyunwoo Park >Priority: Major > > In the auto-generated documentation on pandas API support list, there are > cases where the link of the function property provided in the document is not > connected, so it needs to be corrected. > The current 'supported API generation' function dynamically compares the > modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the > difference. At this time, the inherited class is also aggregated, and the > link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used > internally by inheriting `{{{}Index.all()`{}}}.) because it does not match > the pattern of each API document. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39456) Fix broken function links in the auto-generated documetation on pandas API support list.
Hyunwoo Park created SPARK-39456: Summary: Fix broken function links in the auto-generated documetation on pandas API support list. Key: SPARK-39456 URL: https://issues.apache.org/jira/browse/SPARK-39456 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Hyunwoo Park In the auto-generated documentation on pandas API support list, there are cases where the link of the function property provided in the document is not connected, so it needs to be corrected. The current 'supported API generation' function dynamically compares the modules of `{{{}PySpark.pandas`{}}} and `{{{}pandas`{}}} to find the difference. At this time, the inherited class is also aggregated, and the link is not generated correctly (such as {{`CategoricalIndex.all()`}} is used internally by inheriting `{{{}Index.all()`{}}}.) because it does not match the pattern of each API document. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39295) Improve documentation of pandas API support list.
[ https://issues.apache.org/jira/browse/SPARK-39295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542285#comment-17542285 ] Hyunwoo Park commented on SPARK-39295: -- I am working on it. > Improve documentation of pandas API support list. > - > > Key: SPARK-39295 > URL: https://issues.apache.org/jira/browse/SPARK-39295 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyunwoo Park >Priority: Major > > The description provided in the supported pandas API list document or the > code comment needs improvement. Also, there are cases where the link of the > function property provided in the document is not connected, so it needs to > be corrected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39295) Improve documentation of pandas API support list.
[ https://issues.apache.org/jira/browse/SPARK-39295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542284#comment-17542284 ] Hyunwoo Park commented on SPARK-39295: -- Related: https://github.com/apache/spark/pull/36509#discussion_r881581978 > Improve documentation of pandas API support list. > - > > Key: SPARK-39295 > URL: https://issues.apache.org/jira/browse/SPARK-39295 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Hyunwoo Park >Priority: Major > > The description provided in the supported pandas API list document or the > code comment needs improvement. Also, there are cases where the link of the > function property provided in the document is not connected, so it needs to > be corrected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39295) Improve documentation of pandas API support list.
Hyunwoo Park created SPARK-39295: Summary: Improve documentation of pandas API support list. Key: SPARK-39295 URL: https://issues.apache.org/jira/browse/SPARK-39295 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Hyunwoo Park The description provided in the supported pandas API list document or the code comment needs improvement. Also, there are cases where the link of the function property provided in the document is not connected, so it needs to be corrected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.
Hyunwoo Park created SPARK-39170: Summary: ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low. Key: SPARK-39170 URL: https://issues.apache.org/jira/browse/SPARK-39170 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Hyunwoo Park The pyspark.pandas documentation "Supported APIs" will be auto-generated. ([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961]) At this point, we need to verify the version of pandas. It can be applied after the docker image used in github action is upgraded and republished at https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage. Related: https://github.com/apache/spark/pull/36509 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38961) Enhance to automatically generate the pandas API support list
[ https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534012#comment-17534012 ] Hyunwoo Park edited comment on SPARK-38961 at 5/9/22 9:01 PM: -- [~hyukjin.kwon] Sure! I'm working on this. I'll check sphinx auto-generation :) [~yikunkero] Thank you. I will utilize it. was (Author: beobest2): [~hyukjin.kwon] Sure! I'm working on this :) > Enhance to automatically generate the pandas API support list > - > > Key: SPARK-38961 > URL: https://issues.apache.org/jira/browse/SPARK-38961 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Currently, the supported pandas API list is manually maintained, so it would > be better to make the list automatically generated to reduce the maintenance > cost. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list
[ https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534012#comment-17534012 ] Hyunwoo Park commented on SPARK-38961: -- [~hyukjin.kwon] Sure! I'm working on this :) > Enhance to automatically generate the pandas API support list > - > > Key: SPARK-38961 > URL: https://issues.apache.org/jira/browse/SPARK-38961 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Currently, the supported pandas API list is manually maintained, so it would > be better to make the list automatically generated to reduce the maintenance > cost. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list
[ https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533699#comment-17533699 ] Hyunwoo Park commented on SPARK-38961: -- How about this way? {code:python} from inspect import getmembers, isclass, isfunction import pandas as pd from pyspark import pandas as ps # automatically generated pyspark.pandas APIs ps_classes = tuple(map(lambda x: x[0], getmembers(ps, isclass))) for ps_class in ps_classes: for method, _ in getmembers(getattr(ps, ps_class), isfunction): print(f"{ps_class}.{method}") # also it is possible to automatically create a missing list common_classes = set(map(lambda x: x[0], getmembers(pd, isclass))) & \ set(map(lambda x: x[0], getmembers(ps, isclass))) print(common_classes) # {'Series', 'DataFrame', 'MultiIndex', 'DatetimeIndex', 'NamedAgg', 'Index', 'Int64Index', 'TimedeltaIndex', 'CategoricalIndex', 'Float64Index'} for _class in common_classes: not_implemented = set( map(lambda x: x[0], getmembers(getattr(pd, _class), isfunction)) ) - set( map(lambda x: x[0], getmembers(getattr(ps, _class), isfunction)) ) print(f"class: {_class}") print(f"not_implemented: {not_implemented}") {code} > Enhance to automatically generate the pandas API support list > - > > Key: SPARK-38961 > URL: https://issues.apache.org/jira/browse/SPARK-38961 > Project: Spark > Issue Type: Test > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Currently, the supported pandas API list is manually maintained, so it would > be better to make the list automatically generated to reduce the maintenance > cost. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org