[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-09-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601358#comment-17601358
 ] 

Apache Spark commented on SPARK-38961:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37820

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-09-07 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601359#comment-17601359
 ] 

Apache Spark commented on SPARK-38961:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37820

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581871#comment-17581871
 ] 

Apache Spark commented on SPARK-38961:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37583

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-08-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581869#comment-17581869
 ] 

Apache Spark commented on SPARK-38961:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37583

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-06-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545296#comment-17545296
 ] 

Apache Spark commented on SPARK-38961:
--

User 'beobest2' has created a pull request for this issue:
https://github.com/apache/spark/pull/36748

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-06-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545295#comment-17545295
 ] 

Apache Spark commented on SPARK-38961:
--

User 'beobest2' has created a pull request for this issue:
https://github.com/apache/spark/pull/36748

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534795#comment-17534795
 ] 

Apache Spark commented on SPARK-38961:
--

User 'beobest2' has created a pull request for this issue:
https://github.com/apache/spark/pull/36509

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-09 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534012#comment-17534012
 ] 

Hyunwoo Park commented on SPARK-38961:
--

[~hyukjin.kwon] Sure! I'm working on this :)

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-09 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533733#comment-17533733
 ] 

Hyukjin Kwon commented on SPARK-38961:
--

[~beobest2] please go ahead with a PR! Maybe we can just put the codes in a 
comment at the .rst file.

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-09 Thread Yikun Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533705#comment-17533705
 ] 

Yikun Jiang commented on SPARK-38961:
-

See also here: 
https://github.com/apache/spark/pull/36083#issuecomment-1101966359

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38961) Enhance to automatically generate the pandas API support list

2022-05-09 Thread Hyunwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533699#comment-17533699
 ] 

Hyunwoo Park commented on SPARK-38961:
--

How about this way?




{code:python}

from inspect import getmembers, isclass, isfunction
import pandas as pd
from pyspark import pandas as ps

# automatically generated pyspark.pandas APIs
ps_classes = tuple(map(lambda x: x[0], getmembers(ps, isclass)))
for ps_class in ps_classes:
    for method, _ in getmembers(getattr(ps, ps_class), isfunction):
        print(f"{ps_class}.{method}")

# also it is possible to automatically create a missing list
common_classes = set(map(lambda x: x[0], getmembers(pd, isclass))) & \
                 set(map(lambda x: x[0], getmembers(ps, isclass)))
print(common_classes)
# {'Series', 'DataFrame', 'MultiIndex', 'DatetimeIndex', 'NamedAgg', 'Index', 
'Int64Index', 'TimedeltaIndex', 'CategoricalIndex', 'Float64Index'}

for _class in common_classes:
    not_implemented = set(
        map(lambda x: x[0], getmembers(getattr(pd, _class), isfunction))
    ) - set(
        map(lambda x: x[0], getmembers(getattr(ps, _class), isfunction))
    )

    print(f"class: {_class}")
    print(f"not_implemented: {not_implemented}")

{code}

> Enhance to automatically generate the pandas API support list
> -
>
> Key: SPARK-38961
> URL: https://issues.apache.org/jira/browse/SPARK-38961
> Project: Spark
>  Issue Type: Test
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the supported pandas API list is manually maintained, so it would 
> be better to make the list automatically generated to reduce the maintenance 
> cost.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org