[ https://issues.apache.org/jira/browse/SPARK-33952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257973#comment-17257973 ]
Hyukjin Kwon commented on SPARK-33952: -------------------------------------- How/where the output string is used? > Python-friendly dtypes for pyspark dataframes > --------------------------------------------- > > Key: SPARK-33952 > URL: https://issues.apache.org/jira/browse/SPARK-33952 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.2.0 > Reporter: Marc de Lignie > Priority: Minor > > The pyspark.sql.DataFrame.dtypes attribute contains string representations of > the column datatypes in terms of JVM datatypes. However, for a python user it > is a significant mental step to translate these to the corresponding python > types encountered in UDF's and collected dataframes. This holds in particular > for nested composite datatypes (array, map and struct). It is proposed to > provide python-friendly dtypes in pyspark (as an addition, not a replacement) > in which array<>, map<> and struct<> are translated to [], {} and Row(). > Sample code, including tests, is available as [gist on > github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More > explanation is provided at: > [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html] > If this proposal finds sufficient support, I can provide a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org