[ https://issues.apache.org/jira/browse/SPARK-35638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-35638: ------------------------------------ Assignee: Takuya Ueshin > Introduce InternalField to manage dtypes and StructFields. > ---------------------------------------------------------- > > Key: SPARK-35638 > URL: https://issues.apache.org/jira/browse/SPARK-35638 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.2.0 > Reporter: Takuya Ueshin > Assignee: Takuya Ueshin > Priority: Major > > Currently there are some performance issues in the pandas-on-Spark layer. > One of them is accessing Java DataFrame and run analysis phase too many > times, especially just for retrieving the current column names or data types. > We should reduce the amount of unnecessary access. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org