[ https://issues.apache.org/jira/browse/SPARK-41276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen reassigned SPARK-41276: ------------------------------------ Assignee: Yang Jie > Optimize constructor use of `StructType` > ---------------------------------------- > > Key: SPARK-41276 > URL: https://issues.apache.org/jira/browse/SPARK-41276 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL > Affects Versions: 3.4.0 > Reporter: Yang Jie > Assignee: Yang Jie > Priority: Minor > > There are two main ways to construct `StructType`: > - Primary constructor > ```scala > case class StructType(fields: Array[StructField]) > ``` > - Use `Seq` as input constructor > ```scala > def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray) > ``` > These two construction methods are widely used in Spark, but the latter > requires an additional collection conversion. > This pr changes the following 3 scenarios to use primary constructor to > reduce one collection conversion: > 1. For manually create `Seq` input scenes, change to use manually create > `Array` input instead, for examaple: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63 > 2. For the scenario where 'toSeq' is added to create input for compatibility > with Scala 2.13, directly call 'toArray' to instead, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113 > 3. For scenes whose input is originally `Array`, remove the redundant > `toSeq`, for example: > https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org