[ https://issues.apache.org/jira/browse/SPARK-36733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-36733: ------------------------------------ Assignee: Apache Spark > Perf issue in SchemaPruning when a struct has many fields > --------------------------------------------------------- > > Key: SPARK-36733 > URL: https://issues.apache.org/jira/browse/SPARK-36733 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.2 > Reporter: Kohki Nishio > Assignee: Apache Spark > Priority: Major > > Seeing a significant performance degradation in query processing when a table > contains a significantly large number of fields (>10K). > Here's the stacktraces while processing a query > {code:java} > java.lang.Thread.State: RUNNABLE java.lang.Thread.State: RUNNABLE at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:285) at > scala.collection.TraversableLike$$Lambda$296/874023329.apply(Unknown Source) > at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at > scala.collection.TraversableLike.map(TraversableLike.scala:285) at > scala.collection.TraversableLike.map$(TraversableLike.scala:278) at > scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at > org.apache.spark.sql.types.StructType.fieldNames(StructType.scala:108) at > org.apache.spark.sql.catalyst.expressions.SchemaPruning$.$anonfun$sortLeftFieldsByRight$1(SchemaPruning.scala:70) > at > org.apache.spark.sql.catalyst.expressions.SchemaPruning$.$anonfun$sortLeftFieldsByRight$1$adapted(SchemaPruning.scala:70) > at > org.apache.spark.sql.catalyst.expressions.SchemaPruning$$$Lambda$3963/249742655.apply(Unknown > Source) at > scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:303) > at scala.collection.TraversableLike$$Lambda$403/465534593.apply(Unknown > Source) at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at > scala.collection.TraversableLike.filterImpl(TraversableLike.scala:302) at > scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:296) at > scala.collection.mutable.ArrayOps$ofRef.filterImpl(ArrayOps.scala:198) at > scala.collection.TraversableLike.filter(TraversableLike.scala:394) at > scala.collection.TraversableLike.filter$(TraversableLike.scala:394) at > scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:198) at > org.apache.spark.sql.catalyst.expressions.SchemaPruning$.sortLeftFieldsByRight(SchemaPruning.scala:70) > at > org.apache.spark.sql.catalyst.expressions.SchemaPruning$.$anonfun$sortLeftFieldsByRight$3(SchemaPruning.scala:75) > at > org.apache.spark.sql.catalyst.expressions.SchemaPruning$$$Lambda$3965/461314749.apply(Unknown > Source) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org