Github user bdrillard commented on the issue: https://github.com/apache/spark/pull/19811 As some context, I had initially found array initializations necessary because the number of `init` methods created to do line-by-line var initializations for large test cases was still triggering constant pool errors, even after having compacted the data into arrays. A loop allowed reduction of the number of expressions needed to initialize that array state, but in order to ensure that single loops could initialize whole groups of variables, it became necessary to add additional state to hold the matching init codes and the length of the array. I think @mgaido91's work in SPARK-22226 obviates that original issue with the way it would re-distribute the init method calls. Perhaps another benefit, removing the requirement that state be initialized in loops would allow us to also compact more complicated state than previously could have been initialized in loops, like the [`UnsafeRowWriter`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L78), which can appear as many times as struct columns appear in the dataset. Since their initialization is dependent on varying arguments, no single loop could initialize all of them, but inline-statements could, allowing us to potentially compact them (or any other prevalent non-simply assigned object type).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org