Github user viirya commented on the issue: https://github.com/apache/spark/pull/19229 Yeah, I think that fix should work for the strategy `Imputer.mean` because `Imputer.mean` aggregates many columns at once now and that can be a too large gen'd code for aggregation. For the strategy `Imputer.median`, because it uses `approxQuantile` which calls rdd's aggregate API, I think codegen doesn't affect this part.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org