Github user squito commented on the issue: https://github.com/apache/spark/pull/15505 @witgo @kayousterhout where do we stand on this and https://github.com/apache/spark/pull/16053? Both still viable alternatives? https://github.com/apache/spark/pull/16053 is still missing performance benchmarks, and given the entire purpose here is performance, I think we need to wait for those metrics. But https://github.com/apache/spark/pull/16053 is a much smaller change. I actually think its a little clearer overall in this version, that serialization all happens in one place ... but I'm also biased to go for the smaller change if there isn't really much difference. I also feel like we're missing a clear description of the overall flow of serialization -- its rather complicated, between the task binary broadcast, the task, the task description, where it all happens, etc. (That goes for both versions -- really its an existing problem, this just seems like the right time to address it.)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org