Github user squito commented on the issue:

    https://github.com/apache/spark/pull/15505
  
    @witgo @kayousterhout where do we stand on this and 
https://github.com/apache/spark/pull/16053?  Both still viable alternatives?
    
    https://github.com/apache/spark/pull/16053 is still missing performance 
benchmarks, and given the entire purpose here is performance, I think we need 
to wait for those metrics.
    
    But https://github.com/apache/spark/pull/16053 is a much smaller change.  I 
actually think its a little clearer overall in this version, that serialization 
all happens in one place ... but I'm also biased to go for the smaller change 
if there isn't really much difference.
    
    I also feel like we're missing a clear description of the overall flow of 
serialization -- its rather complicated, between the task binary broadcast, the 
task, the task description, where it all happens, etc.  (That goes for both 
versions -- really its an existing problem, this just seems like the right time 
to address it.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to