Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/19135 Firstly, Serialization time did not take a long time. You can see follow: <img width="848" alt="untitled" src="https://user-images.githubusercontent.com/12733256/30067330-1596eb1e-928d-11e7-818a-4a292e601a26.png"> Secondly, I do not think that every executor in a distributed system should be set to very little core and memory. Because the more the process also means that more communication between the process, which means more data serialization and deserialization. Thirdly, only when there are enough concurrent threads, thread synchronization will cause performance problems. In the server, we have 70 to 80 cores, concurrent tasks more than this. This change is really small, the proportion of the entire task is also very small, so the impact on the total time is not so big, but in this test case, still increased by 5%.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org