Github user ConeyLiu commented on the issue:

    https://github.com/apache/spark/pull/19135
  
    Firstly, Serialization time did not take a long time. You can see follow: 
    <img width="848" alt="untitled" 
src="https://user-images.githubusercontent.com/12733256/30067330-1596eb1e-928d-11e7-818a-4a292e601a26.png";>
    
    Secondly, I do not think that every executor in a distributed system should 
be set to very little core and memory. Because the more the process also means 
that more communication between the process, which means more data 
serialization and deserialization.
    
    Thirdly, only when there are enough concurrent threads, thread 
synchronization will cause performance problems. In the server, we have 70 to 
80 cores, concurrent tasks more than this.
    
    This change is really small, the proportion of the entire task is also very 
small, so the impact on the total time is not so big, but in this test case, 
still increased by 5%.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to