zjf2012 opened a new issue, #615:
URL: https://github.com/apache/incubator-uniffle/issues/615

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What would you like to be improved?
   
   Both map and reduce tasks reference RssShuffleHandle wrapping 
'partitionToServers' which is usually relatively far bigger than original task 
binary. E.g., we have  a shuffle with 10,000 partitions. The 
'patitionToServers' could easily reach to 250,000 bytes assuming each map entry 
has size of 25 bytes.
   
   Large task binary causes long task delay and task serialization time.  We 
can replace it with something else like a mapping function to map partitions to 
shuffle servers.
   
   
   ### How should we improve?
   
   Instead, we can replace 'partitionToServers' with something else like a 
mapping function which map parition ID to shuffle servers. We only get shuffle 
servers once from the first shuffle task and cache them for later shuffle tasks 
with same shuffle ID per executor. 
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to