xianjingfeng opened a new issue, #339:
URL: https://github.com/apache/incubator-uniffle/issues/339

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What would you like to be improved?
   
   Now in 
`org.apache.uniffle.client.impl.grpc.ShuffleServerGrpcClient#sendShuffleData`, 
it will retry to send to one shuffle server for a long time and fail after 
reach `rss.client.send.check.timeout.ms`. Exception as follows:
   
   `Timeout: Task[2852_0] failed because 200 blocks can't be sent to shuffle 
server in 600000 ms.`
   
   This will cause that client will not send data to other servers.
   
   ### How should we improve?
   
   1. Don't retry in `requirePreAllocation` and just retry in upper level
   2. Set the default value of `rss.client.send.check.timeout.ms` to a smaller 
value, such as 10.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to