jerqi commented on issue #955:
URL: 
https://github.com/apache/incubator-uniffle/issues/955#issuecomment-1594507361

   > > It's enough for several TBs shuffle with 1000 executors to use 9 shuffle 
server.
   > 
   > very impressive. how much storage ( disk ) do you typically attach per 
shuffle server? we have some jobs that shuffle almost 150TB of data. One could 
argue that the job needs to be re-written but as a platform we mostly have no 
control over when the job gets fixed to reduce the shuffle
   
   For 150TB shuffle, some config options will be recommended.
   First. we would like to use MEMORY_LOCALFILE_HDFS, because HDFS have more IO 
resource and more disk space.
   Second, use more shuffle servers, may be more than 20.
   Third. use `single.buffer.limit` and 
`rss.server.max.concurrency.of.per-partition.write`. When reduce partition 
reach a size, we will flush it to HDFS and 
`rss.server.max.concurrency.of.per-partition.write` will use multiple threads 
to write HDFS data.
   This feature will have better effect after 
https://github.com/apache/incubator-uniffle/pull/775
   Fourth. We haven't support S3. Because community don't have enough people 
although we have similar plan. S3 is different from HDFS, it will have more 
optimization.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to