[GitHub] [incubator-uniffle] jerqi commented on issue #186: [Feature] Select remoteStoragePath based on the length of files and the remaining space from namespace

GitBox Tue, 23 Aug 2022 23:32:18 -0700


jerqi commented on issue #186:
URL: 
https://github.com/apache/incubator-uniffle/issues/186#issuecomment-1225260090


   There are many things to consider about HDFS allocation. 
   First, the scale of HDFS cluster. There are more DataNodes, the cluster can 
provide more IO capability.
   Second, the remaining space of HDFS cluster. If a shuffle will use too much 
space, we should give it a enough HDFS cluster, but we should notice that 
shuffle is a temporary data, we will delete them after we use them. Shuffle 
data usually don't require too much space like input data and output data.
   Third, if you choose to use HDFS with other users, we also need to care the 
stability of HDFS cluster. If HDFS cluster have two many retries, we should 
allocate less application to it.
   Fourth, we can't forecast how big the shuffle is when we allocate HDFS 
cluster to it. So we only assume that the one shuffle with big shuffle is the 
same as the one with small shuffle, it's absolutely wrong in the production 
cluster. But I don't have any ideas about it.
   Finally, it's ok for me to add a new strategy. But we should separate the 
mechanism from strategy and have some data in production environment to improve 
the effectiveness of the strategy. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-uniffle] jerqi commented on issue #186: [Feature] Select remoteStoragePath based on the length of files and the remaining space from namespace

Reply via email to