Re: [Gluster-devel] Feature: Rebalance completion time estimation

2016-11-13 Thread Shyam

On 11/11/2016 05:46 AM, Susant Palai wrote:

Hello All,
   We have been receiving many requests from users to give a "Rebalance  completion 
time estimation". This email is to gather ideas and feedback from the community for 
the same. We have one proposal, but nothing is concrete. Please feel free to give your 
input for this problem.

A brief about rebalance operation:
- Rebalance process is used to rebalance data across cluster most likely in the 
event of add-brick and remove-brick. Rebalance is spawned on each node. The job 
for the process is to read directories, fix it's layout to include the newly 
added brick. Read children files(only those reside on local bricks) of the 
directory and migrate them if necessary decided by the new layout.


Here is one of the solution pitched by Manoj Pillai.

Assumptions for this idea:
 - files are of similar size.
 - Max 40% of the total files will be migrated

1- Do a statfs on the local bricks. Say the total size is St.


Why not use the f_files from statfs that shows inode count and use that 
and possibly f_ffree, to determine how many inodes are there, and then 
use the crawl, to figure out how many we have visited and how many are 
pending to determine rebalance progress.


I am not sure if the local FS (XFS say) fills up this data for use, but 
if it does, then it may provide a better estimate.



2- Based on first file size say Sf, assume the no of files in the local system 
to be: Nt
3- So the time estimation would be: (Nt * migration time for one file) * 40%.
4- Rebalance will keep updating this estimation as more files are crawled and 
will try to give a fare estimation.

Problem with this approach: This method assumes that the files size will be 
almost similar. For cluster  with variable file sizes this estimation go wrong.

So this is one initial idea. Please give your suggestions/ideas/feedback on 
this.


Thanks,
Susant








___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Feature: Rebalance completion time estimation

2016-11-11 Thread Susant Palai
Hello All,
   We have been receiving many requests from users to give a "Rebalance  
completion time estimation". This email is to gather ideas and feedback from 
the community for the same. We have one proposal, but nothing is concrete. 
Please feel free to give your input for this problem.

A brief about rebalance operation:
- Rebalance process is used to rebalance data across cluster most likely in the 
event of add-brick and remove-brick. Rebalance is spawned on each node. The job 
for the process is to read directories, fix it's layout to include the newly 
added brick. Read children files(only those reside on local bricks) of the 
directory and migrate them if necessary decided by the new layout.


Here is one of the solution pitched by Manoj Pillai.

Assumptions for this idea:
 - files are of similar size.
 - Max 40% of the total files will be migrated

1- Do a statfs on the local bricks. Say the total size is St.
2- Based on first file size say Sf, assume the no of files in the local system 
to be: Nt
3- So the time estimation would be: (Nt * migration time for one file) * 40%.
4- Rebalance will keep updating this estimation as more files are crawled and 
will try to give a fare estimation.

Problem with this approach: This method assumes that the files size will be 
almost similar. For cluster  with variable file sizes this estimation go wrong.

So this is one initial idea. Please give your suggestions/ideas/feedback on 
this.


Thanks,
Susant






 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel