Here is one test performed on a 300GB data set and around 100%(1/2 the time) 
improvement was seen.

[root@gprfs031 ~]# gluster v i
 
Volume Name: rbperf
Type: Distribute
Volume ID: 35562662-337e-4923-b862-d0bbb0748003
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: gprfs029-10ge:/bricks/gprfs029/brick1
Brick2: gprfs030-10ge:/bricks/gprfs030/brick1
Brick3: gprfs031-10ge:/bricks/gprfs031/brick1
Brick4: gprfs032-10ge:/bricks/gprfs032/brick1


Added server 32 and started rebalance force.

Rebalance stat for new changes:
[root@gprfs031 ~]# gluster v rebalance rbperf status
                                    Node Rebalanced-files          size       
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   
-----------   -----------   -----------         ------------     --------------
                               localhost            74639        36.1GB        
297319             0             0            completed            1743.00
                            172.17.40.30            67512        33.5GB        
269187             0             0            completed            1395.00
                           gprfs029-10ge            79095        38.8GB        
284105             0             0            completed            1559.00
                           gprfs032-10ge                0        0Bytes         
    0             0             0            completed             402.00
volume rebalance: rbperf: success:

Rebalance stat for old model:
[root@gprfs031 ~]# gluster v rebalance rbperf status
                                    Node Rebalanced-files          size       
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   
-----------   -----------   -----------         ------------     --------------
                               localhost            86493        42.0GB        
634302             0             0            completed            3329.00
                           gprfs029-10ge            94115        46.2GB        
687852             0             0            completed            3328.00
                           gprfs030-10ge            74314        35.9GB        
651943             0             0            completed            3072.00
                           gprfs032-10ge                0        0Bytes        
594166             0             0            completed            1943.00
volume rebalance: rbperf: success:


Best regards,
Susant

----- Original Message -----
From: "Vijay Bellur" <vbel...@redhat.com>
To: "Susant Palai" <spa...@redhat.com>, "Gluster Devel" 
<gluster-devel@gluster.org>
Sent: Monday, 6 April, 2015 7:39:27 PM
Subject: Re: [Gluster-devel] Rebalance improvement design

On 03/31/2015 01:49 PM, Susant Palai wrote:
> Hi,
>     Posted patch for rebalance improvement here: 
> http://review.gluster.org/#/c/9657/ .
> You can find the feature page here: 
> http://www.gluster.org/community/documentation/index.php/Features/improve_rebalance_performance
>
> The current patch address two part of the design proposed.
> 1. Rebalance multiple files in parallel
> 2. Crawl only bricks that belong to the current node
>
> Brief design explanation for the above two points.
>
> 1. Rebalance multiple files in parallel:
>     -------------------------------------
>
>          The existing rebalance engine is single threaded. Hence, introduced 
> multiple threads which will be running parallel to the crawler.
>          The current rebalance migration is converted to a 
> "Producer-Consumer" frame work.
>          Where Producer is : Crawler
>                Consumer is : Migrating Threads
>
>          Crawler: Crawler is the main thread. The job of the crawler is now 
> limited to fix-layout of each directory and add the files
>                   which are eligible for the migration to a global queue. 
> Hence, the crawler will not be "blocked" by migration process.
>
>         Producer: Producer will monitor the global queue. If any file is 
> added to this queue, it will dqueue that entry and migrate the file.
>                  Currently 15 migration threads are spawned at the beginning 
> of the rebalance process. Hence, multiple file migration
>                  happens in parallel.
>
>
> 2. Crawl only bricks that belong to the current node:
>     --------------------------------------------------
>
>             As rebalance process is spawned per node, it migrates only the 
> files that belongs to it's own node for the sake of load
>             balancing. But it also reads entries from the whole cluster, 
> which is not necessary as readdir hits other nodes.
>
>       New Design:
>             As part of the new design the rebalancer decides the subvols that 
> are local to the rebalancer node by checking the node-uuid of
>             root directory prior to the crawler starts. Hence, readdir won't 
> hit the whole cluster  as it has already the context of
>            local subvols and also node-uuid request for each file can be 
> avoided. This makes the rebalance process "more scalable".
>
>

The approaches outlined do look good.

Do you have rebalance comparison numbers before and after this patchset?

Thanks,
Vijay
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to