On Sunday 12 June 2011 07:00 PM, François Thiebolt wrote:
Hello,

To make things clear, what I've done is :
- deploying GlusterFS on 2, 4, 8, 16, 32, 64, 128 nodes
- running a variant of the MAB benchmark (it's all about compilation of 
openssl-1.0.0) on 2, 4, 8, 16, 32, 64, 128 nodes
- I used 'pdsh -f 512' to start MAB on all nodes at the same time
- on each experiment on each node, I ran MAB  in a dedicated directory within the glusterfs 
global namespace (e.g. nodeA used<gluster global namespace>/nodeA/<mab files>) 
to avoid a metadata storm on the parent directory inode
- between each experiment, I destroy and redeploy a complete new GlusterFS 
setup (and I also destroy everything within each brick i.e the exported storage 
dir)

I then compare the average compilation time vs the number of nodes ... and it 
increases due to the round robin scheduler that dispatches files on all the 
bricks
2 : Phase_V(s)avg   249.9332121175
4 : Phase_V(s)avg   262.808117374
8 : Phase_V(s)avg   293.572061537875
16 : Phase_V(s)avg   351.436554833375
32 : Phase_V(s)avg   546.503069517844
64 : Phase_V(s)avg   1010.61019479478
(phase V is related to the compilation itself, previous phases are about 
metadata ops)
You can also try to compile a linux kernel on your own, this is pretty much the 
same thing.

Thanks much for your detailed description.
Is phase_V the only phase where you are seeing reduced performance?

With regards to your problem, since you are using the bricks also as clients, you have a NUMA kind of scenario. In the case of two bricks (and hence two client), during compilation, ~50% of the files will be available locally for the client for which the latencies will be minimal, and the other 50% with suffer additional latencies. As you increase the number of nodes, this asymmetry is seen for more number of files. So, the problem is not really the introduction of more servers, but the degree of asymmetry your application is seeing. Your numbers for 2 nodes might not be a good indicator of the average performance. Try the same experiment by separating the clients and the servers. If you still see reverse-linear performance with increased bricks/clients, we can investigate further.

Pavan


Now regarding the GlusterFS setup : yes, you're right, there is no replication 
so this is a simple stripping (on a file basis) setup
Each time, I create a glusterfs volume featuring one brick, then i add bricks 
(one by one) till I reach the number of nodes ... and after that, I start the 
volume.
Now regarding the 128bricks case, this is when I start the volume that I get a random 
error telling me that<brickX>  does not respond, and this changes every time I 
retry to start the volume.
So far, I didn't tested with a number of nodes between 64 and 128

François

On Friday, June 10, 2011 16:38 CEST, Pavan T C<t...@gluster.com>  wrote:

On Wednesday 08 June 2011 06:10 PM, Francois THIEBOLT wrote:
Hello,

I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a
first point, i've been unable to start a volume featuring 128bricks (64 ok)

Then, due to the round-robin scheduler, as the number of nodes increase
(every node is also a brick), the performance of an application on an
individual node decrease!

I would like to understand what you mean by "increase of nodes". You
have 64 bricks and each brick also acts as a client. So, where is the
increase in the number of nodes? Are you referring to the mounts that
you are doing?

What is your gluster configuration - I mean, is it a distribute only, or
is it a distributed-replicate setup? [From your command sequence, it
should be a pure distribute, but I just want to be sure].

What is your application like? Is it mostly I/O intensive? It will help
if you provide a brief description of typical operations done by your
application.

How are you measuring the performance? What parameter determines that
you are experiencing a decrease in performance with increase in the
number of nodes?

Pavan

So my question is : how to STOP the round-robin distribution of files
over the bricks within a volume ?

*** Setup ***
- i'm using glusterfs3.2 from source
- every node is both a client node and a brick (storage)
Commands :
- gluster peer probe<each of the 128nodes>
- gluster volume create myVolume transport tcp<128 bricks:/storage>
- gluster volume start myVolume (fails with 128 bricks!)
- mount -t glusterfs ...... on all nodes

Feel free to tell me how to improve things

François







_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to