Re: [Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question

2011-06-14 Thread François Thiebolt
Hello,

To make things clear, what I've done is :
- deploying GlusterFS on 2, 4, 8, 16, 32, 64, 128 nodes
- running a variant of the MAB benchmark (it's all about compilation of 
openssl-1.0.0) on 2, 4, 8, 16, 32, 64, 128 nodes
- I used 'pdsh -f 512' to start MAB on all nodes at the same time
- on each experiment on each node, I ran MAB  in a dedicated directory within 
the glusterfs global namespace (e.g. nodeA used gluster global 
namespace/nodeA/mab files) to avoid a metadata storm on the parent directory 
inode
- between each experiment, I destroy and redeploy a complete new GlusterFS 
setup (and I also destroy everything within each brick i.e the exported storage 
dir)

I then compare the average compilation time vs the number of nodes ... and it 
increases due to the round robin scheduler that dispatches files on all the 
bricks
2 : Phase_V(s)avg   249.9332121175
4 : Phase_V(s)avg   262.808117374
8 : Phase_V(s)avg   293.572061537875
16 : Phase_V(s)avg   351.436554833375
32 : Phase_V(s)avg   546.503069517844
64 : Phase_V(s)avg   1010.61019479478
(phase V is related to the compilation itself, previous phases are about 
metadata ops)
You can also try to compile a linux kernel on your own, this is pretty much the 
same thing.

Now regarding the GlusterFS setup : yes, you're right, there is no replication 
so this is a simple stripping (on a file basis) setup
Each time, I create a glusterfs volume featuring one brick, then i add bricks 
(one by one) till I reach the number of nodes ... and after that, I start the 
volume.
Now regarding the 128bricks case, this is when I start the volume that I get a 
random error telling me that brickX does not respond, and this changes every 
time I retry to start the volume.
So far, I didn't tested with a number of nodes between 64 and 128

François
 
On Friday, June 10, 2011 16:38 CEST, Pavan T C t...@gluster.com wrote: 
 
 On Wednesday 08 June 2011 06:10 PM, Francois THIEBOLT wrote:
  Hello,
 
  I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a
  first point, i've been unable to start a volume featuring 128bricks (64 ok)
 
  Then, due to the round-robin scheduler, as the number of nodes increase
  (every node is also a brick), the performance of an application on an
  individual node decrease!
 
 I would like to understand what you mean by increase of nodes. You 
 have 64 bricks and each brick also acts as a client. So, where is the 
 increase in the number of nodes? Are you referring to the mounts that 
 you are doing?
 
 What is your gluster configuration - I mean, is it a distribute only, or 
 is it a distributed-replicate setup? [From your command sequence, it 
 should be a pure distribute, but I just want to be sure].
 
 What is your application like? Is it mostly I/O intensive? It will help 
 if you provide a brief description of typical operations done by your 
 application.
 
 How are you measuring the performance? What parameter determines that 
 you are experiencing a decrease in performance with increase in the 
 number of nodes?
 
 Pavan
 
  So my question is : how to STOP the round-robin distribution of files
  over the bricks within a volume ?
 
  *** Setup ***
  - i'm using glusterfs3.2 from source
  - every node is both a client node and a brick (storage)
  Commands :
  - gluster peer probe each of the 128nodes
  - gluster volume create myVolume transport tcp 128 bricks:/storage
  - gluster volume start myVolume (fails with 128 bricks!)
  - mount -t glusterfs .. on all nodes
 
  Feel free to tell me how to improve things
 
  François
 
 
 
 
 
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question

2011-06-12 Thread Pavan T C

On Sunday 12 June 2011 07:00 PM, François Thiebolt wrote:

Hello,

To make things clear, what I've done is :
- deploying GlusterFS on 2, 4, 8, 16, 32, 64, 128 nodes
- running a variant of the MAB benchmark (it's all about compilation of 
openssl-1.0.0) on 2, 4, 8, 16, 32, 64, 128 nodes
- I used 'pdsh -f 512' to start MAB on all nodes at the same time
- on each experiment on each node, I ran MAB  in a dedicated directory within the glusterfs 
global namespace (e.g. nodeA usedgluster global namespace/nodeA/mab files) 
to avoid a metadata storm on the parent directory inode
- between each experiment, I destroy and redeploy a complete new GlusterFS 
setup (and I also destroy everything within each brick i.e the exported storage 
dir)

I then compare the average compilation time vs the number of nodes ... and it 
increases due to the round robin scheduler that dispatches files on all the 
bricks
2 : Phase_V(s)avg   249.9332121175
4 : Phase_V(s)avg   262.808117374
8 : Phase_V(s)avg   293.572061537875
16 : Phase_V(s)avg   351.436554833375
32 : Phase_V(s)avg   546.503069517844
64 : Phase_V(s)avg   1010.61019479478
(phase V is related to the compilation itself, previous phases are about 
metadata ops)
You can also try to compile a linux kernel on your own, this is pretty much the 
same thing.


Thanks much for your detailed description.
Is phase_V the only phase where you are seeing reduced performance?

With regards to your problem, since you are using the bricks also as 
clients, you have a NUMA kind of scenario. In the case of two bricks 
(and hence two client), during compilation, ~50% of the files will be 
available locally for the client for which the latencies will be 
minimal, and the other 50% with suffer additional latencies. As you 
increase the number of nodes, this asymmetry is seen for more number of 
files.
So, the problem is not really the introduction of more servers, but the 
degree of asymmetry your application is seeing. Your numbers for 2 nodes 
might not be a good indicator of the average performance. Try the same 
experiment by separating the clients and the servers. If you still see 
reverse-linear performance with increased bricks/clients, we can 
investigate further.


Pavan



Now regarding the GlusterFS setup : yes, you're right, there is no replication 
so this is a simple stripping (on a file basis) setup
Each time, I create a glusterfs volume featuring one brick, then i add bricks 
(one by one) till I reach the number of nodes ... and after that, I start the 
volume.
Now regarding the 128bricks case, this is when I start the volume that I get a random 
error telling me thatbrickX  does not respond, and this changes every time I 
retry to start the volume.
So far, I didn't tested with a number of nodes between 64 and 128

François

On Friday, June 10, 2011 16:38 CEST, Pavan T Ct...@gluster.com  wrote:


On Wednesday 08 June 2011 06:10 PM, Francois THIEBOLT wrote:

Hello,

I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a
first point, i've been unable to start a volume featuring 128bricks (64 ok)

Then, due to the round-robin scheduler, as the number of nodes increase
(every node is also a brick), the performance of an application on an
individual node decrease!


I would like to understand what you mean by increase of nodes. You
have 64 bricks and each brick also acts as a client. So, where is the
increase in the number of nodes? Are you referring to the mounts that
you are doing?

What is your gluster configuration - I mean, is it a distribute only, or
is it a distributed-replicate setup? [From your command sequence, it
should be a pure distribute, but I just want to be sure].

What is your application like? Is it mostly I/O intensive? It will help
if you provide a brief description of typical operations done by your
application.

How are you measuring the performance? What parameter determines that
you are experiencing a decrease in performance with increase in the
number of nodes?

Pavan


So my question is : how to STOP the round-robin distribution of files
over the bricks within a volume ?

*** Setup ***
- i'm using glusterfs3.2 from source
- every node is both a client node and a brick (storage)
Commands :
- gluster peer probeeach of the 128nodes
- gluster volume create myVolume transport tcp128 bricks:/storage
- gluster volume start myVolume (fails with 128 bricks!)
- mount -t glusterfs .. on all nodes

Feel free to tell me how to improve things

François










___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question

2011-06-10 Thread Francois THIEBOLT

Hello,

I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a 
first point, i've been unable to start a volume featuring 128bricks (64 ok)


Then, due to the round-robin scheduler, as the number of nodes increase 
(every node is also a brick), the performance of an application on an 
individual node decrease!
So my question is : how to STOP the round-robin distribution of files 
over the bricks within a volume ?


*** Setup ***
- i'm using glusterfs3.2 from source
- every node is both a client node and a brick (storage)
Commands :
- gluster peer probe each of the 128nodes
- gluster volume create myVolume transport tcp 128 bricks:/storage
- gluster volume start myVolume (fails with 128 bricks!)
- mount -t glusterfs .. on all nodes

Feel free to tell me how to improve things

François

--
-
THIEBOLT Francois \ Your computer seems overloaded ?
UPS Toulouse III  \ - Check that nobody's asked for tea !
thieb...@irit.fr  \ The Hitchhiker's Guide to the Galaxy D.Adams

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question

2011-06-10 Thread Amar Tumballi
Hi Francois,

Answers inline.

On Wed, Jun 8, 2011 at 6:10 PM, Francois THIEBOLT thieb...@irit.fr wrote:

 Hello,

 I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a
 first point, i've been unable to start a volume featuring 128bricks (64 ok)

 This looks similar to the bug http://bugs.gluster.com/show_bug.cgi?id=2941 The
fix should be available with 3.2.1 release, which should be out very soon.

Also we are working on scalability of 'glusterd', the glusterfs management
daemon, after which it should work fine. One work around for now is, create
a volume with 64 bricks and do the 'add-brick' of another 64 brick. It
should work fine.



 Then, due to the round-robin scheduler, as the number of nodes increase
 (every node is also a brick), the performance of an application on an
 individual node decrease!
 So my question is : how to STOP the round-robin distribution of files over
 the bricks within a volume ?


There is no 'Scheduler' in picture here with GlusterFS 3.2.x (for that
matter from 3.0.x releases), hence there is no option to stop the scheduler.

Regards,
Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster3.2@Grid5000] 128 nodes failure and rr scheduler question

2011-06-10 Thread Pavan T C

On Wednesday 08 June 2011 06:10 PM, Francois THIEBOLT wrote:

Hello,

I'm driving some experiments on grid'5000 with GlusterFS 3.2 and, as a
first point, i've been unable to start a volume featuring 128bricks (64 ok)

Then, due to the round-robin scheduler, as the number of nodes increase
(every node is also a brick), the performance of an application on an
individual node decrease!


I would like to understand what you mean by increase of nodes. You 
have 64 bricks and each brick also acts as a client. So, where is the 
increase in the number of nodes? Are you referring to the mounts that 
you are doing?


What is your gluster configuration - I mean, is it a distribute only, or 
is it a distributed-replicate setup? [From your command sequence, it 
should be a pure distribute, but I just want to be sure].


What is your application like? Is it mostly I/O intensive? It will help 
if you provide a brief description of typical operations done by your 
application.


How are you measuring the performance? What parameter determines that 
you are experiencing a decrease in performance with increase in the 
number of nodes?


Pavan


So my question is : how to STOP the round-robin distribution of files
over the bricks within a volume ?

*** Setup ***
- i'm using glusterfs3.2 from source
- every node is both a client node and a brick (storage)
Commands :
- gluster peer probe each of the 128nodes
- gluster volume create myVolume transport tcp 128 bricks:/storage
- gluster volume start myVolume (fails with 128 bricks!)
- mount -t glusterfs .. on all nodes

Feel free to tell me how to improve things

François



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users