[Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-04 Thread Harald Hannelius


This has probably been discussed before, but since I'm new on the list I hope 
You have patience with me.


I have a four brick distributed/replicated setup. The computers are multi-core 
16GB memory and 2*2.0TB in raid1 SATA-disks locally. The nodes are connected by 
1 GB ethernet. All nodes have glusterfs 3.3beta2 installed and they are running 
debian 6 64bit. The underlying filesystems are xfs.


I have setup a volume like so;

gluster volume create virtuals replica 2 transport tcp \
  adraste:/data/brick alcippe:/data/brick aethra:/data/brick helen:/data/brick

Which resulted in a nice volume;

# gluster volume info virtuals

Volume Name: virtuals
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: adraste:/data/brick
Brick2: alcippe:/data/brick
Brick3: aethra:/data/brick
Brick4: helen:/data/brick

All seems OK so far, but write performance seems very slow. When writing to 
localhost:/virtuals I get single-digit MB/s performance which isn't really what 
I had expected. I know that the write has to go to at least two (?) nodes at 
the same time, but still?


A single scp of a 1GB file from a node to another gives something like 
~100MBps.


A copy of a virtual image took 17 minutes;

# time cp debtest.raw /gluster/debtest.img

real17m36.727s
user0m1.832s
sys 0m14.081s

# ls -lah /gluster/debtest.img
-rw--- 1 root root 20G Mar  1 12:35 /gluster/debtest.img
# du -ah /gluster/debtest.img
4.5G/gluster/debtest.img

I noted that the processlist shows that direct-io-mode is disabled. Default 
should be on, or should it?


Any help is really appreciated!

--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-03 Thread Harald Hannelius


On Fri, 2 Mar 2012, Bryan Whitehead wrote:


I'd try putting all hostnames in /etc/hosts. Also, can you post ping times
between each host ?


They are in /etc/hosts.

# ping6 -c3 alcippe
PING alcippe(alcippe) 56 data bytes
64 bytes from alcippe: icmp_seq=1 ttl=64 time=0.160 ms
64 bytes from alcippe: icmp_seq=2 ttl=64 time=0.088 ms
64 bytes from alcippe: icmp_seq=3 ttl=64 time=0.150 ms

--- alcippe ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.088/0.132/0.160/0.034 ms

 # ping6 -c3 aethra
PING aethra(aethra.arcada.fi) 56 data bytes
64 bytes from aethra.arcada.fi: icmp_seq=1 ttl=64 time=0.154 ms
64 bytes from aethra.arcada.fi: icmp_seq=2 ttl=64 time=0.158 ms
64 bytes from aethra.arcada.fi: icmp_seq=3 ttl=64 time=0.164 ms

--- aethra ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.154/0.158/0.164/0.015 ms

# ping6 -c3 adraste
PING adraste(adraste) 56 data bytes
64 bytes from adraste: icmp_seq=1 ttl=255 time=0.165 ms
64 bytes from adraste: icmp_seq=2 ttl=255 time=0.155 ms
64 bytes from adraste: icmp_seq=3 ttl=255 time=0.187 ms

--- adraste ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.155/0.169/0.187/0.013 ms

As said before, I don't think there's a problem with the LAN. Trust me, I 
would know about it :)





On Fri, Mar 2, 2012 at 8:55 AM, Harald Hannelius 
wrote:

  On Fri, 2 Mar 2012, Brian Candler wrote:

  On Fri, Mar 02, 2012 at 05:25:18PM +0200, Harald Hannelius
  wrote:
I'll have to test with just a two-way replica,
and see if I get
better performance out of that. I'm gonna loose
the capability to
have one node at the other site then


  Ah... are these nodes separated by a WAN? Synchronous
  replication is pretty
  sensitive to latency.  You might want to look at
  geo-replication instead
  (which I've not tested)


No, it's a 1 Gbps LAN. The other "site" is within LAN-range.

--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users






--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-02 Thread Harald Hannelius


On Fri, 2 Mar 2012, Brian Candler wrote:


On Fri, Mar 02, 2012 at 05:25:18PM +0200, Harald Hannelius wrote:

I'll have to test with just a two-way replica, and see if I get
better performance out of that. I'm gonna loose the capability to
have one node at the other site then


Ah... are these nodes separated by a WAN? Synchronous replication is pretty
sensitive to latency.  You might want to look at geo-replication instead
(which I've not tested)


No, it's a 1 Gbps LAN. The other "site" is within LAN-range.

--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-02 Thread Harald Hannelius


On Fri, 2 Mar 2012, Brian Candler wrote:


On Fri, Mar 02, 2012 at 03:33:19PM +0200, Harald Hannelius wrote:

The pattern for me starts to look like this;

  max-write-speed ~= /nodes.


This is most odd. If you are using a regular replicated+distributed (not
striped) volume, then each file operation will be directed to one pair of
servers. The dd should just hit two servers and the other two will be idle.
So I don't see why your 4-node setup should perform any differently to a
2-node one.


I'll have to test with just a two-way replica, and see if I get better 
performance out of that. I'm gonna loose the capability to have one node at 
the other site then, but write performance is more important right now.


It could be a good idea to have another ethernet-connection interconnect 
private between the nodes as well, I suppose?


Hopefully 10Gbps will get cheaper soon.

--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-02 Thread Harald Hannelius


On Fri, 2 Mar 2012, Samuli Heinonen wrote:


2.3.2012 15:33, Harald Hannelius kirjoitti:

The pattern for me starts to look like this;

max-write-speed ~= /nodes.


Have you tried tuning performance.io-thread-count setting? More information 
about that can be found at 
http://docs.redhat.com/docs/en-US/Red_Hat_Storage_Software_Appliance/3.2/html/User_Guide/chap-User_Guide-Managing_Volumes.html


Yes, as in a previous post;

# gluster volume info

Volume Name: virtuals
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: adraste:/data/brick
Brick2: alcippe:/data/brick
Brick3: aethra:/data/brick
Brick4: helen:/data/brick
Options Reconfigured:
cluster.data-self-heal-algorithm: diff
cluster.self-heal-window-size: 1
performance.io-thread-count: 64
performance.cache-size: 536870912
performance.write-behind-window-size: 16777216
performance.flush-behind: on


--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-02 Thread Harald Hannelius


On Fri, 2 Mar 2012, Brian Candler wrote:


On Fri, Mar 02, 2012 at 02:41:30PM +0200, Harald Hannelius wrote:

So next is back to the four-node setup you had before. I would expect that
to perform about the same.


So would I expect too. But;

# time dd if=/dev/zero bs=1M count=2 of=/gluster/testfile
2+0 records in
2+0 records out
2097152 bytes (21 GB) copied, 1058.22 s, 19.8 MB/s

real17m38.357s
user0m0.040s
sys 0m12.501s


Right, so we know:

- replic of aethra and alcippe is fast
- distrib/replic across all four nodes is slow

So chopping further, what about:

- replic of adraste and helen?


The pattern for me starts to look like this;

  max-write-speed ~= /nodes.

Volume Name: test
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: adraste:/data/single
Brick2: helen:/data/single

# time dd if=/dev/zero bs=1M count=1 
of=/mnt/testfile

1+0 records in
1+0 records out
1048576 bytes (10 GB) copied, 195.816 s, 53.5 MB/s

real3m15.821s
user0m0.016s
sys 0m8.169s



This would show whether one of these nodes is at fault.


At least I got double figure readings this time. Sometimes I get
write speeds of 5-6 MB/s.


Well, I'm a bit lost when you start talking about VMs. Is this a production
environment, and you are doing these dd/cp tests *in addition* to the
production load of VM traffic?  Or are you doing tests on an unloaded
system?


I have some systems running in the background yes. They are not really 
production machines.



Note: mail servers have a nasty habit of doing fsync() all the time, for
every single received message.


It looks like openldap's slapadd uses some kind of sync as well. The 
load-average on the KVM-host was up at 9.00 while slapadd was running.



Tools which might be useful to observe the production load:

 iostat 1
 # shows the count of I/O requests and KB read/written per second


iotop is handy too.


 btrace /dev/sdb | grep ' [DC] '
 # shows the actual I/O operations dispatched (D) and completed (C)
 # to the drive

There are also gluster-layer tools but I've not tried them:
http://download.gluster.com/pub/gluster/glusterfs/3.2/Documentation/AG/html/chap-Gluster_Administration_Guide-Monitor_Workload.html

Regards,

Brian.




--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-02 Thread Harald Hannelius


On Fri, 2 Mar 2012, Brian Candler wrote:

On Fri, Mar 02, 2012 at 01:02:39PM +0200, Harald Hannelius wrote:

If both are fast: then retest using a two-node replicated volume.


gluster volume create test replica 2 transport tcp
aethra:/data/single alcippe:/data/single

Volume Name: test
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: aethra:/data/single
Brick2: alcippe:/data/single

# time dd if=/dev/zero bs=1M count=2 of=/mnt/testfile
2+0 records in
2+0 records out
2097152 bytes (21 GB) copied, 426.62 s, 49.2 MB/s

real7m6.625s
user0m0.040s
sys 0m12.293s

As expected, roughly half of the single node setup. I could live
with that too.


So next is back to the four-node setup you had before. I would expect that
to perform about the same.


So would I expect too. But;

# time dd if=/dev/zero bs=1M count=2 of=/gluster/testfile
2+0 records in
2+0 records out
2097152 bytes (21 GB) copied, 1058.22 s, 19.8 MB/s

real17m38.357s
user0m0.040s
sys 0m12.501s

# gluster volume info

Volume Name: virtuals
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: adraste:/data/brick
Brick2: alcippe:/data/brick
Brick3: aethra:/data/brick
Brick4: helen:/data/brick
Options Reconfigured:
cluster.data-self-heal-algorithm: diff
cluster.self-heal-window-size: 1
performance.io-thread-count: 64
performance.cache-size: 536870912
performance.write-behind-window-size: 16777216
performance.flush-behind: on

At the same time nagios tries to empty my cell phone battery when virtual 
hosts don't respond to ping anymore. That virtual host is a mailserver and 
it receives e-mail. I guess that sendmail+procmail+imapd generates some I/O.


At least I got double figure readings this time. Sometimes I get write 
speeds of 5-6 MB/s.



If you have problems with high levels of concurrency, this might be a
problem with the number of I/O threads which gluster creates.  You actually
only get log(2) of the number of outstanding requests in the queue.

I made a (stupid, non-production) patch which got around this problem in
my benchmarking:
http://gluster.org/pipermail/gluster-users/2012-February/009590.html

IMO it would be better to be able to configure the *minimum* number of I/O
threads to spawn.  You can configure the maximum but it will almost never be
reached.

Regards,

Brian.




--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Write performance in a replicated/distributed setup with KVM?

2012-03-02 Thread Harald Hannelius


This has probably been discussed before, but since I'm new on the list I hope 
You have patience with me.


I have a four brick distributed/replicated setup. The computers are multi-core 
16GB memory and 2*2.0TB in raid1 SATA-disks locally. The nodes are connected by 
1 GB ethernet. All nodes have glusterfs 3.3beta2 installed and they are running 
debian 6 64bit. The underlying filesystems are xfs.


I have setup a volume like so;

gluster volume create virtuals replica 2 transport tcp \
  adraste:/data/brick alcippe:/data/brick aethra:/data/brick helen:/data/brick

Which resulted in a nice volume;

# gluster volume info virtuals

Volume Name: virtuals
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: adraste:/data/brick
Brick2: alcippe:/data/brick
Brick3: aethra:/data/brick
Brick4: helen:/data/brick

All seems OK so far, but write performance seems very slow. When writing to 
localhost:/virtuals I get single-digit MB/s performance which isn't really what 
I had expected. I know that the write has to go to at least two (?) nodes at 
the same time, but still?


A single scp of a 1GB file from a node to another gives something like 
~100MBps.


A copy of a virtual image took 17 minutes;

# time cp debtest.raw /gluster/debtest.img

real17m36.727s
user0m1.832s
sys 0m14.081s

# ls -lah /gluster/debtest.img
-rw--- 1 root root 20G Mar  1 12:35 /gluster/debtest.img
# du -ah /gluster/debtest.img
4.5G/gluster/debtest.img

I noted that the processlist shows that direct-io-mode is disabled. Default 
should be on, or should it?


Any help is really appreciated!

--

Harald Hannelius | harald.hannelius/a\arcada.fi | +358 50 594 1020
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users