Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-09-21 Thread Alex
Hi Brian, I'm just wondering if you had any luck with figuring out performance
limitations of your setup. I'm testing a similar configuration, so any tips or
recommendations would be much appreciated. Thanks, --Alex


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-09 Thread Brian Candler
On Fri, Jun 08, 2012 at 09:30:19PM +0100, Brian Candler wrote:
 ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros2 bs=1024k count=100
 100+0 records in
 100+0 records out
 104857600 bytes (105 MB) copied, 14.5182 s, 7.2 MB/s
 
 And this is after live-migrating the VM to dev-storage2:
 
 ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros3 bs=1024k count=100
 100+0 records in
 100+0 records out
 104857600 bytes (105 MB) copied, 4.17285 s, 25.1 MB/s

I did some more timings after converting the qcow2 image to a raw file. 
Note that you have to be careful: qemu-img convert -O raw will give you a
sparse file, not actually allocating space on disk.  So I had to flatten it
with dd (which incidentally showed a reasonable write throughput of
~350MB/sec to the 12-disk RAID10 array, and was the same writing locally or
writing to a single-brick gluster volume)

Tests:

1. VM using a single-brick gluster volume as backend. The brick is on the
same node as KVM is running.  (Actually the second cluster node was powered
off for all these tests)

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros4 bs=1024k count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 55.9581 s, 9.4 MB/s

(Strangely this is lower than the 25MB/s I got before)

2. VM image stored directly on the RAID10 array - no gluster.

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros4 bs=1024k count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 10.6027 s, 49.4 MB/s

3. Same VM instance after test 2, but this time with option cache='none'
(which doesn't work with glusterfs)

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros5 bs=1024k count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 2.29959 s, 228 MB/s

That's more like it :-)

So clearly cache='none' (O_DIRECT) makes a big difference when using a
local filesystem, so I'd very much like to be able to test it with gluster.

I'd also very much look forward to having libglusterfs integrated directly
into KVM, which I believe is on the cards at some point:
http://www.mail-archive.com/users@ovirt.org/msg01812.html

Regards,

Brian.

P.S. for those who haven't seen it yet, there's a very nice Red Hat
presentation on KVM performance tuning here.
http://www.linux-kvm.org/wiki/images/5/59/Kvm-forum-2011-performance-improvements-optimizations-D.pdf
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-09 Thread Brian Candler
On Sat, Jun 09, 2012 at 09:53:05AM +0100, Brian Candler wrote:
 So clearly cache='none' (O_DIRECT) makes a big difference when using a
 local filesystem, so I'd very much like to be able to test it with gluster.

Aha, O_DIRECT is in 3.4+:
http://comments.gmane.org/gmane.comp.file-systems.gluster.user/8916
http://lwn.net/Articles/476978/

So I upgraded this box to a mainline 3.4.0 kernel from
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-precise/

After this:

* KVM *does* boot with the cache='none' option :-)
* However performance is pretty much unchanged :-(

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros5 bs=1024k count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 10.184 s, 51.5 MB/s

(As a reminder: that's KVM talking to a single-brick gluster volume,
FUSE-mounted on the same node.  Other tests showed 248MB/s with KVM guest
talking to the RAID10 array directly, and 350MB/s with the host talking to
the RAID10 array directly)

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-09 Thread Brian Candler
Final point. I tried remounting the volume using an undocumented setting I
saw in another posting:

mount -o direct-io-mode=enable -t glusterfs dev-storage1:/single1 
/gluster/single1

But with that, and KVM also using cache=none, the VM simply hung on startup.
This looks like a bug to me.

With this same mount I was able to restart the VM without cache=none, but
then performance was terrible:

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros6 bs=1024k
count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 122.493 s, 4.3 MB/s

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-08 Thread Brian Candler
On Fri, Jun 08, 2012 at 12:19:58AM -0400, olav johansen wrote:
# mount -t glusterfs fs1:/data-storage /storage
I've copied over my data to it again and doing a ls several times,
takes ~0.5 seconds:
[@web1 files]# time ls -all|wc -l

Like I said before, please also try without the -l flags and compare the
results.

My guess is that ls -al or ls -alR are not representative of the *real*
workload you are going to ask of your system (i.e. scan all the files in
this directory, sequentially, and perform a stat() call on each one in
turn) - but please contradict me if I'm wrong.

However you need to measure how much cost that -l is giving you.

Doing the same thing on the raw os files on one node takes 0.021s
[@fs2 files]# time ls -all|wc -l
1989
real0m0.021s
user0m0.007s
sys 0m0.015s

In that case it's probably all coming from cache. If you wanted to test
actual disk performance then you would do

echo 3 /proc/sys/vm/drop_caches

before each test (on both client and server, if they are different
machines).

But from what you say, it sounds like you are actually more interested in
the cached answers anyway.

Just as crazy reference, on another single server with SSD's (Raid 10)
drives I get:
files# time ls -alR|wc -l
2260484
real0m15.761s
user0m5.170s
sys 0m7.670s
For the same operation. (this server even have more files...)

You are not comparing like-for-like. A replicated volume behaves very
differently from a single brick or distributed volume, as explained before.

If you compared a two-brick (HD) setup with an identical two-brick (SSD)
setup then that would be meaningful.  I would expect that if everything is
cacheable then you'd get the same results for both.  In that case, what
you'd show is that the latency for open/stat and heal is the cause of the
delay.

Like I said before, I expect that adding the -l flag to ls is giving you
lots of cumulative latency.

This means that the server is actually idle for a lot of the time, while
it's waiting for the next request. So the server has spare capacity for
handling other clients.

In other words: if your real workload is actually lots of clients accessing
the system concurrently, you'll get a much better total throughput than the
simple tests you are doing, which are a single client performing single
operations one after the other.

If I added two more bricks to the cluster / replicated, would this
double read speed?

Definitely not. The latency would be the same, it's just that some requests
would go to bricks A and B, and other requests would go to bricks C and D.
The other two bricks would be idle, and would not speed things up.

However, if you had concurrent accesses from multiple clients, the extra
bricks would give extra capacity so that the total *throughput* would be
higher when there are multiple clients active.

So I repeat my advice before. If you really want to understand where the
performance issues are coming from, these two tests may highlight them:

* Compare the same 2-brick replicated volume,
  using ls -aR versus ls -laR

* Compare a 2-brick replicated volume to a 2-brick distributed volume,
  using ls -laR on both

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-08 Thread Brian Candler
On Thu, Jun 07, 2012 at 02:36:26PM +0100, Brian Candler wrote:
 I'm interested in understanding this, especially the split-brain scenarios
 (better to understand them *before* you're stuck in a problem :-)
 
 BTW I'm in the process of building a 2-node 3.3 test cluster right now.

FYI, I have got KVM working with a glusterfs 3.3.0 replicated volume as the
image store.

There are two nodes, both running as glusterfs storage and as KVM hosts.

I build a 10.04 ubuntu image using vmbuilder, stored on the replicated
glusterfs volume:

vmbuilder kvm ubuntu --hostname lucidtest --mem 512 --debug --rootsize 
20480 --dest /gluster/safe/images/lucidtest

I was able to fire it up (virsh start lucidtest), ssh into it, and then
live-migrate it to another host:

brian@dev-storage1:~$ virsh migrate --live lucidtest 
qemu+ssh://dev-storage2/system
brian@dev-storage2's password: 

brian@dev-storage1:~$ virsh list
 Id Name State
--

brian@dev-storage1:~$ 

And I live-migrated it back again, all without the ssh session being
interrupted.

I then rebooted the second storage server. While it was rebooting I did
some work in the VM which grew its image. When the second storage server
came back, it resynchronised the image immediately and automatically. Here
is the relevant entry from /var/log/glusterfs/glustershd.log on the first
(non-rebooted) machine:

[2012-06-08 17:08:40.817893] E [socket.c:1715:socket_connect_finish] 
0-safe-client-1: connection to 10.0.1.2:24009 failed (Connection timed out)
[2012-06-08 17:09:10.698272] I 
[client-handshake.c:1636:select_server_supported_programs] 0-safe-client-1: 
Using Program GlusterFS 3.3.0, Num (1298437), Version (330)
[2012-06-08 17:09:10.700197] I 
[client-handshake.c:1433:client_setvolume_cbk] 0-safe-client-1: Connected to 
10.0.1.2:24009, attached to remote volume '/disk/storage2/safe'.
[2012-06-08 17:09:10.700234] I 
[client-handshake.c:1445:client_setvolume_cbk] 0-safe-client-1: Server and 
Client lk-version numbers are not same, reopening the fds
[2012-06-08 17:09:10.701901] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-safe-client-1: Server lk 
version = 1
[2012-06-08 17:09:14.699571] I 
[afr-common.c:1189:afr_detect_self_heal_by_iatt] 0-safe-replicate-0: size 
differs for gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff 
[2012-06-08 17:09:14.699616] I [afr-common.c:1340:afr_launch_self_heal] 
0-safe-replicate-0: background  data self-heal triggered. path: 
gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff, reason: lookup detected pending 
operations
[2012-06-08 17:09:18.230855] I 
[afr-self-heal-algorithm.c:122:sh_loop_driver_done] 0-safe-replicate-0: diff 
self-heal on gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff: completed. (19 blocks 
of 3299 were different (0.58%))
[2012-06-08 17:09:18.232520] I 
[afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-safe-replicate-0: 
background  data self-heal completed on 
gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff

So at first glance this is extremely impressive. It's also very new and
shiny, and I wonder how many edge cases remain to be debugged in live use,
but I can't argue that it's very neat indeed!

Performance-wise:

(1) on the storage/VM host, which has the replicated volume mounted via FUSE:

root@dev-storage1:~# dd if=/dev/zero of=/gluster/safe/test.zeros bs=1024k 
count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 2.7086 s, 194 MB/s

(The bricks have a 12-disk md RAID10 array, far-2 layout, and there's
probably scope for some performance tweaking here)

(2) however from within the VM guest, performance was very poor (2.2MB/s).

I tried my usual tuning options:

driver name='qemu' type='qcow2' io='native' cache='none'/
...
target dev='vda' bus='virtio'/
!-- delete address type='drive' controller='0' bus='0' unit='0'/ --

but glusterfs objected to the cache='none' option (possibly this opens the
file with O_DIRECT?)

# virsh start lucidtest
virsherror: Failed to start domain lucidtest
error: internal error process exited while connecting to monitor: char 
device redirected to /dev/pts/0
kvm: -drive 
file=/gluster/safe/images/lucidtest/tmpaJqTD9.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native:
 could not open disk image /gluster/safe/images/lucidtest/tmpaJqTD9.qcow2: 
Invalid argument

The VM boots with io='native' and bus='virtio', but performance is still
very poor:

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros bs=1024k 
count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 17.4095 s, 6.0 MB/s

This will need some further work.

The guest is lucid (10.04) only because for some reason I cannot get a 12.04
image built with vmbuilder to work (it spins at 100% CPU).  This is not
related to glusterfs and something I need to debug separately. Maybe a
12.04 guest will also 

Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-08 Thread Brian Candler
On Fri, Jun 08, 2012 at 05:46:42PM +0100, Brian Candler wrote:
 but glusterfs objected to the cache='none' option (possibly this opens the
 file with O_DIRECT?)

Yes that's definitely the problem, as I can see if I strace the kvm process:

stat(/gluster/safe/images/lucidtest/tmpaJqTD9.qcow2, {st_mode=S_IFREG|0644, 
st_size=774307840, ...}) = 0
open(/gluster/safe/images/lucidtest/tmpaJqTD9.qcow2, 
O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EINVAL (Invalid argument)

I found http://gluster.org/pipermail/gluster-users/2012-March/009936.html
and tried remounting with '-o direct-io-mode=enable', but that didn't make
a difference.  Also, 'mount' output doesn't show this option anyway.

That page also talked about adding 'option o-direct enable' to the posix
translator, but I'd rather not mess with that directly as I have not yet
found any documentation about how to modify translator options while still
using CLI/glusterd to manage the configuration.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-08 Thread Fernando Frediani (Qube)
Thanks for sharing that Brian,

I wonder if the cause of the problem when trying to power Up VMware ESXi VMs is 
for the same reason.

Fernando

-Original Message-
From: Brian Candler [mailto:b.cand...@pobox.com] 
Sent: 08 June 2012 17:47
To: Pranith Kumar Karampuri
Cc: olav johansen; gluster-users@gluster.org; Fernando Frediani (Qube)
Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small 
files / directory listings)

On Thu, Jun 07, 2012 at 02:36:26PM +0100, Brian Candler wrote:
 I'm interested in understanding this, especially the split-brain 
 scenarios (better to understand them *before* you're stuck in a 
 problem :-)
 
 BTW I'm in the process of building a 2-node 3.3 test cluster right now.

FYI, I have got KVM working with a glusterfs 3.3.0 replicated volume as the 
image store.

There are two nodes, both running as glusterfs storage and as KVM hosts.

I build a 10.04 ubuntu image using vmbuilder, stored on the replicated 
glusterfs volume:

vmbuilder kvm ubuntu --hostname lucidtest --mem 512 --debug --rootsize 
20480 --dest /gluster/safe/images/lucidtest

I was able to fire it up (virsh start lucidtest), ssh into it, and then 
live-migrate it to another host:

brian@dev-storage1:~$ virsh migrate --live lucidtest 
qemu+ssh://dev-storage2/system
brian@dev-storage2's password: 

brian@dev-storage1:~$ virsh list
 Id Name State
--

brian@dev-storage1:~$ 

And I live-migrated it back again, all without the ssh session being 
interrupted.

I then rebooted the second storage server. While it was rebooting I did some 
work in the VM which grew its image. When the second storage server came back, 
it resynchronised the image immediately and automatically. Here is the relevant 
entry from /var/log/glusterfs/glustershd.log on the first
(non-rebooted) machine:

[2012-06-08 17:08:40.817893] E [socket.c:1715:socket_connect_finish] 
0-safe-client-1: connection to 10.0.1.2:24009 failed (Connection timed out)
[2012-06-08 17:09:10.698272] I 
[client-handshake.c:1636:select_server_supported_programs] 0-safe-client-1: 
Using Program GlusterFS 3.3.0, Num (1298437), Version (330)
[2012-06-08 17:09:10.700197] I 
[client-handshake.c:1433:client_setvolume_cbk] 0-safe-client-1: Connected to 
10.0.1.2:24009, attached to remote volume '/disk/storage2/safe'.
[2012-06-08 17:09:10.700234] I 
[client-handshake.c:1445:client_setvolume_cbk] 0-safe-client-1: Server and 
Client lk-version numbers are not same, reopening the fds
[2012-06-08 17:09:10.701901] I 
[client-handshake.c:453:client_set_lk_version_cbk] 0-safe-client-1: Server lk 
version = 1
[2012-06-08 17:09:14.699571] I 
[afr-common.c:1189:afr_detect_self_heal_by_iatt] 0-safe-replicate-0: size 
differs for gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff 
[2012-06-08 17:09:14.699616] I [afr-common.c:1340:afr_launch_self_heal] 
0-safe-replicate-0: background  data self-heal triggered. path: 
gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff, reason: lookup detected pending 
operations
[2012-06-08 17:09:18.230855] I 
[afr-self-heal-algorithm.c:122:sh_loop_driver_done] 0-safe-replicate-0: diff 
self-heal on gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff: completed. (19 blocks 
of 3299 were different (0.58%))
[2012-06-08 17:09:18.232520] I 
[afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-safe-replicate-0: 
background  data self-heal completed on 
gfid:1f080b06-46f1-468e-b21a-12bf4a7c81ff

So at first glance this is extremely impressive. It's also very new and shiny, 
and I wonder how many edge cases remain to be debugged in live use, but I can't 
argue that it's very neat indeed!

Performance-wise:

(1) on the storage/VM host, which has the replicated volume mounted via FUSE:

root@dev-storage1:~# dd if=/dev/zero of=/gluster/safe/test.zeros bs=1024k 
count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 2.7086 s, 194 MB/s

(The bricks have a 12-disk md RAID10 array, far-2 layout, and there's probably 
scope for some performance tweaking here)

(2) however from within the VM guest, performance was very poor (2.2MB/s).

I tried my usual tuning options:

driver name='qemu' type='qcow2' io='native' cache='none'/
...
target dev='vda' bus='virtio'/
!-- delete address type='drive' controller='0' bus='0' unit='0'/ --

but glusterfs objected to the cache='none' option (possibly this opens the file 
with O_DIRECT?)

# virsh start lucidtest
virsherror: Failed to start domain lucidtest
error: internal error process exited while connecting to monitor: char 
device redirected to /dev/pts/0
kvm: -drive 
file=/gluster/safe/images/lucidtest/tmpaJqTD9.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native:
 could not open disk image /gluster/safe/images/lucidtest/tmpaJqTD9.qcow2: 
Invalid argument

The VM boots with io='native' and bus='virtio', but performance is still very

Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-08 Thread olav johansen
Hi Brian,

This is a single thread trying to process a sequential task where the
latency really becomes a problem with ls -aR I get similar speed:

[@web1 files]# time ls -aR|wc -l
1968316
real27m23.432s
user0m5.523s
sys 0m35.369s
[@web1 files]# time ls -aR|wc -l
1968316
real26m2.728s
user0m5.529s
sys 0m33.779s


I understand ls -alR isn't truly our use-case but we use similar
functions the application we're supporting uses opendir() / file_exists() a
lot in PHP, ideally we won't have either but that is not the situation I
have, we have been pushing NFS to its limits, we're looking for better /
scalable performance, and looking for feedback / suggestions on this.

Also to rsync the folders to backup servers we hit on the same issue as ls
-alR in terms of speed.  (I understand in this case I could use the raw
/data/ folder)

The difference between a single server - replicated gluster cluster, what
slowdown do others see compared to a NFS?

Don't get me wrong, Gluster rocks but in our current case latency is
killing us, and I'm looking for help on solving this.

One idea I haven't had a chance to try in terms of latency is to split the
6x1TB raid 10 on each brick to 3x (2x1TB RAID 1)  not sure if gluster can
even do this.  (A1-B1, A2-B2,A3-B3 as one volume)



Any ideas / suggestions are very appreciated.

Thanks again,


On Fri, Jun 8, 2012 at 4:20 AM, Brian Candler b.cand...@pobox.com wrote:

 On Fri, Jun 08, 2012 at 12:19:58AM -0400, olav johansen wrote:
 # mount -t glusterfs fs1:/data-storage /storage
 I've copied over my data to it again and doing a ls several times,
 takes ~0.5 seconds:
 [@web1 files]# time ls -all|wc -l

 Like I said before, please also try without the -l flags and compare the
 results.

 My guess is that ls -al or ls -alR are not representative of the *real*
 workload you are going to ask of your system (i.e. scan all the files in
 this directory, sequentially, and perform a stat() call on each one in
 turn) - but please contradict me if I'm wrong.

 However you need to measure how much cost that -l is giving you.

 Doing the same thing on the raw os files on one node takes 0.021s
 [@fs2 files]# time ls -all|wc -l
 1989
 real0m0.021s
 user0m0.007s
 sys 0m0.015s

 In that case it's probably all coming from cache. If you wanted to test
 actual disk performance then you would do

 echo 3 /proc/sys/vm/drop_caches

 before each test (on both client and server, if they are different
 machines).

 But from what you say, it sounds like you are actually more interested in
 the cached answers anyway.

 Just as crazy reference, on another single server with SSD's (Raid 10)
 drives I get:
 files# time ls -alR|wc -l
 2260484
 real0m15.761s
 user0m5.170s
 sys 0m7.670s
 For the same operation. (this server even have more files...)

 You are not comparing like-for-like. A replicated volume behaves very
 differently from a single brick or distributed volume, as explained before.

 If you compared a two-brick (HD) setup with an identical two-brick (SSD)
 setup then that would be meaningful.  I would expect that if everything is
 cacheable then you'd get the same results for both.  In that case, what
 you'd show is that the latency for open/stat and heal is the cause of the
 delay.

 Like I said before, I expect that adding the -l flag to ls is giving you
 lots of cumulative latency.

 This means that the server is actually idle for a lot of the time, while
 it's waiting for the next request. So the server has spare capacity for
 handling other clients.

 In other words: if your real workload is actually lots of clients accessing
 the system concurrently, you'll get a much better total throughput than the
 simple tests you are doing, which are a single client performing single
 operations one after the other.

 If I added two more bricks to the cluster / replicated, would this
 double read speed?

 Definitely not. The latency would be the same, it's just that some requests
 would go to bricks A and B, and other requests would go to bricks C and D.
 The other two bricks would be idle, and would not speed things up.

 However, if you had concurrent accesses from multiple clients, the extra
 bricks would give extra capacity so that the total *throughput* would be
 higher when there are multiple clients active.

 So I repeat my advice before. If you really want to understand where the
 performance issues are coming from, these two tests may highlight them:

 * Compare the same 2-brick replicated volume,
  using ls -aR versus ls -laR

 * Compare a 2-brick replicated volume to a 2-brick distributed volume,
  using ls -laR on both

 Regards,

 Brian.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-08 Thread Brian Candler
On Fri, Jun 08, 2012 at 02:23:57PM -0400, olav johansen wrote:
This is a single thread trying to process a sequential task where the
latency really becomes a problem with ls -aR I get similar speed:

That's interesting.

[@web1 files]# time ls -aR|wc -l
1968316
real27m23.432s
user0m5.523s
sys 0m35.369s
[@web1 files]# time ls -aR|wc -l
1968316
real26m2.728s
user0m5.529s
sys 0m33.779s

That's an average of 0.8ms per file, which isn't too bad if you're also
getting similar times with ls -laR.

If you're getting much better figures with NFS then it may be down to
something like client-side caching as you suggested.  You may need to do
some more direct looking at what's going on, e.g. with strace, to be sure
what's going on.

Don't get me wrong, Gluster rocks but in our current case latency is
killing us, and I'm looking for help on solving this.
One idea I haven't had a chance to try in terms of latency is to split
the 6x1TB raid 10 on each brick to 3x (2x1TB RAID 1)  not sure if
gluster can even do this.  (A1-B1, A2-B2,A3-B3 as one volume)

Sure it can do that - it's called a distributed replicated volume. It
doesn't care if the bricks are on the same node.  I very much doubt it will
make any difference in latency, but feel free to test.

If the latency is in the network then you could try using 10GE (but use SFP+
with fibre or direct-attach cables; don't use 10GE over CAT6 because that
has an even longer latency than 1GE), or Infiniband.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-08 Thread Brian Candler
On Fri, Jun 08, 2012 at 05:46:42PM +0100, Brian Candler wrote:
 The VM boots with io='native' and bus='virtio', but performance is still
 very poor:
 
 ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros bs=1024k 
 count=100
 100+0 records in
 100+0 records out
 104857600 bytes (105 MB) copied, 17.4095 s, 6.0 MB/s
 
 This will need some further work.

And for comparison, it's not the replication which is causing the delay,
because I get very similar performance if I copy the image to a distributed
volume instead.

This is where the VM is running on dev-storage1 but the distributed image
happens to reside on dev-storage2:

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros2 bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 14.5182 s, 7.2 MB/s

And this is after live-migrating the VM to dev-storage2:

ubuntu@lucidtest:~$ dd if=/dev/zero of=/var/tmp/test.zeros3 bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 4.17285 s, 25.1 MB/s

Clearly network latency has a part to play - this is 10GE on CAT6 (yes I
know that's a poor choice for latency, but they're the NICs I happened to
have spare) Given that the dd is writing large blocks, I'd hope that large
ranges of blocks get flushed to disk too. 

Of course, 25.1 MB/s is not exactly stellar either.

Maybe using a qcow2 (growable) image is part of the problem - I'll need to
convert to raw and retest.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Fernando Frediani (Qube)
Hi,
Sorry this reply won't be of any help to your problem, but I am too curious to 
understand how it can be even slower if monting using Gluster client which I 
would expect always be quicker than NFS or anything else.
If you find the reason port it back to the list and share with us please. I 
think this directory index issues has been reported already for systems with 
many files.

Regards,

Fernando

From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of olav johansen
Sent: 07 June 2012 03:32
To: gluster-users@gluster.org
Subject: [Gluster-users] Performance optimization tips Gluster 3.3? (small 
files / directory listings)

Hi,

I'm using Gluster 3.3.0-1.el6.x86_64, on two storage nodes, replicated mode 
(fs1, fs2)
Node specs: CentOS 6.2 Intel Quad Core 2.8Ghz, 4Gb ram, 3ware raid, 2x500GB 
sata 7200rpm (RAID1 for os), 6x1TB sata 7200rpm (RAID10 for /data), 1Gbit 
network

I've it mounted data partition to web1 a Dual Quad 2.8Ghz, 8Gb ram, using 
glusterfs. (also tried NFS - Gluster mount)

We have 50Gb of files, ~800'000 files in 3 levels of directories (max 2000 
directories in one folder)

My main problem is speed of directory indexes ls -alR  on the gluster mount 
takes 23 minutes every time.

It don't seem like any directory listing information cache, with regular NFS 
(not gluster) between web1-fs1, this takes 6m13s first time, and 5m13s there 
after.

Gluster mount is 4+ times slower for directory indexing performance vs pure NFS 
to single server, is this as expected?
I understand there is a lot more calls involved checking both nodes but I'm 
just looking for a reality check regarding this.

Any suggestions of how I can speed this up?

Thanks,

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Brian Candler
On Thu, Jun 07, 2012 at 10:10:03AM +, Fernando Frediani (Qube) wrote:
Sorry this reply won’t be of any help to your problem, but I am too
curious to understand how it can be even slower if monting using
Gluster client which I would expect always be quicker than NFS or
anything else.

(1) Try it with ls -aR or find . instead of ls -alR

(2) Try it on a gluster non-replicated volume (for fair comparison with
direct NFS access)

With a replicated volume, many accesses involve sending queries to *both*
servers to check they are in sync - even read accesses.  This in turn can
cause disk seeks on both machines, so the latency you'll get is the larger
of the two.  If you are doing lots of accesses sequentially then the
latencies will all add up.

A stat() is one of those accesses which touches both machines, and ls -l
forces a stat() of each file found.

In fact, a quick test suggests ls -l does stat, lstat, getxattr and
lgetxattr:

$ ls -laR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
 13 access
  1 arch_prctl
  5 brk
395 close
  4 connect
  1 execve
  1 exit_group
  2 fcntl
391 fstat
  3 futex
702 getdents
  1 getrlimit
   1719 getxattr
  3 ioctl
   1721 lgetxattr
  9 lseek
   1721 lstat
 58 mmap
 24 mprotect
 12 munmap
424 open
 19 read
  2 readlink
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  4 socket
   1719 stat
  1 statfs
 29 write

Looking at the detail in the strace output, I see these are actually

lstat(target-file, ...)
lgetxattr(target-file, security.selinux, ...)
getxattr(target-file, system.posix_acl_access, ...)
stat(/etc/localtime, ...)

Compare without -l:

$ strace ls -aR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
  9 access
  1 arch_prctl
  4 brk
377 close
  1 execve
  1 exit_group
  1 fcntl
376 fstat
  3 futex
702 getdents
  1 getrlimit
  3 ioctl
 39 mmap
 16 mprotect
  4 munmap
388 open
 11 read
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  1 stat
  1 statfs
  9 write

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Christian Meisinger
Hello there.


That's really interesting, because we think about using GlusterFS too with a
similar setup/scenario.

I read about a really strange setup with GlusterFS native client mount on
the web servers and NFS mount on top of that so you get GlusterFS failover +
NFS caching.
Can't find the link right now.


- Original Message -
From: olav johansen luxis2...@gmail.com
To: gluster-users@gluster.org
Sent: Thursday, June 7, 2012 8:02:14 AM
Subject: [Gluster-users] Performance optimization tips Gluster 3.3? (small
files / directory listings)


Hi, 

I'm using Gluster 3.3.0-1.el6.x86_64, on two storage nodes, replicated mode
(fs1, fs2) Node specs: CentOS 6.2 Intel Quad Core 2.8Ghz, 4Gb ram, 3ware
raid, 2x500GB sata 7200rpm (RAID1 for os), 6x1TB sata 7200rpm (RAID10 for
/data), 1Gbit network 

I've it mounted data partition to web1 a Dual Quad 2.8Ghz, 8Gb ram, using
glusterfs. (also tried NFS - Gluster mount) 

We have 50Gb of files, ~800'000 files in 3 levels of directories (max 2000
directories in one folder) 

My main problem is speed of directory indexes ls -alR on the gluster mount
takes 23 minutes every time. 

It don't seem like any directory listing information cache, with regular NFS
(not gluster) between web1-fs1, this takes 6m13s first time, and 5m13s
there after. 

Gluster mount is 4+ times slower for directory indexing performance vs pure
NFS to single server, is this as expected? 
I understand there is a lot more calls involved checking both nodes but I'm
just looking for a reality check regarding this. 

Any suggestions of how I can speed this up? 

Thanks, 



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Gerald Brandt
Here's the link:

http://community.gluster.org/a/nfs-performance-with-fuse-client-redundancy/

Sent again with a reply to all.

Gerald


- Original Message -
 From: Christian Meisinger em_got...@gmx.net
 To: olav johansen luxis2...@gmail.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, June 7, 2012 7:00:14 AM
 Subject: Re: [Gluster-users] Performance optimization tips Gluster3.3?
 (small  files / directory listings)
 
 Hello there.
 
 
 That's really interesting, because we think about using GlusterFS too
 with a
 similar setup/scenario.
 
 I read about a really strange setup with GlusterFS native client
 mount on
 the web servers and NFS mount on top of that so you get GlusterFS
 failover +
 NFS caching.
 Can't find the link right now.
 
 
 - Original Message -
 From: olav johansen luxis2...@gmail.com
 To: gluster-users@gluster.org
 Sent: Thursday, June 7, 2012 8:02:14 AM
 Subject: [Gluster-users] Performance optimization tips Gluster 3.3?
 (small
 files / directory listings)
 
 
 Hi,
 
 I'm using Gluster 3.3.0-1.el6.x86_64, on two storage nodes,
 replicated mode
 (fs1, fs2) Node specs: CentOS 6.2 Intel Quad Core 2.8Ghz, 4Gb ram,
 3ware
 raid, 2x500GB sata 7200rpm (RAID1 for os), 6x1TB sata 7200rpm (RAID10
 for
 /data), 1Gbit network
 
 I've it mounted data partition to web1 a Dual Quad 2.8Ghz, 8Gb ram,
 using
 glusterfs. (also tried NFS - Gluster mount)
 
 We have 50Gb of files, ~800'000 files in 3 levels of directories (max
 2000
 directories in one folder)
 
 My main problem is speed of directory indexes ls -alR on the
 gluster mount
 takes 23 minutes every time.
 
 It don't seem like any directory listing information cache, with
 regular NFS
 (not gluster) between web1-fs1, this takes 6m13s first time, and
 5m13s
 there after.
 
 Gluster mount is 4+ times slower for directory indexing performance
 vs pure
 NFS to single server, is this as expected?
 I understand there is a lot more calls involved checking both nodes
 but I'm
 just looking for a reality check regarding this.
 
 Any suggestions of how I can speed this up?
 
 Thanks,
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Pranith Kumar Karampuri
Brian,
  Small correction: 'sending queries to *both* servers to check they are in 
sync - even read accesses.' Read fops like stat/getxattr etc are sent to only 
one brick.

Pranith.
- Original Message -
From: Brian Candler b.cand...@pobox.com
To: Fernando Frediani (Qube) fernando.fredi...@qubenet.net
Cc: olav johansen luxis2...@gmail.com, gluster-users@gluster.org 
gluster-users@gluster.org
Sent: Thursday, June 7, 2012 4:24:37 PM
Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small  
files / directory listings)

On Thu, Jun 07, 2012 at 10:10:03AM +, Fernando Frediani (Qube) wrote:
Sorry this reply won’t be of any help to your problem, but I am too
curious to understand how it can be even slower if monting using
Gluster client which I would expect always be quicker than NFS or
anything else.

(1) Try it with ls -aR or find . instead of ls -alR

(2) Try it on a gluster non-replicated volume (for fair comparison with
direct NFS access)

With a replicated volume, many accesses involve sending queries to *both*
servers to check they are in sync - even read accesses.  This in turn can
cause disk seeks on both machines, so the latency you'll get is the larger
of the two.  If you are doing lots of accesses sequentially then the
latencies will all add up.

A stat() is one of those accesses which touches both machines, and ls -l
forces a stat() of each file found.

In fact, a quick test suggests ls -l does stat, lstat, getxattr and
lgetxattr:

$ ls -laR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
 13 access
  1 arch_prctl
  5 brk
395 close
  4 connect
  1 execve
  1 exit_group
  2 fcntl
391 fstat
  3 futex
702 getdents
  1 getrlimit
   1719 getxattr
  3 ioctl
   1721 lgetxattr
  9 lseek
   1721 lstat
 58 mmap
 24 mprotect
 12 munmap
424 open
 19 read
  2 readlink
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  4 socket
   1719 stat
  1 statfs
 29 write

Looking at the detail in the strace output, I see these are actually

lstat(target-file, ...)
lgetxattr(target-file, security.selinux, ...)
getxattr(target-file, system.posix_acl_access, ...)
stat(/etc/localtime, ...)

Compare without -l:

$ strace ls -aR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
  9 access
  1 arch_prctl
  4 brk
377 close
  1 execve
  1 exit_group
  1 fcntl
376 fstat
  3 futex
702 getdents
  1 getrlimit
  3 ioctl
 39 mmap
 16 mprotect
  4 munmap
388 open
 11 read
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  1 stat
  1 statfs
  9 write

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Brian Candler
On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote:
 Brian,
   Small correction: 'sending queries to *both* servers to check they are in 
 sync - even read accesses.' Read fops like stat/getxattr etc are sent to only 
 one brick.

Is that new behaviour for 3.3? My understanding was that stat() was a
healing operation.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate

If this is no longer true, then I'd like to understand what happens after a
node has been down and comes up again.  I understand there's a self-healing
daemon in 3.3, but what if you try to access a file which has not yet been
healed?

I'm interested in understanding this, especially the split-brain scenarios
(better to understand them *before* you're stuck in a problem :-)

BTW I'm in the process of building a 2-node 3.3 test cluster right now.

Cheers,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Pranith Kumar Karampuri
hi Brian,
'stat' command comes as fop (File-operation) 'lookup' to the gluster mount 
which triggers self-heal. So the behavior is still same.
I was referring to the fop 'stat' which will be performed only on one of the 
bricks.
Unfortunately most of the commands and fops have same name.
Following are some of the examples of read-fops:
.access
.stat
.fstat
.readlink
.getxattr
.fgetxattr
.readv

Pranith.
- Original Message -
From: Brian Candler b.cand...@pobox.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: olav johansen luxis2...@gmail.com, gluster-users@gluster.org, Fernando 
Frediani (Qube) fernando.fredi...@qubenet.net
Sent: Thursday, June 7, 2012 7:06:26 PM
Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small  
files / directory listings)

On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote:
 Brian,
   Small correction: 'sending queries to *both* servers to check they are in 
 sync - even read accesses.' Read fops like stat/getxattr etc are sent to only 
 one brick.

Is that new behaviour for 3.3? My understanding was that stat() was a
healing operation.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate

If this is no longer true, then I'd like to understand what happens after a
node has been down and comes up again.  I understand there's a self-healing
daemon in 3.3, but what if you try to access a file which has not yet been
healed?

I'm interested in understanding this, especially the split-brain scenarios
(better to understand them *before* you're stuck in a problem :-)

BTW I'm in the process of building a 2-node 3.3 test cluster right now.

Cheers,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread olav johansen
:select_server_supported_programs]
0-data-storage-client-0: Using Program GlusterFS 3.3.0, Num (1298437),
Version (330)
[2012-06-07 20:47:49.595099] I
[client-handshake.c:1636:select_server_supported_programs]
0-data-storage-client-1: Using Program GlusterFS 3.3.0, Num (1298437),
Version (330)
[2012-06-07 20:47:49.608455] I
[client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-0:
Connected to 10.1.80.81:24009, attached to remote volume '/data/storage'.
[2012-06-07 20:47:49.608489] I
[client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2012-06-07 20:47:49.608572] I [afr-common.c:3627:afr_notify]
0-data-storage-replicate-0: Subvolume 'data-storage-client-0' came back up;
going online.
[2012-06-07 20:47:49.608837] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-0:
Server lk version = 1
[2012-06-07 20:47:49.616381] I
[client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-1:
Connected to 10.1.80.82:24009, attached to remote volume '/data/storage'.
[2012-06-07 20:47:49.616434] I
[client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-1:
Server and Client lk-version numbers are not same, reopening the fds
[2012-06-07 20:47:49.621808] I [fuse-bridge.c:4193:fuse_graph_setup]
0-fuse: switched to graph 0
[2012-06-07 20:47:49.622793] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-1:
Server lk version = 1
[2012-06-07 20:47:49.622873] I [fuse-bridge.c:3376:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel
7.13
[2012-06-07 20:47:49.623440] I
[afr-common.c:1964:afr_set_root_inode_on_first_lookup]
0-data-storage-replicate-0: added root inode

 End storage.log
-


















On Thu, Jun 7, 2012 at 9:46 AM, Pranith Kumar Karampuri pkara...@redhat.com
 wrote:

 hi Brian,
'stat' command comes as fop (File-operation) 'lookup' to the gluster
 mount which triggers self-heal. So the behavior is still same.
 I was referring to the fop 'stat' which will be performed only on one of
 the bricks.
 Unfortunately most of the commands and fops have same name.
 Following are some of the examples of read-fops:
.access
.stat
.fstat
.readlink
.getxattr
.fgetxattr
.readv

 Pranith.
 - Original Message -
 From: Brian Candler b.cand...@pobox.com
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: olav johansen luxis2...@gmail.com, gluster-users@gluster.org,
 Fernando Frediani (Qube) fernando.fredi...@qubenet.net
 Sent: Thursday, June 7, 2012 7:06:26 PM
 Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3?
 (small  files / directory listings)

 On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote:
  Brian,
Small correction: 'sending queries to *both* servers to check they are
 in sync - even read accesses.' Read fops like stat/getxattr etc are sent to
 only one brick.

 Is that new behaviour for 3.3? My understanding was that stat() was a
 healing operation.

 http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate

 If this is no longer true, then I'd like to understand what happens after a
 node has been down and comes up again.  I understand there's a self-healing
 daemon in 3.3, but what if you try to access a file which has not yet been
 healed?

 I'm interested in understanding this, especially the split-brain scenarios
 (better to understand them *before* you're stuck in a problem :-)

 BTW I'm in the process of building a 2-node 3.3 test cluster right now.

 Cheers,

 Brian.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-06 Thread Pranith Kumar Karampuri
Could you post the logs of the mount process so that we can analyse what is 
going on.
Did you have data on bricks before you created the volume? Did you upgrade from 
3.2?

Pranith
- Original Message -
From: olav johansen luxis2...@gmail.com
To: gluster-users@gluster.org
Sent: Thursday, June 7, 2012 8:02:14 AM
Subject: [Gluster-users] Performance optimization tips Gluster 3.3? (small  
files / directory listings)


Hi, 

I'm using Gluster 3.3.0-1.el6.x86_64, on two storage nodes, replicated mode 
(fs1, fs2) 
Node specs: CentOS 6.2 Intel Quad Core 2.8Ghz, 4Gb ram, 3ware raid, 2x500GB 
sata 7200rpm (RAID1 for os), 6x1TB sata 7200rpm (RAID10 for /data), 1Gbit 
network 

I've it mounted data partition to web1 a Dual Quad 2.8Ghz, 8Gb ram, using 
glusterfs. (also tried NFS - Gluster mount) 

We have 50Gb of files, ~800'000 files in 3 levels of directories (max 2000 
directories in one folder) 

My main problem is speed of directory indexes ls -alR on the gluster mount 
takes 23 minutes every time. 

It don't seem like any directory listing information cache, with regular NFS 
(not gluster) between web1-fs1, this takes 6m13s first time, and 5m13s there 
after. 

Gluster mount is 4+ times slower for directory indexing performance vs pure NFS 
to single server, is this as expected? 
I understand there is a lot more calls involved checking both nodes but I'm 
just looking for a reality check regarding this. 

Any suggestions of how I can speed this up? 

Thanks, 



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users