from:"Ben England"

Re: [Gluster-users] Gluster-users Digest, Vol 86, Issue 1 - Message 5: client load high using FUSE mount

2015-06-01 Thread Ben England

- Original Message -
 From: gluster-users-requ...@gluster.org
 To: gluster-users@gluster.org
 Sent: Monday, June 1, 2015 8:00:01 AM
 Subject: Gluster-users Digest, Vol 86, Issue 1

 Message: 5
 Date: Mon, 01 Jun 2015 13:11:13 +0200
 From: Mitja Miheli? mitja.mihe...@arnes.si
 To: gluster-users@gluster.org
 Subject: [Gluster-users] Client load high (300) using fuse mount
 Message-ID: 556c3dd1.1080...@arnes.si
 Content-Type: text/plain; charset=utf-8; format=flowed

 Hi!

 I am trying to set up a Wordpress cluster using GlusterFS used for
 storage. Web nodes will access the same Wordpress install on a volume
 mounted via FUSE from a 3 peer GlusterFS TSP.

 I started with one web node and Wordpress on local storage. The load
 average was constantly about 5. iotop showed about 300kB/s disk reads or
 less. The load average was below 6.

 When I mounted the GlusterFS volume to the web node the 1min load
 average went over 300. Each of the 3 peers is transmitting about 10MB/s
 to my web node regardless of the load.
 TSP peers are on 10Gbit NICs and the web node is on a 1Gbit NIC.

30 MB/s is about 1/3 line speed for a 1-Gbps NIC port.  Sounds like network 
latency and lack of client-side caching might be your bottleneck, might want to 
put a 10-Gbps NIC port on your client.  You did disable client-side caching 
(md-cache and io-cache translators) below, was that your intent?  Also, 
defaults for these translators are very conservative, if only 1 client you may 
want to increase time that data is cached (in the client) using FUSE mount 
options entry-timeout=30 and attribute-timeout=30.  Unlike non-distributed 
Linux filesystems, Gluster is very conservative about client side caching to 
avoid cache coherency issues.

 I'm out of ideas here... Could it be the network?
 What should I look at for optimizing the network stack on the client?

 Options set on TSP:
 Options Reconfigured:
 performance.cache-size: 4GB
 network.ping-timeout: 15
 cluster.quorum-type: auto
 network.remote-dio: on
 cluster.eager-lock: on
 performance.stat-prefetch: off
 performance.io-cache: off
 performance.read-ahead: off
 performance.quick-read: off
 performance.cache-refresh-timeout: 4
 performance.io-thread-count: 32
 nfs.disable: on

Too many tunings, what are these intended to do?  The gluster volume reset 
command allows you to undo this.  in Gluster 3.7, the gluster volume get 
your-volume all command lets you see what the defaults are.  

 Regards, Mitja

 --
 --
 Mitja Miheli?
 ARNES, Tehnolo?ki park 18, p.p. 7, SI-1001 Ljubljana, Slovenia
 tel: +386 1 479 8877, fax: +386 1 479 88 78
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster-users Digest, Vol 85, Issue 22 - 9. Re: seq read performance comparion between libgfapi and fuse

2015-05-29 Thread Ben England

Paul, I don't check this list every day, I would expect you can get more than 
half of minimum of network line speed or storage block device speed using a 
single libgfapi sequential read thread.  I did not see any throughput 
calculation or file size in your e-mail.

HTH, inline below...

-ben e

- Original Message -
 From: gluster-users-requ...@gluster.org
 To: gluster-users@gluster.org
 Sent: Friday, May 22, 2015 8:00:02 AM
 Subject: Gluster-users Digest, Vol 85, Issue 22
 
 Message: 8
 Date: Fri, 22 May 2015 18:50:40 +0800
 From: Paul Guo bigpaul...@foxmail.com
 To: gluster-users@gluster.org
 Subject: [Gluster-users] seq read performance comparion between
   libgfapi andfuse
 Message-ID: 555f0a00.2060...@foxmail.com
 Content-Type: text/plain; charset=gbk; format=flowed
 
 Hello,
 
 I wrote two simple single-process seq read test case to compare libgfapi
 and fuse. The logic looks like this.
 
 char buf[32768];
 while (1) {
cnt = read(fd, buf, sizeof(buf));
  if (cnt == 0)
  break;
  else if (cnt  0)
  total += cnt;
   // No cnt  0 was found during testing.
 }
 
 Following is the time which is needed to finish reading a large file.
 
 fuse libgfapi
 direct io: 40s  51s
 non direct io: 40s  47s
 
 The version is 3.6.3 on centos6.5. The result shows that libgfapi is
 obviously slower than the fuse interface although the cpu cycles were
 saved a lot during libgfapi testing. Each test was run before cleaning
 up all kernel pagecheinodedentry caches and stopping and then starting
 glusterdgluster (to clean up gluster cache).

so if you use libgfapi in a single-threaded app, you may need to tune gluster 
volume parameter read-ahead-page-count (defaults to 4).  The default is 
intended to trade-off single-thread performance for better aggregate 
performance and response time.  Here is a example of how to tune it for a 
single-thread use case, don't do this all the time. 

gluster volume set your-volume performance.read-ahead-page-count 16

As a debugging tool, you can try disabling readahead translator altogether 

# gluster v set your-volume read-ahead off

To reset parameters to defaults:

# gluster v set your-volume read-ahead
# gluster v set your-volume read-ahead-page-count

I have a benchmark for libgfapi testing in case this is useful to you:

https://github.com/bengland2/parallel-libgfapi

please e-mail me direct if problems with it.

 
 I tested direct io because I suspected that fuse kernel readahead
 helped more than the read optimization solutions in gluster. I searched
 a lot but I did not find much about the comparison between fuse and
 libgfapi. Anyone has known about this and known why?
 

If you use O_DIRECT you may be  bypassing readahead translator in Gluster and 
this may account for your problem.  Try NOT using O_DIRECT, and try above 
tuning.  Or if you really need O_DIRECT on client, try this command, which 
disables O_DIRECT on the server side but not the client, it's equivalent of NFS 
behavior.

# gluster v set your-volume network.remote-dio on

Also try turning off io-cache translator which will not help you here.

# gluster v set your-volume io-cache off

Also, O_DIRECT is passed all the way to the server by Gluster so your disk 
reads will ALSO use O_DIRECT, this is terrible for performance.  You want to 
have block device readahead when doing this test.  Suggest you set it to at 
least 4096 KB for block devices used for Gluster brick mountpoints.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] High CPU Usage - Glusterfsd

2015-02-22 Thread Ben England

Renchu, 

I didn't see anything about average file size and read/write mix.  One example 
of how to observe both of these, as well as latency and throughput - on server 
run these commands:

# gluster volume profile your-volume start
# gluster volume profile your-volume info  /tmp/dontcare
# sleep 60
# gluster volume profile your-volume info  profile-for-last-minute.log

There is also a gluster volume top command that may be of use to you in 
understanding what your users are doing with Gluster.

Also you may want to run top -H and see whether any threads in either 
glusterfsd or smbd are at or near 100% CPU - if so, you really are hitting a 
CPU bottleneck.  Looking at process CPU utilization can be deceptive, since a 
process may include multiple threads.  sar -n DEV 2 will show you network 
utilization, and iostat -mdx /dev/sd? 2 on your server will show block device 
queue depth (latter two tools require sysstat rpm).  Together these can help 
you to understand what kind of bottleneck you are seeing.

I don't see how many bricks are in your Gluster volume but it sounds like you 
have only one glusterfsd/server.   If you have idle cores on your servers, you 
can harness more CPU power by using multiple bricks/server, which results in 
multiple glusterfsd processes on each server, allowing greater parallelism.
For example, you can do this by presenting individual disk drives as bricks 
rather than RAID volumes.

Let us know if these suggestions helped

-ben england

- Original Message -
 From: Renchu Mathew ren...@cracknell.com
 To: gluster-users@gluster.org
 Cc: gluster-de...@gluster.org
 Sent: Sunday, February 22, 2015 7:09:09 AM
 Subject: [Gluster-devel] High CPU Usage - Glusterfsd
 
 
 
 Dear all,
 
 
 
 I have implemented glusterfs storage on my company – 2 servers with
 replicate. But glustherfsd shows more than 100% CPU utilization most of the
 time. So it is so slow to access the gluster volume. My setup is two
 glusterfs servers with replication. The gluster volume (almost 10TB of data)
 is mounted on another server (glusterfs native client) and using samba share
 for the network users to access those files. Is there any way to reduce the
 processor usage on these servers? Please give a solution ASAP since the
 users are complaining about the poor performance. I am using glusterfs
 version 3.6.
 
 
 
 Regards
 
 
 
 Renchu Mathew | Sr. IT Administrator
 
 
 
 
 
 
 
 CRACKNELL DUBAI | P.O. Box 66231 | United Arab Emirates | T +971 4 3445417 |
 F +971 4 3493675 | M +971 50 7386484
 
 ABU DHABI | DUBAI | LONDON | MUSCAT | DOHA | JEDDAH
 
 EMAIL ren...@cracknell.com | WEB www.cracknell.com
 
 
 
 This email, its content and any files transmitted with it are intended solely
 for the addressee(s) and may be legally privileged and/or confidential. If
 you are not the intended recipient please let us know by email reply and
 delete it from the system. Please note that any views or opinions presented
 in this email do not necessarily represent those of the company. Email
 transmissions cannot be guaranteed to be secure or error-free as information
 could be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
 or contain viruses. The company therefore does not accept liability for any
 errors or omissions in the contents of this message which arise as a result
 of email transmission.
 
 
 
 ___
 Gluster-devel mailing list
 gluster-de...@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster-users Digest, Vol 77, Issue 2

2014-09-08 Thread Ben England

Message: 9
Date: Tue, 2 Sep 2014 17:17:25 +0800
From: Jaden Liang jaden1...@gmail.com
To: gluster-de...@gluster.org, gluster-users@gluster.org
Subject: [Gluster-users] [Gluster-devel] Regarding the write
performance in replica 1 volume in 1Gbps Ethernet, get about 50MB/s
while writing single file.
Message-ID:
ca+vqw5ndlma+a92wkek2v1foom55ushrnyz-yfahj_32ubq...@mail.gmail.com
Content-Type: text/plain; charset=utf-8

Hello, gluster-devel and gluster-users team,

We are running a performance test in a replica 1 volume and find out the
single file sequence writing performance only get about 50MB/s in a 1Gbps
Ethernet. However, if we test multiple files sequence writing, the writing
performance can go up to 120MB/s which is the top speed of network.

not sure what you mean, are you writing multiple files concurrently or 1 at a
time? With FUSE, this matters -- I typically see best throughput with more
than one file being transferred at the same time.

We also tried to use the stat xlator to find out where is the bottleneck of
single file write performance. Here is the stat data:

Client-side:
..
vs_vol_rep1-client-8.latency.WRITE=total:21834371.00us,
mean:2665.328491us, count:8192, max:4063475, min:1849
..

Server-side:
..
/data/sdb1/brick1.latency.WRITE=total:6156857.00us, mean:751.569458us,
count:8192, max:230864, min:611
..

what's your write transfer size? with FUSE, this really matters a lot, since
FUSE does not aggregate writes, so each write has to travel from the
application to the glusterfs mountpoint process, resulting in slow performance
for small transfer sizes. In general, it's a good idea to supply the details
of your workload generator and how it was run, so we can compare with other
known workloads and results.

Note that the test is write a 1GB single file sequentially to a replica 1
volume through 1Gbps Ethernet network.

So for example try using

# dd if=/dev/zero of=/mnt/glusterfs/your-file.dd bs=1024k count=1k

and see whether your throughput is still 50 MB/s.

On the client-side, we can see there are 8192 write requests totally. Every
request will write 128KB data. Total eclipsed time is 21834371us, about 21
seconds. The mean time of request is 2665us, about 2.6ms which means it
could only serves about 380 requests in 1 seconds. Plus there are other
time consuming like statfs, lookup, but those are not major reasons.

On the server-side, the mean time of request is 751us include write data to
HDD disk. So we think that is not the major reason.

And we also modify some codes to do the statistic of system epoll elapsed
time. It only took about 20us from enqueue data to finish sent-out.

Now we are heading to the rpc mechanism in glusterfs. Still, we think this
issue maybe encountered in gluster-devel or gluster-users teams. Therefor,
any suggestions would be grateful. Or have anyone know such issue?

Best regards,
Jaden Liang
9/2/2014

--
Best regards,
Jaden Liang
-- next part --
An HTML attachment was scrubbed...
URL:
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140902/5dcbc91b/attachment-0001.html

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

End of Gluster-users Digest, Vol 77, Issue 2

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster-users Digest, Vol 76, Issue 18 - Re: reading not distributed across bricks

2014-08-12 Thread Ben England


 Message: 1
 Date: Mon, 11 Aug 2014 09:53:30 -0400 (EDT)
 From: Justin Clift jus...@gluster.org
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: gluster-users@gluster.org, Ray Mannings manningsr...@gmail.com
 Subject: Re: [Gluster-users] Reading not distributed across bricks
 Message-ID:
   417182971.4749068.1407765210056.javamail.zim...@redhat.com
 Content-Type: text/plain; charset=utf-8
 
 - Original Message -
  hi Ray,
 Reads are served from the bricks which respond the fastest at the
  moment. They are not load-balanced.
 
 Maybe a good feature for 3.7? :)
 

Ray,
There already is a feature, from gluster volume set help:
Option: cluster.read-hash-mode
Description: inode-read fops happen only on one of the bricks in replicate. AFR 
will prefer the one computed using the method specified using this option
0 = first responder, 
1 = hash by GFID of file (all clients use same subvolume), 
2 = hash by GFID of file and client PID

This is particularly useful for benchmark tests, where the system may not have 
response time data sufficient to properly load balance and I have seen all the 
clients select the same replica using default value of 0.  The value 2 is nice 
because if many clients are reading the same file, the load is distributed 
across bricks.
-ben

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster-users Digest, Vol 75, Issue 25 - striped volume x8, poor sequential read performance

2014-07-29 Thread Ben England

Sergey, cmts inline...

Is your intended workload really single-client single-thread?Or is it more 
MPI-like?  For example, do you have many clients reading from different parts 
of the same large file?  If the latter, perhaps IOR would be a better benchmark 
for you.

Sorry I'm not familiar with striping translator.

- Original Message -
 From: gluster-users-requ...@gluster.org
 To: gluster-users@gluster.org
 Sent: Tuesday, July 22, 2014 7:21:56 AM
 Subject: Gluster-users Digest, Vol 75, Issue 25
 
 --
 
 Message: 9
 Date: Mon, 21 Jul 2014 21:35:15 +0100 (BST)
 From: Sergey Koposov kopo...@ast.cam.ac.uk
 To: gluster-users@gluster.org
 Subject: [Gluster-users] glusterfs, striped volume x8, poor sequential
   read performance, good write performance
 Message-ID:
   alpine.lrh.2.11.1407212046110.17...@calx115.ast.cam.ac.uk
 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
 
 Hi,
 
 I have a HPC installation with 8 nodes. Each node has a software
 RAID1 using two NLSAS disks. And the disks from 8 nodes are combined into
 large shared striped 20Tb glusterfs partition which seems to show
 abnormally slow sequential read performance, with good write performance.
 
 Basically I see is that the write performance is very decent  ~
 500Mb/sec (tested using dd):
 
 [root@ bigstor]# dd if=/dev/zero of=test2 bs=1M count=10
 10+0 records in
 10+0 records out
 10485760 bytes (105 GB) copied, 186.393 s, 563 MB/s
 
 And all this is is not just seating in the cache of each node, as I see the
 data being flushed to disks with approximately right speed.
 
 In the same time the read performance is
 (tested using dd with dropping of the caches beforehand) is really bad:
 
 [root@ bigstor]# dd if=/data/bigstor/test of=/dev/null bs=1M
 count=1
 1+0 records in
 1+0 records out
 1048576 bytes (10 GB) copied, 309.821 s, 33.8 MB/s
 
 When doing this glusterfs processes only take ~ 10-15% of the CPU max. So it
 isn't CPU starving.
 
 The underlying  devices do not seem to be loaded at all:
 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0.00 0.00   73.000.00  9344.00 0.00   256.00
 0.111.48   1.47  10.70
 
 To check that the disks are not the problem
 I did a separate test of the read-speed of the raided disks on all machines
 and they have read speads of ~ 180Mb/s (uncached). So they aren't the
 problem.
 

Gluster has a read-ahead-page-count setting, I'd try setting it up to 16 (as 
high as it will go), default is 4.  Writes are different because the write to a 
brick can complete before the data hits the disk (in other words, as soon as 
the data reaches server memory), but with reads if the data is not cached in 
memory then your only solution is to get all bricks reading at the same time.

Contrast this with a single-brick 12-disk RAID6 volume (with 32-MB readahead) 
that can hit 800 MB/s on read.  Clearly it isn't the rest of Gluster that's 
holding you back, it's probably the stripe translator behavior.  Does stripe 
translator support parallel reads to different subvolumes in the stripe?  Can 
you post a protocol trace that shows the on-the-wire behavior (collect with 
tcpdump, display with wireshark).

You could try running a re-read test without the stripe translator, I suspect 
it will perform better based on my own experience.

 I also tried to increase the readahead on the raid disks
 echo 2048  /sys/block/md126/queue/read_ahead_kb
 but that doesn't seem to help at all.
 

To prove this, try re-reading a file that fits in Linux buffer cache on servers 
-- block device readahead is then irrelevant since there is no disk I/O at all. 
 You are then doing a network test with Gluster.

Also, try just doing a dd read from the brick (subvolume) directly.

 Does anyone have any advice what to do here ? What knobs to adjust ?
 To me it looks like a bug, being honest,  but I would be happy if there is
 magic switch I forgot to turn on )
 

Second, if you are using IPOIB, try jumbo frame setting of MTU=65520 and 
MODE=connected (in ifcfg-ib0) to reduce Infiniband interrupts on client side.  

Try FUSE mount option -o gid-timeout=2 . 

What is the stripe width of the Gluster volume in KB?  Looks like it's the 
default, I forget what this is but you probably want it to be something like 
128 KB x 8.  A very large stripe size will prevent Gluster from utilizing  1 
brick at the same time.


 Here is more details about my system
 
 OS: Centos 6.5
 glusterfs : 3.4.4
 Kernel 2.6.32-431.20.3.el6.x86_64
 mount options and df output:
 
 [root@ bigstor]# cat /etc/mtab
 
 /dev/md126p4 /data/glvol/brick1 xfs rw 0 0
 node1:/glvol /data/bigstor fuse.glusterfs
 rw,default_permissions,allow_other,max_read=131072 0 0
 
 [root@ bigstor]# df
 Filesystem   1K-blocksUsed  Available Use% Mounted on
 /dev/md126p42516284988  2356820844

Re: [Gluster-users] Gluster-users Digest, Vol 59, Issue 15 - GlusterFS performance

2013-03-02 Thread Ben England

- Original Message -
 From: gluster-users-requ...@gluster.org
 To: gluster-users@gluster.org
 Sent: Friday, March 1, 2013 4:03:13 PM
 Subject: Gluster-users Digest, Vol 59, Issue 15

 --

 Message: 2
 Date: Fri, 01 Mar 2013 10:22:21 -0800
 From: Joe Julian j...@julianfamily.org
 To: gluster-users@gluster.org
 Subject: Re: [Gluster-users] GlusterFS performance
 Message-ID: 5130f1dd.9050...@julianfamily.org
 Content-Type: text/plain; charset=iso-8859-1; Format=flowed

 The kernel developers introduced a bug into ext4 that has yet to be
 fixed. If you use xfs you won't have those hangs.

 On 03/01/2013 01:30 AM, Nikita A Kardashin wrote:
  Hello again!

  I am complete rebuild my storage.
  As base: ext4 over mdadm-raid1
  Gluster volume in distributed-replicated mode with settings:

  Options Reconfigured:
  performance.cache-size: 1024MB
  nfs.disable: on
  performance.write-behind-window-size: 4MB
  performance.io-thread-count: 64
  features.quota: off
  features.quota-timeout: 1800
  performance.io-cache: on
  performance.write-behind: on
  performance.flush-behind: on
  performance.read-ahead: on

  As result, I got write performance about 80MB/s on dd if=/dev/zero
  of=testfile.bin bs=100M count=10, 

Make sure your network and storage bricks are performing as you expect them to, 
Gluster is only as good as underlying hardware.  What happens with reads?  What 
happens when you do multiple threads doing writes? 

for n in `seq 1 4` ; do 
  eval dd if=/dev/zero of=testfile$n.bin bs=100M count=10 
done
time wait

  If I try to execute above command inside virtual machine (KVM),
  first
  time all going right - about 900MB/s (cache effect, I think), but
  if I
  run this test again on existing file - task (dd) hungs up and can
  be
  stopped only by Ctrl+C.

In future, post qemu process command line (from ps awux).  Are you writing to 
local file system inside virtual disk image or are you mounting Gluster from 
inside the VM?  If you are going through /dev/vda then are you using KVM qemu 
cache=writeback?  You could try cache=writethrough or cache=none, see comments 
below for cache=none.  Also, try io=threads not io=native.  

  Overall virtual system latency is poor too. For example, apt-get
  upgrade upgrading system very, very slow, freezing on Unpacking
  replacement and other io-related steps.

If you don't have a fast connection to storage, the Linux VM will buffer write 
data in the kernel buffer cache until it runs out of memory for that 
(vm.dirty_ratio), then it will freeze any process that issues the writes.If 
your VM has a lot of memory relative to storage speed, this can result in very 
long delays.  Try reducing Linux kernel vm.dirty_background_ratio to get writes 
going sooner and vm.dirty_ratio so that the freezes don't last as long.  You 
can even reduce VM's block device queue depth.  But most of all make sure that 
gluster writes are performing near a typical local block device speed.

  Does glusterfs have any tuning options, that can help me?

If your workload is strictly large-file, try this volume tuning:

-- storage.linux-aio: off (default) 

cluster.eager-lock: enable 
 (default is disabled)
network.remote-dio: on (default is off)
performance.write-behind-window-size: 1MB (default)

for pure single-thread sequential read workload, you can tune read-ahead 
translator to be more aggressive, this will help single-thread reads, but don't 
do this for other workloads, such as virtual machine images in the Gluster 
volume (will appear to Gluster as more of a random I/O workload).

performance.read-ahead-page-count: 16 (default is 4 128-KB prefetched buffers)

http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/

Red Hat Storage distribution will help tune Linux block device for better 
performance on many workloads.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster-users Digest, Vol 53, Issue 56 -- GlusterFS performance (Steve Thompson)

2012-10-02 Thread Ben England

Steve,

try glusterfs 3.3 and look at: 

http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/

There will be more optimizations in the next Gluster release.  Take advantage 
of the translators that Gluster supplies, including readahead translator and 
quick-read translator.

Red Hat does offer support for Red Hat Storage based on Gluster, and it has a 
pre-packaged tuning profile built into it.   We test with 10-GbE networks and 
Gluster 3.3 does have reasonably good performance for large-file sequential 
workloads (and it's scalable).
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster-users Digest, Vol 51, Issue 49

2012-08-03 Thread Ben England

 Message: 4
 Date: Fri, 27 Jul 2012 15:29:41 -0700
 From: Harry Mangalam hjmanga...@gmail.com
 Subject: [Gluster-users] Change NFS parameters post-start
 To: gluster-users gluster-users@gluster.org
 Message-ID:
   CAEib2OnKfENr8NhVwkvpsw21C5QJmzu_=C9j144p2Gkn7KP=l...@mail.gmail.com
 Content-Type: text/plain; charset=ISO-8859-1

 In trying to convert clients from using the gluster native client to
 an NFS client, I'm trying to get the gluster volume mounted on a test
 mount point on the same client that the native client has mounted the
 volume.  The client refuses with the error:

  mount -t nfs bs1:/gl /mnt/glnfs
 mount: bs1:/gl failed, reason given by server: No such file or
 directory

Harry,

Have you tried: 
# mount -t nfs -o nfsvers=3,tcp bs1:/gl /mnt/glnfs

Also, there is an /etc/sysconfig/nfs file that may let you remove RDMA as a 
mount option for NFS.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster-users Digest, Vol 48, Issue 18 - Horrible Gluster Performance

2012-04-16 Thread Ben England

Philip,

What parts of your system perform well?   Can you give a specific example of 
your workload (what you are asking system to do)?  If it's a mixture of 
different workloads that's important too.  What version of Gluster and Linux 
are you using?  My suggestions would be 

a) to reset all your gluster tuning parameters to their default values unless 
you are sure that they actually improve performance, and 

b) try to isolate your performance problem to as simple a workload as possible 
before you try to fix it, and try to determine what workloads DO work well in 
your configuration.  This will make it easier for others to help.  

c) if latency spikes are the issue, this sounds like it could be related to 
writes being excessively buffered by Linux kernel and then being flushed all at 
once, which can block reads.  If so, Use iostat -kx /dev/sd? 5 or equivalent 
to observe.  You can throttle back dirty pages in kernel and avoid buffering 
dirty pages for long periods of time to avoid these spikes.  

http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/ provides some 
suggestions that may be relevant to your problem, my recommendations are in a 
comment here.  

Message: 9
Date: Fri, 13 Apr 2012 11:25:58 +0200
From: Philip flip...@googlemail.com
Subject: [Gluster-users] Horrible Gluster Performance
To: gluster-users@gluster.org
Message-ID:
CAKDbnM7AsprRBgiXH6aHoAra6N9DV=kx69w1cepfmrgw573...@mail.gmail.com
Content-Type: text/plain; charset=iso-8859-1

I have a small GlusterFS Cluster providing a replicated volume. Each server
has 2 SAS disks for the OS and logs and 22 SATA disks for the actual data
striped together as a RAID10 using MegaRAID SAS 9280-4i4e with this
configuration: http://pastebin.com/2xj4401J

Connected to this cluster are a few other servers with the native client
running nginx to serve files stored on it in the order of 3-10MB.

Right now a storage server has a outgoing bandwith of 300Mbit/s and the
busy rate of the raid array is at 30-40%. There are also strange
side-effects: Sometimes the io-latency skyrockets and there is no access
possible on the raid for 10 seconds. This happens at 300Mbit/s or
1000Mbit/s of outgoing bandwidth. The file system used is xfs and it has
been tuned to match the raid stripe size.

I've tested all sorts of gluster settings but none seem to have any effect
because of that I've reset the volume configuration and it is using the
default one.

Does anyone have an idea what could be the reason for such a bad
performance? 22 Disks in a RAID10 should deliver *way* more throughput.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

[Gluster-users] RDMA/Ethernet wi ROCEE - failed to modify QP to RTR

2011-11-14 Thread Ben England

Did any RDMA/Ethernet users see this Gluster error?  If so do you know what 
caused it and how to fix?  If you haven't seen it, what RPMs and configuration 
do you use specific to RDMA/Ethernet?

[2011-11-10 10:30:20.595801] C 
[rdma.c:2417:rdma_connect_qp]0-rpc-transport/rdma: Failed to modify QP to RTR 
[2011-11-10 10:30:20.595930] E [rdma.c:4159:rdma_handshake_pollin] 
0-rpc-transport/rdma: rdma.management: failed to connect with remote QP

I see this when I run RDMA over Ethernet using ROCEE RPMs, but when I run over 
Infiniband using RHEL 6.2-, it runs fine.  On the same Ethernet configuration, 
Gluster/TCP runs fine, NFS/RDMA runs fine as does AMQP app.  But qperf and 
rping utilities fail in the same way.  Firmware on the HCAs is not the latest, 
is it worth risk to upgrade?

I went into debugger and found line where qperf fails, it's near line 2056 in 
rdma.c in qperf sources (qperf-debuginfo,  I did Makefile)

(gdb)
2088} else if (dev-trans == IBV_QPT_RC) {
(gdb)
2090flags = IBV_QP_STATE  |
(gdb)
2097if (ibv_modify_qp(dev-qp, rtr_attr, flags) != 0)
(gdb)
2098error(SYS, failed to modify QP to RTR);
(gdb)

Gluster fails in rdma_connect_qp() calling the same routine, but perhaps with 
different parameters.  
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

[Gluster-users] Gluster/RDMA

2011-11-07 Thread Ben England

To Harry Mangalam about Gluster/RDMA:

make sure these modules are loaded

# modprobe -v rdma_ucm
# modprobe -v ib_uverbs
# modprobe -v ib_ucm

To run the subnet manager

# modprobe -v ib_umad

Make sure libibverbs and (libmlx4 or libmthca) RPMs are installed.

I don't understand why they appropriate modules aren't loaded automatically.  
Could put something in /etc/modprobe.d/ to make this happen maybe?  Infiniband 
should not require troubleshooting after 5-10 years of development, it should 
just work.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users