Re: [ceph-users] ceph daemons stucked in FUTEX_WAIT syscall

Gregory Farnum Tue, 14 Jul 2015 03:42:41 -0700

On Tue, Jul 14, 2015 at 11:30 AM, Simion Rad <simion....@yardi.com> wrote:
> Hi ,
>
> The output of ceph -s :
>
> cluster 50961297-815c-4598-8efe-5e08203f9fea
>      health HEALTH_OK
>      monmap e5: 5 mons at 
> {pshn05=10.71.13.5:6789/0,pshn06=10.71.13.6:6789/0,pshn13=10.71.13.13:6789/0,psosctl111=10.71.13.111:6789/0,psosctl112=10.71.13.112:6789/0},
>  election epoch 258, quorum 0,1,2,3,4 
> pshn05,pshn06,pshn13,psosctl111,psosctl112
>      mdsmap e173: 1/1/1 up {0=pshn17=up:active}, 4 up:standby
>      osdmap e21319: 16 osds: 16 up, 16 in
>       pgmap v3301189: 384 pgs, 3 pools, 4906 GB data, 3794 kobjects
>             9940 GB used, 10170 GB / 21187 GB avail
>                  384 active+clean
>
> I don't use any ceph client (kernel or fuse) on the same nodes that run 
> osd/mon/mds daemons.
> Yes, I see slow operations warnings from time to time when I'm looking at 
> ceph -w.


Yeah, I think this is just it — especially if you've got some OSDs
which are 9 times larger than others, the load will disproportionately
go to them and they probably can't take it.

The next time things get stuck you can look at the admin socket on the
ceph-fuse machines and dump_ops_in_flight and see if any of them are
very old, and which OSDs they're targeted at. (You can get similar
information out of the kernel clients by cat'ing the files in
/sys/kernel/debug/ceph/*/.)
-Greg

> The number of iops on the servers aren't that high and I think the write-back 
> cache of the RAID controller sould be able to help with the journal ops.
>
> Simion Rad.
> ________________________________________
> From: Gregory Farnum [g...@gregs42.com]
> Sent: Tuesday, July 14, 2015 12:38
> To: Simion Rad
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] ceph daemons stucked in FUTEX_WAIT syscall
>
> On Mon, Jul 13, 2015 at 11:00 PM, Simion Rad <simion....@yardi.com> wrote:
>> Hi ,
>>
>> I'm running a small cephFS ( 21 TB , 16 OSDs having different sizes between
>> 400G and 3.5 TB ) cluster that is used as a file warehouse (both small and
>> big files).
>> Every day there are times when a lot of processes running on the client
>> servers ( using either fuse of kernel client) become stuck in D state and
>> when I run a strace of them I see them waiting in FUTEX_WAIT syscall.
>> The same issue I'm able to see on all OSD demons.
>> The ceph version I'm running is Firefly 0.80.10 both on clients and on
>> server daemons.
>> I use ext4 as osd filesystem.
>> Operating system on servers : Ubuntu 14.04 and kernel 3.13.
>> Operaing system on clients : Ubuntu 12.04 LTS with HWE option kernel 3.13
>> The osd daemons are using RAID5 virtual disks (6 x 300 GB 10K RPM disks on
>> RAID controller Dell PERC H700 with 512MB BBU using write-back mode).
>> The servers which the ceph daemons are running on are also hosting KVM VMs (
>> OpenStack Nova ).
>> Because of this unfortunate setup the performance is really bad, but at
>> least I shouldn't see as many locking issues (or shoud I ? ).
>> The only thing which temporarily improves the performance is restarting
>> every osd. After such a restart I see some processes on client machines
>> resume I/O but only for a couple of
>> hours,  then the whole process must be repeated.
>> I cannot afford to run a setup without RAID because there isn't enough RAM
>> left for a couple of osd daemons.
>>
>> The ceph.conf settings I use  :
>>
>> auth cluster required = cephx
>> auth service required = cephx
>> auth client required = cephx
>> filestore xattr use omap = true
>> osd pool default size = 2
>> osd pool default min size = 1
>> osd pool default pg num = 128
>> osd pool default pgp num = 128
>> public network = 10.71.13.0/24
>> cluster network = 10.71.12.0/24
>>
>> Did someone else experienced this kind of behaviour (stuck processes in
>> FUTEX_WAIT syscall) when running firefly release on Ubuntu 14.04 ?
>
> What's the output of "ceph -s" on your cluster?
> When your clients get stuck, is the cluster complaining about stuck
> ops on the OSDs?
> Are you running kernel clients on the same boxes as your OSDs?
>
> If I were to guess I'd imagine that you might just have overloaded
> your cluster and the FUTEX_WAIT is the clients waiting for writes to
> get acknowledged, but if restarting the OSDs brings everything back up
> for a few hours that might not be the case.
> -Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph daemons stucked in FUTEX_WAIT syscall

Reply via email to