Re: [ceph-users] failing to respond to cache pressure

Mark Nelson Mon, 16 May 2016 07:44:19 -0700

FWIW, when we tested CephFS at ORNL a couple of years ago we were doingabout 4-6GB/s on relatively non-optimal hardware (pretty much maxing thehardware out on writes, though only about 50-60% on reads). What youare experiencing isn't necessarily reflective of how a healthy clusterwill perform. Certainly if you have degraded PGs you are going to seelower performance than you would if your cluster was healthy. The firststep is probably figuring out why your cluster is unhealthy.


Mark


On 05/16/2016 09:11 AM, Andrus, Brian Contractor wrote:

Both client and server are Jewel 10.2.0

"All kinds of issues"  include that EVERY node ended up with the cache pressure 
message, even if they had done no access at all.
I ended up with some 200 degraded pgs.  Quite a few with other of the 
'standard' errors of suck waiting and such. I ended up disconnecting all 
mounted clients and waiting about 45 minutes for it to clear. I couldn't 
effectively do any writes until I let it clear.

I am watching my write speeds and while I can get it to peak at a couple 
hundred MB/s, it is usually below 10 and often below 1.
That isn't the kind of performance I would expect from a parallel file system, 
hence my questioning if it should be used in my environment.


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238




-----Original Message-----
From: John Spray [mailto:jsp...@redhat.com]
Sent: Monday, May 16, 2016 2:28 AM
To: Andrus, Brian Contractor
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] failing to respond to cache pressure

On Mon, May 16, 2016 at 5:42 AM, Andrus, Brian Contractor <bdand...@nps.edu> 
wrote:

So this ‘production ready’ CephFS for jewel seems a little not quite….



Currently I have a single system mounting CephFS and merely scp-ing
data to it.

The CephFS mount has 168 TB used, 345 TB / 514 TB avail.



Every so often, I get a HEALTH_WARN message of mds0: Client failing to
respond to cache pressure


What client, what version?

Even if I stop the scp, it will not go away until I umount/remount the
filesystem.



For testing, I had the cephfs mounted on about 50 systems and when
updated started on the, I got all kinds of issues with it all.


All kinds of issues...?  Need more specific bug reports than that to fix things.

John

I figured having updated run on a few systems would be a good ‘see
what happens’ if there is a fair amount of access to it.



So, should I not be even considering using CephFS as a large storage
mount for a compute cluster? Is there a sweet spot for what CephFS
would be good for?





Brian Andrus

ITACS/Research Computing

Naval Postgraduate School

Monterey, California

voice: 831-656-6238






_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] failing to respond to cache pressure

Reply via email to