Nick,
I noticed that dumping page cache sometime helps as I was hitting Ubuntu page 
cache compaction issue (I shared that to community sometimes back). Perf top 
should show compaction related stack trace then . Setting sysctl vm option 
min_free_kbytes to big numbers (like 5/10 GB in my 64 GB RAM setup) may help. 
But, if it is the same issue over some period of time you will hit again if you 
don't set the above option properly.

Regarding your second problem:
If you enable optracker, there are bunch of counters you can dump with admin 
socket. But, if you are saying if it is served from page cache performance is 
improved, it is unlikely it will be within OSD though. But, again, same disk 
serving other RBDS are giving you good numbers (May be part of the disk causing 
problem ?) !
BTW, are you seeing something wrong in the log by enabling OSD and filestore 
debug level to say 20 ?
If you can identify what PGs are slowing things down (by log or counters), you  
can run similar fio reads directly on the drives responsible for holding 
primary OSD for that PG.

Thanks & Regards
Somnath

-----Original Message-----
From: Nick Fisk [mailto:[email protected]]
Sent: Thursday, June 04, 2015 2:12 PM
To: 'Gregory Farnum'; Somnath Roy
Cc: [email protected]
Subject: RE: [ceph-users] Old vs New pool on same OSDs - Performance Difference

> -----Original Message-----
> From: ceph-users [mailto:[email protected]] On Behalf
> Of Gregory Farnum
> Sent: 04 June 2015 21:22
> To: Nick Fisk
> Cc: [email protected]
> Subject: Re: [ceph-users] Old vs New pool on same OSDs - Performance
> Difference
>
> On Thu, Jun 4, 2015 at 6:31 AM, Nick Fisk <[email protected]> wrote:
> >
> > Hi All,
> >
> > I have 2 pools both on the same set of OSD’s, 1st is the default rbd
> > pool
> created at installation 3 months ago, the other has just recently been
> created, to verify performance problems.
> >
> > As mentioned both pools are on the same set of OSD’s, same crush
> > ruleset
> and RBD’s on both are identical in size, version and order. The only
> real difference that I can think of is that the existing pool as
> around 5 million objects on it.
> >
> > Testing using RBD enabled fio, I see the newly created pool get an
> expected random read IO performance of around 60 iop’s. The existing
> pool only gets around half of this. New pool latency = ~15ms Old pool
> latency = ~35ms for random reads.
> >
> > There is no other IO going on in the cluster at the point of running
> > these
> tests.
> >
> > XFS fragmentation is low, somewhere around 1-2% on most of the disks.
> Only difference I can think of is that the existing pool has data on
> it where the new one is empty apart from testing RBD, should this make a 
> difference?
> >
> > Any ideas?
> >
> > Any hints on what I can check to see why latency is so high for the
> > existing
> pool?
> >
> > Nick
>
> Apart from what Somnath said, depending on your PG counts and
> configuration setup you might also have put enough objects into the
> cluster that you have a multi-level PG folder hierarchy in the old
> pool. I wouldn't expect that to make a difference because those
> folders should be cached in RAM, but if somehow they're not that would 
> require more disk accesses.
>
> But more likely it's as Somnath suggests and since most of the objects
> don't exist for images in the new pool it's able to put back ENOENT on
> accesses much more quickly.
> -Greg

Thanks for the replies guys.

I had previously completely written to both test RBD's until full. Strangely, I 
have just written to them both again and then dropped caches on all OSD nodes. 
Now both seem to perform the same but at the speed of the faster pool.

I have then pointed fio at another existing RBD on the old pool and the results 
are awful, averaging under 10 iops for 64k random read QD=1. Unfortunately this 
RBD has live data on it, so can't overwrite it.

But something seems up with RBD's (or the underlying objects) that have had 
data written to them a while back. If I make sure the data is in the pagecache, 
then I get really great performance, so it must be something to do with reading 
data off the disk, but I'm lost as to what it might be.

Iostat doesn't really show anything interesting, but I'm guessing a single 
thread read over 40 disks wouldn't anyway. Are there any counters I could look 
at that might help to break down the steps the OSD goes through to do the read 
to determine where the slow down comes from?


> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to