Re: [ceph-users] How to improve latencies and per-VM performance and latencies

Josef Johansson Wed, 20 May 2015 15:40:15 -0700

Hi,

Just to add, there’s also a collectd plugin at 
https://github.com/rochaporto/collectd-ceph 
<https://github.com/rochaporto/collectd-ceph>.


Things to check when you have slow read performance is:

*) how much defragmentation on those xfs-partitions? With some workloads you 
get high values pretty quick.
for osd in $(grep 'osd/ceph' /etc/mtab | cut -d ' ' -f 1); do sudo xfs_db -c 
frag -r $osd;done
*) 32/48GB RAM on the OSDs, could be increased. So as XFS is used and all the 
objects are files, ceph uses the linux file cache.
If your data set fits into that cache pretty much, you can gain _alot_ of read 
performance since there’s pretty much no reads from the drives. We’re at 128GB 
per OSD right now. Compared with the options at hand this could be a cheap way 
of increasing the performance. It won’t help you out when you’re doing 
deep-scrubs or recovery though.
*) turn off logging
[global]
        debug_lockdep = 0/0
        debug_context = 0/0
        debug_crush = 0/0
        debug_buffer = 0/0
        debug_timer = 0/0
        debug_filer = 0/0
        debug_objecter = 0/0
        debug_rados = 0/0
        debug_rbd = 0/0
        debug_journaler = 0/0
        debug_objectcatcher = 0/0
        debug_client = 0/0
        debug_osd = 0/0
        debug_optracker = 0/0
        debug_objclass = 0/0
        debug_filestore = 0/0
        debug_journal = 0/0
        debug_ms = 0/0
        debug_monc = 0/0
        debug_tp = 0/0
        debug_auth = 0/0
        debug_finisher = 0/0
        debug_heartbeatmap = 0/0
        debug_perfcounter = 0/0
        debug_asok = 0/0
        debug_throttle = 0/0
        debug_mon = 0/0
        debug_paxos = 0/0
        debug_rgw = 0/0
[osd]
       debug lockdep = 0/0
       debug context = 0/0
       debug crush = 0/0
       debug buffer = 0/0
       debug timer = 0/0
       debug journaler = 0/0
       debug osd = 0/0
       debug optracker = 0/0
       debug objclass = 0/0
       debug filestore = 0/0
       debug journal = 0/0
       debug ms = 0/0
       debug monc = 0/0
       debug tp = 0/0
       debug auth = 0/0
       debug finisher = 0/0
       debug heartbeatmap = 0/0
       debug perfcounter = 0/0
       debug asok = 0/0
       debug throttle = 0/0

*) run htop or vmstat/iostat to determinate whether it’s the CPU that’s getting 
maxed out or not.
*) just double check the performance and latencies on the network (do it for 
low and high MTU, just to make sure, it’s tough to optimise a lot and get 
bitten by it ;)

2) I don’t see anything in the help section about it
sudo ceph --admin-daemon /var/run/ceph/ceph-osd.$osd.asok help
an easy way of getting the osds if you want to change something globally
for osd in $(grep 'osd/ceph' /etc/mtab | cut -d ' ' -f 2 | cut -d '-' -f 2); do 
echo $osd;done

3) this is on one of the OSDs, about the same size as yours but sata drives for 
backing ( a bit more cpu and memory though):

sudo ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok perf dump | grep -A 1 -e 
op_latency -e op_[rw]_latency -e op_[rw]_process_latency -e journal_latency
      "journal_latency": { "avgcount": 406051353,
          "sum": 230178.927806000},
--
      "op_latency": { "avgcount": 272537987,
          "sum": 4337608.211040000},
--
      "op_r_latency": { "avgcount": 111672059,
          "sum": 758059.732591000},
--
      "op_w_latency": { "avgcount": 9308193,
          "sum": 174762.139637000},
--
      "subop_latency": { "avgcount": 273742609,
          "sum": 1084598.823585000},
--
      "subop_w_latency": { "avgcount": 273742609,
          "sum": 1084598.823585000},

Cheers
Josef

> On 20 May 2015, at 10:20, Межов Игорь Александрович <me...@yuterra.ru> wrote:
> 
> Hi!
> 
> 1. Use it at your own risk. I'm not responsible to any damage, you can get by 
> running thos script
> 
> 2. What is it for. 
> Ceph osd daemon have so called 'admin socket' - a local (to osd host) unix 
> socket, that we can
> use to issue commant to that osd. The script connects to a list od osd hosts 
> (now it os hardcoded in
> source code, but it's easily changeable) by ssh, lists all admin sockets from 
> /var/run/ceph, grep
> socket names for osd numbers, and issue 'perf dump' command to all osds. Json 
> output parsed
> by standard python libs ans some latency parameters extracted from it. They 
> coded in json as tuples,
> containing  total amount of time in milliseconds and count of events. So 
> dividing time to count we get
> average latency for one or more ceph operations. The min/max/avg are counted 
> for every host and
> whole cluster, and latency of every osd compared to minimal value of cluster 
> (or host) and colorized
> to easily detect too high values. 
> You can check usage example in comments at the top of the script and change 
> hardcoded values,
> that are also gathered at the top.
> 
> 3. I use script on Ceph Firefly 0.80.7, but think that it will work on any 
> release, that supports
> admin socket connection to osd, 'perf dump' command and the same json output 
> structure.
> 
> 4. As we connects to osd hosts by ssh in a one-by-one, the script is slow, 
> especially when you have
> more osd hosts. Also, als osd from a host are output in a one row, so if you 
> have >12 osds per host,
> it will mess output slightly.
> 
> PS: This is my first python script, so suggestions and improvements are 
> welcome ;)
> 
> 
> Megov Igor
> CIO, Yuterra
> 
> ________________________________________
> От: Michael Kuriger <mk7...@yp.com <mailto:mk7...@yp.com>>
> Отправлено: 19 мая 2015 г. 18:51
> Кому: Межов Игорь Александрович
> Тема: Re: [ceph-users] How to improve latencies and per-VM performance  and 
> latencies
> 
> Awesome!  I would be interested in doing this as well.  Care to share how
> your script works?
> 
> Thanks!
> 
> 
> 
> 
> Michael Kuriger
> Sr. Unix Systems Engineer
> * mk7...@yp.com <mailto:mk7...@yp.com> |( 818-649-7235
> 
> 
> 
> 
> 
> On 5/19/15, 6:31 AM, "Межов Игорь Александрович" <me...@yuterra.ru 
> <mailto:me...@yuterra.ru>> wrote:
> 
>> Hi!
>> 
>> Seeking performance improvement in our cluster (Firefly 0.80.7 on Wheezy,
>> 5 nodes, 58 osds), I wrote
>> a small python script, that walks through ceph nodes and issue 'perf
>> dump' command on osd admin
>> sockets. It extracts *_latency tuples, calculate min/max/avg, compare osd
>> perf metrics with min/avg
>> of whole cluster or same host and display result in table form. The goal
>> - to check where the most latency is.
>> 
>> The hardware is not new and shiny:
>> - 5 nodes * 10-12 OSDs each
>> - Intel E5520@2.26/32-48Gb DDR3-1066 ECC
>> - 10Gbit X520DA interconnect
>> - Intel DC3700 200Gb as a system volume + journals, connected to sata2
>> onboard in ahci mode
>> - Intel RS2MB044 / RS2BL080 SAS RAID in RAID0 per drive mode, WT, disk
>> cache disabled
>> - bunch of 1Tb or 2Tb various WD Black drives, 58 disks, 76Tb total
>> - replication = 3, filestore on xfs
>> - shared client and cluster 10Gbit network
>> - cluster used as rbd storage for VMs
>> - rbd_cache is on by 'cache=writeback' in libvirt (I suppose, that it is
>> true ;))
>> - no special tuning in ceph.conf:
>> 
>>> osd mount options xfs = rw,noatime,inode64
>>> osd disk threads = 2
>>> osd op threads = 8
>>> osd max backfills = 2
>>> osd recovery max active = 2
>> 
>> I get rather slow read performance from within VM, especially with QD=1,
>> so many VMs are running slowly.
>> I think, that this HW config can perform better, as I got 10-12k iops
>> with QD=32 from time to time.
>> 
>> So I have some questions:
>> 1. Am I right, that osd perfs are cumulative and counting up from OSD
>> start?
>> 2. Is any way to reset perf counters without restating OSD daemon? Maybe
>> a command through admin socket?
>> 3. What latencies should I expect from my config, or, what latencies you
>> have on yours clusters?
>> Just an example or as a reference to compare with my values. I've
>> interesting mostly in
>> - 'op_latency',
>> - 'op_[r|w]_latency',
>> - 'op_[r|w]_process_latency'
>> - 'journal_latency'
>> But other parameters, like 'apply_latency' or
>> 'queue_transaction_latency_avg' are also interesting to compare.
>> 4. Where I have to look firstly, if I need to improve QD=1 (i. e.
>> per-VM) performance.
>> 
>> Thanks!
>> 
>> Megov Igor
>> CIO, Yuterra
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf>
>> o.cgi_ceph-2Dusers-2Dceph.com 
>> <http://2dusers-2dceph.com/>&d=AwICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc
>> m6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=c0lu_hzIfU4AXi0gnwLzaOeWo7EFrFwlKjKf
>> K-iihGg&s=o-hDZx1--UnZ27K2XL7-w08f2fwTwargpeiWtFS87L0&e=
> 
> <getosdstat.py.gz>_______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to improve latencies and per-VM performance and latencies

Reply via email to