Re: [ceph-users] Ceph monitor load, low performance
Hello Ladies and Gentlemen;-) The reason for the problem was the lack of battery backuped cache. After we had installed it the load is even on all osd's. Thanks Pawel --- Paweł Orzechowski pawel.orzechow...@budikom.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
On 09/03/2014 04:34 PM, pawel.orzechow...@budikom.net wrote: Hello Ladies and Gentlemen;-) The reason for the problem was the lack of battery backuped cache. After we had installed it the load is even on all osd's. Glad to hear it was that simple! :) Mark Thanks Pawel --- Paweł Orzechowski pawel.orzechow...@budikom.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
Irrelevant, but I need to say this: Cephers aren't only men, you know... :-) Cheers, Patrycja 2014-08-26 12:58 GMT+02:00 pawel.orzechow...@budikom.net: Hello Gentelmen:-) Let me point one important aspect of this low performance problem: from all 4 nodes of our ceph cluster only one node shows bad metrics, that is very high latency on its osd's (from 200-600ms), while other three nodes behave normaly, thats is latency of their osds is between 1-10ms. So, the idea of putting journals on SSD is something that we are looking at, but we think that we have in general some problem with that particular node, what affects whole cluster. So can the number (4) of hosts a reason for that? Any other hints? Thanks Pawel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
Move logs on the SSD and immediately increase performance. you have about 50% of the performance lost on logs. And just for the three replications recommended more than 5 hosts 2014-08-26 12:17 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net: Hi thanks for reply. From the top of my head, it is recommended to use 3 mons in production. Also, for the 22 osds your number of PGs look a bug low, you should look at that. I get it from http://ceph.com/docs/master/rados/operations/placement- groups/ (22osd's * 100)/3 replicas = 733, ~1024 pgs Please correct me if I'm wrong. It will be 5 mons (on 6 hosts) but now we must migrate some data from used servers. The performance of the cluster is poor - this is too vague. What is your current performance, what benchmarks have you tried, what is your data workload and most importantly, how is your cluster setup. what disks, ssds, network, ram, etc. Please provide more information so that people could help you. Andrei Hardware informations: ceph15: RAM: 4GB Network: 4x 1GB NIC OSD disk's: 2x SATA Seagate ST31000524NS 2x SATA WDC WD1003FBYX-18Y7B0 ceph25: RAM: 16GB Network: 4x 1GB NIC OSD disk's: 2x SATA WDC WD7500BPKX-7 2x SATA WDC WD7500BPKX-2 2x SATA SSHD ST1000LM014-1EJ164 ceph30 RAM: 16GB Network: 4x 1GB NIC OSD disks: 6x SATA SSHD ST1000LM014-1EJ164 ceph35: RAM: 16GB Network: 4x 1GB NIC OSD disks: 6x SATA SSHD ST1000LM014-1EJ164 All journals are on OSD's. 2 NIC are for backend network (10.20.4.0/22) and 2 NIC are for frontend (10.20.8.0/22). This cluster we use as storage backend for 100VM's on KVM. I don't make benchmarks but all vm's are migrated from Xen+GlusterFS(NFS), before migration every VM are running fine, now each VM from time to time hangs for few seconds, apps installed on VM's loading much more time. GlusterFS are running on 2 servers with 1x 1GB NIC and 2x8 disks WDC WD7500BPKX-7. I make one test with recovery, if disk marks out, then recovery io is 150-200MB/s but all vm's hangs until recovery ends. Biggest load is on ceph35, IOps on each disk are near 150, cpu load ~4-5. On other hosts cpu load 2, 120~130iops Our ceph.conf === [global] fsid=a9d17295-62f2-46f6-8325-1cad7724e97f mon initial members = ceph35, ceph30, ceph25, ceph15 mon host = 10.20.8.35, 10.20.8.30, 10.20.8.25, 10.20.8.15 public network = 10.20.8.0/22 cluster network = 10.20.4.0/22 osd journal size = 1024 filestore xattr use omap = true osd pool default size = 3 osd pool default min size = 1 osd pool default pg num = 1024 osd pool default pgp num = 1024 osd crush chooseleaf type = 1 auth cluster required = cephx auth service required = cephx auth client required = cephx rbd default format = 2 ##ceph35 osds [osd.0] cluster addr = 10.20.4.35 [osd.1] cluster addr = 10.20.4.35 [osd.2] cluster addr = 10.20.4.35 [osd.3] cluster addr = 10.20.4.36 [osd.4] cluster addr = 10.20.4.36 [osd.5] cluster addr = 10.20.4.36 ##ceph25 osds [osd.6] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.7] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.8] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.9] cluster addr = 10.20.4.26 public addr = 10.20.8.26 [osd.10] cluster addr = 10.20.4.26 public addr = 10.20.8.26 [osd.11] cluster addr = 10.20.4.26 public addr = 10.20.8.26 ##ceph15 osds [osd.12] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.13] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.14] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.15] cluster addr = 10.20.4.16 public addr = 10.20.8.16 ##ceph30 osds [osd.16] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.17] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.18] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.19] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [osd.20] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [osd.21] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [mon.ceph35] host = ceph35 mon addr = 10.20.8.35:6789 [mon.ceph30] host = ceph30 mon addr = 10.20.8.30:6789 [mon.ceph25] host = ceph25 mon addr = 10.20.8.25:6789 [mon.ceph15] host = ceph15 mon addr = 10.20.8.15:6789 Regards, Mateusz ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
You mean to move /var/log/ceph/* to SSD disk? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
I'm sorry, of course it journals) 2014-08-26 13:16 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net: You mean to move /var/log/ceph/* to SSD disk? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph monitor load, low performance
Hello Gentelmen:-) Let me point one important aspect of this low performance problem: from all 4 nodes of our ceph cluster only one node shows bad metrics, that is very high latency on its osd's (from 200-600ms), while other three nodes behave normaly, thats is latency of their osds is between 1-10ms. So, the idea of putting journals on SSD is something that we are looking at, but we think that we have in general some problem with that particular node, what affects whole cluster. So can the number (4) of hosts a reason for that? Any other hints? Thanks Pawel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph monitor load, low performance
I had a similar problem once. I traced my problem it to a failed battery on my RAID card, which disabled write caching. One of the many things I need to add to monitoring. On Tue, Aug 26, 2014 at 3:58 AM, pawel.orzechow...@budikom.net wrote: Hello Gentelmen:-) Let me point one important aspect of this low performance problem: from all 4 nodes of our ceph cluster only one node shows bad metrics, that is very high latency on its osd's (from 200-600ms), while other three nodes behave normaly, thats is latency of their osds is between 1-10ms. So, the idea of putting journals on SSD is something that we are looking at, but we think that we have in general some problem with that particular node, what affects whole cluster. So can the number (4) of hosts a reason for that? Any other hints? Thanks Pawel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com