Re: [ceph-users] Ceph monitor load, low performance

2014-09-03 Thread pawel . orzechowski
 

Hello Ladies and Gentlemen;-) 

The reason for the problem was the lack of battery backuped cache. After
we had installed it the load is even on all osd's. 

Thanks 

Pawel 

---

Paweł Orzechowski
pawel.orzechow...@budikom.net

 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitor load, low performance

2014-09-03 Thread Mark Nelson

On 09/03/2014 04:34 PM, pawel.orzechow...@budikom.net wrote:

Hello Ladies and Gentlemen;-)

The reason for the problem was the lack of battery backuped cache. After
we had installed it the load is even on all osd's.


Glad to hear it was that simple! :)

Mark



Thanks

Pawel

---

Paweł Orzechowski
pawel.orzechow...@budikom.net



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitor load, low performance

2014-08-27 Thread Patrycja Szabłowska
Irrelevant, but I need to say this: Cephers aren't only men, you know... :-)


Cheers,

Patrycja

2014-08-26 12:58 GMT+02:00  pawel.orzechow...@budikom.net:
 Hello Gentelmen:-)

 Let me point one important aspect of this low performance problem: from
 all 4 nodes of our ceph cluster only one node shows bad metrics, that is
 very high latency on its osd's (from 200-600ms), while other three nodes
 behave normaly, thats is latency of their osds is between 1-10ms.

 So, the idea of putting journals on SSD is something that we are looking at,
 but we think that we have in general some problem with that particular node,
 what affects whole cluster.

 So can the number (4) of hosts a reason for that? Any other hints?

 Thanks

 Pawel


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Irek Fasikhov
Move logs on the SSD and immediately increase performance. you have about
50% of the performance lost on logs. And just for the three replications
recommended more than 5 hosts


2014-08-26 12:17 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net:


 Hi thanks for reply.



  From the top of my head, it is recommended to use 3 mons in
 production. Also, for the 22 osds your number of PGs look a bug low,
 you should look at that.

 I get it from http://ceph.com/docs/master/rados/operations/placement-
 groups/

 (22osd's * 100)/3 replicas = 733, ~1024 pgs
 Please correct me if I'm wrong.

 It will be 5 mons (on 6 hosts) but now we must migrate some data from used
 servers.




 The performance of the cluster is poor - this is too vague. What is
 your current performance, what benchmarks have you tried, what is your
 data workload and most importantly, how is your cluster setup. what
 disks, ssds, network, ram, etc.

 Please provide more information so that people could help you.

 Andrei


 Hardware informations:
 ceph15:
 RAM: 4GB
 Network: 4x 1GB NIC
 OSD disk's:
 2x SATA Seagate ST31000524NS
 2x SATA WDC WD1003FBYX-18Y7B0

 ceph25:
 RAM: 16GB
 Network: 4x 1GB NIC
 OSD disk's:
 2x SATA WDC WD7500BPKX-7
 2x SATA WDC WD7500BPKX-2
 2x SATA SSHD ST1000LM014-1EJ164

 ceph30
 RAM: 16GB
 Network: 4x 1GB NIC
 OSD disks:
 6x SATA SSHD ST1000LM014-1EJ164

 ceph35:
 RAM: 16GB
 Network: 4x 1GB NIC
 OSD disks:
 6x SATA SSHD ST1000LM014-1EJ164


 All journals are on OSD's. 2 NIC are for backend network (10.20.4.0/22)
 and 2 NIC are for frontend (10.20.8.0/22).

 This cluster we use as storage backend for 100VM's on KVM. I don't make
 benchmarks but all vm's are migrated from Xen+GlusterFS(NFS), before
 migration every VM are running fine, now each VM  from time to time hangs
 for few seconds, apps installed on VM's loading much more time. GlusterFS
 are running on 2 servers with 1x 1GB NIC and 2x8 disks WDC WD7500BPKX-7.

 I make one test with recovery, if disk marks out, then recovery io is
 150-200MB/s but all vm's hangs until recovery ends.

 Biggest load is on ceph35, IOps on each disk are near 150, cpu load ~4-5.
 On other hosts cpu load 2, 120~130iops

 Our ceph.conf

 ===
 [global]

 fsid=a9d17295-62f2-46f6-8325-1cad7724e97f
 mon initial members = ceph35, ceph30, ceph25, ceph15
 mon host = 10.20.8.35, 10.20.8.30, 10.20.8.25, 10.20.8.15
 public network = 10.20.8.0/22
 cluster network = 10.20.4.0/22
 osd journal size = 1024
 filestore xattr use omap = true
 osd pool default size = 3
 osd pool default min size = 1
 osd pool default pg num = 1024
 osd pool default pgp num = 1024
 osd crush chooseleaf type = 1
 auth cluster required = cephx
 auth service required = cephx
 auth client required = cephx
 rbd default format = 2

 ##ceph35 osds
 [osd.0]
 cluster addr = 10.20.4.35
 [osd.1]
 cluster addr = 10.20.4.35
 [osd.2]
 cluster addr = 10.20.4.35
 [osd.3]
 cluster addr = 10.20.4.36
 [osd.4]
 cluster addr = 10.20.4.36
 [osd.5]
 cluster addr = 10.20.4.36

 ##ceph25 osds
 [osd.6]
 cluster addr = 10.20.4.25
 public addr = 10.20.8.25
 [osd.7]
 cluster addr = 10.20.4.25
 public addr = 10.20.8.25
 [osd.8]
 cluster addr = 10.20.4.25
 public addr = 10.20.8.25
 [osd.9]
 cluster addr = 10.20.4.26
 public addr = 10.20.8.26
 [osd.10]
 cluster addr = 10.20.4.26
 public addr = 10.20.8.26
 [osd.11]
 cluster addr = 10.20.4.26
 public addr = 10.20.8.26

 ##ceph15 osds
 [osd.12]
 cluster addr = 10.20.4.15
 public addr = 10.20.8.15
 [osd.13]
 cluster addr = 10.20.4.15
 public addr = 10.20.8.15
 [osd.14]
 cluster addr = 10.20.4.15
 public addr = 10.20.8.15
 [osd.15]
 cluster addr = 10.20.4.16
 public addr = 10.20.8.16

 ##ceph30 osds
 [osd.16]
 cluster addr = 10.20.4.30
 public addr = 10.20.8.30
 [osd.17]
 cluster addr = 10.20.4.30
 public addr = 10.20.8.30
 [osd.18]
 cluster addr = 10.20.4.30
 public addr = 10.20.8.30
 [osd.19]
 cluster addr = 10.20.4.31
 public addr = 10.20.8.31
 [osd.20]
 cluster addr = 10.20.4.31
 public addr = 10.20.8.31
 [osd.21]
 cluster addr = 10.20.4.31
 public addr = 10.20.8.31

 [mon.ceph35]
 host = ceph35
 mon addr = 10.20.8.35:6789
 [mon.ceph30]
 host = ceph30
 mon addr = 10.20.8.30:6789
 [mon.ceph25]
 host = ceph25
 mon addr = 10.20.8.25:6789
 [mon.ceph15]
 host = ceph15
 mon addr = 10.20.8.15:6789
 

 Regards,

 Mateusz


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Mateusz Skała

You mean to move /var/log/ceph/* to SSD disk?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Irek Fasikhov
I'm sorry, of course it journals)


2014-08-26 13:16 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net:

 You mean to move /var/log/ceph/* to SSD disk?


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph monitor load, low performance

2014-08-26 Thread pawel . orzechowski
 

Hello Gentelmen:-) 

Let me point one important aspect of this low performance problem:
from all 4 nodes of our ceph cluster only one node shows bad metrics,
that is very high latency on its osd's (from 200-600ms), while other
three nodes behave normaly, thats is latency of their osds is between
1-10ms. 

So, the idea of putting journals on SSD is something that we are looking
at, but we think that we have in general some problem with that
particular node, what affects whole cluster. 

So can the number (4) of hosts a reason for that? Any other hints? 

Thanks 

Pawel ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Craig Lewis
I had a similar problem once.  I traced my problem it to a failed battery
on my RAID card, which disabled write caching.  One of the many things I
need to add to monitoring.



On Tue, Aug 26, 2014 at 3:58 AM, pawel.orzechow...@budikom.net wrote:

  Hello Gentelmen:-)

 Let me point one important aspect of this low performance problem: from
 all 4 nodes of our ceph cluster only one node shows bad metrics, that is
 very high latency on its osd's (from 200-600ms), while other three nodes
 behave normaly, thats is latency of their osds is between 1-10ms.

 So, the idea of putting journals on SSD is something that we are looking
 at, but we think that we have in general some problem with that particular
 node, what affects whole cluster.

 So can the number (4) of hosts a reason for that? Any other hints?

 Thanks

 Pawel

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com