1. No packet drop found in system log. 2. "ceph health detail" shows:
# ceph health detail HEALTH_WARN mon.bj-ceph10 addr 10.10.11.23:6789/0 has 43% avail disk space -- store is getting too big! 77364 MB >= 40960 MB mon.bj-ceph12 addr 10.10.11.25:6789/0 has 43% avail disk space -- store is getting too big! 77071 MB >= 40960 MB mon.bj-ceph13 addr 10.10.11.26:6789/0 has 42% avail disk space -- store is getting too big! 78403 MB >= 40960 MB mon.bj-ceph14 addr 10.10.11.27:6789/0 has 43% avail disk space -- store is getting too big! 78006 MB >= 40960 MB I am checking out what does this mean exactly. 3. by "run out", I mean: # free -m total used free shared buffers cached Mem: 64376 63766 609 0 123 47974 -/+ buffers/cache: 15669 48707 Swap: 22831 2319 20512 "top" shows memory are mainly used by ceph-osd process. 4. Cluster configuration, for a single Ceph node CPU: Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz memory: 64GB Data disk: 1TB HDD * 24 (do not know vendor now) Journal disk: 800TB SSD * 2 (do not know vendor now) We have run 24 OSDs on one node! I think this is why the memory in shortage (also CPU may not afford in highload recovery or reblance, and 2 SSD for 24 OSD journal is just not enough) and slow OSD write logged, if reduced to 16 or 12, it would be much better. 5. about "ceph osd tree". It seems some nodes have served, but lately OSDs on them are all removed: ...... 156 0.91 osd.156 up 1 157 0.91 osd.157 up 1 158 0.91 osd.158 up 1 -16 0 host bj-ceph20 -17 0 host bj-ceph15 -18 0 host bj-ceph11 -19 21.84 host bj-ceph08 219 0.91 osd.219 up 1 220 0.91 osd.220 up 1 221 0.91 osd.221 up 1 208 0.91 osd.208 up 1 207 0.91 osd.207 up 1 209 0.91 osd.209 up 1 215 0.91 osd.215 up 1 211 0.91 osd.211 up 1 ..... Seems I have a lot of work to do... 2015-05-13 22:13 GMT+08:00 Somnath Roy <somnath....@sandisk.com>: > If you are seeing ceph –s or ceph osd tree is showing all up , but in > reality it is not. I think the heart bit traffic is getting lost may be > due to network problem you indicated. > > Also, the point that your node memory is running out means probably it is > trying to do recovery. OSDs are flapping. > > > > Give us the following information. > > > > 1. Are you seeing any packet drop in the system log ? > > > > 2. Also, how many host serving this OSDs ? Host cpu/memory ? > > > > 3. Is this all HDD cluster ? > > > > 4. Yeah, it is showing health warning, but, not saying anything ! Run, > ‘ceph health detail’ to see if it gives extra details or not. > > > > Thanks & Regards > > Somnath > > > > *From:* changqian zuo [mailto:dummyhacke...@gmail.com] > *Sent:* Wednesday, May 13, 2015 3:10 AM > *To:* Christian Balzer; Somnath Roy > *Cc:* ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] How to debug a ceph read performance problem? > > > > Thanks for your time, > > > > I write a Python script and re-analysing client messaging log, list all > OSD ip and the time used for the read request to this OSD, and find slow > replies are all from node 10.10.11.12. So I do a network test, and there is > problem: > > > > # iperf3 -c 10.10.11.15 -t 60 -i 1 > > Connecting to host 10.10.11.15, port 5201 > > [ 4] local 10.10.11.12 port 53944 connected to 10.10.11.15 port 5201 > > [ ID] Interval Transfer Bandwidth Retr Cwnd > > [ 4] 0.00-1.00 sec 123 KBytes 1.01 Mbits/sec 20 4.24 KBytes > > > [ 4] 1.00-2.00 sec 67.9 KBytes 556 Kbits/sec 16 2.83 KBytes > > > [ 4] 2.00-3.00 sec 192 KBytes 1.58 Mbits/sec 23 4.24 KBytes > > > [ 4] 3.00-4.00 sec 22.6 KBytes 185 Kbits/sec 11 4.24 KBytes > > > [ 4] 4.00-5.00 sec 76.4 KBytes 626 Kbits/sec 18 4.24 KBytes > > > [ 4] 5.00-6.00 sec 25.5 KBytes 209 Kbits/sec 10 4.24 KBytes > > > [ 4] 6.00-7.00 sec 107 KBytes 880 Kbits/sec 15 2.83 KBytes > > > [ 4] 7.00-8.00 sec 127 KBytes 1.04 Mbits/sec 23 2.83 KBytes > > > [ 4] 8.00-9.00 sec 130 KBytes 1.07 Mbits/sec 17 4.24 KBytes > > > [ 4] 9.00-10.00 sec 119 KBytes 973 Kbits/sec 19 4.24 KBytes > > > > I've asked ops to tweak this yesterday, now it seems this is not > completely done. I am still waiting for them to handle this and do another > test. This may explain why the io rate started high and suddenly goes down. > But there may still be other issue. > > > > Our cluster have been not very well maintained, "ceph -s" shows: > > > > [root@controller fio-rbd]# ceph -s > > cluster 6d2bb752-db69-48ff-9df4-7d85703f322e > > health HEALTH_WARN > > monmap e29: 5 mons at {bj-ceph09= > 10.10.11.22:6789/0,bj-ceph10=10.10.11.23:6789/0,bj-ceph12=10.10.11.25:6789/0,bj-ceph13=10.10.11.26:6789/0,bj-ceph14=10.10.11.27:6789/0}, > election epoch 180736, quorum 0,1,2,3,4 > bj-ceph09,bj-ceph10,bj-ceph12,bj-ceph13,bj-ceph14 > > osdmap e240303: 216 osds: 216 up, 216 in > > pgmap v31819901: 10048 pgs, 3 pools, 28274 GB data, 4736 kobjects > > 85880 GB used, 111 TB / 195 TB avail > > 10048 active+clean > > client io 208 kB/s rd, 18808 kB/s wr, 1463 op/s > > > > health is HEALTH_WARN, but can not tell what goes wrong at first glance. > > > > "ceph osd tree" shows all OSD is up, but there are ceph node without any > OSD up and running. And I do see memory in many Ceph storage nodes are > almost or been run out. > > > > I think I will work on these issues soon. > > > > > > > > > > 2015-05-13 15:17 GMT+08:00 Christian Balzer <ch...@gol.com>: > > > Hello, > > in addition to what Somnath wrote, if you're seeing this kind of blocking > reads _and_ have slow write warnings in the logs, your cluster is likely > either unhealthy and/or underpowered for it's current load. > > If your cluster is healthy, you may want to investigate what's busy, my > guess is the OSDs/HDDs. > Also any scrubs my drag your performance down especially deep-scrubs. > > However this doesn't really explain any difference between Juno and > Havanna, as both should suffer from a sickly Ceph cluster. > > Christian > > > On Wed, 13 May 2015 06:51:56 +0000 Somnath Roy wrote: > > > Can you give some more insight about the ceph cluster you are running ? > > It seems IO started and then no response..cur MB/s is becoming 0s.. > > What is ‘ceph –s’ output ? > > Hope all the OSDs are up and running.. > > > > Thanks & Regards > > Somnath > > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > changqian zuo Sent: Tuesday, May 12, 2015 9:00 PM > > To: ceph-users@lists.ceph.com > > Subject: [ceph-users] How to debug a ceph read performance problem? > > > > Hi, guys, > > > > We have been running an OpenStack Havana environment with Ceph 0.72.2 as > > block storage backend. Recently we were trying to upgrade OpenStack to > > Juno. For testing, we deployed a Juno all-in-one node, this node share > > the same Cinder volume rbd pool and Glance image rbd pool with the old > > Havana. > > > > And after some test, we found a serious read performance problem in Juno > > client (write is just OK), something like: > > > > # rados bench -p test 30 seq > > sec Cur ops started finished avg MB/s cur MB/s last lat avg > > lat 0 0 0 0 0 0 - 0 > > 1 16 100 84 335.843 336 0.020221 > > 0.0393582 2 16 100 84 167.944 0 - > > 0.0393582 3 16 100 84 111.967 0 - > > 0.0393582 4 16 100 84 83.9769 0 - > > 0.0393582 5 16 100 84 67.1826 0 - > > 0.0393582 6 16 100 84 55.9863 0 - > > 0.0393582 7 16 100 84 47.9886 0 - > > 0.0393582 8 16 100 84 41.9905 0 - > > 0.0393582 9 16 100 84 37.3249 0 - > > 0.0393582 10 16 100 84 33.5926 0 - > > 0.0393582 11 16 100 84 30.5388 0 - > > 0.0393582 12 16 100 84 27.9938 0 - > > 0.0393582 13 16 100 84 25.8405 0 - > > 0.0393582 14 16 100 84 23.9948 0 - > > 0.0393582 15 16 100 84 22.3952 0 - > > 0.0393582 > > > > And when testing RBD image with fio (bs=512k read), there are: > > > > # grep 12067 ceph.client.log | grep read > > 2015-05-11 16:19:36.649554 7ff9949d5a00 1 -- > > > 10.10.11.15:0/2012449<http://10.10.11.15:0/2012449> --> > > 10.10.11.21:6835/45746<http://10.10.11.21:6835/45746> -- > > osd_op(client.3772684.0:12067 rbd_data.262a6e7bf17801.0000000000000003 > > [sparse-read 2621440~524288] 7.c43a3ae3 e240302) v4 -- ?+0 > > 0x7ff9967c5fb0 con 0x7ff99a41c420 2015-05-11 16:20:07.709915 > > 7ff94bfff700 1 -- 10.10.11.15:0/2012449<http://10.10.11.15:0/2012449> > > <== osd.218 10.10.11.21:6835/45746<http://10.10.11.21:6835/45746> 111 > > ==== osd_op_reply(12067 rbd_data.262a6e7bf17801.0000000000000003 > > [sparse-read 2621440~524288] v0'0 uv3803266 ondisk = 0) v6 ==== > > 199+0+524312 (3484234903 0 0) 0x7ff3a4002ba0 con 0x7ff99a41c420 > > > > Some operation takes more an minute. > > > > I checked OSD log (default logging level, ceph.com<http://ceph.com> said > > > when a request takes too long, it will complain in log), and do see some > > slow 4k write request, but no read. > > > > We have tested Giant, Firefly, and self-built Emperor client, same sad > > results. > > > > The network between OSD and all-in-one node is 10Gb network, this is > > from client to OSD: > > > > # iperf3 -c 10.10.11.25 -t 60 -i 1 > > Connecting to host 10.10.11.25, port 5201 > > [ 4] local 10.10.11.15 port 41202 connected to 10.10.11.25 port 5201 > > [ ID] Interval Transfer Bandwidth Retr Cwnd > > [ 4] 0.00-1.00 sec 1.09 GBytes 9.32 Gbits/sec 11 2.02 MBytes > > [ 4] 1.00-2.00 sec 1.09 GBytes 9.35 Gbits/sec 34 1.53 MBytes > > [ 4] 2.00-3.00 sec 1.09 GBytes 9.35 Gbits/sec 11 1.14 MBytes > > [ 4] 3.00-4.00 sec 1.09 GBytes 9.37 Gbits/sec 0 1.22 MBytes > > [ 4] 4.00-5.00 sec 1.09 GBytes 9.34 Gbits/sec 0 1.27 MBytes > > > > and OSD to client (there may be some problem in client interface > > bonding, 10Gb could not by reached): > > > > # iperf3 -c 10.10.11.15 -t 60 -i 1 > > Connecting to host 10.10.11.15, port 5201 > > [ 4] local 10.10.11.25 port 43934 connected to 10.10.11.15 port 5201 > > [ ID] Interval Transfer Bandwidth Retr Cwnd > > [ 4] 0.00-1.00 sec 400 MBytes 3.35 Gbits/sec 1 337 KBytes > > [ 4] 1.00-2.00 sec 553 MBytes 4.63 Gbits/sec 1 341 KBytes > > [ 4] 2.00-3.00 sec 390 MBytes 3.27 Gbits/sec 1 342 KBytes > > [ 4] 3.00-4.00 sec 395 MBytes 3.32 Gbits/sec 0 342 KBytes > > [ 4] 4.00-5.00 sec 541 MBytes 4.54 Gbits/sec 0 346 KBytes > > [ 4] 5.00-6.00 sec 405 MBytes 3.40 Gbits/sec 0 358 KBytes > > [ 4] 6.00-7.00 sec 728 MBytes 6.11 Gbits/sec 1 370 KBytes > > [ 4] 7.00-8.00 sec 741 MBytes 6.22 Gbits/sec 0 355 KBytes > > > > Ceph cluster is shared by this Juno and old Havana (as mentioned, they > > use exactly same rbd pool), and IO on Havana just goes fine. Any > > suggestion or advice? So that we can make sure it is an issue of client, > > network, or ceph cluster and then go on. I am new to Ceph, need some > > help. > > > > Thanks > > > > > > > > > > ________________________________ > > > > PLEASE NOTE: The information contained in this electronic mail message > > is intended only for the use of the designated recipient(s) named above. > > If the reader of this message is not the intended recipient, you are > > hereby notified that you have received this message in error and that > > any review, dissemination, distribution, or copying of this message is > > strictly prohibited. If you have received this communication in error, > > please notify the sender by telephone or e-mail (as shown above) > > immediately and destroy any and all copies of this message in your > > possession (whether hard copies or electronically stored copies). > > > > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/ > > > > ------------------------------ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If > the reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com