[ceph-users] how to get ceph daemon debug info from ceph-rest-api ?
Hi all, I want to use ceph-rest-api to view some debug details from ceph daemons. On linux shell I can get this message from below: # ceph daemon osd.0 dump_ops_in_flight | python -m json.tool { num_ops: 0, ops: []} This is my question: Can I get this output from ceph-rest-api ? Until now I tried some method : curl, python-cephclient I did not get the right respone. Can some guys help me ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to get ceph daemon debug info from ceph-rest-api ?
It doesn't currently support that. ceph-rest-api only wraps commands that are sent to the mon cluster, whereas the ceph daemon operations use the local admin socket (/var/run/ceph/*.asok) of the service. There has been some discussion of enabling calls to admin socket operations via the mon though. John On Thu, Jul 24, 2014 at 9:20 AM, zhu qiang zhu_qiang...@foxmail.com wrote: Hi all, I want to use ceph-rest-api to view some debug details from ceph daemons. On linux shell I can get this message from below: # ceph daemon osd.0 dump_ops_in_flight | python -m json.tool { num_ops: 0, ops: []} This is my question: Can I get this output from ceph-rest-api ? Until now I tried some method : curl, python-cephclient I did not get the right respone. Can some guys help me ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HW recommendations for OSD journals?
I found this article very interesting: http://techreport.com/review/26523/the-ssd-endurance-experiment-casualties-on-the-way-to-a-petabyte I've got Samsung 840 Pros and while I'm thinking that I wouldn't go with them again I am interested in the fact that (in this anecdotal experiment) it seemed to last much longer than the wear leveling indicator would have suggested. On a side note, if anyone is having performance issues with these drives, I've found that this produced a drastic speed up: https://wiki.archlinux.org/index.php/SSD_Memory_Cell_Clearing___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MON segfaulting when setting a crush ruleset to a pool (firefly 0.80.4)
Hi Joao, In the meanwhile I have done the following things : $ ceph osd crush move ceph-osd15 rack=rack1-pdu1 moved item id -17 name 'ceph-osd15' to location {rack=rack1-pdu1} in crush map $ ceph osd crush rm rack2-pdu3 removed item id -23 name 'rack2-pdu3' from crush map But it does not solve the problem either. I saw in the documentation that restarting the osd where the PG are stuck could help... I did restart all the OSD but it leads to the following status : cluster 4a8669b9-b379-43b2-9488-7fca6e1366bc health HEALTH_WARN 80 pgs degraded; 152 pgs peering; 411 pgs stale; 166 pgs stuck inactive; 411 pgs stuck stale; 620 pgs stuck unclean; recovery 51106/694410 objects degraded (7.360%) monmap e2: 3 mons at {ceph-mon0=10.1.2.1:6789/0,ceph-mon1=10.1.2.2:6789/0,ceph-mon2=10.1.2.3:6789/0}, election epoch 68, quorum 0,1,2 ceph-mon0,ceph-mon1,ceph-mon2 osdmap e1825: 16 osds: 16 up, 16 in pgmap v301798: 712 pgs, 5 pools, 1350 GB data, 338 kobjects 2763 GB used, 5615 GB / 8379 GB avail 51106/694410 objects degraded (7.360%) 152 stale+peering 73 stale+active+remapped 80 stale+active+degraded+remapped 92 stale+active+clean 301 active+remapped 14 stale You'll find my crush map here : http://pastebin.com/F9aFjcjm Cheers, Olivier. - Mail original - De: Joao Eduardo Luis joao.l...@inktank.com À: Olivier DELHOMME olivier.delho...@mines-paristech.fr, ceph-users@lists.ceph.com Envoyé: Mercredi 23 Juillet 2014 19:39:52 Objet: Re: [ceph-users] MON segfaulting when setting a crush ruleset to a pool (firefly 0.80.4) Hey Olivier, On 07/23/2014 02:06 PM, Olivier DELHOMME wrote: Hello, I'm running a test cluster (mon and osd are debian 7 with 3.2.57-3+deb7u2 kernel). The client is a debian 7 with a 3.15.4 kernel that I compiled myself. The cluster has 3 monitors and 16 osd servers. I created a pool (periph) and used it a bit and then I decided to create some buckets and moved the hosts into : Can you share your crush map? Cheers! -Joao $ ceph osd crush add-bucket rack1-pdu1 rack $ ceph osd crush add-bucket rack1-pdu2 rack $ ceph osd crush add-bucket rack1-pdu3 rack $ ceph osd crush add-bucket rack2-pdu1 rack $ ceph osd crush add-bucket rack2-pdu2 rack $ ceph osd crush add-bucket rack2-pdu3 rack $ ceph osd crush move ceph-osd0 rack=rack1-pdu1 $ ceph osd crush move ceph-osd1 rack=rack1-pdu1 $ ceph osd crush move ceph-osd2 rack=rack1-pdu1 $ ceph osd crush move ceph-osd3 rack=rack1-pdu2 $ ceph osd crush move ceph-osd4 rack=rack1-pdu2 $ ceph osd crush move ceph-osd5 rack=rack1-pdu2 $ ceph osd crush move ceph-osd6 rack=rack1-pdu3 $ ceph osd crush move ceph-osd7 rack=rack1-pdu3 $ ceph osd crush move ceph-osd8 rack=rack1-pdu3 $ ceph osd crush move ceph-osd9 rack=rack2-pdu1 $ ceph osd crush move ceph-osd10 rack=rack2-pdu1 $ ceph osd crush move ceph-osd11 rack=rack2-pdu1 $ ceph osd crush move ceph-osd12 rack=rack2-pdu2 $ ceph osd crush move ceph-osd13 rack=rack2-pdu2 $ ceph osd crush move ceph-osd14 rack=rack2-pdu2 $ ceph osd crush move ceph-osd15 rack=rack2-pdu3 It did well : $ ceph osd tree # idweight type name up/down reweight -23 0.91rack rack2-pdu3 -17 0.91host ceph-osd15 15 0.91osd.15 up 1 -22 1.81rack rack2-pdu2 -14 0.45host ceph-osd12 12 0.45osd.12 up 1 -15 0.45host ceph-osd13 13 0.45osd.13 up 1 -16 0.91host ceph-osd14 14 0.91osd.14 up 1 -21 1.35rack rack2-pdu1 -11 0.45host ceph-osd9 9 0.45osd.9 up 1 -12 0.45host ceph-osd10 10 0.45osd.10 up 1 -13 0.45host ceph-osd11 11 0.45osd.11 up 1 -20 1.35rack rack1-pdu3 -8 0.45host ceph-osd6 6 0.45osd.6 up 1 -9 0.45host ceph-osd7 7 0.45osd.7 up 1 -10 0.45host ceph-osd8 8 0.45osd.8 up 1 -19 1.35rack rack1-pdu2 -5 0.45host ceph-osd3 3 0.45osd.3 up 1 -6 0.45host ceph-osd4 4 0.45osd.4 up 1 -7 0.45host ceph-osd5 5 0.45osd.5 up 1 -18 1.35rack rack1-pdu1 -2 0.45host ceph-osd0 0 0.45osd.0 up 1 -3 0.45host ceph-osd1 1 0.45osd.1 up 1 -4 0.45host ceph-osd2
Re: [ceph-users] slow read speeds from kernel rbd (Firefly 0.80.4)
Hi Steve, I'm also looking for improvements of single-thread-reads. A little bit higher values (twice?) should be possible with your config. I have 5 nodes with 60 4-TB hdds and got following: rados -p test bench -b 4194304 60 seq -t 1 --no-cleanup Total time run:60.066934 Total reads made: 863 Read size:4194304 Bandwidth (MB/sec):57.469 Average Latency: 0.0695964 Max latency: 0.434677 Min latency: 0.016444 In my case I had some osds (xfs) with an high fragmentation (20%). Changing the mount options and defragmentation help slightly. Performance changes: [client] rbd cache = true rbd cache writethrough until flush = true [osd] osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M osd_op_threads = 4 osd_disk_threads = 4 But I expect much more speed for an single thread... Udo On 23.07.2014 22:13, Steve Anthony wrote: Ah, ok. That makes sense. With one concurrent operation I see numbers more in line with the read speeds I'm seeing from the filesystems on the rbd images. # rados -p bench bench 300 seq --no-cleanup -t 1 Total time run:300.114589 Total reads made: 2795 Read size:4194304 Bandwidth (MB/sec):37.252 Average Latency: 0.10737 Max latency: 0.968115 Min latency: 0.039754 # rados -p bench bench 300 rand --no-cleanup -t 1 Total time run:300.164208 Total reads made: 2996 Read size:4194304 Bandwidth (MB/sec):39.925 Average Latency: 0.100183 Max latency: 1.04772 Min latency: 0.039584 I really wish I could find my data on read speeds from a couple weeks ago. It's possible that they've always been in this range, but I remember one of my test users saturating his 1GbE link over NFS reading copying from the rbd client to his workstation. Of course, it's also possible that the data set he was using was cached in RAM when he was testing, masking the lower rbd speeds. It just seems counterintuitive to me that read speeds would be so much slower that writes at the filesystem layer in practice. With images in the 10-100TB range, reading data at 20-60MB/s isn't going to be pleasant. Can you suggest any tunables or other approaches to investigate to improve these speeds, or are they in line with what you'd expect? Thanks for your help! -Steve ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow read speeds from kernel rbd (Firefly 0.80.4)
Hi again, forget to say - I'm still on 0.72.2! Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow read speeds from kernel rbd (Firefly 0.80.4)
What is your kernel version ? On kernel = 3.11 sysctl -w net.ipv4.tcp_window_scaling=0 seems to improve the situation a lot. It also helped a lot to mitigate processes going (and sticking) in 'D' state. Le 24/07/2014 22:08, Udo Lembke a écrit : Hi again, forget to say - I'm still on 0.72.2! Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jean-Tiare, shared-hosting team ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Berlin MeetUp 28.7.
Hi, the next Ceph MeetUp in Berlin, Germany, happens on July 28. http://www.meetup.com/Ceph-Berlin/events/195107422/ Regards -- Robert Sander Heinlein Support GmbH Linux: Akademie - Support - Hosting http://www.heinlein-support.de Tel: 030-405051-43 Fax: 030-405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] use sas drives for journal?
Hello. In this set up: PowerEdge R720 Raid: Perc H710 eight-port, 6Gb/s OSD drives: qty 4: Seagate Constellation ES.3 ST2000NM0023 2TB 7200 RPM 128MB Cache SAS 6Gb/s Would it make sense to uses these good sas drives in raid-1 for journal? Western Digital XE WD3001BKHG 300GB 1 RPM 32MB Cache SAS 6Gb/s 2.5 *Or would it make sense to * *b: put journal on OSD* *c: get 2 ssd's * *I'm trying to find a good use for the WD 300GB drives..We are using some for opsys raid-1 .. However we've got a few more to use up..* *best regards, Rob Fantini* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow read speeds from kernel rbd (Firefly 0.80.4)
Thanks for the information! Based on my reading of http://ceph.com/docs/next/rbd/rbd-config-ref I was under the impression that rbd cache options wouldn't apply, since presumably the kernel is handling the caching. I'll have to toggle some of those values and see it they make a difference in my setup. I did some additional testing today. If I limit the write benchmark to 1 concurrent operation I see a lower bandwidth number, as I expected. However, when writing to the XFS filesystem on an rbd image I see transfer rates closer to to 400MB/s. # rados -p bench bench 300 write --no-cleanup -t 1 Total time run: 300.105945 Total writes made: 1992 Write size: 4194304 Bandwidth (MB/sec): 26.551 Stddev Bandwidth: 5.69114 Max bandwidth (MB/sec): 40 Min bandwidth (MB/sec): 0 Average Latency:0.15065 Stddev Latency: 0.0732024 Max latency:0.617945 Min latency:0.097339 # time cp -a /mnt/local/climate /mnt/ceph_test1 real2m11.083s user0m0.440s sys1m11.632s # du -h --max-deph=1 /mnt/local 53G/mnt/local/climate This seems to imply that the there is more than one concurrent operation when writing into the filesystem on top of the rbd image. However, given that the filesystem read speeds and the rados benchmark read speeds are much closer in reported bandwidth, it's as if reads are occurring as a single operation. # time cp -a /mnt/ceph_test2/isos /mnt/local/ real36m2.129s user0m1.572s sys3m23.404s # du -h --max-deph=1 /mnt/ceph_test2/ 68G/mnt/ceph_test2/isos Is this apparent single-thread read and multi-thread write with the rbd kernel module the expected mode of operation? If so, could someone explain the reason for this limitation? Based on the information on data striping in http://ceph.com/docs/next/architecture/#data-striping I would assume that a format 1 image would stripe a file larger than the 4MB object size over multiple objects and that those objects would be distributed over multiple OSDs. This would seem to indicate that reading a file back would be much faster since even though Ceph is only reading the primary replica, the read is still distributed over multiple OSDs. At worst I would expect something near the read bandwidth of a single OSD, which would still be much higher than 30-40MB/s. -Steve On 07/24/2014 04:07 PM, Udo Lembke wrote: Hi Steve, I'm also looking for improvements of single-thread-reads. A little bit higher values (twice?) should be possible with your config. I have 5 nodes with 60 4-TB hdds and got following: rados -p test bench -b 4194304 60 seq -t 1 --no-cleanup Total time run:60.066934 Total reads made: 863 Read size:4194304 Bandwidth (MB/sec):57.469 Average Latency: 0.0695964 Max latency: 0.434677 Min latency: 0.016444 In my case I had some osds (xfs) with an high fragmentation (20%). Changing the mount options and defragmentation help slightly. Performance changes: [client] rbd cache = true rbd cache writethrough until flush = true [osd] osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M osd_op_threads = 4 osd_disk_threads = 4 But I expect much more speed for an single thread... Udo On 23.07.2014 22:13, Steve Anthony wrote: Ah, ok. That makes sense. With one concurrent operation I see numbers more in line with the read speeds I'm seeing from the filesystems on the rbd images. # rados -p bench bench 300 seq --no-cleanup -t 1 Total time run:300.114589 Total reads made: 2795 Read size:4194304 Bandwidth (MB/sec):37.252 Average Latency: 0.10737 Max latency: 0.968115 Min latency: 0.039754 # rados -p bench bench 300 rand --no-cleanup -t 1 Total time run:300.164208 Total reads made: 2996 Read size:4194304 Bandwidth (MB/sec):39.925 Average Latency: 0.100183 Max latency: 1.04772 Min latency: 0.039584 I really wish I could find my data on read speeds from a couple weeks ago. It's possible that they've always been in this range, but I remember one of my test users saturating his 1GbE link over NFS reading copying from the rbd client to his workstation. Of course, it's also possible that the data set he was using was cached in RAM when he was testing, masking the lower rbd speeds. It just seems counterintuitive to me that read speeds would be so much slower that writes at the filesystem layer in practice. With images in the 10-100TB range, reading data at 20-60MB/s isn't going to be pleasant. Can you suggest any tunables or other approaches to