[ceph-users] All SSD storage and journals
Hello, as others have reported in the past and now having tested things here myself, there really is no point in having journals for SSD backed OSDs on other SSDs. It is a zero sum game, because: a) using that journal SSD as another OSD with integrated journal will yield the same overall result performance wise, if all SSDs are the same. And In addition its capacity will be made available for actual storage. b) if the journal SSD is faster than the OSD SSDs it tends to be priced accordingly. For example the DC P3700 400GB is about twice as fast (write) and expensive as the DC S3700 400GB. Things _may_ be different if one doesn't look at bandwidth but IOPS (though certainly not in the near future in regard to Ceph actually getting SSDs busy), but even there the difference is negligible when for example comparing the Intel S and P models in write performance. Reads are another thing, but nobody cares about those in journals. ^o^ Obvious things that come to mind in this context would be the ability to disable journals (difficult, I know, not touching BTRFS, thank you) and probably K/V store in the future. Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] can we deploy multi-rgw on one ceph cluster?
hi,yehuda 1. can we deploy multi-rgws on one ceph cluster? if so does it bring us any problems? 2. what is the major difference between apache and civetweb? what is civetweb's advantage? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] get/put files with radosgw once MDS crash
dear cepher, Today, I use mds to put/get files from ceph storgate cluster as it is very easy to use for each side of a company. But ceph mds is not very stable, So my question: is it possbile to get the file name and contentes from OSD with radosgw once MDS crash and how ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Continuous OSD crash with kv backend (firefly)
Hi, during recovery testing on a latest firefly with leveldb backend we found that the OSDs on a selected host may crash at once, leaving attached backtrace. In other ways, recovery goes more or less smoothly for hours. Timestamps shows how the issue is correlated between different processes on same node: core.ceph-osd.25426.node01.1414148261 core.ceph-osd.25734.node01.1414148263 core.ceph-osd.25566.node01.1414148345 The question is about kv backend state in Firefly - is it considered stable enough to run production test against it or we should better move to giant/master for this? Thanks! GNU gdb (GDB) 7.4.1-debian Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/bin/ceph-osd...Reading symbols from /usr/lib/debug/usr/bin/ceph-osd...done. done. [New LWP 10182] [New LWP 10183] [New LWP 10699] [New LWP 10184] [New LWP 10703] [New LWP 10704] [New LWP 10702] [New LWP 10708] [New LWP 10707] [New LWP 10710] [New LWP 10700] [New LWP 10717] [New LWP 10765] [New LWP 10705] [New LWP 10706] [New LWP 10701] [New LWP 10712] [New LWP 10735] [New LWP 10713] [New LWP 10750] [New LWP 10718] [New LWP 10711] [New LWP 10716] [New LWP 10715] [New LWP 10785] [New LWP 10766] [New LWP 10796] [New LWP 10720] [New LWP 10725] [New LWP 10736] [New LWP 10709] [New LWP 10730] [New LWP 11541] [New LWP 10770] [New LWP 11573] [New LWP 10778] [New LWP 10804] [New LWP 11561] [New LWP 9388] [New LWP 9398] [New LWP 11538] [New LWP 10790] [New LWP 11586] [New LWP 10798] [New LWP 9910] [New LWP 10726] [New LWP 21823] [New LWP 10815] [New LWP 9397] [New LWP 11248] [New LWP 10723] [New LWP 11253] [New LWP 10728] [New LWP 10791] [New LWP 9389] [New LWP 10724] [New LWP 10780] [New LWP 11287] [New LWP 11592] [New LWP 10816] [New LWP 10812] [New LWP 10787] [New LWP 20622] [New LWP 21822] [New LWP 10751] [New LWP 10768] [New LWP 10767] [New LWP 11874] [New LWP 10733] [New LWP 10811] [New LWP 11574] [New LWP 11873] [New LWP 10771] [New LWP 11551] [New LWP 10799] [New LWP 10729] [New LWP 18254] [New LWP 10792] [New LWP 10803] [New LWP 9912] [New LWP 11293] [New LWP 20623] [New LWP 14805] [New LWP 10773] [New LWP 11298] [New LWP 11872] [New LWP 10763] [New LWP 10783] [New LWP 10769] [New LWP 11300] [New LWP 10777] [New LWP 10764] [New LWP 10802] [New LWP 10749] [New LWP 14806] [New LWP 10806] [New LWP 10805] [New LWP 18255] [New LWP 10181] [New LWP 11277] [New LWP 9913] [New LWP 10800] [New LWP 10801] [New LWP 11555] [New LWP 11871] [New LWP 10748] [New LWP 9915] [New LWP 10779] [New LWP 11294] [New LWP 9916] [New LWP 10757] [New LWP 10734] [New LWP 10786] [New LWP 10727] [New LWP 19063] [New LWP 11279] [New LWP 9905] [New LWP 9911] [New LWP 10772] [New LWP 10722] [New LWP 9914] [New LWP 10789] [New LWP 11540] [New LWP 9917] [New LWP 11289] [New LWP 10714] [New LWP 10721] [New LWP 10719] [New LWP 10788] [New LWP 10782] [New LWP 10784] [New LWP 10776] [New LWP 10774] [New LWP 10737] [New LWP 19064] [Thread debugging using libthread_db enabled] Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1. Core was generated by `/usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.con'. Program terminated with signal 6, Aborted. #0 0x7ff9ad91eb7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) Thread 135 (Thread 0x7ff99a492700 (LWP 19064)): #0 0x7ff9ad91ad84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00c496da in Wait (mutex=..., this=0x108cd110) at ./common/Cond.h:55 #2 Pipe::writer (this=0x108ccf00) at msg/Pipe.cc:1730 #3 0x00c5485d in Pipe::Writer::entry (this=optimized out) at msg/Pipe.h:61 #4 0x7ff9ad916e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #5 0x7ff9ac4a43dd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x in ?? () Thread 134 (Thread 0x7ff975e1d700 (LWP 10737)): #0 0x7ff9ac498a13 in poll () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00c3e73c in Pipe::tcp_read_wait (this=this@entry=0x4a53180) at msg/Pipe.cc:2282 #2 0x00c3e9d0 in Pipe::tcp_read (this=this@entry=0x4a53180, buf=optimized out, buf@entry=0x7ff975e1cccf \377, len=len@entry=1) at msg/Pipe.cc:2255 #3 0x00c5095f in Pipe::reader (this=0x4a53180) at msg/Pipe.cc:1421 #4 0x00c5497d in Pipe::Reader::entry (this=optimized out) at msg/Pipe.h:49 #5 0x7ff9ad916e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #6 0x7ff9ac4a43dd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x in ?? () Thread 133 (Thread 0x7ff972dda700 (LWP
Re: [ceph-users] Continuous OSD crash with kv backend (firefly)
It's not stable at Firely for kvstore. But for the master branch, it's should be no existing/known bug. On Fri, Oct 24, 2014 at 7:41 PM, Andrey Korolyov and...@xdel.ru wrote: Hi, during recovery testing on a latest firefly with leveldb backend we found that the OSDs on a selected host may crash at once, leaving attached backtrace. In other ways, recovery goes more or less smoothly for hours. Timestamps shows how the issue is correlated between different processes on same node: core.ceph-osd.25426.node01.1414148261 core.ceph-osd.25734.node01.1414148263 core.ceph-osd.25566.node01.1414148345 The question is about kv backend state in Firefly - is it considered stable enough to run production test against it or we should better move to giant/master for this? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Wheat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds isn't working anymore after osd's running full
Hello Greg and John, I used the patch on the ceph cluster and tried it again: /usr/bin/ceph-mds -i th1-mon001 -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001 undump journaldumptgho-mon001 start 9483323613 len 134213311 writing header 200. writing 9483323613~1048576 writing 9484372189~1048576 writing 9614395613~1048576 writing 9615444189~1048576 writing 9616492765~1044159 done. It went well without errors and after that I restarted the mds. The status went from up:replay to up:reconnect to up:rejoin(lagged or crashed) In the log there is an error about trim_to trimming_pos and its like Greg mentioned that maybe the dumpfile needs to be truncated to the proper length and resetting and undumping again. How can I truncate the dumped file to the correct length? The mds log during the undumping and starting the mds: http://pastebin.com/y14pSvM0 Kind Regards, Jasper Van: john.sp...@inktank.com [john.sp...@inktank.com] namens John Spray [john.sp...@redhat.com] Verzonden: donderdag 16 oktober 2014 12:23 Aan: Jasper Siero CC: Gregory Farnum; ceph-users Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full Following up: firefly fix for undump is: https://github.com/ceph/ceph/pull/2734 Jasper: if you still need to try undumping on this existing firefly cluster, then you can download ceph-mds packages from this wip-firefly-undump branch from http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/ Cheers, John On Wed, Oct 15, 2014 at 8:15 PM, John Spray john.sp...@redhat.com wrote: Sadly undump has been broken for quite some time (it was fixed in giant as part of creating cephfs-journal-tool). If there's a one line fix for this then it's probably worth putting in firefly since it's a long term supported branch -- I'll do that now. John On Wed, Oct 15, 2014 at 8:23 AM, Jasper Siero jasper.si...@target-holding.nl wrote: Hello Greg, The dump and reset of the journal was succesful: [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --dump-journal 0 journaldumptgho-mon001 journal is 9483323613~134215459 read 134213311 bytes at offset 9483323613 wrote 134213311 bytes at offset 9483323613 to journaldumptgho-mon001 NOTE: this is a _sparse_ file; you can $ tar cSzf journaldumptgho-mon001.tgz journaldumptgho-mon001 to efficiently compress it while preserving sparseness. [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --reset-journal 0 old journal was 9483323613~134215459 new journal start will be 9621733376 (4194304 bytes past old end) writing journal head writing EResetJournal entry done Undumping the journal was not successful and looking into the error client_lock.is_locked() is showed several times. The mds is not running when I start the undumping so maybe have forgot something? [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph --undump-journal 0 journaldumptgho-mon001 undump journaldumptgho-mon001 start 9483323613 len 134213311 writing header 200. osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked()) ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 1: /usr/bin/ceph-mds() [0x80f15e] 2: (Dumper::undump(char const*)+0x65d) [0x56c7ad] 3: (main()+0x1632) [0x569c62] 4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d] 5: /usr/bin/ceph-mds() [0x567d99] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked()) ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 1: /usr/bin/ceph-mds() [0x80f15e] 2: (Dumper::undump(char const*)+0x65d) [0x56c7ad] 3: (main()+0x1632) [0x569c62] 4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d] 5: /usr/bin/ceph-mds() [0x567d99] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. 0 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287 osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked()) ceph version 0.80.5 (38b73c67d375a2552d8ed67843c [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --p8a65c2c0feba6) 1: /usr/bin/ceph-mds() [0x80f15e] 2: (Dumper::undump(char const*)+0x65d) [0x56c7ad] 3:
[ceph-users] Object Storage Statistics
Hi list, We're using the object storage in production and billing people based on their usage, much like S3. We're also trying to produce things like hourly bandwidth graphs for our clients. We're having some issues with the API not returning the correct statistics. I can see that there is a --sync-stats option for the command line radosgw-admin, but there doesn't appear to be anything similar for the admin REST API. Is there an equivalent feature for the API that hasn't been documented by chance? Thanks Dane ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] librados crash in nova-compute
Hey folks, I am trying to enable OpenStack to use RBD as image backend: https://bugs.launchpad.net/nova/+bug/1226351 For some reason, nova-compute segfaults due to librados crash: ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7f1b477fe700 time 2014-10-24 03:20:17.382769 ./log/SubsystemMap.h: 62: FAILED assert(sub m_subsys.size()) ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 1: (()+0x42785) [0x7f1b4c4db785] 2: (ObjectCacher::flusher_entry()+0xfda) [0x7f1b4c53759a] 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f1b4c54a16d] 4: (()+0x6b50) [0x7f1b6ea93b50] 5: (clone()+0x6d) [0x7f1b6df3e0ed] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted I feel that there is some concurrency issue, since this sometimes happen before and sometimes after this line: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/rbd_utils.py#L208 Any idea what are the potential causes of the crash? Thanks. -Simon ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Lost monitors in a multi mon cluster
Hello, I was running a multi mon (3) Ceph cluster and in a migration move, I reinstall 2 of the 3 monitors nodes without deleting them properly into the cluster. So, there is only one monitor left which is stuck in probing phase and the cluster is down. As I can only connect to mon socket, I don't how if it's possible to add a monitor, get and edit monmap. This cluster is running Ceph version 0.67.1. Is there a way to force my last monitor into a leader state or re build a lost monitor to pass the probe and election phases ? Thank you, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extremely slow small files rewrite performance
Any update? On Tue, Oct 21, 2014 at 3:32 PM, Sergey Nazarov nataraj...@gmail.com wrote: Ouch, I think client log is missing. Here it goes: https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0 On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov nataraj...@gmail.com wrote: I enabled logging and performed same tests. Here is the link on archive with logs, they are only from one node (from the node where active MDS was sitting): https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0 Rados bench results: # rados bench -p test 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_atl-fs11_4630 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 164630 119.967 120 0.201327 0.348463 2 168872 143.969 168 0.132983 0.353677 3 16 124 108 143.972 144 0.930837 0.383018 4 16 155 139 138.976 124 0.899468 0.426396 5 16 203 187 149.575 192 0.236534 0.400806 6 16 243 227 151.309 160 0.835213 0.397673 7 16 276 260 148.549 132 0.905989 0.406849 8 16 306 290 144.978 120 0.353279 0.422106 9 16 335 319 141.757 116 1.12114 0.428268 10 16 376 360143.98 164 0.418921 0.43351 11 16 377 361 131.254 4 0.499769 0.433693 Total time run: 11.206306 Total writes made: 377 Write size: 4194304 Bandwidth (MB/sec): 134.567 Stddev Bandwidth: 60.0232 Max bandwidth (MB/sec): 192 Min bandwidth (MB/sec): 0 Average Latency:0.474923 Stddev Latency: 0.376038 Max latency:1.82171 Min latency:0.060877 # rados bench -p test 10 seq sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 166145 179.957 180 0.010405 0.25243 2 16 10993 185.962 192 0.908263 0.284303 3 16 151 135 179.965 168 0.255312 0.297283 4 16 191 175174.97 160 0.836727 0.330659 5 16 236 220 175.971 180 0.009995 0.330832 6 16 275 259 172.639 156 1.06855 0.345418 7 16 311 295 168.545 144 0.907648 0.361689 8 16 351 335 167.474 160 0.947688 0.363552 9 16 390 374 166.196 156 0.140539 0.369057 Total time run:9.755367 Total reads made: 401 Read size:4194304 Bandwidth (MB/sec):164.422 Average Latency: 0.387705 Max latency: 1.33852 Min latency: 0.008064 # rados bench -p test 10 rand sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 165539 155.938 156 0.773716 0.257267 2 169377 153.957 152 0.006573 0.339199 3 16 135 119 158.629 168 0.009851 0.359675 4 16 171 155 154.967 144 0.892027 0.359015 5 16 209 193 154.369 152 1.13945 0.378618 6 16 256 240159.97 188 0.009965 0.368439 7 16 295 279 159.4 156 0.195812 0.371259 8 16 343 327 163.472 192 0.880587 0.370759 9 16 380 364161.75 148 0.113111 0.377983 10 16 424 408 163.173 176 0.772274 0.379497 Total time run:10.518482 Total reads made: 425 Read size:4194304 Bandwidth (MB/sec):161.620 Average Latency: 0.393978 Max latency: 1.36572 Min latency: 0.006448 On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum g...@inktank.com wrote: Can you enable debugging on the client (debug ms = 1, debug client = 20) and mds (debug ms = 1, debug mds = 20), run this test again, and post them somewhere for me to look at? While you're at it, can you try rados bench and see what sort of results you get? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov nataraj...@gmail.com wrote: It is CephFS mounted via ceph-fuse. I am getting the same results not depending on how many other clients are having this fs mounted and their activity. Cluster is working on Debian Wheezy, kernel
Re: [ceph-users] RGW Federated Gateways and Apache 2.4 problems
On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis cle...@centraldesktop.com wrote: I'm having a problem getting RadosGW replication to work after upgrading to Apache 2.4 on my primary test cluster. Upgrading the secondary cluster to Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and Ubuntu's packages cause the same problem. I'm pretty sure I'm missing something obvious, but I'm not seeing it. Has anybody else upgraded their federated gateways to apache 2.4? My setup 2 VMs, each running their own ceph cluster with replication=1 test0-ceph.cdlocal is the primary zone, named us-west test1-ceph.cdlocal is the secondary zone, named us-central Before I start, replication works, and I'm running Ubuntu 14.04 LTS Emperor (0.72.2-1precise, retained using apt-hold) Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold) As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets permission errors. radosgw-agent.log: 2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync object bucket3/test6.jpg: state is error The access logs from the primary say (using vhost_combined log format): test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] PUT /test6.jpg HTTP/1.1 200 209 - -- - - [23/Oct/2014:13:24:18 -0700] GET /?delimiter=/ HTTP/1.1 200 1254 - - bucket3.test0-ceph.cdlocal snip test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] GET /admin/log?marker=089.89.3type=bucket-indexbucket-instance=bucket3%3Aus-west.5697.2max-entries=1000 HTTP/1.1 200 398 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] GET /bucket3/test6.jpg?rgwx-uid=us-centralrgwx-region=usrgwx-prepend-metadata=us HTTP/1.1 403 249 - - 172.16.205.143 is the primary cluster, .144 is the secondary cluster, and .1 is my workstation. The access logs on the secondary show: test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] GET /admin/replica_log?boundstype=bucket-indexbucket-instance=bucket3%3Aus-west.5697.2 HTTP/1.1 200 643 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] PUT /bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3rgwx-source-zone=us-westrgwx-client-id=radosgw-agent HTTP/1.1 403 286 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] GET /admin/opstate?client-id=radosgw-agentobject=bucket3%2Ftest6.jpgop-id=test1-ceph0.cdlocal%3A6484%3A3 HTTP/1.1 200 355 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic If I crank up radosgw debugging, it tells me that the calculated digest is correct for the /admin/* requests, but fails for the object GET: /admin/log 2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated digest=6Tt13P6naWJEc0mJmYyDj6NzBS8= 2014-10-23 15:44:29.257690 7fa6fcfb9700 15 auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8= /bucket3/test6.jpg 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U= 2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0 2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request snip /bucket3/test6.jpg 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U= 2014-10-23 15:44:29.411573 7fa6fc7b8700 15 auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0= 2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41 2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request That explains the 403 responses. So I have metadata replication working, but the data replication is failing with permission problems. I verified that I can create users and buckets in the primary, and have them replicate to the secondary. A similar situation was posted to the list before. That time, the problem was that the system users weren't correctly deployed to both the primary and secondary clusters. I verified that both users exist in both clusters, with the same access and secret. Just to test, I used s3cmd. I can read and write to both clusters using both system user's credentials. Anybody have any ideas? You're hitting issue #9206. Apache 2.4 filters out certain http headers because they use underscores instead of dashes. There's a fix for that for firefly, although it hasn't made it to an officially released version. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Lost monitors in a multi mon cluster
Hi, October 24 2014 5:28 PM, HURTEVENT VINCENT vincent.hurtev...@univ-lyon1.fr wrote: Hello, I was running a multi mon (3) Ceph cluster and in a migration move, I reinstall 2 of the 3 monitors nodes without deleting them properly into the cluster. So, there is only one monitor left which is stuck in probing phase and the cluster is down. As I can only connect to mon socket, I don't how if it's possible to add a monitor, get and edit monmap. This cluster is running Ceph version 0.67.1. Is there a way to force my last monitor into a leader state or re build a lost monitor to pass the probe and election phases ? Did you already try to remake one of the lost monitors? Assuming your ceph.conf has the addresses of the mons, and the keyrings are in place, maybe this will work: ceph-mon --mkfs -i previous name then start the process? I've never been in this situation before, so I don't know if it will work. Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Lost monitors in a multi mon cluster
Bonjour, Maybe http://ceph.com/docs/giant/rados/troubleshooting/troubleshooting-mon/ can help ? Joao wrote that a few month ago and it covers a number of scenarios. Cheers On 24/10/2014 08:27, HURTEVENT VINCENT wrote: Hello, I was running a multi mon (3) Ceph cluster and in a migration move, I reinstall 2 of the 3 monitors nodes without deleting them properly into the cluster. So, there is only one monitor left which is stuck in probing phase and the cluster is down. As I can only connect to mon socket, I don't how if it's possible to add a monitor, get and edit monmap. This cluster is running Ceph version 0.67.1. Is there a way to force my last monitor into a leader state or re build a lost monitor to pass the probe and election phases ? Thank you, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Extremely slow small files rewrite performance
On Fri, Oct 24, 2014 at 8:47 AM, Sergey Nazarov nataraj...@gmail.com wrote: Any update? The short answer is that when the command is executed for second time, the MDS needs to truncate the file zero length. The speed of truncate a file is limited by the OSD speed. (creating file and write data to the file are async operations, but truncating a file is sync operation) Regards Yan, Zheng On Tue, Oct 21, 2014 at 3:32 PM, Sergey Nazarov nataraj...@gmail.com wrote: Ouch, I think client log is missing. Here it goes: https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0 On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov nataraj...@gmail.com wrote: I enabled logging and performed same tests. Here is the link on archive with logs, they are only from one node (from the node where active MDS was sitting): https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0 Rados bench results: # rados bench -p test 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_atl-fs11_4630 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 164630 119.967 120 0.201327 0.348463 2 168872 143.969 168 0.132983 0.353677 3 16 124 108 143.972 144 0.930837 0.383018 4 16 155 139 138.976 124 0.899468 0.426396 5 16 203 187 149.575 192 0.236534 0.400806 6 16 243 227 151.309 160 0.835213 0.397673 7 16 276 260 148.549 132 0.905989 0.406849 8 16 306 290 144.978 120 0.353279 0.422106 9 16 335 319 141.757 116 1.12114 0.428268 10 16 376 360143.98 164 0.418921 0.43351 11 16 377 361 131.254 4 0.499769 0.433693 Total time run: 11.206306 Total writes made: 377 Write size: 4194304 Bandwidth (MB/sec): 134.567 Stddev Bandwidth: 60.0232 Max bandwidth (MB/sec): 192 Min bandwidth (MB/sec): 0 Average Latency:0.474923 Stddev Latency: 0.376038 Max latency:1.82171 Min latency:0.060877 # rados bench -p test 10 seq sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 166145 179.957 180 0.010405 0.25243 2 16 10993 185.962 192 0.908263 0.284303 3 16 151 135 179.965 168 0.255312 0.297283 4 16 191 175174.97 160 0.836727 0.330659 5 16 236 220 175.971 180 0.009995 0.330832 6 16 275 259 172.639 156 1.06855 0.345418 7 16 311 295 168.545 144 0.907648 0.361689 8 16 351 335 167.474 160 0.947688 0.363552 9 16 390 374 166.196 156 0.140539 0.369057 Total time run:9.755367 Total reads made: 401 Read size:4194304 Bandwidth (MB/sec):164.422 Average Latency: 0.387705 Max latency: 1.33852 Min latency: 0.008064 # rados bench -p test 10 rand sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 165539 155.938 156 0.773716 0.257267 2 169377 153.957 152 0.006573 0.339199 3 16 135 119 158.629 168 0.009851 0.359675 4 16 171 155 154.967 144 0.892027 0.359015 5 16 209 193 154.369 152 1.13945 0.378618 6 16 256 240159.97 188 0.009965 0.368439 7 16 295 279 159.4 156 0.195812 0.371259 8 16 343 327 163.472 192 0.880587 0.370759 9 16 380 364161.75 148 0.113111 0.377983 10 16 424 408 163.173 176 0.772274 0.379497 Total time run:10.518482 Total reads made: 425 Read size:4194304 Bandwidth (MB/sec):161.620 Average Latency: 0.393978 Max latency: 1.36572 Min latency: 0.006448 On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum g...@inktank.com wrote: Can you enable debugging on the client (debug ms = 1, debug client = 20) and mds (debug ms = 1, debug mds = 20), run this test again, and post them somewhere for me to look at? While you're at it, can you try rados bench and see
Re: [ceph-users] RGW Federated Gateways and Apache 2.4 problems
Thanks! I'll continue with Apache 2.2 until the next release. On Fri, Oct 24, 2014 at 8:58 AM, Yehuda Sadeh yeh...@redhat.com wrote: On Thu, Oct 23, 2014 at 3:51 PM, Craig Lewis cle...@centraldesktop.com wrote: I'm having a problem getting RadosGW replication to work after upgrading to Apache 2.4 on my primary test cluster. Upgrading the secondary cluster to Apache 2.4 doesn't cause any problems. Both Ceph's apache packages and Ubuntu's packages cause the same problem. I'm pretty sure I'm missing something obvious, but I'm not seeing it. Has anybody else upgraded their federated gateways to apache 2.4? My setup 2 VMs, each running their own ceph cluster with replication=1 test0-ceph.cdlocal is the primary zone, named us-west test1-ceph.cdlocal is the secondary zone, named us-central Before I start, replication works, and I'm running Ubuntu 14.04 LTS Emperor (0.72.2-1precise, retained using apt-hold) Apache 2.2 (2.2.22-2precise.ceph, retained using apt-hold) As soon as I upgrade Apache to 2.4 in the primary cluster, replication gets permission errors. radosgw-agent.log: 2014-10-23T15:13:43.022 31106:ERROR:radosgw_agent.worker:failed to sync object bucket3/test6.jpg: state is error The access logs from the primary say (using vhost_combined log format): test0-ceph.cdlocal:80 172.16.205.1 - - [23/Oct/2014:15:16:51 -0700] PUT /test6.jpg HTTP/1.1 200 209 - -- - - [23/Oct/2014:13:24:18 -0700] GET /?delimiter=/ HTTP/1.1 200 1254 - - bucket3.test0-ceph.cdlocal snip test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] GET /admin/log?marker=089.89.3type=bucket-indexbucket-instance=bucket3%3Aus-west.5697.2max-entries=1000 HTTP/1.1 200 398 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic test0-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:17:34 -0700] GET /bucket3/test6.jpg?rgwx-uid=us-centralrgwx-region=usrgwx-prepend-metadata=us HTTP/1.1 403 249 - - 172.16.205.143 is the primary cluster, .144 is the secondary cluster, and .1 is my workstation. The access logs on the secondary show: test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] GET /admin/replica_log?boundstype=bucket-indexbucket-instance=bucket3%3Aus-west.5697.2 HTTP/1.1 200 643 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] PUT /bucket3/test6.jpg?rgwx-op-id=test1-ceph0.cdlocal%3A6484%3A3rgwx-source-zone=us-westrgwx-client-id=radosgw-agent HTTP/1.1 403 286 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic test1-ceph.cdlocal:80 172.16.205.144 - - [23/Oct/2014:15:18:07 -0700] GET /admin/opstate?client-id=radosgw-agentobject=bucket3%2Ftest6.jpgop-id=test1-ceph0.cdlocal%3A6484%3A3 HTTP/1.1 200 355 - Boto/2.20.1 Python/2.7.6 Linux/3.13.0-37-generic If I crank up radosgw debugging, it tells me that the calculated digest is correct for the /admin/* requests, but fails for the object GET: /admin/log 2014-10-23 15:44:29.257688 7fa6fcfb9700 15 calculated digest=6Tt13P6naWJEc0mJmYyDj6NzBS8= 2014-10-23 15:44:29.257690 7fa6fcfb9700 15 auth_sign=6Tt13P6naWJEc0mJmYyDj6NzBS8= /bucket3/test6.jpg 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U= 2014-10-23 15:44:29.257691 7fa6fcfb9700 15 compare=0 2014-10-23 15:44:29.257693 7fa6fcfb9700 20 system request snip /bucket3/test6.jpg 2014-10-23 15:44:29.411572 7fa6fc7b8700 15 calculated digest=pYWIOwRxCh4/bZ/D7b9RnS7RT1U= 2014-10-23 15:44:29.411573 7fa6fc7b8700 15 auth_sign=Gv398QNc6gLig9/0QbdO+1UZUq0= 2014-10-23 15:44:29.411574 7fa6fc7b8700 15 compare=-41 2014-10-23 15:44:29.411577 7fa6fc7b8700 10 failed to authorize request That explains the 403 responses. So I have metadata replication working, but the data replication is failing with permission problems. I verified that I can create users and buckets in the primary, and have them replicate to the secondary. A similar situation was posted to the list before. That time, the problem was that the system users weren't correctly deployed to both the primary and secondary clusters. I verified that both users exist in both clusters, with the same access and secret. Just to test, I used s3cmd. I can read and write to both clusters using both system user's credentials. Anybody have any ideas? You're hitting issue #9206. Apache 2.4 filters out certain http headers because they use underscores instead of dashes. There's a fix for that for firefly, although it hasn't made it to an officially released version. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fio rbd stalls during 4M reads
There's an issue in master branch temporarily that makes rbd reads greater than the cache size hang (if the cache was on). This might be that. (Jason is working on it: http://tracker.ceph.com/issues/9854) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Oct 23, 2014 at 5:09 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: I'm doing some fio tests on Giant using fio rbd driver to measure performance on a new ceph cluster. However with block sizes 1M (initially noticed with 4M) I am seeing absolutely no IOPS for *reads* - and the fio process becomes non interrupteable (needs kill -9): $ ceph -v ceph version 0.86-467-g317b83d (317b831a917f70838870b31931a79bdd4dd0) $ fio --version fio-2.1.11-20-g9a44 $ fio read-busted.fio env-read-4M: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=32 fio-2.1.11-20-g9a44 Starting 1 process rbd engine: RBD version: 0.1.8 Jobs: 1 (f=1): [R(1)] [inf% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 1158050441d:06h:58m:03s] This appears to be a pure fio rbd driver issue, as I can attach the relevant rbd volume to a vm and dd from it using 4M blocks no problem. Any ideas? Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fio rbd stalls during 4M reads
FWIW the specific fio read problem appears to have started after 0.86 and before commit 42bcabf. Mark On 10/24/2014 12:56 PM, Gregory Farnum wrote: There's an issue in master branch temporarily that makes rbd reads greater than the cache size hang (if the cache was on). This might be that. (Jason is working on it: http://tracker.ceph.com/issues/9854) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Oct 23, 2014 at 5:09 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: I'm doing some fio tests on Giant using fio rbd driver to measure performance on a new ceph cluster. However with block sizes 1M (initially noticed with 4M) I am seeing absolutely no IOPS for *reads* - and the fio process becomes non interrupteable (needs kill -9): $ ceph -v ceph version 0.86-467-g317b83d (317b831a917f70838870b31931a79bdd4dd0) $ fio --version fio-2.1.11-20-g9a44 $ fio read-busted.fio env-read-4M: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=32 fio-2.1.11-20-g9a44 Starting 1 process rbd engine: RBD version: 0.1.8 Jobs: 1 (f=1): [R(1)] [inf% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 1158050441d:06h:58m:03s] This appears to be a pure fio rbd driver issue, as I can attach the relevant rbd volume to a vm and dd from it using 4M blocks no problem. Any ideas? Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados crash in nova-compute
On 10/24/2014 08:21 AM, Xu (Simon) Chen wrote: Hey folks, I am trying to enable OpenStack to use RBD as image backend: https://bugs.launchpad.net/nova/+bug/1226351 For some reason, nova-compute segfaults due to librados crash: ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7f1b477fe700 time 2014-10-24 03:20:17.382769 ./log/SubsystemMap.h: 62: FAILED assert(sub m_subsys.size()) ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 1: (()+0x42785) [0x7f1b4c4db785] 2: (ObjectCacher::flusher_entry()+0xfda) [0x7f1b4c53759a] 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f1b4c54a16d] 4: (()+0x6b50) [0x7f1b6ea93b50] 5: (clone()+0x6d) [0x7f1b6df3e0ed] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted I feel that there is some concurrency issue, since this sometimes happen before and sometimes after this line: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/rbd_utils.py#L208 Any idea what are the potential causes of the crash? Thanks. -Simon This is http://tracker.ceph.com/issues/8912, fixed in the latest firefly and dumpling releases. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Object Storage Statistics
On Fri, Oct 24, 2014 at 8:17 AM, Dane Elwell dane.elw...@gmail.com wrote: Hi list, We're using the object storage in production and billing people based on their usage, much like S3. We're also trying to produce things like hourly bandwidth graphs for our clients. We're having some issues with the API not returning the correct statistics. I can see that there is a --sync-stats option for the command line radosgw-admin, but there doesn't appear to be anything similar for the admin REST API. Is there an equivalent feature for the API that hasn't been documented by chance? There are two different statistics that are collected, one is the 'usage' information that collects data about actual operations that clients do in a period of time. This information can be accessed through the admin api. The other one is the user stats info that is part of the user quota system, which at the moment is not hooked into a REST interface. Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph and hadoop
Hi, Given HDFS is far from ideal for small files, I am examining the possibility of using Hadoop on top Ceph. I found mainly one online resource about it https://ceph.com/docs/v0.79/cephfs/hadoop/. I am wondering whether there is any reference implementation or blog post you are aware of, about hadoop on top Ceph. Likewise happy to have any pointers about why _not_ to attempt just that Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to recover Incomplete PGs from lost time symptom?
I have a number of PGs which are marked as incomplete. I'm at a loss for how to go about recovering these PGs and believe they're suffering from the lost time symptom. How do I recover these PGs? I'd settle for sacrificing the lost time and just going with what I've got. I've lost the ability to mount the RBD within this pool and I'm afraid that unless I can resolve this I'll have lost all my data. A query from one of my incomplete PGs: http://pastebin.com/raw.php?i=AJ3RMjz6 My CRUSH map: http://pastebin.com/raw.php?i=gWtJuhsy___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] get/put files with radosgw once MDS crash
No, MDS and RadosGW store their data in different pools. There's no way for them to access the other's data. All of the data is stored in RADOS, and can be accessed via the rados CLI. It's not easy, and you'd probably have to spend a lot of time reading the source code to do it. On Fri, Oct 24, 2014 at 1:49 AM, 廖建锋 de...@f-club.cn wrote: dear cepher, Today, I use mds to put/get files from ceph storgate cluster as it is very easy to use for each side of a company. But ceph mds is not very stable, So my question: is it possbile to get the file name and contentes from OSD with radosgw once MDS crash and how ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can we deploy multi-rgw on one ceph cluster?
You can deploy multiple RadosGW in a single cluster. You'll need to setup zones (see http://ceph.com/docs/master/radosgw/federated-config/). Most people seem to be using zones for geo-replication, but local replication works even better. Multiple zones don't have to be replicated either. For example, you could use multiple zones for tiered services. For example, a service with 4x replication on pure SSDs, and a cheaper service with 2x replication on HDDs. If you do have separate zones in a single cluster, you'll want to configure different OSDs to serve the different zones. You want fault isolation between the zones. The problems this brings are mostly management of the extra complexity. CivetWeb is embedded into the RadosGW daemon, where as Apache talks to RadosGW using FastCGI. Overall, CivetWeb should be simpler to setup and manage, since it doesn't require Apache, it's configuration, or the overhead. I don't know if Civetweb is considered production ready. Giant has a bunch of fixes for Civetweb, so I'm leaning towards not on Firefly unless somebody more knowledgeable tells me otherwise. On Thu, Oct 23, 2014 at 11:04 PM, yuelongguang fasts...@163.com wrote: hi,yehuda 1. can we deploy multi-rgws on one ceph cluster? if so does it bring us any problems? 2. what is the major difference between apache and civetweb? what is civetweb's advantage? thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Were any of the snapshots you're removing up created on older versions of Ceph? If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list. On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fio rbd stalls during 4M reads
Yeah, looks like it. If I disable the rbd ccahe: $ tail /etc/ceph/ceph.conf ... [client] rbd cache = false then the 2-4M reads work fine (no invalid reads in valgrind either). I'll let the fio guys know. Cheers Mark On 25/10/14 06:56, Gregory Farnum wrote: There's an issue in master branch temporarily that makes rbd reads greater than the cache size hang (if the cache was on). This might be that. (Jason is working on it: http://tracker.ceph.com/issues/9854) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Oct 23, 2014 at 5:09 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: I'm doing some fio tests on Giant using fio rbd driver to measure performance on a new ceph cluster. However with block sizes 1M (initially noticed with 4M) I am seeing absolutely no IOPS for *reads* - and the fio process becomes non interrupteable (needs kill -9): $ ceph -v ceph version 0.86-467-g317b83d (317b831a917f70838870b31931a79bdd4dd0) $ fio --version fio-2.1.11-20-g9a44 $ fio read-busted.fio env-read-4M: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=32 fio-2.1.11-20-g9a44 Starting 1 process rbd engine: RBD version: 0.1.8 Jobs: 1 (f=1): [R(1)] [inf% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 1158050441d:06h:58m:03s] This appears to be a pure fio rbd driver issue, as I can attach the relevant rbd volume to a vm and dd from it using 4M blocks no problem. Any ideas? Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados crash in nova-compute
Thanks. I found the commit on git and confirms 0.80.7 fixes the issue. On Friday, October 24, 2014, Josh Durgin josh.dur...@inktank.com wrote: On 10/24/2014 08:21 AM, Xu (Simon) Chen wrote: Hey folks, I am trying to enable OpenStack to use RBD as image backend: https://bugs.launchpad.net/nova/+bug/1226351 For some reason, nova-compute segfaults due to librados crash: ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7f1b477fe700 time 2014-10-24 03:20:17.382769 ./log/SubsystemMap.h: 62: FAILED assert(sub m_subsys.size()) ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 1: (()+0x42785) [0x7f1b4c4db785] 2: (ObjectCacher::flusher_entry()+0xfda) [0x7f1b4c53759a] 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f1b4c54a16d] 4: (()+0x6b50) [0x7f1b6ea93b50] 5: (clone()+0x6d) [0x7f1b6df3e0ed] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted I feel that there is some concurrency issue, since this sometimes happen before and sometimes after this line: https://github.com/openstack/nova/blob/master/nova/virt/ libvirt/rbd_utils.py#L208 Any idea what are the potential causes of the crash? Thanks. -Simon This is http://tracker.ceph.com/issues/8912, fixed in the latest firefly and dumpling releases. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] librados crash in nova-compute
I am actually curious about one more thing. In the image - rbd case, is rbd_secret_uuid config option really used? I am running nova-compute as a non-root user, so virsh secret shouldn't be accessible unless we get it via rootwrap. I had to make ceph keyring file readable to the nova-compute user for the whole thing to work... On Friday, October 24, 2014, Xu (Simon) Chen xche...@gmail.com wrote: Thanks. I found the commit on git and confirms 0.80.7 fixes the issue. On Friday, October 24, 2014, Josh Durgin josh.dur...@inktank.com javascript:_e(%7B%7D,'cvml','josh.dur...@inktank.com'); wrote: On 10/24/2014 08:21 AM, Xu (Simon) Chen wrote: Hey folks, I am trying to enable OpenStack to use RBD as image backend: https://bugs.launchpad.net/nova/+bug/1226351 For some reason, nova-compute segfaults due to librados crash: ./log/SubsystemMap.h: In function 'bool ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread 7f1b477fe700 time 2014-10-24 03:20:17.382769 ./log/SubsystemMap.h: 62: FAILED assert(sub m_subsys.size()) ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) 1: (()+0x42785) [0x7f1b4c4db785] 2: (ObjectCacher::flusher_entry()+0xfda) [0x7f1b4c53759a] 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f1b4c54a16d] 4: (()+0x6b50) [0x7f1b6ea93b50] 5: (clone()+0x6d) [0x7f1b6df3e0ed] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted I feel that there is some concurrency issue, since this sometimes happen before and sometimes after this line: https://github.com/openstack/nova/blob/master/nova/virt/ libvirt/rbd_utils.py#L208 Any idea what are the potential causes of the crash? Thanks. -Simon This is http://tracker.ceph.com/issues/8912, fixed in the latest firefly and dumpling releases. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] journals relabeled by OS, symlinks broken
Hello, I was having problems with a node in my cluster (Ceph v0.80.7/Debian Wheezy/Kernel 3.12), so I rebooted it and the disks were relabled when it came back up. Now all the symlinks to the journals are broken. The SSDs are now sda, sdb, and sdc but the journals were sdc, sdd, and sde: root@ceph17:~# ls -l /var/lib/ceph/osd/ceph-*/journal lrwxrwxrwx 1 root root 9 Oct 20 16:47 /var/lib/ceph/osd/ceph-150/journal - /dev/sde1 lrwxrwxrwx 1 root root 9 Oct 20 16:53 /var/lib/ceph/osd/ceph-157/journal - /dev/sdd1 lrwxrwxrwx 1 root root 9 Oct 21 08:31 /var/lib/ceph/osd/ceph-164/journal - /dev/sdc1 lrwxrwxrwx 1 root root 9 Oct 21 16:33 /var/lib/ceph/osd/ceph-171/journal - /dev/sde2 lrwxrwxrwx 1 root root 9 Oct 22 10:50 /var/lib/ceph/osd/ceph-178/journal - /dev/sdc2 lrwxrwxrwx 1 root root 9 Oct 22 15:48 /var/lib/ceph/osd/ceph-184/journal - /dev/sdd2 lrwxrwxrwx 1 root root 9 Oct 23 10:46 /var/lib/ceph/osd/ceph-191/journal - /dev/sde3 lrwxrwxrwx 1 root root 9 Oct 23 15:22 /var/lib/ceph/osd/ceph-195/journal - /dev/sdc3 lrwxrwxrwx 1 root root 9 Oct 23 16:59 /var/lib/ceph/osd/ceph-201/journal - /dev/sdd3 lrwxrwxrwx 1 root root 9 Oct 24 21:32 /var/lib/ceph/osd/ceph-214/journal - /dev/sde4 lrwxrwxrwx 1 root root 9 Oct 24 21:33 /var/lib/ceph/osd/ceph-215/journal - /dev/sdd4 Any way to fix this without just removing all the OSDs and re-adding them? I thought about recreating the symlinks to point at the new SSD labels, but I figured I'd check here first. Thanks! -Steve -- Steve Anthony LTS HPC Support Specialist Lehigh University sma...@lehigh.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete: true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco vered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for 54.967456 secs 2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1 02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 05:07 AM, Craig Lewis wrote: It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Were any of the snapshots you're removing up created on older versions of Ceph? If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list. On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn wrote: Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam ___ ceph-users mailing list