Re: [ceph-users] Managing OSDs on twin machines
Hi Pierre — You can manipulate your CRUSH map to make use of ‘chassis’ in addition to the default ‘host’ type. I’ve done this with FatTwin and FatTwin^2 boxes with great success. For more reading take a look at: http://ceph.com/docs/master/rados/operations/crush-map/ In particular the ‘Move a Bucket’ section: http://ceph.com/docs/master/rados/operations/crush-map/#move-a-bucket ./JRH On Aug 18, 2014, at 2:57 PM, Pierre Jaury pie...@jaury.eu wrote: Hello guys, I just acquired some brand new machines I would like to rely upon for a storage cluster (and some virtualization). These machines are, however, « twin servers », ie. each blade (1U) comes with two different machines but a single psu. I think two replicas would be enough for the intended purpose. Yet I cannot guarantee that all replicas of a given object are stored on two different blades. I basically have N blades, each blade has 2 distinct machines but a single psu, each machine has 2 hard drives. Is it possible to configure mutual exclusion between OSDs where replicas of a single object are stored? Regards -- Pierre Jaury @ kaiyou http://kaiyou.fr/contact.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mon: leveldb checksum mismatch
Hi list — I’ve got a small dev. cluster: 3 OSD nodes with 6 disks/OSDs each and a single monitor (this, it seems, was my mistake). The monitor node went down hard and it looks like the monitor’s db is in a funny state. Running ‘ceph-mon’ manually with ‘debug_mon 20’ and ‘debug_ms 20’ gave the following: /usr/bin/ceph-mon -i monhost --mon-data /var/lib/ceph/mon/ceph-monhost --debug_mon 20 --debug_ms 20 -d 2014-07-03 23:20:55.800512 7f973918e7c0 0 ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73), process ceph-mon, pid 24930 Corruption: checksum mismatch Corruption: checksum mismatch 2014-07-03 23:20:56.455797 7f973918e7c0 -1 failed to create new leveldb store I attempted to make use of the leveldb Python library’s ‘RepairDB’ function, which just moves enough files into ‘lost’ that when running the monitor again I’m asked if I ran mkcephfs. Any insight into resolving these two checksum mismatches so I can access my OSD data would be greatly appreciated. Thanks, ./JRH p.s. I’m assuming that without the maps from the monitor, my OSD data is unrecoverable also. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mon: leveldb checksum mismatch
Hi Joao, On Jul 3, 2014, at 7:57 PM, Joao Eduardo Luis joao.l...@inktank.com wrote: We don't have a way to repair leveldb. Having multiple monitors usually help with such tricky situations. I know this, but for this small dev cluster I wasn’t thinking about corruption of my mon’s backing store. Silly me :) According to this [1] the python bindings you're using may not be linked into snappy, which we were using (mistakenly until recently) to compress data as it goes into leveldb. Not having those snappy bindings may be what's causing all those files to be moved to lost instead. I found the same posting, and confirmed that the ‘levedb.so’ that ships with the ‘python-leveldb’ package on Ubuntu 13.10 links against ‘snappy’. The suggestion that the thread in [1] offers is to have the repair functionality directly in the 'application' itself. We could do this by adding a repair option to ceph-kvstore-tool -- which could help. I'll be happy to get that into ceph-kvstore-tool tomorrow and push a branch for you to compile and test. I would be more than happy to try this out. Without fixing these checksums, I think I’m reinitializing my cluster. :\ Thank you, ./JRH___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] REST API and uWSGI?
On Jun 16, 2014, at 8:52 PM, Wido den Hollander w...@42on.com wrote: Op 16 jun. 2014 om 19:23 heeft Jason Harley jhar...@redmind.ca het volgende geschreven: Howdy — I’d like to run the ceph REST API behind nginx, and uWSGI and UNIX sockets seems like a smart way to do this. Has anyone attempted to get this setup working? I’ve tried writing a uWSGI wrapper as well as just telling ‘uwsgi’ to call the ‘ceph_rest_api’ module without luck. Not using uwsgi, but via mod_wsgi. I have a Gist online: https://gist.github.com/wido/8bf032e5f482bfef949c Hope that helps! This was tremendously helpful. Thank you, Wido! It seems my biggest issue was attempting to use ‘uwsgi’ with the master process manager: the process would just hang trying to speak to the admin socket for some reason. With your wsgi file from the Gist above I was able to get the REST API working with upstart, uwsgi and nginx. I’ve shared my config’s here: https://gist.github.com/jharley/e23034135966bb7b438e ./JRH___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] REST API and uWSGI?
Howdy — I’d like to run the ceph REST API behind nginx, and uWSGI and UNIX sockets seems like a smart way to do this. Has anyone attempted to get this setup working? I’ve tried writing a uWSGI wrapper as well as just telling ‘uwsgi’ to call the ‘ceph_rest_api’ module without luck. ./JRH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK
Just wanted to close this open loop: I gave up attempting to recover pool 4 as it was just test data, and the PGs with unfound objects were localized to that pool. After I destroyed and recreated the pool this were fine. Thank you for your help, Florian. ./JRH On Jun 3, 2014, at 6:30 PM, Jason Harley jhar...@redmind.ca wrote: On Jun 3, 2014, at 5:58 PM, Smart Weblications GmbH - Florian Wiessner f.wiess...@smart-weblications.de wrote: I think it would be less painfull if you had removed and the immediatelly recreate the corrupted osd again to avoid 'holes' in the osd ids. It should work with your configuration anyhow, though. I agree with you… I learned about ‘lost’ after removing OSDs :\ You should check other pg with ceph pg query and look out for recovery_state: [ { name: Started\/Primary\/Active, enter_time: 2014-06-03 18:27:58.473736, might_have_unfound: [ { osd: 2, status: already probed}, { osd: 3, status: already probed}, { osd: 12, status: osd is down}, { osd: 14, status: osd is down}, { osd: 19, status: osd is down}, { osd: 23, status: querying}, { osd: 26, status: already probed}], And restart the osd that has status querying. Thank you, I will go through the other pgs and try this approach. What do you get if you do ceph pg query 4.ff3 now? # ceph pg query 4.ff3 { state: active+clean, epoch: 1650, up: [ 23, 4], acting: [ 23, 4], info: { pgid: 4.ff3, last_update: 337'1080, last_complete: 337'1080, log_tail: 0'0, last_backfill: MAX, purged_snaps: [1~9], history: { epoch_created: 3, last_epoch_started: 1646, last_epoch_clean: 1646, last_epoch_split: 0, same_up_since: 1645, same_interval_since: 1645, same_primary_since: 1645, last_scrub: 337'1080, last_scrub_stamp: 2014-06-03 16:19:28.591026, last_deep_scrub: 337'32, last_deep_scrub_stamp: 2014-05-29 20:28:58.517432, last_clean_scrub_stamp: 2014-06-03 16:19:28.591026}, stats: { version: 337'1080, reported_seq: 1102, reported_epoch: 1650, state: active+clean, last_fresh: 2014-06-03 21:13:31.949714, last_change: 2014-06-03 20:56:41.466837, last_active: 2014-06-03 21:13:31.949714, last_clean: 2014-06-03 21:13:31.949714, last_became_active: 0.00, last_unstale: 2014-06-03 21:13:31.949714, mapping_epoch: 1643, log_start: 0'0, ondisk_log_start: 0'0, created: 3, last_epoch_clean: 1646, parent: 0.0, parent_split_bits: 0, last_scrub: 337'1080, last_scrub_stamp: 2014-06-03 16:19:28.591026, last_deep_scrub: 337'32, last_deep_scrub_stamp: 2014-05-29 20:28:58.517432, last_clean_scrub_stamp: 2014-06-03 16:19:28.591026, log_size: 1080, ondisk_log_size: 1080, stats_invalid: 0, stat_sum: { num_bytes: 25165824, num_objects: 3, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_unfound: 0, num_read: 3205, num_read_kb: 12615, num_write: 1086, num_write_kb: 88685, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 9, num_bytes_recovered: 75497472, num_keys_recovered: 0}, stat_cat_sum: {}, up: [ 23, 4], acting: [ 23, 4]}, empty: 0, dne: 0, incomplete: 0, last_epoch_started: 1646}, recovery_state: [ { name: Started\/Primary\/Active, enter_time: 2014-06-03 20:56:41.232146, might_have_unfound: [], recovery_progress: { backfill_target: -1, waiting_on_backfill: 0, backfill_pos: 0\/\/0\/\/-1, backfill_info: { begin: 0\/\/0\/\/-1, end: 0\/\/0\/\/-1, objects: []}, peer_backfill_info: { begin: 0\/\/0\/\/-1, end: 0\/\/0\/\/-1, objects: []}, backfills_in_flight: [], pull_from_peer: [], pushing: []}, scrub: { scrubber.epoch_start: 0, scrubber.active: 0
[ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK
Howdy — I’ve had a failure on a small, Dumpling (0.67.4) cluster running on Ubuntu 13.10 machines. I had three OSD nodes (running 6 OSDs each), and lost two of them in a beautiful failure. One of these nodes even went so far as to scramble the XFS filesystems of my OSD disks (I’m curious if it has some bad DIMMs). Anyway, the thing is: I’m okay with losing the data, this was a test setup and I want to take this opportunity to learn from the recovery process. I’m now stuck in ‘HEALTH_ERR’ and want to get back to ‘HEALTH_OK’ without just reinitializing the cluster. My OSD map seems correct, I’ve done scrubs (deep, and normal) at the PG and OSD levels. ‘ceph -s’ shows that I have 47 unfound objects still after I told ceph to ‘mark_unfound_lost’. The remaining 47 PGs tell me that they haven't probed all sources, not marking lost”. Two days have passed at this point, and I’d just like to get my cluster back to working and deal with the object loss (which seems located to a single pool). How do I move forward from here, if at all? Do I ‘force_create_pg’ the PGs containing my unfound objects? # ceph health detail | grep unfound | grep ^pg pg 4.ffe is active+recovering, acting [7,26], 3 unfound pg 4.feb is active+recovering, acting [10,23], 1 unfound pg 4.fa6 is active+recovery_wait, acting [11,25], 2 unfound pg 4.f61 is active+recovering, acting [9,26], 1 unfound pg 4.f2d is active+recovering, acting [8,22], 1 unfound pg 4.ef5 is active+recovering, acting [6,22], 1 unfound pg 4.e9c is active+recovering, acting [7,24], 1 unfound pg 4.e12 is active+recovering, acting [7,22], 1 unfound pg 4.e0e is active+recovering, acting [9,24], 1 unfound pg 4.ddc is active+recovering, acting [10,26], 1 unfound pg 4.d95 is active+recovering, acting [10,25], 1 unfound pg 4.ccf is active+recovering, acting [10,24], 1 unfound pg 4.c84 is active+recovering, acting [6,22], 2 unfound pg 4.c4e is active+recovering, acting [10,23], 1 unfound pg 4.bca is active+recovering, acting [6,26], 1 unfound pg 4.bbf is active+recovering, acting [8,26], 1 unfound pg 4.b5e is active+recovering, acting [6,26], 1 unfound pg 4.ae1 is active+recovering, acting [8,26], 1 unfound pg 4.a9c is active+recovering, acting [7,23], 1 unfound pg 4.a39 is active+recovering, acting [10,24], 1 unfound pg 4.85f is active+recovering, acting [9,25], 1 unfound pg 4.83b is active+recovering, acting [10,25], 1 unfound pg 4.7b8 is active+recovering, acting [7,26], 1 unfound pg 4.758 is active+recovering, acting [8,23], 1 unfound pg 4.740 is active+recovery_wait, acting [11,21], 1 unfound pg 4.6f6 is active+recovering, acting [10,26], 2 unfound pg 4.68d is active+recovering, acting [8,24], 2 unfound pg 4.635 is active+recovery_wait, acting [11,22], 1 unfound pg 4.60f is active+recovering, acting [6,22], 1 unfound pg 4.603 is active+recovering, acting [9,24], 1 unfound pg 4.5e1 is active+recovering, acting [10,25], 1 unfound pg 4.579 is active+recovering, acting [9,21], 1 unfound pg 4.56a is active+recovering, acting [7,24], 1 unfound pg 4.519 is active+recovering, acting [9,21], 1 unfound pg 4.435 is active+recovering, acting [6,26], 1 unfound pg 4.42e is active+recovering, acting [7,22], 1 unfound pg 4.30b is active+recovering, acting [7,25], 2 unfound pg 4.1bb is active+recovery_wait, acting [11,22], 1 unfound pg 4.178 is active+recovering, acting [8,23], 1 unfound pg 4.43 is active+recovering, acting [9,23], 1 unfound Any help would be greatly appreciated. ./JRH ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK
0.91osd.11 up 1 -4 5.46host r-F9CBF5C8C5 21 0.91osd.21 up 1 22 0.91osd.22 up 1 23 0.91osd.23 up 1 24 0.91osd.24 up 1 25 0.91osd.25 up 1 26 0.91osd.26 up 1 ./JRH On Jun 3, 2014, at 4:00 PM, Smart Weblications GmbH - Florian Wiessner f.wiess...@smart-weblications.de wrote: Hi, Am 03.06.2014 21:46, schrieb Jason Harley: Howdy — I’ve had a failure on a small, Dumpling (0.67.4) cluster running on Ubuntu 13.10 machines. I had three OSD nodes (running 6 OSDs each), and lost two of them in a beautiful failure. One of these nodes even went so far as to scramble the XFS filesystems of my OSD disks (I’m curious if it has some bad DIMMs). Anyway, the thing is: I’m okay with losing the data, this was a test setup and I want to take this opportunity to learn from the recovery process. I’m now stuck in ‘HEALTH_ERR’ and want to get back to ‘HEALTH_OK’ without just reinitializing the cluster. My OSD map seems correct, I’ve done scrubs (deep, and normal) at the PG and OSD levels. ‘ceph -s’ shows that I have 47 unfound objects still after I told ceph to ‘mark_unfound_lost’. The remaining 47 PGs tell me that they haven't probed all sources, not marking lost”. Two days have passed at this point, and I’d just like to get my cluster back to working and deal with the object loss (which seems located to a single pool). How do I move forward from here, if at all? Do I ‘force_create_pg’ the PGs containing my unfound objects? # ceph health detail | grep unfound | grep ^pg pg 4.ffe is active+recovering, acting [7,26], 3 unfound ... pg 4.43 is active+recovering, acting [9,23], 1 unfound what is the output of: ceph pg query 4.ffe -- Mit freundlichen Grüßen, Florian Wiessner Smart Weblications GmbH Martinsberger Str. 1 D-95119 Naila fon.: +49 9282 9638 200 fax.: +49 9282 9638 205 24/7: +49 900 144 000 00 - 0,99 EUR/Min* http://www.smart-weblications.de -- Sitz der Gesellschaft: Naila Geschäftsführer: Florian Wiessner HRB-Nr.: HRB 3840 Amtsgericht Hof *aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG Recovery: HEALTH_ERR to HEALTH_OK
On Jun 3, 2014, at 5:58 PM, Smart Weblications GmbH - Florian Wiessner f.wiess...@smart-weblications.de wrote: I think it would be less painfull if you had removed and the immediatelly recreate the corrupted osd again to avoid 'holes' in the osd ids. It should work with your configuration anyhow, though. I agree with you… I learned about ‘lost’ after removing OSDs :\ You should check other pg with ceph pg query and look out for recovery_state: [ { name: Started\/Primary\/Active, enter_time: 2014-06-03 18:27:58.473736, might_have_unfound: [ { osd: 2, status: already probed}, { osd: 3, status: already probed}, { osd: 12, status: osd is down}, { osd: 14, status: osd is down}, { osd: 19, status: osd is down}, { osd: 23, status: querying}, { osd: 26, status: already probed}], And restart the osd that has status querying. Thank you, I will go through the other pgs and try this approach. What do you get if you do ceph pg query 4.ff3 now? # ceph pg query 4.ff3 { state: active+clean, epoch: 1650, up: [ 23, 4], acting: [ 23, 4], info: { pgid: 4.ff3, last_update: 337'1080, last_complete: 337'1080, log_tail: 0'0, last_backfill: MAX, purged_snaps: [1~9], history: { epoch_created: 3, last_epoch_started: 1646, last_epoch_clean: 1646, last_epoch_split: 0, same_up_since: 1645, same_interval_since: 1645, same_primary_since: 1645, last_scrub: 337'1080, last_scrub_stamp: 2014-06-03 16:19:28.591026, last_deep_scrub: 337'32, last_deep_scrub_stamp: 2014-05-29 20:28:58.517432, last_clean_scrub_stamp: 2014-06-03 16:19:28.591026}, stats: { version: 337'1080, reported_seq: 1102, reported_epoch: 1650, state: active+clean, last_fresh: 2014-06-03 21:13:31.949714, last_change: 2014-06-03 20:56:41.466837, last_active: 2014-06-03 21:13:31.949714, last_clean: 2014-06-03 21:13:31.949714, last_became_active: 0.00, last_unstale: 2014-06-03 21:13:31.949714, mapping_epoch: 1643, log_start: 0'0, ondisk_log_start: 0'0, created: 3, last_epoch_clean: 1646, parent: 0.0, parent_split_bits: 0, last_scrub: 337'1080, last_scrub_stamp: 2014-06-03 16:19:28.591026, last_deep_scrub: 337'32, last_deep_scrub_stamp: 2014-05-29 20:28:58.517432, last_clean_scrub_stamp: 2014-06-03 16:19:28.591026, log_size: 1080, ondisk_log_size: 1080, stats_invalid: 0, stat_sum: { num_bytes: 25165824, num_objects: 3, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_unfound: 0, num_read: 3205, num_read_kb: 12615, num_write: 1086, num_write_kb: 88685, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 9, num_bytes_recovered: 75497472, num_keys_recovered: 0}, stat_cat_sum: {}, up: [ 23, 4], acting: [ 23, 4]}, empty: 0, dne: 0, incomplete: 0, last_epoch_started: 1646}, recovery_state: [ { name: Started\/Primary\/Active, enter_time: 2014-06-03 20:56:41.232146, might_have_unfound: [], recovery_progress: { backfill_target: -1, waiting_on_backfill: 0, backfill_pos: 0\/\/0\/\/-1, backfill_info: { begin: 0\/\/0\/\/-1, end: 0\/\/0\/\/-1, objects: []}, peer_backfill_info: { begin: 0\/\/0\/\/-1, end: 0\/\/0\/\/-1, objects: []}, backfills_in_flight: [], pull_from_peer: [], pushing: []}, scrub: { scrubber.epoch_start: 0, scrubber.active: 0, scrubber.block_writes: 0, scrubber.finalizing: 0, scrubber.waiting_on: 0, scrubber.waiting_on_whom: []}}, { name: Started, enter_time: 2014-06-03 20:56:40.300108}]} Thank you for your help so far. I will respond with progress tomorrow. ./JRH