Sam, It is for a valid pool, however the up and acting sets for 2.14 both show OSDs 8 & 7. I'll take a look at 7 & 8 and see if they are good.
If so, it seems like it being present on osd.3 could be an artifact from previous topologies and I could mv it off old.3 Thanks very much for the assistance! Berant On Tuesday, May 19, 2015, Samuel Just <sj...@redhat.com> wrote: > If 2.14 is part of a non-existent pool, you should be able to rename it > out of current/ in the osd directory to prevent the osd from seeing it on > startup. > -Sam > > ----- Original Message ----- > From: "Berant Lemmenes" <ber...@lemmenes.com <javascript:;>> > To: "Samuel Just" <sj...@redhat.com <javascript:;>> > Cc: ceph-users@lists.ceph.com <javascript:;> > Sent: Tuesday, May 19, 2015 12:58:30 PM > Subject: Re: [ceph-users] OSD unable to start (giant -> hammer) > > Hello, > > So here are the steps I performed and where I sit now. > > Step 1) Using 'ceph-objectstore-tool list' to create a list of all PGs not > associated with the 3 pools (rbd, data, metadata) that are actually in use > on this cluster. > > Step 2) I then did a 'ceph-objectstore-tool remove' of those PGs > > Then when starting the OSD it would complain about PGs that were NOT in the > list of 'ceph-objectstore-tool list' but WERE present on the filesystem of > the OSD in question. > > Step 3) Iterating over all of the PGs that were on disk and using > 'ceph-objectstore-tool info' I made a list of all PGs that returned ENOENT, > > Step 4) 'ceph-objectstore-tool remove' to remove all those as well. > > Now when starting osd.3 I get an "unable to load metadata' error for a PG > that according to 'ceph pg 2.14 query' is not present (and shouldn't be) on > osd.3. Shown below with OSD debugging at 20: > > <snip> > > -23> 2015-05-19 15:15:12.712036 7fb079a20780 20 read_log 39533'174051 > (39533'174050) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2811937 2015-05-18 07:18:42.859501 > > -22> 2015-05-19 15:15:12.712066 7fb079a20780 20 read_log 39533'174052 > (39533'174051) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2812374 2015-05-18 07:33:21.973157 > > -21> 2015-05-19 15:15:12.712096 7fb079a20780 20 read_log 39533'174053 > (39533'174052) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2812861 2015-05-18 07:48:23.098343 > > -20> 2015-05-19 15:15:12.712127 7fb079a20780 20 read_log 39533'174054 > (39533'174053) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2813371 2015-05-18 08:03:54.226512 > > -19> 2015-05-19 15:15:12.712157 7fb079a20780 20 read_log 39533'174055 > (39533'174054) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2813922 2015-05-18 08:18:20.351421 > > -18> 2015-05-19 15:15:12.712187 7fb079a20780 20 read_log 39533'174056 > (39533'174055) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2814396 2015-05-18 08:33:56.476035 > > -17> 2015-05-19 15:15:12.712221 7fb079a20780 20 read_log 39533'174057 > (39533'174056) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2814971 2015-05-18 08:48:22.605674 > > -16> 2015-05-19 15:15:12.712252 7fb079a20780 20 read_log 39533'174058 > (39533'174057) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2815407 2015-05-18 09:02:48.720181 > > -15> 2015-05-19 15:15:12.712282 7fb079a20780 20 read_log 39533'174059 > (39533'174058) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2815434 2015-05-18 09:03:43.727839 > > -14> 2015-05-19 15:15:12.712312 7fb079a20780 20 read_log 39533'174060 > (39533'174059) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2815889 2015-05-18 09:17:49.846406 > > -13> 2015-05-19 15:15:12.712342 7fb079a20780 20 read_log 39533'174061 > (39533'174060) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2816358 2015-05-18 09:32:50.969457 > > -12> 2015-05-19 15:15:12.712372 7fb079a20780 20 read_log 39533'174062 > (39533'174061) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2816840 2015-05-18 09:47:52.091524 > > -11> 2015-05-19 15:15:12.712403 7fb079a20780 20 read_log 39533'174063 > (39533'174062) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2816861 2015-05-18 09:48:22.096309 > > -10> 2015-05-19 15:15:12.712433 7fb079a20780 20 read_log 39533'174064 > (39533'174063) modify 49277412/rb.0.100f.2ae8944a.000000029945/head//2 by > client.18119.0:2817714 2015-05-18 10:02:53.222749 > > -9> 2015-05-19 15:15:12.713130 7fb079a20780 10 read_log done > > -8> 2015-05-19 15:15:12.713550 7fb079a20780 10 osd.3 pg_epoch: 39533 > pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 > ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 > pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive] handle_loaded > > -7> 2015-05-19 15:15:12.713570 7fb079a20780 5 osd.3 pg_epoch: 39533 > pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 > ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 > pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] exit Initial > 0.097986 0 0.000000 > > -6> 2015-05-19 15:15:12.713587 7fb079a20780 5 osd.3 pg_epoch: 39533 > pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 > ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 > pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] enter Reset > > -5> 2015-05-19 15:15:12.713601 7fb079a20780 20 osd.3 pg_epoch: 39533 > pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 > ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0 > pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] > set_last_peering_reset 39533 > > -4> 2015-05-19 15:15:12.713614 7fb079a20780 10 osd.3 pg_epoch: 39533 > pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 > ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533 > pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] Clearing > blocked outgoing recovery messages > > -3> 2015-05-19 15:15:12.713629 7fb079a20780 10 osd.3 pg_epoch: 39533 > pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101 > ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533 > pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] Not blocking > outgoing recovery messages > > -2> 2015-05-19 15:15:12.713643 7fb079a20780 10 osd.3 39533 load_pgs > loaded pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 > n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533 > pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] > log((37945'171063,39533'174064], crt=39533'174062) > > -1> 2015-05-19 15:15:12.713658 7fb079a20780 10 osd.3 39533 pgid 2.14 > coll 2.14_head > > 0> 2015-05-19 15:15:12.716475 7fb079a20780 -1 osd/PG.cc: In function > 'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t, ceph::bufferlist*)' > thread 7fb079a20780 time 2015-05-19 15:15:12.715425 > > osd/PG.cc: 2860: FAILED assert(0 == "unable to open pg metadata") > > > ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x7f) [0xb1784f] > > 2: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) > [0x793dd8] > > 3: (OSD::load_pgs()+0x147f) [0x683dff] > > 4: (OSD::init()+0x1448) [0x6930b8] > > 5: (main()+0x26b9) [0x62fd89] > > 6: (__libc_start_main()+0xed) [0x7fb07767876d] > > 7: ceph-osd() [0x635679] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > > --- logging levels --- > > 0/ 5 none > > 0/ 1 lockdep > > 0/ 1 context > > 1/ 1 crush > > 1/ 5 mds > > 1/ 5 mds_balancer > > 1/ 5 mds_locker > > 1/ 5 mds_log > > 1/ 5 mds_log_expire > > 1/ 5 mds_migrator > > 0/ 1 buffer > > 0/ 1 timer > > 0/ 1 filer > > 0/ 1 striper > > 0/ 1 objecter > > 0/ 5 rados > > 0/ 5 rbd > > 0/ 5 rbd_replay > > 0/ 5 journaler > > 0/ 5 objectcacher > > 0/ 5 client > > 20/20 osd > > 0/ 5 optracker > > 0/ 5 objclass > > 1/ 3 filestore > > 1/ 3 keyvaluestore > > 1/ 3 journal > > 0/ 5 ms > > 1/ 5 mon > > 0/10 monc > > 1/ 5 paxos > > 0/ 5 tp > > 1/ 5 auth > > 1/ 5 crypto > > 1/ 1 finisher > > 1/ 5 heartbeatmap > > 1/ 5 perfcounter > > 1/ 5 rgw > > 1/10 civetweb > > 1/ 5 javaclient > > 1/ 5 asok > > 1/ 1 throttle > > 0/ 0 refs > > 1/ 5 xio > > -2/-2 (syslog threshold) > > 99/99 (stderr threshold) > > max_recent 10000 > > max_new 1000 > > log_file > > --- end dump of recent events --- > > terminate called after throwing an instance of 'ceph::FailedAssertion' > > *** Caught signal (Aborted) ** > > in thread 7fb079a20780 > > ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > > 1: ceph-osd() [0xa1fe55] > > 2: (()+0xfcb0) [0x7fb078a60cb0] > > 3: (gsignal()+0x35) [0x7fb07768d0d5] > > 4: (abort()+0x17b) [0x7fb07769083b] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d] > > 6: (()+0xb5846) [0x7fb077fdc846] > > 7: (()+0xb5873) [0x7fb077fdc873] > > 8: (()+0xb596e) [0x7fb077fdc96e] > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x259) [0xb17a29] > > 10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) > [0x793dd8] > > 11: (OSD::load_pgs()+0x147f) [0x683dff] > > 12: (OSD::init()+0x1448) [0x6930b8] > > 13: (main()+0x26b9) [0x62fd89] > > 14: (__libc_start_main()+0xed) [0x7fb07767876d] > > 15: ceph-osd() [0x635679] > > 2015-05-19 15:15:12.812704 7fb079a20780 -1 *** Caught signal (Aborted) ** > > in thread 7fb079a20780 > > > ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > > 1: ceph-osd() [0xa1fe55] > > 2: (()+0xfcb0) [0x7fb078a60cb0] > > 3: (gsignal()+0x35) [0x7fb07768d0d5] > > 4: (abort()+0x17b) [0x7fb07769083b] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d] > > 6: (()+0xb5846) [0x7fb077fdc846] > > 7: (()+0xb5873) [0x7fb077fdc873] > > 8: (()+0xb596e) [0x7fb077fdc96e] > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x259) [0xb17a29] > > 10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) > [0x793dd8] > > 11: (OSD::load_pgs()+0x147f) [0x683dff] > > 12: (OSD::init()+0x1448) [0x6930b8] > > 13: (main()+0x26b9) [0x62fd89] > > 14: (__libc_start_main()+0xed) [0x7fb07767876d] > > 15: ceph-osd() [0x635679] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > > --- begin dump of recent events --- > > 0> 2015-05-19 15:15:12.812704 7fb079a20780 -1 *** Caught signal > (Aborted) ** > > in thread 7fb079a20780 > > > ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > > 1: ceph-osd() [0xa1fe55] > > 2: (()+0xfcb0) [0x7fb078a60cb0] > > 3: (gsignal()+0x35) [0x7fb07768d0d5] > > 4: (abort()+0x17b) [0x7fb07769083b] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d] > > 6: (()+0xb5846) [0x7fb077fdc846] > > 7: (()+0xb5873) [0x7fb077fdc873] > > 8: (()+0xb596e) [0x7fb077fdc96e] > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x259) [0xb17a29] > > 10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28) > [0x793dd8] > > 11: (OSD::load_pgs()+0x147f) [0x683dff] > > 12: (OSD::init()+0x1448) [0x6930b8] > > 13: (main()+0x26b9) [0x62fd89] > > 14: (__libc_start_main()+0xed) [0x7fb07767876d] > > 15: ceph-osd() [0x635679] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > > --- logging levels --- > > 0/ 5 none > > 0/ 1 lockdep > > 0/ 1 context > > 1/ 1 crush > > 1/ 5 mds > > 1/ 5 mds_balancer > > 1/ 5 mds_locker > > 1/ 5 mds_log > > 1/ 5 mds_log_expire > > 1/ 5 mds_migrator > > 0/ 1 buffer > > 0/ 1 timer > > 0/ 1 filer > > 0/ 1 striper > > 0/ 1 objecter > > 0/ 5 rados > > 0/ 5 rbd > > 0/ 5 rbd_replay > > 0/ 5 journaler > > 0/ 5 objectcacher > > 0/ 5 client > > 20/20 osd > > 0/ 5 optracker > > 0/ 5 objclass > > 1/ 3 filestore > > 1/ 3 keyvaluestore > > 1/ 3 journal > > 0/ 5 ms > > 1/ 5 mon > > 0/10 monc > > 1/ 5 paxos > > 0/ 5 tp > > 1/ 5 auth > > 1/ 5 crypto > > 1/ 1 finisher > > 1/ 5 heartbeatmap > > 1/ 5 perfcounter > > 1/ 5 rgw > > 1/10 civetweb > > 1/ 5 javaclient > > 1/ 5 asok > > 1/ 1 throttle > > 0/ 0 refs > > 1/ 5 xio > > -2/-2 (syslog threshold) > > 99/99 (stderr threshold) > > max_recent 10000 > > max_new 1000 > > log_file > > --- end dump of recent events --- > > > Here is the PG info for 2.14 > > ceph pg 2.14 query > > { "state": "active+undersized+degraded", > > "snap_trimq": "[]", > > "epoch": 39556, > > "up": [ > > 8, > > 7], > > "acting": [ > > 8, > > 7], > > "actingbackfill": [ > > "7", > > "8"], > > "info": { "pgid": "2.14", > > "last_update": "39533'175859", > > "last_complete": "39533'175859", > > "log_tail": "36964'172858", > > "last_user_version": 175859, > > "last_backfill": "MAX", > > "purged_snaps": "[]", > > "history": { "epoch_created": 1, > > "last_epoch_started": 39536, > > "last_epoch_clean": 39536, > > "last_epoch_split": 0, > > "same_up_since": 39534, > > "same_interval_since": 39534, > > "same_primary_since": 39527, > > "last_scrub": "39533'175859", > > "last_scrub_stamp": "2015-05-18 05:23:02.952523", > > "last_deep_scrub": "39533'175859", > > "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523", > > "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523"}, > > "stats": { "version": "39533'175859", > > "reported_seq": "281883", > > "reported_epoch": "39556", > > "state": "active+undersized+degraded", > > "last_fresh": "2015-05-19 06:41:09.002111", > > "last_change": "2015-05-18 10:19:22.277851", > > "last_active": "2015-05-19 06:41:09.002111", > > "last_clean": "2015-05-18 06:41:38.906417", > > "last_became_active": "2013-05-07 04:23:31.972742", > > "last_unstale": "2015-05-19 06:41:09.002111", > > "last_undegraded": "2015-05-18 10:18:37.449550", > > "last_fullsized": "2015-05-18 10:18:37.449550", > > "mapping_epoch": 39527, > > "log_start": "36964'172858", > > "ondisk_log_start": "36964'172858", > > "created": 1, > > "last_epoch_clean": 39536, > > "parent": "0.0", > > "parent_split_bits": 0, > > "last_scrub": "39533'175859", > > "last_scrub_stamp": "2015-05-18 05:23:02.952523", > > "last_deep_scrub": "39533'175859", > > "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523", > > "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523", > > "log_size": 3001, > > "ondisk_log_size": 3001, > > "stats_invalid": "0", > > "stat_sum": { "num_bytes": 441982976, > > "num_objects": 106, > > "num_object_clones": 0, > > "num_object_copies": 318, > > "num_objects_missing_on_primary": 0, > > "num_objects_degraded": 106, > > "num_objects_misplaced": 0, > > "num_objects_unfound": 0, > > "num_objects_dirty": 11, > > "num_whiteouts": 0, > > "num_read": 61399, > > "num_read_kb": 1285319, > > "num_write": 135192, > > "num_write_kb": 2422029, > > "num_scrub_errors": 0, > > "num_shallow_scrub_errors": 0, > > "num_deep_scrub_errors": 0, > > "num_objects_recovered": 79, > > "num_bytes_recovered": 329883648, > > "num_keys_recovered": 0, > > "num_objects_omap": 0, > > "num_objects_hit_set_archive": 0, > > "num_bytes_hit_set_archive": 0}, > > "stat_cat_sum": {}, > > "up": [ > > 8, > > 7], > > "acting": [ > > 8, > > 7], > > "blocked_by": [], > > "up_primary": 8, > > "acting_primary": 8}, > > "empty": 0, > > "dne": 0, > > "incomplete": 0, > > "last_epoch_started": 39536, > > "hit_set_history": { "current_last_update": "0'0", > > "current_last_stamp": "0.000000", > > "current_info": { "begin": "0.000000", > > "end": "0.000000", > > "version": "0'0"}, > > "history": []}}, > > "peer_info": [ > > { "peer": "7", > > "pgid": "2.14", > > "last_update": "39533'175859", > > "last_complete": "39533'175859", > > "log_tail": "36964'172858", > > "last_user_version": 175859, > > "last_backfill": "MAX", > > "purged_snaps": "[]", > > "history": { "epoch_created": 1, > > "last_epoch_started": 39536, > > "last_epoch_clean": 39536, > > "last_epoch_split": 0, > > "same_up_since": 39534, > > "same_interval_since": 39534, > > "same_primary_since": 39527, > > "last_scrub": "39533'175859", > > "last_scrub_stamp": "2015-05-18 05:23:02.952523", > > "last_deep_scrub": "39533'175859", > > "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523", > > "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523"}, > > "stats": { "version": "39533'175858", > > "reported_seq": "281598", > > "reported_epoch": "39533", > > "state": "active+clean", > > "last_fresh": "2015-05-13 21:58:43.553887", > > "last_change": "2015-05-12 22:50:16.011917", > > "last_active": "2015-05-13 21:58:43.553887", > > "last_clean": "2015-05-13 21:58:43.553887", > > "last_became_active": "2013-05-07 04:23:31.972742", > > "last_unstale": "2015-05-13 21:58:43.553887", > > "last_undegraded": "2015-05-13 21:58:43.553887", > > "last_fullsized": "2015-05-13 21:58:43.553887", > > "mapping_epoch": 39527, > > "log_start": "36964'172857", > > "ondisk_log_start": "36964'172857", > > "created": 1, > > "last_epoch_clean": 39529, > > "parent": "0.0", > > "parent_split_bits": 0, > > "last_scrub": "39533'175857", > > "last_scrub_stamp": "2015-05-12 22:50:16.011867", > > "last_deep_scrub": "39533'175856", > > "last_deep_scrub_stamp": "2015-05-10 10:30:24.933431", > > "last_clean_scrub_stamp": "2015-05-12 22:50:16.011867", > > "log_size": 3001, > > "ondisk_log_size": 3001, > > "stats_invalid": "0", > > "stat_sum": { "num_bytes": 441982976, > > "num_objects": 106, > > "num_object_clones": 0, > > "num_object_copies": 315, > > "num_objects_missing_on_primary": 0, > > "num_objects_degraded": 0, > > "num_objects_misplaced": 0, > > "num_objects_unfound": 0, > > "num_objects_dirty": 11, > > "num_whiteouts": 0, > > "num_read": 61157, > > "num_read_kb": 1281187, > > "num_write": 135192, > > "num_write_kb": 2422029, > > "num_scrub_errors": 0, > > "num_shallow_scrub_errors": 0, > > "num_deep_scrub_errors": 0, > > "num_objects_recovered": 79, > > "num_bytes_recovered": 329883648, > > "num_keys_recovered": 0, > > "num_objects_omap": 0, > > "num_objects_hit_set_archive": 0, > > "num_bytes_hit_set_archive": 0}, > > "stat_cat_sum": {}, > > "up": [ > > 8, > > 7], > > "acting": [ > > 8, > > 7], > > "blocked_by": [], > > "up_primary": 8, > > "acting_primary": 8}, > > "empty": 0, > > "dne": 0, > > "incomplete": 0, > > "last_epoch_started": 39536, > > "hit_set_history": { "current_last_update": "0'0", > > "current_last_stamp": "0.000000", > > "current_info": { "begin": "0.000000", > > "end": "0.000000", > > "version": "0'0"}, > > "history": []}}], > > "recovery_state": [ > > { "name": "Started\/Primary\/Active", > > "enter_time": "2015-05-18 10:18:37.449561", > > "might_have_unfound": [], > > "recovery_progress": { "backfill_targets": [], > > "waiting_on_backfill": [], > > "last_backfill_started": "0\/\/0\/\/-1", > > "backfill_info": { "begin": "0\/\/0\/\/-1", > > "end": "0\/\/0\/\/-1", > > "objects": []}, > > "peer_backfill_info": [], > > "backfills_in_flight": [], > > "recovering": [], > > "pg_backend": { "pull_from_peer": [], > > "pushing": []}}, > > "scrub": { "scrubber.epoch_start": "39527", > > "scrubber.active": 0, > > "scrubber.block_writes": 0, > > "scrubber.waiting_on": 0, > > "scrubber.waiting_on_whom": []}}, > > { "name": "Started", > > "enter_time": "2015-05-18 10:18:05.335040"}], > > "agent_state": {}} > > On Mon, May 18, 2015 at 2:34 PM, Berant Lemmenes <ber...@lemmenes.com > <javascript:;>> > wrote: > > > Sam, > > > > Thanks for taking a look. It does seem to fit my issue. Would just > > removing the 5.0_head directory be appropriate or would using > > ceph-objectstore-tool be better? > > > > Thanks, > > Berant > > > > On Mon, May 18, 2015 at 1:47 PM, Samuel Just <sj...@redhat.com > <javascript:;>> wrote: > > > >> You have most likely hit http://tracker.ceph.com/issues/11429. There > >> are some workarounds in the bugs marked as duplicates of that bug, or > you > >> can wait for the next hammer point release. > >> -Sam > >> > >> ----- Original Message ----- > >> From: "Berant Lemmenes" <ber...@lemmenes.com <javascript:;>> > >> To: ceph-users@lists.ceph.com <javascript:;> > >> Sent: Monday, May 18, 2015 10:24:38 AM > >> Subject: [ceph-users] OSD unable to start (giant -> hammer) > >> > >> Hello all, > >> > >> I've encountered a problem when upgrading my single node home cluster > >> from giant to hammer, and I would greatly appreciate any insight. > >> > >> I upgraded the packages like normal, then proceeded to restart the mon > >> and once that came back restarted the first OSD (osd.3). However it > >> subsequently won't start and crashes with the following failed > assertion: > >> > >> > >> > >> osd/OSD.h: 716: FAILED assert(ret) > >> > >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > >> > >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x7f) [0xb1784f] > >> > >> 2: (OSD::load_pgs()+0x277b) [0x6850fb] > >> > >> 3: (OSD::init()+0x1448) [0x6930b8] > >> > >> 4: (main()+0x26b9) [0x62fd89] > >> > >> 5: (__libc_start_main()+0xed) [0x7f2345bc976d] > >> > >> 6: ceph-osd() [0x635679] > >> > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > >> to interpret this. > >> > >> > >> > >> > >> --- logging levels --- > >> > >> 0/ 5 none > >> > >> 0/ 1 lockdep > >> > >> 0/ 1 context > >> > >> 1/ 1 crush > >> > >> 1/ 5 mds > >> > >> 1/ 5 mds_balancer > >> > >> 1/ 5 mds_locker > >> > >> 1/ 5 mds_log > >> > >> 1/ 5 mds_log_expire > >> > >> 1/ 5 mds_migrator > >> > >> 0/ 1 buffer > >> > >> 0/ 1 timer > >> > >> 0/ 1 filer > >> > >> 0/ 1 striper > >> > >> 0/ 1 objecter > >> > >> 0/ 5 rados > >> > >> 0/ 5 rbd > >> > >> 0/ 5 rbd_replay > >> > >> 0/ 5 journaler > >> > >> 0/ 5 objectcacher > >> > >> 0/ 5 client > >> > >> 0/ 5 osd > >> > >> 0/ 5 optracker > >> > >> 0/ 5 objclass > >> > >> 1/ 3 filestore > >> > >> 1/ 3 keyvaluestore > >> > >> 1/ 3 journal > >> > >> 0/ 5 ms > >> > >> 1/ 5 mon > >> > >> 0/10 monc > >> > >> 1/ 5 paxos > >> > >> 0/ 5 tp > >> > >> 1/ 5 auth > >> > >> 1/ 5 crypto > >> > >> 1/ 1 finisher > >> > >> 1/ 5 heartbeatmap > >> > >> 1/ 5 perfcounter > >> > >> 1/ 5 rgw > >> > >> 1/10 civetweb > >> > >> 1/ 5 javaclient > >> > >> 1/ 5 asok > >> > >> 1/ 1 throttle > >> > >> 0/ 0 refs > >> > >> 1/ 5 xio > >> > >> -2/-2 (syslog threshold) > >> > >> 99/99 (stderr threshold) > >> > >> max_recent 10000 > >> > >> max_new 1000 > >> > >> log_file > >> > >> --- end dump of recent events --- > >> > >> terminate called after throwing an instance of 'ceph::FailedAssertion' > >> > >> *** Caught signal (Aborted) ** > >> > >> in thread 7f2347f71780 > >> > >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > >> > >> 1: ceph-osd() [0xa1fe55] > >> > >> 2: (()+0xfcb0) [0x7f2346fb1cb0] > >> > >> 3: (gsignal()+0x35) [0x7f2345bde0d5] > >> > >> 4: (abort()+0x17b) [0x7f2345be183b] > >> > >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d] > >> > >> 6: (()+0xb5846) [0x7f234652d846] > >> > >> 7: (()+0xb5873) [0x7f234652d873] > >> > >> 8: (()+0xb596e) [0x7f234652d96e] > >> > >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x259) [0xb17a29] > >> > >> 10: (OSD::load_pgs()+0x277b) [0x6850fb] > >> > >> 11: (OSD::init()+0x1448) [0x6930b8] > >> > >> 12: (main()+0x26b9) [0x62fd89] > >> > >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d] > >> > >> 14: ceph-osd() [0x635679] > >> > >> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal (Aborted) > ** > >> > >> in thread 7f2347f71780 > >> > >> > >> > >> > >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > >> > >> 1: ceph-osd() [0xa1fe55] > >> > >> 2: (()+0xfcb0) [0x7f2346fb1cb0] > >> > >> 3: (gsignal()+0x35) [0x7f2345bde0d5] > >> > >> 4: (abort()+0x17b) [0x7f2345be183b] > >> > >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d] > >> > >> 6: (()+0xb5846) [0x7f234652d846] > >> > >> 7: (()+0xb5873) [0x7f234652d873] > >> > >> 8: (()+0xb596e) [0x7f234652d96e] > >> > >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x259) [0xb17a29] > >> > >> 10: (OSD::load_pgs()+0x277b) [0x6850fb] > >> > >> 11: (OSD::init()+0x1448) [0x6930b8] > >> > >> 12: (main()+0x26b9) [0x62fd89] > >> > >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d] > >> > >> 14: ceph-osd() [0x635679] > >> > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > >> to interpret this. > >> > >> > >> > >> > >> --- begin dump of recent events --- > >> > >> 0> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal > (Aborted) > >> ** > >> > >> in thread 7f2347f71780 > >> > >> > >> > >> > >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > >> > >> 1: ceph-osd() [0xa1fe55] > >> > >> 2: (()+0xfcb0) [0x7f2346fb1cb0] > >> > >> 3: (gsignal()+0x35) [0x7f2345bde0d5] > >> > >> 4: (abort()+0x17b) [0x7f2345be183b] > >> > >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d] > >> > >> 6: (()+0xb5846) [0x7f234652d846] > >> > >> 7: (()+0xb5873) [0x7f234652d873] > >> > >> 8: (()+0xb596e) [0x7f234652d96e] > >> > >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > >> const*)+0x259) [0xb17a29] > >> > >> 10: (OSD::load_pgs()+0x277b) [0x6850fb] > >> > >> 11: (OSD::init()+0x1448) [0x6930b8] > >> > >> 12: (main()+0x26b9) [0x62fd89] > >> > >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d] > >> > >> 14: ceph-osd() [0x635679] > >> > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > >> to interpret this. > >> > >> > >> > >> > >> --- logging levels --- > >> > >> 0/ 5 none > >> > >> 0/ 1 lockdep > >> > >> 0/ 1 context > >> > >> 1/ 1 crush > >> > >> 1/ 5 mds > >> > >> 1/ 5 mds_balancer > >> > >> 1/ 5 mds_locker > >> > >> 1/ 5 mds_log > >> > >> 1/ 5 mds_log_expire > >> > >> 1/ 5 mds_migrator > >> > >> 0/ 1 buffer > >> > >> 0/ 1 timer > >> > >> 0/ 1 filer > >> > >> 0/ 1 striper > >> > >> 0/ 1 objecter > >> > >> 0/ 5 rados > >> > >> 0/ 5 rbd > >> > >> 0/ 5 rbd_replay > >> > >> 0/ 5 journaler > >> > >> 0/ 5 objectcacher > >> > >> 0/ 5 client > >> > >> 0/ 5 osd > >> > >> 0/ 5 optracker > >> > >> 0/ 5 objclass > >> > >> 1/ 3 filestore > >> > >> 1/ 3 keyvaluestore > >> > >> 1/ 3 journal > >> > >> 0/ 5 ms > >> > >> 1/ 5 mon > >> > >> 0/10 monc > >> > >> 1/ 5 paxos > >> > >> 0/ 5 tp > >> > >> 1/ 5 auth > >> > >> 1/ 5 crypto > >> > >> 1/ 1 finisher > >> > >> 1/ 5 heartbeatmap > >> > >> 1/ 5 perfcounter > >> > >> 1/ 5 rgw > >> > >> 1/10 civetweb > >> > >> 1/ 5 javaclient > >> > >> 1/ 5 asok > >> > >> 1/ 1 throttle > >> > >> 0/ 0 refs > >> > >> 1/ 5 xio > >> > >> -2/-2 (syslog threshold) > >> > >> 99/99 (stderr threshold) > >> > >> max_recent 10000 > >> > >> max_new 1000 > >> > >> log_file > >> > >> --- end dump of recent events --- > >> > >> > >> I've included a 'ceph osd dump' here: > >> http://pastebin.com/RKbaY7nv > >> > >> ceph osd tree: > >> > >> > >> ceph osd tree > >> > >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > >> > >> -1 24.14000 root default > >> > >> -3 0 rack unknownrack > >> > >> -2 0 host ceph-test > >> > >> -4 24.14000 host ceph01 > >> > >> 0 1.50000 osd.0 down 0 1.00000 > >> > >> 2 1.50000 osd.2 down 0 1.00000 > >> > >> 3 1.50000 osd.3 down 1.00000 1.00000 > >> > >> 5 2.00000 osd.5 up 1.00000 1.00000 > >> > >> 6 2.00000 osd.6 up 1.00000 1.00000 > >> > >> 7 2.00000 osd.7 up 1.00000 1.00000 > >> > >> 8 2.00000 osd.8 up 1.00000 1.00000 > >> > >> 9 2.00000 osd.9 up 1.00000 1.00000 > >> > >> 10 2.00000 osd.10 up 1.00000 1.00000 > >> > >> 4 4.00000 osd.4 up 1.00000 1.00000 > >> > >> 1 3.64000 osd.1 up 1.00000 1.00000 > >> > >> > >> > >> > >> Note that osd.0 and osd.2 were down prior to the upgrade and the cluster > >> was healthy (these are failed disks that have been out for some time > just > >> not removed from CRUSH. > >> > >> I've also included a log with OSD debugging set to 20 here: > >> > >> https://dl.dropboxusercontent.com/u/1043493/osd.3.log.gz > >> > >> > >> Looking through that file, it appears the last pg that it loads > >> successfully is 2.3f6 then it moves to 5.0 > >> > >> -3> 2015-05-18 12:25:24.292091 7f6f407f9780 10 osd.3 39533 load_pgs > >> loaded pg[2.3f6( v 39533'289849 (37945'286848,39533'289849] > local-les=39532 > >> n=99 ec=1 les/c 39532/39532 39531/39531/39523) [5,4,3] r=2 lpr=39533 > >> pi=34961-39530/34 crt=39533'289846 lcod 0'0 inactive NOTIFY] > >> log((37945'286848,39533'289849], crt=39533'289846) > >> > >> -2> 2015-05-18 12:25:24.292100 7f6f407f9780 10 osd.3 39533 pgid 5.0 coll > >> 5.0_head > >> > >> -1> 2015-05-18 12:25:24.570188 7f6f407f9780 20 osd.3 0 get_map 34144 - > >> loading and decoding 0x411fd80 > >> > >> 0> 2015-05-18 12:26:02.758914 7f6f407f9780 -1 osd/OSD.h: In function > >> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f6f407f9780 time > >> 2015-05-18 12:25:24.620468 > >> > >> > >> > >> osd/OSD.h: 716: FAILED assert(ret) > >> > >> [snip] > >> > >> Which I don't see 5.0 in a pg dump. > >> > >> > >> > >> > >> Thanks in advance! > >> > >> Berant > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com <javascript:;> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com