Sam,

It is for a valid pool, however the up and acting sets for 2.14 both show
OSDs 8 & 7. I'll take a look at 7 &  8 and see if they are good.

If so, it seems like it being present on osd.3 could be an artifact from
previous topologies and I could mv it off old.3

Thanks very much for the assistance!

Berant

On Tuesday, May 19, 2015, Samuel Just <sj...@redhat.com> wrote:

> If 2.14 is part of a non-existent pool, you should be able to rename it
> out of current/ in the osd directory to prevent the osd from seeing it on
> startup.
> -Sam
>
> ----- Original Message -----
> From: "Berant Lemmenes" <ber...@lemmenes.com <javascript:;>>
> To: "Samuel Just" <sj...@redhat.com <javascript:;>>
> Cc: ceph-users@lists.ceph.com <javascript:;>
> Sent: Tuesday, May 19, 2015 12:58:30 PM
> Subject: Re: [ceph-users] OSD unable to start (giant -> hammer)
>
> Hello,
>
> So here are the steps I performed and where I sit now.
>
> Step 1) Using 'ceph-objectstore-tool list' to create a list of all PGs not
> associated with the 3 pools (rbd, data, metadata) that are actually in use
> on this cluster.
>
> Step 2) I then did a 'ceph-objectstore-tool remove' of those PGs
>
> Then when starting the OSD it would complain about PGs that were NOT in the
> list of 'ceph-objectstore-tool list' but WERE present on the filesystem of
> the OSD in question.
>
> Step 3) Iterating over all of the PGs that were on disk and using
> 'ceph-objectstore-tool info' I made a list of all PGs that returned ENOENT,
>
> Step 4) 'ceph-objectstore-tool remove' to remove all those as well.
>
> Now when starting osd.3 I get an "unable to load metadata' error for a PG
> that according to 'ceph pg 2.14 query' is not present (and shouldn't be) on
> osd.3. Shown below with OSD debugging at 20:
>
> <snip>
>
>    -23> 2015-05-19 15:15:12.712036 7fb079a20780 20 read_log 39533'174051
> (39533'174050) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2811937 2015-05-18 07:18:42.859501
>
>    -22> 2015-05-19 15:15:12.712066 7fb079a20780 20 read_log 39533'174052
> (39533'174051) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2812374 2015-05-18 07:33:21.973157
>
>    -21> 2015-05-19 15:15:12.712096 7fb079a20780 20 read_log 39533'174053
> (39533'174052) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2812861 2015-05-18 07:48:23.098343
>
>    -20> 2015-05-19 15:15:12.712127 7fb079a20780 20 read_log 39533'174054
> (39533'174053) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2813371 2015-05-18 08:03:54.226512
>
>    -19> 2015-05-19 15:15:12.712157 7fb079a20780 20 read_log 39533'174055
> (39533'174054) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2813922 2015-05-18 08:18:20.351421
>
>    -18> 2015-05-19 15:15:12.712187 7fb079a20780 20 read_log 39533'174056
> (39533'174055) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2814396 2015-05-18 08:33:56.476035
>
>    -17> 2015-05-19 15:15:12.712221 7fb079a20780 20 read_log 39533'174057
> (39533'174056) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2814971 2015-05-18 08:48:22.605674
>
>    -16> 2015-05-19 15:15:12.712252 7fb079a20780 20 read_log 39533'174058
> (39533'174057) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2815407 2015-05-18 09:02:48.720181
>
>    -15> 2015-05-19 15:15:12.712282 7fb079a20780 20 read_log 39533'174059
> (39533'174058) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2815434 2015-05-18 09:03:43.727839
>
>    -14> 2015-05-19 15:15:12.712312 7fb079a20780 20 read_log 39533'174060
> (39533'174059) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2815889 2015-05-18 09:17:49.846406
>
>    -13> 2015-05-19 15:15:12.712342 7fb079a20780 20 read_log 39533'174061
> (39533'174060) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2816358 2015-05-18 09:32:50.969457
>
>    -12> 2015-05-19 15:15:12.712372 7fb079a20780 20 read_log 39533'174062
> (39533'174061) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2816840 2015-05-18 09:47:52.091524
>
>    -11> 2015-05-19 15:15:12.712403 7fb079a20780 20 read_log 39533'174063
> (39533'174062) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2816861 2015-05-18 09:48:22.096309
>
>    -10> 2015-05-19 15:15:12.712433 7fb079a20780 20 read_log 39533'174064
> (39533'174063) modify   49277412/rb.0.100f.2ae8944a.000000029945/head//2 by
> client.18119.0:2817714 2015-05-18 10:02:53.222749
>
>     -9> 2015-05-19 15:15:12.713130 7fb079a20780 10 read_log done
>
>     -8> 2015-05-19 15:15:12.713550 7fb079a20780 10 osd.3 pg_epoch: 39533
> pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101
> ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0
> pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive] handle_loaded
>
>     -7> 2015-05-19 15:15:12.713570 7fb079a20780  5 osd.3 pg_epoch: 39533
> pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101
> ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0
> pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] exit Initial
> 0.097986 0 0.000000
>
>     -6> 2015-05-19 15:15:12.713587 7fb079a20780  5 osd.3 pg_epoch: 39533
> pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101
> ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0
> pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] enter Reset
>
>     -5> 2015-05-19 15:15:12.713601 7fb079a20780 20 osd.3 pg_epoch: 39533
> pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101
> ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=0
> pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY]
> set_last_peering_reset 39533
>
>     -4> 2015-05-19 15:15:12.713614 7fb079a20780 10 osd.3 pg_epoch: 39533
> pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101
> ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533
> pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] Clearing
> blocked outgoing recovery messages
>
>     -3> 2015-05-19 15:15:12.713629 7fb079a20780 10 osd.3 pg_epoch: 39533
> pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529 n=101
> ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533
> pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY] Not blocking
> outgoing recovery messages
>
>     -2> 2015-05-19 15:15:12.713643 7fb079a20780 10 osd.3 39533 load_pgs
> loaded pg[2.12( v 39533'174064 (37945'171063,39533'174064] local-les=39529
> n=101 ec=1 les/c 39529/39529 39526/39526/39526) [9,3,10] r=1 lpr=39533
> pi=37959-39525/7 crt=39533'174062 lcod 0'0 inactive NOTIFY]
> log((37945'171063,39533'174064], crt=39533'174062)
>
>     -1> 2015-05-19 15:15:12.713658 7fb079a20780 10 osd.3 39533 pgid 2.14
> coll 2.14_head
>
>      0> 2015-05-19 15:15:12.716475 7fb079a20780 -1 osd/PG.cc: In function
> 'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t, ceph::bufferlist*)'
> thread 7fb079a20780 time 2015-05-19 15:15:12.715425
>
> osd/PG.cc: 2860: FAILED assert(0 == "unable to open pg metadata")
>
>
>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7f) [0xb1784f]
>
>  2: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28)
> [0x793dd8]
>
>  3: (OSD::load_pgs()+0x147f) [0x683dff]
>
>  4: (OSD::init()+0x1448) [0x6930b8]
>
>  5: (main()+0x26b9) [0x62fd89]
>
>  6: (__libc_start_main()+0xed) [0x7fb07767876d]
>
>  7: ceph-osd() [0x635679]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
> --- logging levels ---
>
>    0/ 5 none
>
>    0/ 1 lockdep
>
>    0/ 1 context
>
>    1/ 1 crush
>
>    1/ 5 mds
>
>    1/ 5 mds_balancer
>
>    1/ 5 mds_locker
>
>    1/ 5 mds_log
>
>    1/ 5 mds_log_expire
>
>    1/ 5 mds_migrator
>
>    0/ 1 buffer
>
>    0/ 1 timer
>
>    0/ 1 filer
>
>    0/ 1 striper
>
>    0/ 1 objecter
>
>    0/ 5 rados
>
>    0/ 5 rbd
>
>    0/ 5 rbd_replay
>
>    0/ 5 journaler
>
>    0/ 5 objectcacher
>
>    0/ 5 client
>
>   20/20 osd
>
>    0/ 5 optracker
>
>    0/ 5 objclass
>
>    1/ 3 filestore
>
>    1/ 3 keyvaluestore
>
>    1/ 3 journal
>
>    0/ 5 ms
>
>    1/ 5 mon
>
>    0/10 monc
>
>    1/ 5 paxos
>
>    0/ 5 tp
>
>    1/ 5 auth
>
>    1/ 5 crypto
>
>    1/ 1 finisher
>
>    1/ 5 heartbeatmap
>
>    1/ 5 perfcounter
>
>    1/ 5 rgw
>
>    1/10 civetweb
>
>    1/ 5 javaclient
>
>    1/ 5 asok
>
>    1/ 1 throttle
>
>    0/ 0 refs
>
>    1/ 5 xio
>
>   -2/-2 (syslog threshold)
>
>   99/99 (stderr threshold)
>
>   max_recent     10000
>
>   max_new         1000
>
>   log_file
>
> --- end dump of recent events ---
>
> terminate called after throwing an instance of 'ceph::FailedAssertion'
>
> *** Caught signal (Aborted) **
>
>  in thread 7fb079a20780
>
>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
>  1: ceph-osd() [0xa1fe55]
>
>  2: (()+0xfcb0) [0x7fb078a60cb0]
>
>  3: (gsignal()+0x35) [0x7fb07768d0d5]
>
>  4: (abort()+0x17b) [0x7fb07769083b]
>
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d]
>
>  6: (()+0xb5846) [0x7fb077fdc846]
>
>  7: (()+0xb5873) [0x7fb077fdc873]
>
>  8: (()+0xb596e) [0x7fb077fdc96e]
>
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x259) [0xb17a29]
>
>  10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28)
> [0x793dd8]
>
>  11: (OSD::load_pgs()+0x147f) [0x683dff]
>
>  12: (OSD::init()+0x1448) [0x6930b8]
>
>  13: (main()+0x26b9) [0x62fd89]
>
>  14: (__libc_start_main()+0xed) [0x7fb07767876d]
>
>  15: ceph-osd() [0x635679]
>
> 2015-05-19 15:15:12.812704 7fb079a20780 -1 *** Caught signal (Aborted) **
>
>  in thread 7fb079a20780
>
>
>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
>  1: ceph-osd() [0xa1fe55]
>
>  2: (()+0xfcb0) [0x7fb078a60cb0]
>
>  3: (gsignal()+0x35) [0x7fb07768d0d5]
>
>  4: (abort()+0x17b) [0x7fb07769083b]
>
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d]
>
>  6: (()+0xb5846) [0x7fb077fdc846]
>
>  7: (()+0xb5873) [0x7fb077fdc873]
>
>  8: (()+0xb596e) [0x7fb077fdc96e]
>
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x259) [0xb17a29]
>
>  10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28)
> [0x793dd8]
>
>  11: (OSD::load_pgs()+0x147f) [0x683dff]
>
>  12: (OSD::init()+0x1448) [0x6930b8]
>
>  13: (main()+0x26b9) [0x62fd89]
>
>  14: (__libc_start_main()+0xed) [0x7fb07767876d]
>
>  15: ceph-osd() [0x635679]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
> --- begin dump of recent events ---
>
>      0> 2015-05-19 15:15:12.812704 7fb079a20780 -1 *** Caught signal
> (Aborted) **
>
>  in thread 7fb079a20780
>
>
>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>
>  1: ceph-osd() [0xa1fe55]
>
>  2: (()+0xfcb0) [0x7fb078a60cb0]
>
>  3: (gsignal()+0x35) [0x7fb07768d0d5]
>
>  4: (abort()+0x17b) [0x7fb07769083b]
>
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fb077fde69d]
>
>  6: (()+0xb5846) [0x7fb077fdc846]
>
>  7: (()+0xb5873) [0x7fb077fdc873]
>
>  8: (()+0xb596e) [0x7fb077fdc96e]
>
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x259) [0xb17a29]
>
>  10: (PG::peek_map_epoch(ObjectStore*, spg_t, ceph::buffer::list*)+0xb28)
> [0x793dd8]
>
>  11: (OSD::load_pgs()+0x147f) [0x683dff]
>
>  12: (OSD::init()+0x1448) [0x6930b8]
>
>  13: (main()+0x26b9) [0x62fd89]
>
>  14: (__libc_start_main()+0xed) [0x7fb07767876d]
>
>  15: ceph-osd() [0x635679]
>
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
> --- logging levels ---
>
>    0/ 5 none
>
>    0/ 1 lockdep
>
>    0/ 1 context
>
>    1/ 1 crush
>
>    1/ 5 mds
>
>    1/ 5 mds_balancer
>
>    1/ 5 mds_locker
>
>    1/ 5 mds_log
>
>    1/ 5 mds_log_expire
>
>    1/ 5 mds_migrator
>
>    0/ 1 buffer
>
>    0/ 1 timer
>
>    0/ 1 filer
>
>    0/ 1 striper
>
>    0/ 1 objecter
>
>    0/ 5 rados
>
>    0/ 5 rbd
>
>    0/ 5 rbd_replay
>
>    0/ 5 journaler
>
>    0/ 5 objectcacher
>
>    0/ 5 client
>
>   20/20 osd
>
>    0/ 5 optracker
>
>    0/ 5 objclass
>
>    1/ 3 filestore
>
>    1/ 3 keyvaluestore
>
>    1/ 3 journal
>
>    0/ 5 ms
>
>    1/ 5 mon
>
>    0/10 monc
>
>    1/ 5 paxos
>
>    0/ 5 tp
>
>    1/ 5 auth
>
>    1/ 5 crypto
>
>    1/ 1 finisher
>
>    1/ 5 heartbeatmap
>
>    1/ 5 perfcounter
>
>    1/ 5 rgw
>
>    1/10 civetweb
>
>    1/ 5 javaclient
>
>    1/ 5 asok
>
>    1/ 1 throttle
>
>    0/ 0 refs
>
>    1/ 5 xio
>
>   -2/-2 (syslog threshold)
>
>   99/99 (stderr threshold)
>
>   max_recent     10000
>
>   max_new         1000
>
>   log_file
>
> --- end dump of recent events ---
>
>
> Here is the PG info for 2.14
>
> ceph pg 2.14 query
>
> { "state": "active+undersized+degraded",
>
>   "snap_trimq": "[]",
>
>   "epoch": 39556,
>
>   "up": [
>
>         8,
>
>         7],
>
>   "acting": [
>
>         8,
>
>         7],
>
>   "actingbackfill": [
>
>         "7",
>
>         "8"],
>
>   "info": { "pgid": "2.14",
>
>       "last_update": "39533'175859",
>
>       "last_complete": "39533'175859",
>
>       "log_tail": "36964'172858",
>
>       "last_user_version": 175859,
>
>       "last_backfill": "MAX",
>
>       "purged_snaps": "[]",
>
>       "history": { "epoch_created": 1,
>
>           "last_epoch_started": 39536,
>
>           "last_epoch_clean": 39536,
>
>           "last_epoch_split": 0,
>
>           "same_up_since": 39534,
>
>           "same_interval_since": 39534,
>
>           "same_primary_since": 39527,
>
>           "last_scrub": "39533'175859",
>
>           "last_scrub_stamp": "2015-05-18 05:23:02.952523",
>
>           "last_deep_scrub": "39533'175859",
>
>           "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523",
>
>           "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523"},
>
>       "stats": { "version": "39533'175859",
>
>           "reported_seq": "281883",
>
>           "reported_epoch": "39556",
>
>           "state": "active+undersized+degraded",
>
>           "last_fresh": "2015-05-19 06:41:09.002111",
>
>           "last_change": "2015-05-18 10:19:22.277851",
>
>           "last_active": "2015-05-19 06:41:09.002111",
>
>           "last_clean": "2015-05-18 06:41:38.906417",
>
>           "last_became_active": "2013-05-07 04:23:31.972742",
>
>           "last_unstale": "2015-05-19 06:41:09.002111",
>
>           "last_undegraded": "2015-05-18 10:18:37.449550",
>
>           "last_fullsized": "2015-05-18 10:18:37.449550",
>
>           "mapping_epoch": 39527,
>
>           "log_start": "36964'172858",
>
>           "ondisk_log_start": "36964'172858",
>
>           "created": 1,
>
>           "last_epoch_clean": 39536,
>
>           "parent": "0.0",
>
>           "parent_split_bits": 0,
>
>           "last_scrub": "39533'175859",
>
>           "last_scrub_stamp": "2015-05-18 05:23:02.952523",
>
>           "last_deep_scrub": "39533'175859",
>
>           "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523",
>
>           "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523",
>
>           "log_size": 3001,
>
>           "ondisk_log_size": 3001,
>
>           "stats_invalid": "0",
>
>           "stat_sum": { "num_bytes": 441982976,
>
>               "num_objects": 106,
>
>               "num_object_clones": 0,
>
>               "num_object_copies": 318,
>
>               "num_objects_missing_on_primary": 0,
>
>               "num_objects_degraded": 106,
>
>               "num_objects_misplaced": 0,
>
>               "num_objects_unfound": 0,
>
>               "num_objects_dirty": 11,
>
>               "num_whiteouts": 0,
>
>               "num_read": 61399,
>
>               "num_read_kb": 1285319,
>
>               "num_write": 135192,
>
>               "num_write_kb": 2422029,
>
>               "num_scrub_errors": 0,
>
>               "num_shallow_scrub_errors": 0,
>
>               "num_deep_scrub_errors": 0,
>
>               "num_objects_recovered": 79,
>
>               "num_bytes_recovered": 329883648,
>
>               "num_keys_recovered": 0,
>
>               "num_objects_omap": 0,
>
>               "num_objects_hit_set_archive": 0,
>
>               "num_bytes_hit_set_archive": 0},
>
>           "stat_cat_sum": {},
>
>           "up": [
>
>                 8,
>
>                 7],
>
>           "acting": [
>
>                 8,
>
>                 7],
>
>           "blocked_by": [],
>
>           "up_primary": 8,
>
>           "acting_primary": 8},
>
>       "empty": 0,
>
>       "dne": 0,
>
>       "incomplete": 0,
>
>       "last_epoch_started": 39536,
>
>       "hit_set_history": { "current_last_update": "0'0",
>
>           "current_last_stamp": "0.000000",
>
>           "current_info": { "begin": "0.000000",
>
>               "end": "0.000000",
>
>               "version": "0'0"},
>
>           "history": []}},
>
>   "peer_info": [
>
>         { "peer": "7",
>
>           "pgid": "2.14",
>
>           "last_update": "39533'175859",
>
>           "last_complete": "39533'175859",
>
>           "log_tail": "36964'172858",
>
>           "last_user_version": 175859,
>
>           "last_backfill": "MAX",
>
>           "purged_snaps": "[]",
>
>           "history": { "epoch_created": 1,
>
>               "last_epoch_started": 39536,
>
>               "last_epoch_clean": 39536,
>
>               "last_epoch_split": 0,
>
>               "same_up_since": 39534,
>
>               "same_interval_since": 39534,
>
>               "same_primary_since": 39527,
>
>               "last_scrub": "39533'175859",
>
>               "last_scrub_stamp": "2015-05-18 05:23:02.952523",
>
>               "last_deep_scrub": "39533'175859",
>
>               "last_deep_scrub_stamp": "2015-05-18 05:23:02.952523",
>
>               "last_clean_scrub_stamp": "2015-05-18 05:23:02.952523"},
>
>           "stats": { "version": "39533'175858",
>
>               "reported_seq": "281598",
>
>               "reported_epoch": "39533",
>
>               "state": "active+clean",
>
>               "last_fresh": "2015-05-13 21:58:43.553887",
>
>               "last_change": "2015-05-12 22:50:16.011917",
>
>               "last_active": "2015-05-13 21:58:43.553887",
>
>               "last_clean": "2015-05-13 21:58:43.553887",
>
>               "last_became_active": "2013-05-07 04:23:31.972742",
>
>               "last_unstale": "2015-05-13 21:58:43.553887",
>
>               "last_undegraded": "2015-05-13 21:58:43.553887",
>
>               "last_fullsized": "2015-05-13 21:58:43.553887",
>
>               "mapping_epoch": 39527,
>
>               "log_start": "36964'172857",
>
>               "ondisk_log_start": "36964'172857",
>
>               "created": 1,
>
>               "last_epoch_clean": 39529,
>
>               "parent": "0.0",
>
>               "parent_split_bits": 0,
>
>               "last_scrub": "39533'175857",
>
>               "last_scrub_stamp": "2015-05-12 22:50:16.011867",
>
>               "last_deep_scrub": "39533'175856",
>
>               "last_deep_scrub_stamp": "2015-05-10 10:30:24.933431",
>
>               "last_clean_scrub_stamp": "2015-05-12 22:50:16.011867",
>
>               "log_size": 3001,
>
>               "ondisk_log_size": 3001,
>
>               "stats_invalid": "0",
>
>               "stat_sum": { "num_bytes": 441982976,
>
>                   "num_objects": 106,
>
>                   "num_object_clones": 0,
>
>                   "num_object_copies": 315,
>
>                   "num_objects_missing_on_primary": 0,
>
>                   "num_objects_degraded": 0,
>
>                   "num_objects_misplaced": 0,
>
>                   "num_objects_unfound": 0,
>
>                   "num_objects_dirty": 11,
>
>                   "num_whiteouts": 0,
>
>                   "num_read": 61157,
>
>                   "num_read_kb": 1281187,
>
>                   "num_write": 135192,
>
>                   "num_write_kb": 2422029,
>
>                   "num_scrub_errors": 0,
>
>                   "num_shallow_scrub_errors": 0,
>
>                   "num_deep_scrub_errors": 0,
>
>                   "num_objects_recovered": 79,
>
>                   "num_bytes_recovered": 329883648,
>
>                   "num_keys_recovered": 0,
>
>                   "num_objects_omap": 0,
>
>                   "num_objects_hit_set_archive": 0,
>
>                   "num_bytes_hit_set_archive": 0},
>
>               "stat_cat_sum": {},
>
>               "up": [
>
>                     8,
>
>                     7],
>
>               "acting": [
>
>                     8,
>
>                     7],
>
>               "blocked_by": [],
>
>               "up_primary": 8,
>
>               "acting_primary": 8},
>
>           "empty": 0,
>
>           "dne": 0,
>
>           "incomplete": 0,
>
>           "last_epoch_started": 39536,
>
>           "hit_set_history": { "current_last_update": "0'0",
>
>               "current_last_stamp": "0.000000",
>
>               "current_info": { "begin": "0.000000",
>
>                   "end": "0.000000",
>
>                   "version": "0'0"},
>
>               "history": []}}],
>
>   "recovery_state": [
>
>         { "name": "Started\/Primary\/Active",
>
>           "enter_time": "2015-05-18 10:18:37.449561",
>
>           "might_have_unfound": [],
>
>           "recovery_progress": { "backfill_targets": [],
>
>               "waiting_on_backfill": [],
>
>               "last_backfill_started": "0\/\/0\/\/-1",
>
>               "backfill_info": { "begin": "0\/\/0\/\/-1",
>
>                   "end": "0\/\/0\/\/-1",
>
>                   "objects": []},
>
>               "peer_backfill_info": [],
>
>               "backfills_in_flight": [],
>
>               "recovering": [],
>
>               "pg_backend": { "pull_from_peer": [],
>
>                   "pushing": []}},
>
>           "scrub": { "scrubber.epoch_start": "39527",
>
>               "scrubber.active": 0,
>
>               "scrubber.block_writes": 0,
>
>               "scrubber.waiting_on": 0,
>
>               "scrubber.waiting_on_whom": []}},
>
>         { "name": "Started",
>
>           "enter_time": "2015-05-18 10:18:05.335040"}],
>
>   "agent_state": {}}
>
> On Mon, May 18, 2015 at 2:34 PM, Berant Lemmenes <ber...@lemmenes.com
> <javascript:;>>
> wrote:
>
> > Sam,
> >
> > Thanks for taking a look. It does seem to fit my issue. Would just
> > removing the 5.0_head directory be appropriate or would using
> > ceph-objectstore-tool be better?
> >
> > Thanks,
> > Berant
> >
> > On Mon, May 18, 2015 at 1:47 PM, Samuel Just <sj...@redhat.com
> <javascript:;>> wrote:
> >
> >> You have most likely hit http://tracker.ceph.com/issues/11429.  There
> >> are some workarounds in the bugs marked as duplicates of that bug, or
> you
> >> can wait for the next hammer point release.
> >> -Sam
> >>
> >> ----- Original Message -----
> >> From: "Berant Lemmenes" <ber...@lemmenes.com <javascript:;>>
> >> To: ceph-users@lists.ceph.com <javascript:;>
> >> Sent: Monday, May 18, 2015 10:24:38 AM
> >> Subject: [ceph-users] OSD unable to start (giant -> hammer)
> >>
> >> Hello all,
> >>
> >> I've encountered a problem when upgrading my single node home cluster
> >> from giant to hammer, and I would greatly appreciate any insight.
> >>
> >> I upgraded the packages like normal, then proceeded to restart the mon
> >> and once that came back restarted the first OSD (osd.3). However it
> >> subsequently won't start and crashes with the following failed
> assertion:
> >>
> >>
> >>
> >> osd/OSD.h: 716: FAILED assert(ret)
> >>
> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >>
> >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x7f) [0xb1784f]
> >>
> >> 2: (OSD::load_pgs()+0x277b) [0x6850fb]
> >>
> >> 3: (OSD::init()+0x1448) [0x6930b8]
> >>
> >> 4: (main()+0x26b9) [0x62fd89]
> >>
> >> 5: (__libc_start_main()+0xed) [0x7f2345bc976d]
> >>
> >> 6: ceph-osd() [0x635679]
> >>
> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> >> to interpret this.
> >>
> >>
> >>
> >>
> >> --- logging levels ---
> >>
> >> 0/ 5 none
> >>
> >> 0/ 1 lockdep
> >>
> >> 0/ 1 context
> >>
> >> 1/ 1 crush
> >>
> >> 1/ 5 mds
> >>
> >> 1/ 5 mds_balancer
> >>
> >> 1/ 5 mds_locker
> >>
> >> 1/ 5 mds_log
> >>
> >> 1/ 5 mds_log_expire
> >>
> >> 1/ 5 mds_migrator
> >>
> >> 0/ 1 buffer
> >>
> >> 0/ 1 timer
> >>
> >> 0/ 1 filer
> >>
> >> 0/ 1 striper
> >>
> >> 0/ 1 objecter
> >>
> >> 0/ 5 rados
> >>
> >> 0/ 5 rbd
> >>
> >> 0/ 5 rbd_replay
> >>
> >> 0/ 5 journaler
> >>
> >> 0/ 5 objectcacher
> >>
> >> 0/ 5 client
> >>
> >> 0/ 5 osd
> >>
> >> 0/ 5 optracker
> >>
> >> 0/ 5 objclass
> >>
> >> 1/ 3 filestore
> >>
> >> 1/ 3 keyvaluestore
> >>
> >> 1/ 3 journal
> >>
> >> 0/ 5 ms
> >>
> >> 1/ 5 mon
> >>
> >> 0/10 monc
> >>
> >> 1/ 5 paxos
> >>
> >> 0/ 5 tp
> >>
> >> 1/ 5 auth
> >>
> >> 1/ 5 crypto
> >>
> >> 1/ 1 finisher
> >>
> >> 1/ 5 heartbeatmap
> >>
> >> 1/ 5 perfcounter
> >>
> >> 1/ 5 rgw
> >>
> >> 1/10 civetweb
> >>
> >> 1/ 5 javaclient
> >>
> >> 1/ 5 asok
> >>
> >> 1/ 1 throttle
> >>
> >> 0/ 0 refs
> >>
> >> 1/ 5 xio
> >>
> >> -2/-2 (syslog threshold)
> >>
> >> 99/99 (stderr threshold)
> >>
> >> max_recent 10000
> >>
> >> max_new 1000
> >>
> >> log_file
> >>
> >> --- end dump of recent events ---
> >>
> >> terminate called after throwing an instance of 'ceph::FailedAssertion'
> >>
> >> *** Caught signal (Aborted) **
> >>
> >> in thread 7f2347f71780
> >>
> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >>
> >> 1: ceph-osd() [0xa1fe55]
> >>
> >> 2: (()+0xfcb0) [0x7f2346fb1cb0]
> >>
> >> 3: (gsignal()+0x35) [0x7f2345bde0d5]
> >>
> >> 4: (abort()+0x17b) [0x7f2345be183b]
> >>
> >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]
> >>
> >> 6: (()+0xb5846) [0x7f234652d846]
> >>
> >> 7: (()+0xb5873) [0x7f234652d873]
> >>
> >> 8: (()+0xb596e) [0x7f234652d96e]
> >>
> >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x259) [0xb17a29]
> >>
> >> 10: (OSD::load_pgs()+0x277b) [0x6850fb]
> >>
> >> 11: (OSD::init()+0x1448) [0x6930b8]
> >>
> >> 12: (main()+0x26b9) [0x62fd89]
> >>
> >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d]
> >>
> >> 14: ceph-osd() [0x635679]
> >>
> >> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal (Aborted)
> **
> >>
> >> in thread 7f2347f71780
> >>
> >>
> >>
> >>
> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >>
> >> 1: ceph-osd() [0xa1fe55]
> >>
> >> 2: (()+0xfcb0) [0x7f2346fb1cb0]
> >>
> >> 3: (gsignal()+0x35) [0x7f2345bde0d5]
> >>
> >> 4: (abort()+0x17b) [0x7f2345be183b]
> >>
> >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]
> >>
> >> 6: (()+0xb5846) [0x7f234652d846]
> >>
> >> 7: (()+0xb5873) [0x7f234652d873]
> >>
> >> 8: (()+0xb596e) [0x7f234652d96e]
> >>
> >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x259) [0xb17a29]
> >>
> >> 10: (OSD::load_pgs()+0x277b) [0x6850fb]
> >>
> >> 11: (OSD::init()+0x1448) [0x6930b8]
> >>
> >> 12: (main()+0x26b9) [0x62fd89]
> >>
> >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d]
> >>
> >> 14: ceph-osd() [0x635679]
> >>
> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> >> to interpret this.
> >>
> >>
> >>
> >>
> >> --- begin dump of recent events ---
> >>
> >> 0> 2015-05-18 13:02:33.643064 7f2347f71780 -1 *** Caught signal
> (Aborted)
> >> **
> >>
> >> in thread 7f2347f71780
> >>
> >>
> >>
> >>
> >> ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >>
> >> 1: ceph-osd() [0xa1fe55]
> >>
> >> 2: (()+0xfcb0) [0x7f2346fb1cb0]
> >>
> >> 3: (gsignal()+0x35) [0x7f2345bde0d5]
> >>
> >> 4: (abort()+0x17b) [0x7f2345be183b]
> >>
> >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f234652f69d]
> >>
> >> 6: (()+0xb5846) [0x7f234652d846]
> >>
> >> 7: (()+0xb5873) [0x7f234652d873]
> >>
> >> 8: (()+0xb596e) [0x7f234652d96e]
> >>
> >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x259) [0xb17a29]
> >>
> >> 10: (OSD::load_pgs()+0x277b) [0x6850fb]
> >>
> >> 11: (OSD::init()+0x1448) [0x6930b8]
> >>
> >> 12: (main()+0x26b9) [0x62fd89]
> >>
> >> 13: (__libc_start_main()+0xed) [0x7f2345bc976d]
> >>
> >> 14: ceph-osd() [0x635679]
> >>
> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> >> to interpret this.
> >>
> >>
> >>
> >>
> >> --- logging levels ---
> >>
> >> 0/ 5 none
> >>
> >> 0/ 1 lockdep
> >>
> >> 0/ 1 context
> >>
> >> 1/ 1 crush
> >>
> >> 1/ 5 mds
> >>
> >> 1/ 5 mds_balancer
> >>
> >> 1/ 5 mds_locker
> >>
> >> 1/ 5 mds_log
> >>
> >> 1/ 5 mds_log_expire
> >>
> >> 1/ 5 mds_migrator
> >>
> >> 0/ 1 buffer
> >>
> >> 0/ 1 timer
> >>
> >> 0/ 1 filer
> >>
> >> 0/ 1 striper
> >>
> >> 0/ 1 objecter
> >>
> >> 0/ 5 rados
> >>
> >> 0/ 5 rbd
> >>
> >> 0/ 5 rbd_replay
> >>
> >> 0/ 5 journaler
> >>
> >> 0/ 5 objectcacher
> >>
> >> 0/ 5 client
> >>
> >> 0/ 5 osd
> >>
> >> 0/ 5 optracker
> >>
> >> 0/ 5 objclass
> >>
> >> 1/ 3 filestore
> >>
> >> 1/ 3 keyvaluestore
> >>
> >> 1/ 3 journal
> >>
> >> 0/ 5 ms
> >>
> >> 1/ 5 mon
> >>
> >> 0/10 monc
> >>
> >> 1/ 5 paxos
> >>
> >> 0/ 5 tp
> >>
> >> 1/ 5 auth
> >>
> >> 1/ 5 crypto
> >>
> >> 1/ 1 finisher
> >>
> >> 1/ 5 heartbeatmap
> >>
> >> 1/ 5 perfcounter
> >>
> >> 1/ 5 rgw
> >>
> >> 1/10 civetweb
> >>
> >> 1/ 5 javaclient
> >>
> >> 1/ 5 asok
> >>
> >> 1/ 1 throttle
> >>
> >> 0/ 0 refs
> >>
> >> 1/ 5 xio
> >>
> >> -2/-2 (syslog threshold)
> >>
> >> 99/99 (stderr threshold)
> >>
> >> max_recent 10000
> >>
> >> max_new 1000
> >>
> >> log_file
> >>
> >> --- end dump of recent events ---
> >>
> >>
> >> I've included a 'ceph osd dump' here:
> >> http://pastebin.com/RKbaY7nv
> >>
> >> ceph osd tree:
> >>
> >>
> >> ceph osd tree
> >>
> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >>
> >> -1 24.14000 root default
> >>
> >> -3 0 rack unknownrack
> >>
> >> -2 0 host ceph-test
> >>
> >> -4 24.14000 host ceph01
> >>
> >> 0 1.50000 osd.0 down 0 1.00000
> >>
> >> 2 1.50000 osd.2 down 0 1.00000
> >>
> >> 3 1.50000 osd.3 down 1.00000 1.00000
> >>
> >> 5 2.00000 osd.5 up 1.00000 1.00000
> >>
> >> 6 2.00000 osd.6 up 1.00000 1.00000
> >>
> >> 7 2.00000 osd.7 up 1.00000 1.00000
> >>
> >> 8 2.00000 osd.8 up 1.00000 1.00000
> >>
> >> 9 2.00000 osd.9 up 1.00000 1.00000
> >>
> >> 10 2.00000 osd.10 up 1.00000 1.00000
> >>
> >> 4 4.00000 osd.4 up 1.00000 1.00000
> >>
> >> 1 3.64000 osd.1 up 1.00000 1.00000
> >>
> >>
> >>
> >>
> >> Note that osd.0 and osd.2 were down prior to the upgrade and the cluster
> >> was healthy (these are failed disks that have been out for some time
> just
> >> not removed from CRUSH.
> >>
> >> I've also included a log with OSD debugging set to 20 here:
> >>
> >> https://dl.dropboxusercontent.com/u/1043493/osd.3.log.gz
> >>
> >>
> >> Looking through that file, it appears the last pg that it loads
> >> successfully is 2.3f6 then it moves to 5.0
> >>
> >> -3> 2015-05-18 12:25:24.292091 7f6f407f9780 10 osd.3 39533 load_pgs
> >> loaded pg[2.3f6( v 39533'289849 (37945'286848,39533'289849]
> local-les=39532
> >> n=99 ec=1 les/c 39532/39532 39531/39531/39523) [5,4,3] r=2 lpr=39533
> >> pi=34961-39530/34 crt=39533'289846 lcod 0'0 inactive NOTIFY]
> >> log((37945'286848,39533'289849], crt=39533'289846)
> >>
> >> -2> 2015-05-18 12:25:24.292100 7f6f407f9780 10 osd.3 39533 pgid 5.0 coll
> >> 5.0_head
> >>
> >> -1> 2015-05-18 12:25:24.570188 7f6f407f9780 20 osd.3 0 get_map 34144 -
> >> loading and decoding 0x411fd80
> >>
> >> 0> 2015-05-18 12:26:02.758914 7f6f407f9780 -1 osd/OSD.h: In function
> >> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f6f407f9780 time
> >> 2015-05-18 12:25:24.620468
> >>
> >>
> >>
> >> osd/OSD.h: 716: FAILED assert(ret)
> >>
> >> [snip]
> >>
> >> Which I don't see 5.0 in a pg dump.
> >>
> >>
> >>
> >>
> >> Thanks in advance!
> >>
> >> Berant
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com <javascript:;>
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to