Re: [ceph-users] PG's incomplete after OSD failure

2014-11-12 Thread Chad Seys
Would love to hear if you discover a way to get zapping incomplete PGs!

Perhaps this is a common enough issue to open an issue?

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG's incomplete after OSD failure

2014-11-11 Thread Matthew Anderson
Thanks for your reply Sage!

I've tested with 8.6ae and no luck I'm afraid. Steps taken were -
Stop osd.117
Export 8.6ae from osd.117
Remove 8.6ae from osd.117
start osd.117
restart osd.190 after still showing incomplete

After this the PG was still showing incomplete and ceph pg dump_stuck
inactive shows -
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0
161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09
16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650

I then tried an export from OSD 190 to OSD 117 by doing -
Stop osd.190 and osd.117
Export pg 8.6ae from osd.190
Import from file generated in previous step into osd.117
Boot both osd.190 and osd.117

When osd.117 attempts to start it generates an failed assert, full log
is here http://pastebin.com/S4CXrTAL
-1 2014-11-11 17:25:15.130509 7f9f44512900  0 osd.117 161404 load_pgs
 0 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900
time 2014-11-11 17:25:18.602626
osd/OSD.h: 715: FAILED assert(ret)

 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xb8231b]
 2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f]
 3: (OSD::load_pgs()+0x1b78) [0x6aae18]
 4: (OSD::init()+0x71f) [0x6abf5f]
 5: (main()+0x252c) [0x638cfc]
 6: (__libc_start_main()+0xf5) [0x7f9f41650ec5]
 7: /usr/bin/ceph-osd() [0x651027]

I also attempted the same steps with 8.ca and got the same results.
The below is the current state of the pg with it removed from osd.111
-
pg_stat objects mip degr misp unf bytes log disklog state state_stamp
v reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11
17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111]
190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02
12:57:58.162789

Any idea of where I can go from here?
One thought I had was setting osd.111 and osd.117 out of the cluster
and once the data is moved I can shut them down and mark them as lost
which would make osd.190 the only replica available for those PG's.

Thanks again

On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil sw...@redhat.com wrote:
 On Tue, 11 Nov 2014, Matthew Anderson wrote:
 Just an update, it appears that no data actually exists for those PG's
 on osd.117 and osd.111 but it's showing as incomplete anyway.

 So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
 filled with data.
 For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
 filled with data as before.

 Since all of the required data is on OSD.190, would there be a way to
 make osd.111 and osd.117 forget they have ever seen the two incomplete
 PG's and therefore restart backfilling?

 Ah, that's good news.  You should know that the copy on osd.190 is
 slightly out of date, but it is much better than losing the entire
 contents of the PG.  More specifically, for 8.6ae the latest version was
 1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll
 need to fsck the RBD images after this is all done.

 I don't think we've tested this recovery scenario, but I think you'll be
 able to recovery with ceph_objectstore_tool, which has an import/export
 function and a delete function.  First, try removing the newer version of
 the pg on osd.117.  First export it for good measure (even tho it's
 empty):

 stop the osd

 ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
 --journal-path /var/lib/ceph/osd/ceph-117/journal \
 --op export --pgid 8.6ae --file osd.117.8.7ae

 ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
 --journal-path /var/lib/ceph/osd/ceph-117/journal \
 --op remove --pgid 8.6ae

 and restart.  If that doesn't peer, you can also try exporting the pg from
 osd.190 and importing it into osd.117.  I think just removing the
 newer empty pg on osd.117 will do the trick, though...

 sage





 On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson
 manderson8...@gmail.com wrote:
  Hi All,
 
  We've had a string of very unfortunate failures and need a hand fixing
  the incomplete PG's that we're now left with. We're configured with 3
  replicas over different hosts with 5 in total.
 
  The timeline goes -
  -1 week  :: A full server goes offline with a failed backplane. Still
  not working
  -1 day  ::  OSD 190 fails
  -1 day + 3 minutes :: OSD 121 fails in a different server fails taking
  out several PG's and blocking IO
  Today  :: The first failed osd (osd.190) was cloned to a good drive
  with xfs_dump | xfs_restore and now boots fine. The last failed osd
  (osd.121) is completely unrecoverable and was marked as lost.
 
  What we're left with 

Re: [ceph-users] PG's incomplete after OSD failure

2014-11-11 Thread Matthew Anderson
I've done a bit more work tonight and managed to get some more data
back. Osd.121, which was previously completely dead, has made it
through an XFS repair with a more fault tolerant HBA firmware and I
was able to export both of the placement groups required using
ceph_objectstore_tool. The osd would probably boot if I hadn't already
marked it as lost :(

I've basically got it down to two options.

I can import the exported data from osd.121 into osd.190 which would
complete the PG but this fails with a filestore feature mismatch
because the sharded objects feature is missing on the target osd.
Export has incompatible features set
compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
object,3=object
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
objects,12=transaction hints}

The second one would be to run a ceph pg force_create_pg on each of
the problem PG's to reset them back to empty and them import the data
using ceph_objectstore_tool import-rados. Unfortunately this has
failed as well when I tested ceph pg force_create_pg on an incomplete
PG in another pool. The PG gets set to creating but then goes back to
incomplete after a few minutes.

I've trawled the mailing list for solutions but have come up empty,
neither problem appears to have been resolved before.

On Tue, Nov 11, 2014 at 5:54 PM, Matthew Anderson
manderson8...@gmail.com wrote:
 Thanks for your reply Sage!

 I've tested with 8.6ae and no luck I'm afraid. Steps taken were -
 Stop osd.117
 Export 8.6ae from osd.117
 Remove 8.6ae from osd.117
 start osd.117
 restart osd.190 after still showing incomplete

 After this the PG was still showing incomplete and ceph pg dump_stuck
 inactive shows -
 pg_stat objects mip degr misp unf bytes log disklog state state_stamp
 v reported up up_primary acting acting_primary last_scrub scrub_stamp
 last_deep_scrub deep_scrub_stamp
 8.6ae 0 0 0 0 0 0 0 0 incomplete 2014-11-11 17:34:27.168078 0'0
 161425:40 [117,190] 117 [117,190] 117 86424'389748 2013-09-09
 16:52:58.796650 86424'389748 2013-09-09 16:52:58.796650

 I then tried an export from OSD 190 to OSD 117 by doing -
 Stop osd.190 and osd.117
 Export pg 8.6ae from osd.190
 Import from file generated in previous step into osd.117
 Boot both osd.190 and osd.117

 When osd.117 attempts to start it generates an failed assert, full log
 is here http://pastebin.com/S4CXrTAL
 -1 2014-11-11 17:25:15.130509 7f9f44512900  0 osd.117 161404 load_pgs
  0 2014-11-11 17:25:18.604696 7f9f44512900 -1 osd/OSD.h: In
 function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f9f44512900
 time 2014-11-11 17:25:18.602626
 osd/OSD.h: 715: FAILED assert(ret)

  ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
 const*)+0x8b) [0xb8231b]
  2: (OSDService::get_map(unsigned int)+0x3f) [0x6eea2f]
  3: (OSD::load_pgs()+0x1b78) [0x6aae18]
  4: (OSD::init()+0x71f) [0x6abf5f]
  5: (main()+0x252c) [0x638cfc]
  6: (__libc_start_main()+0xf5) [0x7f9f41650ec5]
  7: /usr/bin/ceph-osd() [0x651027]

 I also attempted the same steps with 8.ca and got the same results.
 The below is the current state of the pg with it removed from osd.111
 -
 pg_stat objects mip degr misp unf bytes log disklog state state_stamp
 v reported up up_primary acting acting_primary last_scrub scrub_stamp
 last_deep_scrub deep_scrub_stamp
 8.ca 2440 0 0 0 0 10219748864 9205 9205 incomplete 2014-11-11
 17:39:28.570675 160435'959618 161425:6071759 [190,111] 190 [190,111]
 190 86417'207324 2013-09-09 12:58:10.749001 86229'196887 2013-09-02
 12:57:58.162789

 Any idea of where I can go from here?
 One thought I had was setting osd.111 and osd.117 out of the cluster
 and once the data is moved I can shut them down and mark them as lost
 which would make osd.190 the only replica available for those PG's.

 Thanks again

 On Tue, Nov 11, 2014 at 1:10 PM, Sage Weil sw...@redhat.com wrote:
 On Tue, 11 Nov 2014, Matthew Anderson wrote:
 Just an update, it appears that no data actually exists for those PG's
 on osd.117 and osd.111 but it's showing as incomplete anyway.

 So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
 filled with data.
 For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
 filled with data as before.

 Since all of the required data is on OSD.190, would there be a way to
 make osd.111 and osd.117 forget they have ever seen the two incomplete
 PG's and therefore restart backfilling?

 Ah, that's good news.  You should know that the copy on osd.190 is
 slightly out of date, but it is much better than losing the entire
 contents of the PG.  More specifically, for 8.6ae the latest version was
 1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll
 need to fsck the RBD images after this is all done.

 I don't think we've tested this recovery scenario, but I think you'll be
 able to recovery with 

[ceph-users] PG's incomplete after OSD failure

2014-11-10 Thread Matthew Anderson
Hi All,

We've had a string of very unfortunate failures and need a hand fixing
the incomplete PG's that we're now left with. We're configured with 3
replicas over different hosts with 5 in total.

The timeline goes -
-1 week  :: A full server goes offline with a failed backplane. Still
not working
-1 day  ::  OSD 190 fails
-1 day + 3 minutes :: OSD 121 fails in a different server fails taking
out several PG's and blocking IO
Today  :: The first failed osd (osd.190) was cloned to a good drive
with xfs_dump | xfs_restore and now boots fine. The last failed osd
(osd.121) is completely unrecoverable and was marked as lost.

What we're left with now is 2 incomplete PG's that are preventing RBD
images from booting.

# ceph pg dump_stuck inactive
ok
pg_statobjectsmipdegrmispunfbyteslog
disklogstatestate_stampvreportedupup_primary
 actingacting_primarylast_scrubscrub_stamp
last_deep_scrubdeep_scrub_stamp
8.ca244000001021974886492059205
incomplete2014-11-11 10:29:04.910512160435'959618
161358:6071679[190,111]190[190,111]19086417'207324
   2013-09-09 12:58:10.74900186229'1968872013-09-02
12:57:58.162789
8.6ae00000031763176incomplete
2014-11-11 10:24:07.000373160931'1935986161358:267
[117,190]117[117,190]11786424'3897482013-09-09
16:52:58.79665086424'3897482013-09-09 16:52:58.796650

We've tried doing a pg revert but it's saying 'no missing objects'
followed by not doing anything. I've also done the usual scrub,
deep-scrub, pg and osd repairs... so far nothing has helped.

I think it could be a similar situation to this post [
http://www.spinics.net/lists/ceph-users/msg11461.html ] where one of
the osd's it holding a slightly newer but incomplete version of the PG
which needs to be removed. Is anyone able to shed some light on how I
might be able to use the objectstore tool to check if this is the
case?

If anyone has any suggestions it would be greatly appreciated.
Likewise if you need any more information about my problem just let me
know

Thanks all
-Matt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG's incomplete after OSD failure

2014-11-10 Thread Matthew Anderson
Just an update, it appears that no data actually exists for those PG's
on osd.117 and osd.111 but it's showing as incomplete anyway.

So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
filled with data.
For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
filled with data as before.

Since all of the required data is on OSD.190, would there be a way to
make osd.111 and osd.117 forget they have ever seen the two incomplete
PG's and therefore restart backfilling?


On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson
manderson8...@gmail.com wrote:
 Hi All,

 We've had a string of very unfortunate failures and need a hand fixing
 the incomplete PG's that we're now left with. We're configured with 3
 replicas over different hosts with 5 in total.

 The timeline goes -
 -1 week  :: A full server goes offline with a failed backplane. Still
 not working
 -1 day  ::  OSD 190 fails
 -1 day + 3 minutes :: OSD 121 fails in a different server fails taking
 out several PG's and blocking IO
 Today  :: The first failed osd (osd.190) was cloned to a good drive
 with xfs_dump | xfs_restore and now boots fine. The last failed osd
 (osd.121) is completely unrecoverable and was marked as lost.

 What we're left with now is 2 incomplete PG's that are preventing RBD
 images from booting.

 # ceph pg dump_stuck inactive
 ok
 pg_statobjectsmipdegrmispunfbyteslog
 disklogstatestate_stampvreportedupup_primary
  actingacting_primarylast_scrubscrub_stamp
 last_deep_scrubdeep_scrub_stamp
 8.ca244000001021974886492059205
 incomplete2014-11-11 10:29:04.910512160435'959618
 161358:6071679[190,111]190[190,111]19086417'207324
2013-09-09 12:58:10.74900186229'1968872013-09-02
 12:57:58.162789
 8.6ae00000031763176incomplete
 2014-11-11 10:24:07.000373160931'1935986161358:267
 [117,190]117[117,190]11786424'3897482013-09-09
 16:52:58.79665086424'3897482013-09-09 16:52:58.796650

 We've tried doing a pg revert but it's saying 'no missing objects'
 followed by not doing anything. I've also done the usual scrub,
 deep-scrub, pg and osd repairs... so far nothing has helped.

 I think it could be a similar situation to this post [
 http://www.spinics.net/lists/ceph-users/msg11461.html ] where one of
 the osd's it holding a slightly newer but incomplete version of the PG
 which needs to be removed. Is anyone able to shed some light on how I
 might be able to use the objectstore tool to check if this is the
 case?

 If anyone has any suggestions it would be greatly appreciated.
 Likewise if you need any more information about my problem just let me
 know

 Thanks all
 -Matt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG's incomplete after OSD failure

2014-11-10 Thread Sage Weil
On Tue, 11 Nov 2014, Matthew Anderson wrote:
 Just an update, it appears that no data actually exists for those PG's
 on osd.117 and osd.111 but it's showing as incomplete anyway.
 
 So for the 8.ca PG, osd.111 has only an empty directory but osd 190 is
 filled with data.
 For 8.6ae, osd.117 has no data in the pg directory and osd.190 is
 filled with data as before.
 
 Since all of the required data is on OSD.190, would there be a way to
 make osd.111 and osd.117 forget they have ever seen the two incomplete
 PG's and therefore restart backfilling?

Ah, that's good news.  You should know that the copy on osd.190 is 
slightly out of date, but it is much better than losing the entire 
contents of the PG.  More specifically, for 8.6ae the latest version was 
1935986 but the osd.190 is 1935747, about 200 writes in the past.  You'll 
need to fsck the RBD images after this is all done.

I don't think we've tested this recovery scenario, but I think you'll be 
able to recovery with ceph_objectstore_tool, which has an import/export 
function and a delete function.  First, try removing the newer version of 
the pg on osd.117.  First export it for good measure (even tho it's 
empty):

stop the osd

ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
--journal-path /var/lib/ceph/osd/ceph-117/journal \
--op export --pgid 8.6ae --file osd.117.8.7ae

ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-117  \
--journal-path /var/lib/ceph/osd/ceph-117/journal \
--op remove --pgid 8.6ae

and restart.  If that doesn't peer, you can also try exporting the pg from 
osd.190 and importing it into osd.117.  I think just removing the 
newer empty pg on osd.117 will do the trick, though...

sage



 
 
 On Tue, Nov 11, 2014 at 10:37 AM, Matthew Anderson
 manderson8...@gmail.com wrote:
  Hi All,
 
  We've had a string of very unfortunate failures and need a hand fixing
  the incomplete PG's that we're now left with. We're configured with 3
  replicas over different hosts with 5 in total.
 
  The timeline goes -
  -1 week  :: A full server goes offline with a failed backplane. Still
  not working
  -1 day  ::  OSD 190 fails
  -1 day + 3 minutes :: OSD 121 fails in a different server fails taking
  out several PG's and blocking IO
  Today  :: The first failed osd (osd.190) was cloned to a good drive
  with xfs_dump | xfs_restore and now boots fine. The last failed osd
  (osd.121) is completely unrecoverable and was marked as lost.
 
  What we're left with now is 2 incomplete PG's that are preventing RBD
  images from booting.
 
  # ceph pg dump_stuck inactive
  ok
  pg_statobjectsmipdegrmispunfbyteslog
  disklogstatestate_stampvreportedupup_primary
   actingacting_primarylast_scrubscrub_stamp
  last_deep_scrubdeep_scrub_stamp
  8.ca244000001021974886492059205
  incomplete2014-11-11 10:29:04.910512160435'959618
  161358:6071679[190,111]190[190,111]19086417'207324
 2013-09-09 12:58:10.74900186229'1968872013-09-02
  12:57:58.162789
  8.6ae00000031763176incomplete
  2014-11-11 10:24:07.000373160931'1935986161358:267
  [117,190]117[117,190]11786424'3897482013-09-09
  16:52:58.79665086424'3897482013-09-09 16:52:58.796650
 
  We've tried doing a pg revert but it's saying 'no missing objects'
  followed by not doing anything. I've also done the usual scrub,
  deep-scrub, pg and osd repairs... so far nothing has helped.
 
  I think it could be a similar situation to this post [
  http://www.spinics.net/lists/ceph-users/msg11461.html ] where one of
  the osd's it holding a slightly newer but incomplete version of the PG
  which needs to be removed. Is anyone able to shed some light on how I
  might be able to use the objectstore tool to check if this is the
  case?
 
  If anyone has any suggestions it would be greatly appreciated.
  Likewise if you need any more information about my problem just let me
  know
 
  Thanks all
  -Matt
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com