During reorganization of the Ceph system, including an updated CRUSH
map and moving to btrfs, some PGs became stuck incomplete+remapped.
Before that was resolved, a restart of osd.1 failed while creating a
btrfs snapshot.  A 'ceph-osd -i 1 --flush-journal' fails with the same
error.  See the below pasted log.

This is a Bad Thing, because two PGs are now stuck down+peering.  A
'ceph pg 2.74 query' shows they had been stuck on osd.1 before the
btrfs problem, despite what the 'last acting' field shows in the below
'ceph health detail' output.

Is there any way to recover from this?  Judging from Google searches
on the list archives, nobody has run into this problem before, so I'm
quite worried that this spells backup recovery exercises for the next
few days.

Related question:  Are outright OSD crashes the reason btrfs is
discouraged for production use?

Thanks-

        John



pg 2.74 is stuck inactive since forever, current state down+peering, last 
acting [3,7,0,6]
pg 3.73 is stuck inactive since forever, current state down+peering, last 
acting [3,7,0,6]
pg 2.74 is stuck unclean since forever, current state down+peering, last acting 
[3,7,0,6]
pg 3.73 is stuck unclean since forever, current state down+peering, last acting 
[3,7,0,6]
pg 2.74 is down+peering, acting [3,7,0,6]
pg 3.73 is down+peering, acting [3,7,0,6]


2014-08-26 22:36:12.641585 7f5b38e507a0  0 ceph version 0.67.10 
(9d446bd416c52cd785ccf048ca67737ceafcdd7f), process ceph-osd, pid 10281
2014-08-26 22:36:12.717100 7f5b38e507a0  0 filestore(/ceph/osd.1) mount FIEMAP 
ioctl is supported and appears to work
2014-08-26 22:36:12.717121 7f5b38e507a0  0 filestore(/ceph/osd.1) mount FIEMAP 
ioctl is disabled via 'filestore fiemap' config option
2014-08-26 22:36:12.717434 7f5b38e507a0  0 filestore(/ceph/osd.1) mount 
detected btrfs
2014-08-26 22:36:12.717471 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
CLONE_RANGE ioctl is supported
2014-08-26 22:36:12.765009 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
SNAP_CREATE is supported
2014-08-26 22:36:12.765335 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
SNAP_DESTROY is supported
2014-08-26 22:36:12.765541 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
START_SYNC is supported (transid 3118)
2014-08-26 22:36:12.789600 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
WAIT_SYNC is supported
2014-08-26 22:36:12.808287 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
SNAP_CREATE_V2 is supported
2014-08-26 22:36:12.834144 7f5b38e507a0  0 filestore(/ceph/osd.1) mount 
syscall(SYS_syncfs, fd) fully supported
2014-08-26 22:36:12.834377 7f5b38e507a0  0 filestore(/ceph/osd.1) mount found 
snaps <6009082,6009083>
2014-08-26 22:36:12.834427 7f5b38e507a0 -1 filestore(/ceph/osd.1) 
FileStore::mount: error removing old current subvol: (22) Invalid argument
2014-08-26 22:36:12.861045 7f5b38e507a0 -1 filestore(/ceph/osd.1) mount initial 
op seq is 0; something is wrong
2014-08-26 22:36:12.861428 7f5b38e507a0 -1 ^[[0;31m ** ERROR: error converting 
store /ceph/osd.1: (22) Invalid argument^[[0m
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to