We recently had a few inconsistent PGs crop up on one of our clusters, and I wanted to describe the process used to repair them for review and perhaps to help someone in the future.
Our state roughly matched David's described comment here: http://tracker.ceph.com/issues/21388#note-1 However, we were missing the object entirely on the primary OSD. This may have been due to previous manual repair attempts, but the exact cause of the missing object is unclear. In order to get the PG into a state consistent with David's comment, I exported the perceived "good" copy of the PG using ceph-objectstore-tool and imported it to the primary OSD. At this point, a repair would consistently cause an empty listing in "rados list-inconsistent-obj" (but still inconsistent), and a deep-scrub would cause the "list-inconsistent-obj" state to appear as David described. However, "rados get" resulted in I/O errors. I again used ceph-objectstore-tool with the "get-bytes" option to dump the object contents to a file and "rados put" that. It seemed to work and the customer's VM hasn't noticed anything awry yet... but then again it wasn't prior to this either. Seems the right data is in place and the PG is consistent after a deep-scrub. Pretty standard stuff, but might help with alternative ways of dumping byte data in the future as long as others don't see an issue with this. I see at least one other with the same I/O error on the bug. -- Brian Andrus | Cloud Systems Engineer | DreamHost brian.and...@dreamhost.com | www.dreamhost.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com