Hi community, 10 months ago, we discovered issue, after removing cache tier from cluster with cluster HEALTH, and start email thread, as result - new bug was created on tracker by Samuel Just http://tracker.ceph.com/issues/12738
Till that time, i'm looking for good moment to upgrade (after fix was backported to 0.94.7). And yesterday i did upgrade on my production cluster. >From 28 scrub errors, only 5 remains, so i need to fix them by ceph-objectstore-tool remove-clone-metadata subcommand. I try to did it, but without real results... Can you please give me advice, what i'm doing wrong? My flow was the next: 1. Identify problem PGs... - ceph health detail | grep inco | grep -v HEALTH | cut -d " " -f 2 2. Start repair for them, to collect info about errors into logs - ceph pg repair <pg_id> After this for example, i received next records into logs 2016-07-20 00:32:10.650061 osd.56 10.12.2.5:6800/1985741 25 : cluster [INF] 2.c4 repair starts 2016-07-20 00:33:06.405136 osd.56 10.12.2.5:6800/1985741 26 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/14d 2016-07-20 00:33:06.405323 osd.56 10.12.2.5:6800/1985741 27 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/138 2016-07-20 00:33:06.405385 osd.56 10.12.2.5:6800/1985741 28 : cluster [INF] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir 1 missing clone(s) 2016-07-20 00:40:42.457657 osd.56 10.12.2.5:6800/1985741 29 : cluster [ERR] 2.c4 repair 2 errors, 0 fixed So, i try to fix it with next command: stop ceph-osd id=56 ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0000000000000307 remove-clone-metadata 138 ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0000000000000307 remove-clone-metadata 14d start ceph-osd id=56 Strange fact, that after I did this commands - i don;t receive message like (according to sources... ) cout << "Removal of clone " << cloneid << " complete" << std::endl; cout << "Use pg repair after OSD restarted to correct stat information" << std::endl; I received silent (no output after command, and command take about 30-35 min to execute... ) Sure, i start pg repair again after this actions... But result - same, errors still exists... So, possible i misunderstand input format for ceph-objectstore-tool... Please help with this.. :) Thanks you in advance!
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com