Hi community, 10 months ago, we discovered issue, after removing cache tier
from cluster with cluster HEALTH, and start email thread, as result - new
bug was created on tracker by Samuel Just
http://tracker.ceph.com/issues/12738

Till that time, i'm looking for good moment to upgrade (after fix was
backported to 0.94.7). And yesterday i did upgrade on my production cluster.

>From 28 scrub errors, only 5 remains, so i need to fix them by
ceph-objectstore-tool remove-clone-metadata subcommand.

I try to did it, but without real results... Can you please give me advice,
what i'm doing wrong?

My flow was the next:

1. Identify problem PGs... -  ceph health detail | grep inco | grep -v
HEALTH | cut -d " " -f 2
2. Start repair for them, to collect info about errors into logs - ceph pg
repair <pg_id>

After this for example, i received next records into logs

2016-07-20 00:32:10.650061 osd.56 10.12.2.5:6800/1985741 25 : cluster [INF]
2.c4 repair starts

2016-07-20 00:33:06.405136 osd.56 10.12.2.5:6800/1985741 26 : cluster [ERR]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir
expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/14d

2016-07-20 00:33:06.405323 osd.56 10.12.2.5:6800/1985741 27 : cluster [ERR]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir
expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/138

2016-07-20 00:33:06.405385 osd.56 10.12.2.5:6800/1985741 28 : cluster [INF]
repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir 1
missing clone(s)

2016-07-20 00:40:42.457657 osd.56 10.12.2.5:6800/1985741 29 : cluster [ERR]
2.c4 repair 2 errors, 0 fixed

So, i try to fix it with next command:

stop ceph-osd id=56
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path
/var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0000000000000307
remove-clone-metadata 138
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path
/var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0000000000000307
remove-clone-metadata 14d
start ceph-osd id=56

Strange fact, that after I did this commands - i don;t receive message like
(according to sources... )

cout << "Removal of clone " << cloneid << " complete" << std::endl;
cout << "Use pg repair after OSD restarted to correct stat information" <<
std::endl;

I received silent (no output after command, and command take about 30-35
min to execute... )

Sure, i start pg repair again after this actions... But result - same,
errors still exists...

So, possible i misunderstand input format for ceph-objectstore-tool...
Please help with this.. :)

Thanks you in advance!
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to