Re: [ceph-users] PG repair failing when object missing

2013-10-25 Thread Harry Harrington
Thanks Greg

> Date: Thu, 24 Oct 2013 13:30:02 -0700
> From: g...@inktank.com
> To: watering...@gmail.com; git-ha...@live.co.uk
> CC: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] PG repair failing when object missing
>
> I also created a ticket to try and handle this particular instance of bad 
> behavior:
> http://tracker.ceph.com/issues/6629
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> On October 24, 2013 at 1:22:54 PM, Greg Farnum (gregory.far...@inktank.com) 
> wrote:
>>
>>I was also able to reproduce this, guys, but I believe it’s specific to the 
>>mode of testing rather than to anything being wrong with the OSD. In 
>>particular, after restarting the OSD whose file I removed and running repair, 
>>it did so successfully.
>>The OSD has an “fd cacher” which caches open file handles, and we believe 
>>this is what causes the observed behavior: if the removed object is among the 
>>most recent objects touched, the FileStore (an OSD subsystem) has an open fd 
>>cached, so when manually deleting the file the FileStore now has a deleted 
>>file open. When the repair happens, it finds that open file descriptor and 
>>applies the repair to it — which of course doesn’t help put it back into 
>>place!
>>-Greg
>>Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>On October 24, 2013 at 2:52:54 AM, Matt Thompson (watering...@gmail.com) 
>>wrote:
>>>
>>>Hi Harry,
>>>
>>>I was able to replicate this.
>>>
>>>What does appear to work (for me) is to do an osd scrub followed by a pg
>>>repair. I've tried this 2x now and in each case the deleted file gets
>>>copied over to the OSD from where it was removed. However, I've tried a
>>>few pg scrub / pg repairs after manually deleting a file and have yet to
>>>see the file get copied back to the OSD on which it was deleted. Like you
>>>said, the pg repair sets the health of the PG back to active+clean, but
>>>then re-running the pg scrub detects the file as missing again and sets it
>>>back to active+clean+inconsistent.
>>>
>>>Regards,
>>>Matt
>>>
>>>
>>>On Wed, Oct 23, 2013 at 3:45 PM, Harry Harrington wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been taking a look at the repair functionality in ceph. As I
>>>> understand it the osds should try to copy an object from another member of
>>>> the pg if it is missing. I have been attempting to test this by manually
>>>> removing a file from one of the osds however each time the repair
>>>> completes the the file has not been restored. If I run another scrub on the
>>>> pg it gets flagged as inconsistent. See below for the output from my
>>>> testing. I assume I'm missing something obvious, any insight into this
>>>> process would be greatly appreciated.
>>>>
>>>> Thanks,
>>>> Harry
>>>>
>>>> # ceph --version
>>>> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>>>> # ceph status
>>>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>>> health HEALTH_OK
>>>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>>>> 0 ceph1
>>>> osdmap e13: 3 osds: 3 up, 3 in
>>>> pgmap v232: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
>>>> 164 GB / 179 GB avail
>>>> mdsmap e1: 0/0/1 up
>>>>
>>>> file removed from osd.2
>>>>
>>>> # ceph pg scrub 0.b
>>>> instructing pg 0.b on osd.1 to scrub
>>>>
>>>> # ceph status
>>>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>>> health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>>>> 0 ceph1
>>>> osdmap e13: 3 osds: 3 up, 3 in
>>>> pgmap v233: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44
>>>> bytes data, 15465 MB used, 164 GB / 179 GB avail
>>>> mdsmap e1: 0/0/1 up
>>>>
>>>> # ceph pg repair 0.b
>>>> instructing pg 0.b on osd.1 to repair
>>>>
>>>> # ceph status
>>>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>>> health HEALTH_OK
>>>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>>>> 0 ceph1
>>>> osdmap e13: 3 osds: 3 up, 3 in
>>>&

Re: [ceph-users] PG repair failing when object missing

2013-10-24 Thread Inktank
I also created a ticket to try and handle this particular instance of bad 
behavior:
http://tracker.ceph.com/issues/6629
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On October 24, 2013 at 1:22:54 PM, Greg Farnum (gregory.far...@inktank.com) 
wrote:
>
>I was also able to reproduce this, guys, but I believe it’s specific to the 
>mode of testing rather than to anything being wrong with the OSD. In 
>particular, after restarting the OSD whose file I removed and running repair, 
>it did so successfully.
>The OSD has an “fd cacher” which caches open file handles, and we believe this 
>is what causes the observed behavior: if the removed object is among the most 
>recent objects touched, the FileStore (an OSD subsystem) has an open fd 
>cached, so when manually deleting the file the FileStore now has a deleted 
>file open. When the repair happens, it finds that open file descriptor and 
>applies the repair to it — which of course doesn’t help put it back into place!
>-Greg
>Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>On October 24, 2013 at 2:52:54 AM, Matt Thompson (watering...@gmail.com) wrote:
>>
>>Hi Harry,
>>
>>I was able to replicate this.
>>
>>What does appear to work (for me) is to do an osd scrub followed by a pg
>>repair. I've tried this 2x now and in each case the deleted file gets
>>copied over to the OSD from where it was removed. However, I've tried a
>>few pg scrub / pg repairs after manually deleting a file and have yet to
>>see the file get copied back to the OSD on which it was deleted. Like you
>>said, the pg repair sets the health of the PG back to active+clean, but
>>then re-running the pg scrub detects the file as missing again and sets it
>>back to active+clean+inconsistent.
>>
>>Regards,
>>Matt
>>
>>
>>On Wed, Oct 23, 2013 at 3:45 PM, Harry Harrington wrote:
>>
>>> Hi,
>>>
>>> I've been taking a look at the repair functionality in ceph. As I
>>> understand it the osds should try to copy an object from another member of
>>> the pg if it is missing. I have been attempting to test this by manually
>>> removing a file from one of the osds however each time the repair
>>> completes the the file has not been restored. If I run another scrub on the
>>> pg it gets flagged as inconsistent. See below for the output from my
>>> testing. I assume I'm missing something obvious, any insight into this
>>> process would be greatly appreciated.
>>>
>>> Thanks,
>>> Harry
>>>
>>> # ceph --version
>>> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>>> # ceph status
>>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>> health HEALTH_OK
>>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>>> 0 ceph1
>>> osdmap e13: 3 osds: 3 up, 3 in
>>> pgmap v232: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
>>> 164 GB / 179 GB avail
>>> mdsmap e1: 0/0/1 up
>>>
>>> file removed from osd.2
>>>
>>> # ceph pg scrub 0.b
>>> instructing pg 0.b on osd.1 to scrub
>>>
>>> # ceph status
>>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>> health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>>> 0 ceph1
>>> osdmap e13: 3 osds: 3 up, 3 in
>>> pgmap v233: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44
>>> bytes data, 15465 MB used, 164 GB / 179 GB avail
>>> mdsmap e1: 0/0/1 up
>>>
>>> # ceph pg repair 0.b
>>> instructing pg 0.b on osd.1 to repair
>>>
>>> # ceph status
>>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>> health HEALTH_OK
>>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>>> 0 ceph1
>>> osdmap e13: 3 osds: 3 up, 3 in
>>> pgmap v234: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
>>> 164 GB / 179 GB avail
>>> mdsmap e1: 0/0/1 up
>>>
>>> # ceph pg scrub 0.b
>>> instructing pg 0.b on osd.1 to scrub
>>>
>>> # ceph status
>>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>> health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>>> 0 ceph1
>>> osdmap e13: 3 osds: 3 up, 3 in
>>> pgmap v236: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44
>>> bytes data, 15465 MB used, 164 GB / 179 GB avail
>>> mdsmap e1: 0/0/1 up
>>>
>>>
>>>
>>> The logs from osd.1:
>>> 2013-10-23 14:12:31.188281 7f02a5161700 0 log [ERR] : 0.b osd.2 missing
>>> 3a643fcb/testfile1/head//0
>>> 2013-10-23 14:12:31.188312 7f02a5161700 0 log [ERR] : 0.b scrub 1
>>> missing, 0 inconsistent objects
>>> 2013-10-23 14:12:31.188319 7f02a5161700 0 log [ERR] : 0.b scrub 1 errors
>>> 2013-10-23 14:13:03.197802 7f02a5161700 0 log [ERR] : 0.b osd.2 missing
>>> 3a643fcb/testfile1/head//0
>>> 2013-10-23 14:13:03.197837 7f02a5161700 0 log [ERR] : 0.b repair 1
>>> missing, 0 inconsistent objects
>>> 2013-10-23 14:13:03.197850 7f02a5161700 0 log [ERR] : 0.b repair 1
>>> errors, 1 fixed
>>> 2013-10-23 14:14:47.232953 7f02a5161700 0 log [ERR] : 0.b osd.2 missing
>>> 3a643fcb/testfile1/head//0

Re: [ceph-users] PG repair failing when object missing

2013-10-24 Thread Greg Farnum
I was also able to reproduce this, guys, but I believe it’s specific to the 
mode of testing rather than to anything being wrong with the OSD. In 
particular, after restarting the OSD whose file I removed and running repair, 
it did so successfully.
The OSD has an “fd cacher” which caches open file handles, and we believe this 
is what causes the observed behavior: if the removed object is among the most 
recent  objects touched, the FileStore (an OSD subsystem) has an open fd 
cached, so when manually deleting the file the FileStore now has a deleted file 
open. When the repair happens, it finds that open file descriptor and applies 
the repair to it — which of course doesn’t help put it back into place!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On October 24, 2013 at 2:52:54 AM, Matt Thompson (watering...@gmail.com) wrote:
>
>Hi Harry,
>
>I was able to replicate this.
>
>What does appear to work (for me) is to do an osd scrub followed by a pg
>repair. I've tried this 2x now and in each case the deleted file gets
>copied over to the OSD from where it was removed. However, I've tried a
>few pg scrub / pg repairs after manually deleting a file and have yet to
>see the file get copied back to the OSD on which it was deleted. Like you
>said, the pg repair sets the health of the PG back to active+clean, but
>then re-running the pg scrub detects the file as missing again and sets it
>back to active+clean+inconsistent.
>
>Regards,
>Matt
>
>
>On Wed, Oct 23, 2013 at 3:45 PM, Harry Harrington wrote:
>
>> Hi,
>>
>> I've been taking a look at the repair functionality in ceph. As I
>> understand it the osds should try to copy an object from another member of
>> the pg if it is missing. I have been attempting to test this by manually
>> removing a file from one of the osds however each time the repair
>> completes the the file has not been restored. If I run another scrub on the
>> pg it gets flagged as inconsistent. See below for the output from my
>> testing. I assume I'm missing something obvious, any insight into this
>> process would be greatly appreciated.
>>
>> Thanks,
>> Harry
>>
>> # ceph --version
>> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>> # ceph status
>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>> health HEALTH_OK
>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>> osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v232: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
>> 164 GB / 179 GB avail
>> mdsmap e1: 0/0/1 up
>>
>> file removed from osd.2
>>
>> # ceph pg scrub 0.b
>> instructing pg 0.b on osd.1 to scrub
>>
>> # ceph status
>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>> health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>> osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v233: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44
>> bytes data, 15465 MB used, 164 GB / 179 GB avail
>> mdsmap e1: 0/0/1 up
>>
>> # ceph pg repair 0.b
>> instructing pg 0.b on osd.1 to repair
>>
>> # ceph status
>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>> health HEALTH_OK
>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>> osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v234: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
>> 164 GB / 179 GB avail
>> mdsmap e1: 0/0/1 up
>>
>> # ceph pg scrub 0.b
>> instructing pg 0.b on osd.1 to scrub
>>
>> # ceph status
>> cluster a4e417fe-0386-46a5-4475-ca7e10294273
>> health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>> monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>> osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v236: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44
>> bytes data, 15465 MB used, 164 GB / 179 GB avail
>> mdsmap e1: 0/0/1 up
>>
>>
>>
>> The logs from osd.1:
>> 2013-10-23 14:12:31.188281 7f02a5161700 0 log [ERR] : 0.b osd.2 missing
>> 3a643fcb/testfile1/head//0
>> 2013-10-23 14:12:31.188312 7f02a5161700 0 log [ERR] : 0.b scrub 1
>> missing, 0 inconsistent objects
>> 2013-10-23 14:12:31.188319 7f02a5161700 0 log [ERR] : 0.b scrub 1 errors
>> 2013-10-23 14:13:03.197802 7f02a5161700 0 log [ERR] : 0.b osd.2 missing
>> 3a643fcb/testfile1/head//0
>> 2013-10-23 14:13:03.197837 7f02a5161700 0 log [ERR] : 0.b repair 1
>> missing, 0 inconsistent objects
>> 2013-10-23 14:13:03.197850 7f02a5161700 0 log [ERR] : 0.b repair 1
>> errors, 1 fixed
>> 2013-10-23 14:14:47.232953 7f02a5161700 0 log [ERR] : 0.b osd.2 missing
>> 3a643fcb/testfile1/head//0
>> 2013-10-23 14:14:47.232985 7f02a5161700 0 log [ERR] : 0.b scrub 1
>> missing, 0 inconsistent objects
>> 2013-10-23 14:14:47.232991 7f02a5161700 0 log [ERR] : 0.b scrub 1 errors
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>__

Re: [ceph-users] PG repair failing when object missing

2013-10-24 Thread Matt Thompson
To add -- I thought I was running 0.67.4 on my test cluster (fc 19), but I
appear to be running 0.69.  Not sure how that happened as my yum config is
still pointing to dumpling.  :)


On Thu, Oct 24, 2013 at 10:52 AM, Matt Thompson wrote:

> Hi Harry,
>
> I was able to replicate this.
>
> What does appear to work (for me) is to do an osd scrub followed by a pg
> repair.  I've tried this 2x now and in each case the deleted file gets
> copied over to the OSD from where it was removed.  However, I've tried a
> few pg scrub / pg repairs after manually deleting a file and have yet to
> see the file get copied back to the OSD on which it was deleted.  Like you
> said, the pg repair sets the health of the PG back to active+clean, but
> then re-running the pg scrub detects the file as missing again and sets it
> back to active+clean+inconsistent.
>
> Regards,
> Matt
>
>
> On Wed, Oct 23, 2013 at 3:45 PM, Harry Harrington wrote:
>
>> Hi,
>>
>> I've been taking a look at the repair functionality in ceph. As I
>> understand it the osds should try to copy an object from another member of
>> the pg if it is missing. I have been attempting to test this by manually
>> removing  a file from one of the osds however each time the repair
>> completes the the file has not been restored. If I run another scrub on the
>> pg it gets flagged as inconsistent. See below for the output from my
>> testing. I assume I'm missing something obvious, any insight into this
>> process would be greatly appreciated.
>>
>> Thanks,
>> Harry
>>
>> # ceph --version
>> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>> # ceph status
>>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>health HEALTH_OK
>>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>>osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v232: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
>> 164 GB / 179 GB avail
>>mdsmap e1: 0/0/1 up
>>
>> file removed from osd.2
>>
>> # ceph pg scrub 0.b
>> instructing pg 0.b on osd.1 to scrub
>>
>> # ceph status
>>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>>osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v233: 192 pgs: 191 active+clean, 1 active+clean+inconsistent;
>> 44 bytes data, 15465 MB used, 164 GB / 179 GB avail
>>mdsmap e1: 0/0/1 up
>>
>> # ceph pg repair 0.b
>> instructing pg 0.b on osd.1 to repair
>>
>> # ceph status
>>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>health HEALTH_OK
>>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>>osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v234: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
>> 164 GB / 179 GB avail
>>mdsmap e1: 0/0/1 up
>>
>> # ceph pg scrub 0.b
>> instructing pg 0.b on osd.1 to scrub
>>
>> # ceph status
>>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>>health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
>> 0 ceph1
>>osdmap e13: 3 osds: 3 up, 3 in
>> pgmap v236: 192 pgs: 191 active+clean, 1 active+clean+inconsistent;
>> 44 bytes data, 15465 MB used, 164 GB / 179 GB avail
>>mdsmap e1: 0/0/1 up
>>
>>
>>
>> The logs from osd.1:
>> 2013-10-23 14:12:31.188281 7f02a5161700  0 log [ERR] : 0.b osd.2 missing
>> 3a643fcb/testfile1/head//0
>> 2013-10-23 14:12:31.188312 7f02a5161700  0 log [ERR] : 0.b scrub 1
>> missing, 0 inconsistent objects
>> 2013-10-23 14:12:31.188319 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors
>> 2013-10-23 14:13:03.197802 7f02a5161700  0 log [ERR] : 0.b osd.2 missing
>> 3a643fcb/testfile1/head//0
>> 2013-10-23 14:13:03.197837 7f02a5161700  0 log [ERR] : 0.b repair 1
>> missing, 0 inconsistent objects
>> 2013-10-23 14:13:03.197850 7f02a5161700  0 log [ERR] : 0.b repair 1
>> errors, 1 fixed
>> 2013-10-23 14:14:47.232953 7f02a5161700  0 log [ERR] : 0.b osd.2 missing
>> 3a643fcb/testfile1/head//0
>> 2013-10-23 14:14:47.232985 7f02a5161700  0 log [ERR] : 0.b scrub 1
>> missing, 0 inconsistent objects
>> 2013-10-23 14:14:47.232991 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG repair failing when object missing

2013-10-24 Thread Matt Thompson
Hi Harry,

I was able to replicate this.

What does appear to work (for me) is to do an osd scrub followed by a pg
repair.  I've tried this 2x now and in each case the deleted file gets
copied over to the OSD from where it was removed.  However, I've tried a
few pg scrub / pg repairs after manually deleting a file and have yet to
see the file get copied back to the OSD on which it was deleted.  Like you
said, the pg repair sets the health of the PG back to active+clean, but
then re-running the pg scrub detects the file as missing again and sets it
back to active+clean+inconsistent.

Regards,
Matt


On Wed, Oct 23, 2013 at 3:45 PM, Harry Harrington wrote:

> Hi,
>
> I've been taking a look at the repair functionality in ceph. As I
> understand it the osds should try to copy an object from another member of
> the pg if it is missing. I have been attempting to test this by manually
> removing  a file from one of the osds however each time the repair
> completes the the file has not been restored. If I run another scrub on the
> pg it gets flagged as inconsistent. See below for the output from my
> testing. I assume I'm missing something obvious, any insight into this
> process would be greatly appreciated.
>
> Thanks,
> Harry
>
> # ceph --version
> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
> # ceph status
>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>health HEALTH_OK
>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
> 0 ceph1
>osdmap e13: 3 osds: 3 up, 3 in
> pgmap v232: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
> 164 GB / 179 GB avail
>mdsmap e1: 0/0/1 up
>
> file removed from osd.2
>
> # ceph pg scrub 0.b
> instructing pg 0.b on osd.1 to scrub
>
> # ceph status
>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
> 0 ceph1
>osdmap e13: 3 osds: 3 up, 3 in
> pgmap v233: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44
> bytes data, 15465 MB used, 164 GB / 179 GB avail
>mdsmap e1: 0/0/1 up
>
> # ceph pg repair 0.b
> instructing pg 0.b on osd.1 to repair
>
> # ceph status
>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>health HEALTH_OK
>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
> 0 ceph1
>osdmap e13: 3 osds: 3 up, 3 in
> pgmap v234: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used,
> 164 GB / 179 GB avail
>mdsmap e1: 0/0/1 up
>
> # ceph pg scrub 0.b
> instructing pg 0.b on osd.1 to scrub
>
> # ceph status
>   cluster a4e417fe-0386-46a5-4475-ca7e10294273
>health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum
> 0 ceph1
>osdmap e13: 3 osds: 3 up, 3 in
> pgmap v236: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44
> bytes data, 15465 MB used, 164 GB / 179 GB avail
>mdsmap e1: 0/0/1 up
>
>
>
> The logs from osd.1:
> 2013-10-23 14:12:31.188281 7f02a5161700  0 log [ERR] : 0.b osd.2 missing
> 3a643fcb/testfile1/head//0
> 2013-10-23 14:12:31.188312 7f02a5161700  0 log [ERR] : 0.b scrub 1
> missing, 0 inconsistent objects
> 2013-10-23 14:12:31.188319 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors
> 2013-10-23 14:13:03.197802 7f02a5161700  0 log [ERR] : 0.b osd.2 missing
> 3a643fcb/testfile1/head//0
> 2013-10-23 14:13:03.197837 7f02a5161700  0 log [ERR] : 0.b repair 1
> missing, 0 inconsistent objects
> 2013-10-23 14:13:03.197850 7f02a5161700  0 log [ERR] : 0.b repair 1
> errors, 1 fixed
> 2013-10-23 14:14:47.232953 7f02a5161700  0 log [ERR] : 0.b osd.2 missing
> 3a643fcb/testfile1/head//0
> 2013-10-23 14:14:47.232985 7f02a5161700  0 log [ERR] : 0.b scrub 1
> missing, 0 inconsistent objects
> 2013-10-23 14:14:47.232991 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG repair failing when object missing

2013-10-23 Thread Harry Harrington
Hi,

I've been taking a look at the repair functionality in ceph. As I understand it 
the osds should try to copy an object from another member of the pg if it is 
missing. I have been attempting to test this by manually removing  a file from 
one of the osds however each time the repair completes the the file has not 
been restored. If I run another scrub on the pg it gets flagged as 
inconsistent. See below for the output from my testing. I assume I'm missing 
something obvious, any insight into this process would be greatly appreciated.

Thanks,
Harry

# ceph --version
ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_OK
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v232: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used, 164 GB 
/ 179 GB avail
   mdsmap e1: 0/0/1 up

file removed from osd.2

# ceph pg scrub 0.b
instructing pg 0.b on osd.1 to scrub

# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v233: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44 
bytes data, 15465 MB used, 164 GB / 179 GB avail
   mdsmap e1: 0/0/1 up

# ceph pg repair 0.b
instructing pg 0.b on osd.1 to repair

# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_OK
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v234: 192 pgs: 192 active+clean; 44 bytes data, 15465 MB used, 164 GB 
/ 179 GB avail
   mdsmap e1: 0/0/1 up

# ceph pg scrub 0.b
instructing pg 0.b on osd.1 to scrub

# ceph status
  cluster a4e417fe-0386-46a5-4475-ca7e10294273
   health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
   monmap e1: 1 mons at {ceph1=1.2.3.4:6789/0}, election epoch 2, quorum 0 ceph1
   osdmap e13: 3 osds: 3 up, 3 in
    pgmap v236: 192 pgs: 191 active+clean, 1 active+clean+inconsistent; 44 
bytes data, 15465 MB used, 164 GB / 179 GB avail
   mdsmap e1: 0/0/1 up



The logs from osd.1:
2013-10-23 14:12:31.188281 7f02a5161700  0 log [ERR] : 0.b osd.2 missing 
3a643fcb/testfile1/head//0
2013-10-23 14:12:31.188312 7f02a5161700  0 log [ERR] : 0.b scrub 1 missing, 0 
inconsistent objects
2013-10-23 14:12:31.188319 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors
2013-10-23 14:13:03.197802 7f02a5161700  0 log [ERR] : 0.b osd.2 missing 
3a643fcb/testfile1/head//0
2013-10-23 14:13:03.197837 7f02a5161700  0 log [ERR] : 0.b repair 1 missing, 0 
inconsistent objects
2013-10-23 14:13:03.197850 7f02a5161700  0 log [ERR] : 0.b repair 1 errors, 1 
fixed
2013-10-23 14:14:47.232953 7f02a5161700  0 log [ERR] : 0.b osd.2 missing 
3a643fcb/testfile1/head//0
2013-10-23 14:14:47.232985 7f02a5161700  0 log [ERR] : 0.b scrub 1 missing, 0 
inconsistent objects
2013-10-23 14:14:47.232991 7f02a5161700  0 log [ERR] : 0.b scrub 1 errors   
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com