.... also, I should point out that if you've already upgraded to
Luminous, you can just use the new RBD caps profiles (a la mon
'profile rbd' osd 'profile rbd') [1]. The explicit blacklist caps
mentioned in the upgrade guide are only required since pre-Luminous
clusters didn't support the RBD caps profiles.

[1] 
http://docs.ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication

On Thu, May 10, 2018 at 10:11 AM, Jason Dillaman <jdill...@redhat.com> wrote:
> It only bites you if you have a hard failure of a VM (i.e. the RBD
> image wasn't cleanly closed and the lock wasn't cleanly released). In
> that case, the next librbd client to attempt to acquire the lock will
> notice the dead lock owner and will attempt to blacklist it from the
> cluster to ensure it cannot write to the image.
>
> On Thu, May 10, 2018 at 10:08 AM, Jonathan Proulx <j...@csail.mit.edu> wrote:
>> On Thu, May 10, 2018 at 09:55:15AM -0700, Jason Dillaman wrote:
>> :My immediate guess is that your caps are incorrect for your OpenStack
>> :Ceph user. Please refer to step 6 from the Luminous upgrade guide to
>> :ensure your RBD users have permission to blacklist dead peers [1]
>> :
>> :[1] 
>> http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken
>>
>> Good spotting!  Thanks for fastreply.  Next question is why did this
>> take so long to bite me we've been on luminous for 6 months, not going
>> to worry too myc about that last quetion though.
>>
>> Hoepfully that was the problem (it definitely was a problem).
>>
>> Thanks,
>> -Jon
>>
>> :On Thu, May 10, 2018 at 9:49 AM, Jonathan Proulx <j...@csail.mit.edu> wrote:
>> :> Hi All,
>> :>
>> :> recently I saw a number of rbd backed VMs in my openstack cloud fail
>> :> to reboot after a hypervisor crash with errors simialr to:
>> :>
>> :> [    5.279393] blk_update_request: I/O error, dev vda, sector 2048
>> :> [    5.281427] Buffer I/O error on dev vda1, logical block 0, lost async 
>> page write
>> :> [    5.284114] Buffer I/O error on dev vda1, logical block 1, lost async 
>> page write
>> :> [    5.286600] Buffer I/O error on dev vda1, logical block 2, lost async 
>> page write
>> :> [    5.289022] Buffer I/O error on dev vda1, logical block 3, lost async 
>> page write
>> :> [    5.291515] Buffer I/O error on dev vda1, logical block 4, lost async 
>> page write
>> :> [    5.338981] blk_update_request: I/O error, dev vda, sector 3088
>> :>
>> :> for many blocks and sectors. I was able to export the rbd images and
>> :> they seemed fine, also 'rbd flatten' made them boot again with no
>> :> errors.
>> :>
>> :> I found this puzzling and concerning but given the crash and limited
>> :> time didn't really follow up.
>> :>
>> :> Today I intetionally rebooted a VM on a health hypervisor and had it
>> :> land in the same condition, now I'm really worried.
>> :>
>> :> running:
>> :> Ubuntu16.04
>> :> ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous 
>> (stable) (on hypervisor)
>> :> {
>> :>     "mon": {
>> :>         "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) 
>> luminous (stable)": 3
>> :>     },
>> :>     "mgr": {
>> :>         "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) 
>> luminous (stable)": 3
>> :>     },
>> :>     "osd": {
>> :>         "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) 
>> luminous (stable)": 102,
>> :>         "ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) 
>> luminous (stable)": 10,
>> :>         "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) 
>> luminous (stable)": 62
>> :>     }
>> :> }
>> :> libvirt-bin    1.3.1-1ubuntu10.21
>> :> qemu-system    1:2.5+dfsg-5ubuntu10.24
>> :> OpenStack Mitaka
>> :>
>> :> Any one seen anything like this or have suggestions where to look for 
>> more details?
>> :>
>> :> -Jon
>> :> --
>> :> _______________________________________________
>> :> ceph-users mailing list
>> :> ceph-users@lists.ceph.com
>> :> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> :
>> :
>> :
>> :--
>> :Jason
>>
>> --
>
>
>
> --
> Jason



-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to