> That doesn't appear to be an error -- that's just stating that it found a 
> dead client that was holding the exclusice-lock, so it broke the dead 
> client's lock on the image (by blacklisting the client).

As there is only 1 RBD client in this testing, does it mean the RBD client 
process keeps failing?
In a fresh boot RBD client, doing some basic operations also gives the warning:

---------------- cut here ----------------
# rbd -n client.acapp1 map 4copy/foo
/dev/rbd0
# mount /dev/rbd0 /4copy
# cd /4copy; ls


# tail /var/log/messages
Jan 28 14:23:39 acapp1 kernel: Key type ceph registered
Jan 28 14:23:39 acapp1 kernel: libceph: loaded (mon/osd proto 15/24)
Jan 28 14:23:39 acapp1 kernel: rbd: loaded (major 252)
Jan 28 14:23:39 acapp1 kernel: libceph: mon2 192.168.1.156:6789 session 
established
Jan 28 14:23:39 acapp1 kernel: libceph: client80624 fsid 
cc795498-5d16-4b84-9584-1788d0458be9
Jan 28 14:23:39 acapp1 kernel: rbd: rbd0: capacity 10737418240 features 0x5
Jan 28 14:23:44 acapp1 kernel: XFS (rbd0): Mounting V5 Filesystem
Jan 28 14:23:44 acapp1 kernel: rbd: rbd0: client80621 seems dead, breaking lock 
        <--
Jan 28 14:23:45 acapp1 kernel: XFS (rbd0): Starting recovery (logdev: internal)
Jan 28 14:23:45 acapp1 kernel: XFS (rbd0): Ending recovery (logdev: internal)

---------------- cut here ----------------

Is this normal?



Besides, repeated the testing:
* Map and mount the rbd device, read/write ok.
* Umount all rbd, then reboot without problem
* Reboot hangs if not umounting all rbd before reboot:

---------------- cut here ----------------
Jan 28 14:13:12 acapp1 kernel: rbd: rbd0: client80531 seems dead, breaking lock
Jan 28 14:13:13 acapp1 kernel: XFS (rbd0): Ending clean mount                   
        <-- Reboot hangs here
Jan 28 14:14:06 acapp1 systemd: Stopping Session 1 of user root.                
        <-- pressing power reset 
Jan 28 14:14:06 acapp1 systemd: Stopped target Multi-User System.
---------------- cut here ----------------

Is it necessary to umount all RDB before rebooting  the client host?

Thanks a lot.
/st

-----Original Message-----
From: Jason Dillaman <jdill...@redhat.com> 
Sent: Friday, January 25, 2019 10:04 PM
To: ST Wong (ITSC) <s...@itsc.cuhk.edu.hk>
Cc: dilla...@redhat.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RBD client hangs

That doesn't appear to be an error -- that's just stating that it found a dead 
client that was holding the exclusice-lock, so it broke the dead client's lock 
on the image (by blacklisting the client).

On Fri, Jan 25, 2019 at 5:09 AM ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> wrote:
>
> Oops, while I can map and mount the filesystem, still found error as below, 
> while rebooting the client machine freezes and have to power reset her.
>
> Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Mounting V5 Filesystem Jan 
> 25 17:57:30 acapp1 kernel: rbd: rbd0: client74700 seems dead, breaking 
> lock ß Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Starting recovery 
> (logdev: internal) Jan 25 17:57:30 acapp1 kernel: XFS (rbd0): Ending 
> recovery (logdev: internal) Jan 25 17:58:07 acapp1 kernel: rbd: rbd1: 
> capacity 10737418240 features 0x5 Jan 25 17:58:14 acapp1 kernel: XFS 
> (rbd1): Mounting V5 Filesystem Jan 25 17:58:14 acapp1 kernel: rbd: 
> rbd1: client74700 seems dead, breaking lock ß Jan 25 17:58:15 acapp1 
> kernel: XFS (rbd1): Starting recovery (logdev: internal) Jan 25 
> 17:58:15 acapp1 kernel: XFS (rbd1): Ending recovery (logdev: internal)
>
> Would you help ?   Thanks.
> /st
>
> -----Original Message-----
> From: ceph-users <ceph-users-boun...@lists.ceph.com> On Behalf Of ST 
> Wong (ITSC)
> Sent: Friday, January 25, 2019 5:58 PM
> To: dilla...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] RBD client hangs
>
> Hi,  It works.  Thanks a lot.
>
> /st
>
> -----Original Message-----
> From: Jason Dillaman <jdill...@redhat.com>
> Sent: Tuesday, January 22, 2019 9:29 PM
> To: ST Wong (ITSC) <s...@itsc.cuhk.edu.hk>
> Cc: Ilya Dryomov <idryo...@gmail.com>; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] RBD client hangs
>
> Your "mon" cap should be "profile rbd" instead of "allow r" [1].
>
> [1] 
> http://docs.ceph.com/docs/master/rbd/rados-rbd-cmds/#create-a-block-de
> vice-user
>
> On Mon, Jan 21, 2019 at 9:05 PM ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> wrote:
> >
> > Hi,
> >
> > > Is this an upgraded or a fresh cluster?
> > It's a fresh cluster.
> >
> > > Does client.acapp1 have the permission to blacklist other clients?  You 
> > > can check with "ceph auth get client.acapp1".
> >
> > No,  it's our first Ceph cluster with basic setup for testing, without any 
> > blacklist implemented.
> >
> > --------------- cut here ----------- # ceph auth get client.acapp1 
> > exported keyring for client.acapp1 [client.acapp1]
> >         key = <key here>
> >         caps mds = "allow r"
> >         caps mgr = "allow r"
> >         caps mon = "allow r"
> >         caps osd = "allow rwx pool=2copy, allow rwx pool=4copy"
> > --------------- cut here -----------
> >
> > Thanks a lot.
> > /st
> >
> >
> >
> > -----Original Message-----
> > From: Ilya Dryomov <idryo...@gmail.com>
> > Sent: Monday, January 21, 2019 7:33 PM
> > To: ST Wong (ITSC) <s...@itsc.cuhk.edu.hk>
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] RBD client hangs
> >
> > On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) <s...@itsc.cuhk.edu.hk> 
> > wrote:
> > >
> > > Hi, we’re trying mimic on an VM farm.  It consists 4 OSD hosts (8 OSDs) 
> > > and 3 MON.     We tried mounting as RBD and CephFS (fuse and kernel 
> > > mount) on different clients without problem.
> >
> > Is this an upgraded or a fresh cluster?
> >
> > >
> > > Then one day we perform failover test and stopped one of the OSD.  Not 
> > > sure if it’s related but after that testing, the RBD client freeze when 
> > > trying to mount the rbd device.
> > >
> > >
> > >
> > > Steps to reproduce:
> > >
> > >
> > >
> > > # modprobe rbd
> > >
> > >
> > >
> > > (dmesg)
> > >
> > > [  309.997587] Key type dns_resolver registered
> > >
> > > [  310.043647] Key type ceph registered
> > >
> > > [  310.044325] libceph: loaded (mon/osd proto 15/24)
> > >
> > > [  310.054548] rbd: loaded
> > >
> > >
> > >
> > > # rbd -n client.acapp1 map 4copy/foo
> > >
> > > /dev/rbd0
> > >
> > >
> > >
> > > # rbd showmapped
> > >
> > > id pool  image snap device
> > >
> > > 0  4copy foo   -    /dev/rbd0
> > >
> > >
> > >
> > >
> > >
> > > Then hangs if I tried to mount or reboot the server after rbd map.   
> > > There are lot of error in dmesg, e.g.
> > >
> > >
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700
> > > failed: -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock:
> > > -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: client74700 seems dead, 
> > > breaking lock
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700
> > > failed: -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock:
> > > -13
> > >
> > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected
> >
> > Does client.acapp1 have the permission to blacklist other clients?  You can 
> > check with "ceph auth get client.acapp1".  If not, follow step 6 of 
> > http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken.
> >
> > Thanks,
> >
> >                 Ilya
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to