Hi Cory, Thanks for identifying the bug and creating a PR to fix it. We'll do a retrospective on this issue to catch and avoid such regressions in the future. At the moment, we will go ahead with a minimal 16.2.9 release for this issue.
Thanks, Neha On Tue, May 17, 2022 at 5:03 AM Cory Snyder <csny...@iland.com> wrote: > > Yep, sorry about that. Thanks for the correction, Dan! > > On Tue, May 17, 2022 at 7:44 AM Dan van der Ster <dvand...@gmail.com> wrote: > > > On Tue, May 17, 2022 at 1:14 PM Cory Snyder <csny...@iland.com> wrote: > > > > > > Hi all, > > > > > > Unfortunately, we experienced some issues with the upgrade to 16.2.8 > > > on one of our larger clusters. Within a few hours of the upgrade, all > > > 5 of our managers had become unavailable. We found that they were all > > > deadlocked due to (what appears to be) a regression with GIL and mutex > > > handling. See https://tracker.ceph.com/issues/39264 and > > > https://github.com/ceph/ceph/pull/38677 for context on previous > > > manifestations of the issue. > > > > > > I discovered some mistakes within a recent Pacific backport that seem > > > to be responsible. Here is the tracker for the regression: > > > https://tracker.ceph.com/issues/55687. Here is an open PR that should > > > resolve the problem: https://github.com/ceph/ceph/pull/38677. > > > > I guess you mean https://github.com/ceph/ceph/pull/46302 ? > > > > Thanks > > > > .. dan > > > > > > > > Note that this is a sort of race condition, and the issue tends to > > > manifest itself more frequently in larger clusters. Enabling certain > > > modules may also make it more likely to occur. On our cluster, MGRs > > > are consistently deadlocking within about an hour. > > > > > > Hopefully this is useful to others who are considering an upgrade! > > > > > > Thanks, > > > > > > Cory Snyder > > > > > > > > > > > > > > > > > > > > > On Mon, May 16, 2022 at 3:46 PM David Galloway <dgall...@redhat.com> > > wrote: > > > > > > > > We're happy to announce the 8th backport release in the Pacific series. > > > > We recommend users to update to this release. For a detailed release > > > > notes with links & changelog please refer to the official blog entry at > > > > https://ceph.io/en/news/blog/2022/v16-2-8-pacific-released > > > > > > > > Notable Changes > > > > --------------- > > > > > > > > * MON/MGR: Pools can now be created with `--bulk` flag. Any pools > > > > created with `bulk` will use a profile of the `pg_autoscaler` that > > > > provides more performance from the start. However, any pools created > > > > without the `--bulk` flag will remain using it's old behavior by > > > > default. For more details, see: > > > > https://docs.ceph.com/en/latest/rados/operations/placement-groups/ > > > > > > > > * MGR: The pg_autoscaler can now be turned `on` and `off` globally with > > > > the `noautoscale` flag. By default this flag is unset and the default > > > > pg_autoscale mode remains the same. For more details, see: > > > > https://docs.ceph.com/en/latest/rados/operations/placement-groups/ > > > > > > > > * A health warning will now be reported if the ``require-osd-release`` > > > > flag is not set to the appropriate release after a cluster upgrade. > > > > > > > > * CephFS: Upgrading Ceph Metadata Servers when using multiple active > > > > MDSs requires ensuring no pending stray entries which are directories > > > > are present for active ranks except rank 0. See > > > > > > https://docs.ceph.com/en/latest/releases/pacific/#upgrading-from-octopus-or-nautilus > > . > > > > > > > > Getting Ceph > > > > ------------ > > > > * Git at git://github.com/ceph/ceph.git > > > > * Tarball at https://download.ceph.com/tarballs/ceph-16.2.8.tar.gz > > > > * Containers at https://quay.io/repository/ceph/ceph > > > > * For packages, see > > https://docs.ceph.com/docs/master/install/get-packages/ > > > > * Release git sha1: 209e51b856505df4f2f16e54c0d7a9e070973185 > > > > > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > -- > > > cory snyder > > staff software engineer > > USA: +1.567.331.2061 > > <https://www.linkedin.com/company/ilandcloud/> > <https://twitter.com/ilandcloud> <https://www.facebook.com/ilandcloud/> > <https://www.youtube.com/user/ilandinternet> > > <https://iland.com/resources/1111-systems-completes-acquisition-of-iland/?utm_campaign=Email%20Signature&utm_medium=email%20signature&utm_source=sig_banner> > > > iland Americas, 1235 North Loop West, Suite 800 Houston, TX 77008 > > iland Europe, 51 Eastcheap, London EC3M 1JP > > iland Asia, 80 Robinson Rd #02-00 Singapore 068898 > > iland Australia, Level 11, 50 Margaret Street, Sydney, NSW 2000 > ------------------------------ > > Disclaimer: The information transmitted is intended only for the person or > entity to which it is addressed and may contain confidential and/or > privileged material. Any review, retransmission, dissemination or other use > of, or taking of any action in reliance upon, this information by persons > or entities other than the intended recipient is prohibited. If you > received this in error, please contact the sender and delete the material > from any computer. > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io