Re: FAS issue (was Re: Another mass rebuild blocker: glibc qsort regression)

2024-02-12 Thread Tomasz Torcz
On Tue, Jan 23, 2024 at 09:04:53AM -0600, Chris Adams wrote:
> Once upon a time, Richard W.M. Jones  said:
> > The authentication issue being this one:
> > 
> > https://pagure.io/fedora-infrastructure/issue/11733
> 
> I'd be interested in an after report on this one... as someone who has
> managed FreeIPA, I'd like to know how this happened (so I can file away
> how to NOT do the same thing in my own setups).

Form the comment
(https://pagure.io/fedora-infrastructure/issue/11733#comment-892793)
it seems that new requirement of users having SID caught Fedora FreeIPA
of guard.
  There's `ipa config-mod` invocation to add SIDs to users, but you must
make sure you have ID Ranges defined covering all your UIDs and GIDs.


-- 
Tomasz TorczOnly gods can safely risk perfection,
to...@pipebreaker.pl it's a dangerous thing for a man.  — Alia
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: FAS issue (was Re: Another mass rebuild blocker: glibc qsort regression)

2024-01-23 Thread Chris Adams
Once upon a time, kevin  said:
> On Tue, Jan 23, 2024 at 09:04:53AM -0600, Chris Adams wrote:
> > Once upon a time, Richard W.M. Jones  said:
> > > The authentication issue being this one:
> > > 
> > > https://pagure.io/fedora-infrastructure/issue/11733
> > 
> > I'd be interested in an after report on this one... as someone who has
> > managed FreeIPA, I'd like to know how this happened (so I can file away
> > how to NOT do the same thing in my own setups).
> 
> It seemed to be a number of things at once sadly, as often such things
> are. We took a cluster member down and reinstalled rhel9 on it (to start
> upgrading the cluster), but then the replication agreements for all
> nodes were accidentally removed. That might have been easily
> recoverable, but then we also hit that in our case the cluster was
> installed a long time ago and didn't have SID's, which became manditory
> to fix a CVE in the most recent version. And then we also hit some old
> kruft leftover from when our cluster was in another datacenter long
> ago. ;( 

Tech debt always wins, doesn't it... it's not always due to a lack of
effort or anything, but it does seem to jump up at the worst times.

> Many kudos to everyone who worked on this. Especially the IPA folks.
> They have been calm and understanding and really helped us track
> things down and get back working.

Thanks to all who worked on this for getting it back into a serviceable
state!  Hope the path to fully finishing is smooth.

-- 
Chris Adams 
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: FAS issue (was Re: Another mass rebuild blocker: glibc qsort regression)

2024-01-23 Thread kevin
On Tue, Jan 23, 2024 at 09:04:53AM -0600, Chris Adams wrote:
> Once upon a time, Richard W.M. Jones  said:
> > The authentication issue being this one:
> > 
> > https://pagure.io/fedora-infrastructure/issue/11733
> 
> I'd be interested in an after report on this one... as someone who has
> managed FreeIPA, I'd like to know how this happened (so I can file away
> how to NOT do the same thing in my own setups).

It seemed to be a number of things at once sadly, as often such things
are. We took a cluster member down and reinstalled rhel9 on it (to start
upgrading the cluster), but then the replication agreements for all
nodes were accidentally removed. That might have been easily
recoverable, but then we also hit that in our case the cluster was
installed a long time ago and didn't have SID's, which became manditory
to fix a CVE in the most recent version. And then we also hit some old
kruft leftover from when our cluster was in another datacenter long
ago. ;( 

Many kudos to everyone who worked on this. Especially the IPA folks.
They have been calm and understanding and really helped us track
things down and get back working.

> Certainly not bothering anybody while there's still an outage (or while
> they're recovering from dealing with it), but when things like this
> happen, it's good for everybody to document how it happened - NOT to
> cast blame or anything like that (sooner or later, we all do something
> that breaks in wildly unexpected ways), but so we can all learn from the
> mistake.

Absolutely. 

Things are not fully normal now, but everything should be up from the
user perspective. We will be working to get the cluster back to a normal
state in the next few days, then we can look at retrospective.

kevin


signature.asc
Description: PGP signature
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


FAS issue (was Re: Another mass rebuild blocker: glibc qsort regression)

2024-01-23 Thread Chris Adams
Once upon a time, Richard W.M. Jones  said:
> The authentication issue being this one:
> 
> https://pagure.io/fedora-infrastructure/issue/11733

I'd be interested in an after report on this one... as someone who has
managed FreeIPA, I'd like to know how this happened (so I can file away
how to NOT do the same thing in my own setups).

Certainly not bothering anybody while there's still an outage (or while
they're recovering from dealing with it), but when things like this
happen, it's good for everybody to document how it happened - NOT to
cast blame or anything like that (sooner or later, we all do something
that breaks in wildly unexpected ways), but so we can all learn from the
mistake.

-- 
Chris Adams 
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue