Hello Folks, Thanks for Your inspiration; and I made some progress (found a way to avoid the issue).
The issue is most likely not related to postgres. Ron Johnson said: >> A configuration problem on the machine(s) can be ruled out, > Famous last words. Trust me. :) > Is there a way to test pmc authentication via some other tool, like psql? Sure, that works. The problem is contained inside the running application program(s), everything else doesn't show it. > If *only *the application changed, then by definition it can't be a > database problem. *Something* in the application changed; you just haven't > found it. Obviousely, yes. But then, such a change might expose an undesired behaviour elsewhere. > Specifically, I'd read the Discourse 2.3.0 and 2.3.1 release notes. Correction: it is actually 3.2.0 and 3.3.1. I finally went the way of bisecting, and, it's not really a problem in Discourse either. It comes from a feature I had enabled in the course of migrating, a filesystem change monitor based on kqueue: https://man.freebsd.org/cgi/man.cgi?query=kqueue Removing that feature solves the issue for now. I have still no idea how that tool might lead to mishandled sockets elsewhere; it might somehow have to do with the async processing of the DB connect. That would need a thorough look into the code where this is done. Tom Lane wrote: >The TCP trace looks like the client side is timing out too quickly >in the unsuccessful case. It's not clear to me how the different >Discourse version would lead to the Kerberos library applying a >different timeout. It's not a timeout; a timeout would close the socket. It seems to rather forget the socket. >Still, it seems like most of the moving parts >here are outside of Postgres' control --- I don't think that libpq >itself has much involvement in the KDC communication. Kerberos is weird. It goes into libgssapi, but libgssapi doesn't do much on it's own, it just maps so-called "mech"s, which then point to the actual kerberos code - which in the case of FreeBSD is very ancient (but work should be underway to modernize it). It's one of the most creepy pieces of code I've looked into. > I concur with looking at the Discourse release notes and maybe asking > some questions in that community. They only support that app to run in a certain containerization on a specific brand of Linux. They don't like my questions and might just delete them. Anyway, I have a lead now to either avoid the problem or where to look more closely. And it has not directly to do with postgres, but rather with genuine socket mishandling and/or maybe some flaw in FreeBSD. cheers, PMc