Hello Folks,

  Thanks for Your inspiration; and I made some progress (found
a way to avoid the issue).

The issue is most likely not related to postgres.

Ron Johnson said:

>> A configuration problem on the machine(s) can be ruled out,
> Famous last words.

Trust me. :)

> Is there a way to test pmc authentication via some other tool, like psql?

Sure, that works. The problem is contained inside the running
application program(s), everything else doesn't show it.

> If *only *the application changed, then by definition it can't be a
> database problem.  *Something* in the application changed; you just haven't
> found it.

Obviousely, yes. But then, such a change might expose an undesired
behaviour elsewhere.

> Specifically, I'd read the Discourse 2.3.0 and 2.3.1 release notes.

Correction: it is actually 3.2.0 and 3.3.1.

I finally went the way of bisecting, and, it's not really a problem in
Discourse either. It comes from a feature I had enabled in the course
of migrating, a filesystem change monitor based on kqueue:
   https://man.freebsd.org/cgi/man.cgi?query=kqueue
Removing that feature solves the issue for now.

I have still no idea how that tool might lead to mishandled sockets
elsewhere; it might somehow have to do with the async processing of
the DB connect. That would need a thorough look into the code where
this is done.

Tom Lane wrote:

>The TCP trace looks like the client side is timing out too quickly
>in the unsuccessful case. It's not clear to me how the different
>Discourse version would lead to the Kerberos library applying a
>different timeout.

It's not a timeout; a timeout would close the socket. It seems to
rather forget the socket.

>Still, it seems like most of the moving parts
>here are outside of Postgres' control --- I don't think that libpq
>itself has much involvement in the KDC communication.

Kerberos is weird. It goes into libgssapi, but libgssapi doesn't
do much on it's own, it just maps so-called "mech"s, which then point
to the actual kerberos code - which in the case of FreeBSD is very
ancient (but work should be underway to modernize it). It's one of
the most creepy pieces of code I've looked into.

> I concur with looking at the Discourse release notes and maybe asking
> some questions in that community.

They only support that app to run in a certain containerization
on a specific brand of Linux. They don't like my questions and
might just delete them.

Anyway, I have a lead now to either avoid the problem or where to
look more closely. And it has not directly to do with postgres, but
rather with genuine socket mishandling and/or maybe some flaw in
FreeBSD.

cheers,
PMc


Reply via email to