> On Dec 10, 2025, at 05:00, Jacob Champion <[email protected]> > wrote: > > Hi everybody, > > We introduced the libpq-oauth module late in the cycle for PG18, and > to put it mildly, its interface isn't great. The original > implementation depended on libpq internals, and we had to make sure > that we didn't start crashing during major or minor version upgrades. > So there were a bunch of compromises made to keep things safe, > including the restriction that the module name has to change every > major release. > > Separately, but closely related: PG18's OAuth support allows you to > customize the client flow via a libpq hook function. Third-party > applications can make use of that, but our own utilities can still > only use the builtin device flow. That's annoying. > > I've been working to replace the internal ABI with a stable one, > hopefully to solve both problems at the same time. A dlopen() is a > pretty clear seam for other people to use to modify and extend. > Unfortunately my first attempt (not pictured) ended up in a rabbit > hole, because I tried to tackle the third-party use case first. My > second attempt, attached, focuses on the ABI stabilization instead, > which I think is more likely to succeed. > > (This took enough thinking that I'm really glad we didn't try this for > PG18. Thanks for letting me take on some technical debt for a bit.) > > = Design = > > Here's the train of thought behind the core changes (which are in patch 0004): > > The builtin-flow code in fe-auth-oauth.c is similar to the custom-flow > code, but it's just ever-so-slightly different. I'd like to unify the > two, so I want libpq-oauth to make use of the public > PGoauthBearerRequest API, and that means that almost all of the > injections made in the PG18 ABI need to be replaced. > > Most of those injections are simply subsumed by the hook API > (hooray!). A couple of others can be replaced by PQconninfo(). Four > are left over: > - pgthreadlock_t > - libpq_gettext > - conn->errorMessage > - conn->oauth_issuer_id > > I think we should keep injecting libpq_gettext; no third-party > implementations would be able to use that. And application hooks are > presumably capable of figuring out top-level locking already, since > the application has to have called PQregisterThreadLock() if it needed > to coordinate with libpq. > > That leaves error messages and the issuer identifier. I think both > would be useful for hooks to have, too, so I'd like to add them to > PQAUTHDATA_OAUTH_BEARER_TOKEN. > > = PQAUTHDATA_OAUTH_BEARER_TOKEN, version 2 = > > My original plan for authdata extensions was to add new members to the > end of the structs that libpq passes into the hook. Applications would > gate on a feature macro during compilation to see whether they could > use the new members. That should work fine for an application hook; > you're not allowed to downgrade libpq past the version that your > applications are compiled against, lest you lose symbols (or other > feature-flag functionality) you're relying on. > > Plugins, unfortunately, can't rely on the feature macro. As we found > out during the libpq-oauth split [1], we have to handle a long-running > application with an old libpq that loads an upgraded libpq-oauth, even > if the OS package dependencies suggest otherwise. (A plugin > architecture flips the direction of the runtime dependency arrow.) > > There are a couple ways we could handle this. For this draft, I've > implemented what I think is a middle ground between verbosity and > type-safety: introduce a new V2 struct that "inherits" the V1 struct > and can be down-cast in the callbacks, kinda similar to our Node > hierarchy. We could go even more verbose, and duplicate the entire > PGoauthBearerToken struct -- matching the callback parameter types for > maximum safety -- but I'm not convinced that this would be a good use > of maintenance effort. The ability to cast between the two means we > can share v1 and v2 implementations in our tests. > > We could also just add the new members at the end, say that you're > only allowed to use them if the V2 hook type is in use, and scribble > on them in V1 hooks to try to get misbehaving implementations to crash > outright. This arguably has the same amount of type-safety as the > downcast, and the resulting code looks nicer. (The libcurl API we use > does something similar with curl_version_info().) But it is definitely > more "magic". > > Also of note: this adds a PQExpBuffer to libpq-fe.h. Technically, that > type is "internal". But... is it really, though? It doesn't seem > possible for us to make incompatible changes there without crashing > earlier psqls, in which case I would like to make use of it too. Maybe > this deserves its own minithread. > > Okay, on to the full patchset. > > = Roadmap: Prep = > > The first few patches are bugfixes I intend to backpatch to 18. > > - 0001: I stomped on the SOCKTYPE name in libpq-fe.h, but that's not > in our namespace and it's conceivable that it might collide with > someone else. (It collided with my own test code during my work on > this.) > - 0002 fixes a copy-paste bug in meson.build, which luckily hadn't > caused problems yet. > - 0003 ports Tom's debug2 fix for Test::Cluster::connect_fails() over > to 002_client.pl. (Currently, log checks in this test aren't made > after connection failures, but I don't really want to chase that down > later after I've forgotten about it.) > > = Roadmap: Implementation = > > Next three patches are the core implementation, which stabilizes the > ABI for libpq-oauth. I feel fairly confident that this, or something > close to it, could land in PG19. > > - 0004 introduces the new PQAUTHDATA_OAUTH_BEARER_TOKEN_V2 API. > > As described above. > > - 0005 makes use of the new API in libpq-oauth. > > This removes more code than it introduces, which is exciting. > > Now we can rename libpq-oauth-19.so to just libpq-oauth.so, since we > no longer depend on anything that could change between major versions. > (We still need lock and locale support from libpq, as mentioned above, > so they continue to be decoupled via injection at init time.) > > Some of the code in this patch overlaps with the translation fixes > thread [2], which I need to get to first. I'm hoping some additional > simplifications can be made after those localization bugs are fixed. > > I think I'd also like to get the threadlock into the module API (but > not the hook API). A third-party flow might need to perform its own > one-time initialization, and if it relies on an old version of Curl > (or worse, Kerberos [3]), it'll need to use the same lock that the > top-level application has registered for libpq. So I imagine I'll need > to break out an initialization function here. Alternatively, I could > introduce an API into libpq to retrieve the threadlock function in > use? > > - 0006 removes a potential ABI-compatibility pitfall for future devs. > > libpq-oauth needs to use the shared-library variant of libpgcommon, > but it can no longer assume that the encoding support exported by > libpq is compatible. So it must not accidentally link against those > functions (see [4]). I don't imagine anyone will try adding code that > does this in practice; we're pretty UTF8-centric in OAuth. But just to > be safe, define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries > will fail the build. > > = Roadmap: Plugins? = > > So now we have a stable ABI, which technically means that any > enterprising developer who wants to play games with LD_LIBRARY_PATH > could implement their own version of an OAuth flow, and have our > utilities make use of it into the future. > > That brings us to patch 0007, which experimentally promotes the stable > API to a public header, and introduces a really janky environment > variable so that people don't have to play games. It will be obvious > from the code that this is not well-baked yet. > > I hope the ability to dlopen() a custom flow can make it for 19; I > just don't know that this envvar approach is any good. The ideal > situation, IMO, is for a flow to be selected in the connection string. > But we have to lock that down, similarly to how we protect > local_preload_libraries, to prevent horrible exploits. At which point > we'll have essentially designed a generic libpq plugin system. Not > necessarily a terrible thing, but I don't think I have time to take it > on for PG19. > > WDYT? > --Jacob > > [1] https://postgr.es/m/aAkJnDQq3mOUvmQV%40msg.df7cb.de > [2] > https://postgr.es/m/TY4PR01MB1690746DB91991D1E9A47F57E94CDA%40TY4PR01MB16907.jpnprd01.prod.outlook.com > [3] https://postgr.es/m/aSSp03wmNMngi/Oe%40ubby > [4] https://postgr.es/c/b6c7cfac8 > <0001-libpq-fe.h-Don-t-claim-SOCKTYPE-in-the-global-namesp.patch><0002-libpq-oauth-use-correct-c_args-in-meson.build.patch><0003-oauth_validator-Avoid-races-in-log_check.patch><0004-libpq-Introduce-PQAUTHDATA_OAUTH_BEARER_TOKEN_V2.patch><0005-libpq-oauth-Use-the-PGoauthBearerRequestV2-API.patch><0006-libpq-oauth-Never-link-against-libpq-s-encoding-func.patch><0007-WIP-Introduce-third-party-OAuth-flow-plugins.patch> Hi Jacob, This is a solid patch set. Only a few small comments: 1 - 0001 ``` /* for PGoauthBearerRequest.async() */ #ifdef _WIN32 -#define SOCKTYPE uintptr_t /* avoids depending on winsock2.h for SOCKET */ +#define PQ_SOCKTYPE uintptr_t /* avoids depending on winsock2.h for SOCKET */ #else -#define SOCKTYPE int +#define PQ_SOCKTYPE int #endif ``` The commit message has explained why SOCKTYPE is temporary and the reason why adding prefix “PG_” is to avoid collisions. But I don’t think code readers will always read commit messages, given the macro is a local and temporary, why adding a prefix starting with a underscore, like “_PQ_SOCKTYPE”, which explicitly indicates the macro is kinda private. === 0002 & 0003 Looks good. === 2 - 0004 ``` + * Helper for handling user flow failures. If the implementation put anything + * into request->error, it's added to conn->errorMessage here. ``` Typo: put -> puts 3 - 0004 ``` +# Make sure the v1 hook continues to work. */ +test( ``` “*/“ in the end of the comment line seems a typo. 4 - 0005 ``` + PQAUTHDATA_OAUTH_BEARER_TOKEN, /* server requests an OAuth Bearer token + * (prefer v2, below, instead) */ ``` "(prefer v2, below, instead)" looks confusing to me, though I can understand what it means. Maybe make it clearer, like “(v2 is preferred; see below)" === 0006 Looks good. === === Not reviewing 0007 as it marks with WIP. === Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
