Hi, At 2024.pgconf.dev, Heikki did a session on multithreading PostgreSQL which I was unfortunately unable to attend due my involvement with another session, and then we had an unconference discussion which I was able to attend and at which I volunteered to have a look at a couple of tasks, including "Extension Marking System (marking extensions as thread-safe)". So in this email I'd like to (1) say a few things about multithreading for PostgreSQL in general, (2) spell out my understanding of the extension compatibility problem specifically, and then (3) discuss possible solutions to that problem. See also https://wiki.postgresql.org/wiki/Multithreading
== Multithreading Generally == I believe there is a consensus in the PostgreSQL developer community, or at least among committers, that a multi-threaded programming model would be superior to a multi-process programming model as we have now. I won't be surprised if a few people disagree with that as a general statement, and others may view it as better in theory but so difficult in practice as to be not worth doing, but I believe that the consensus is otherwise. I do understand that switching to threads introduces some new stability risks, which are not to be taken lightly, but it also opens the door to various performance improvements, and even functionality, that are not feasible today. I do not believe that it would be necessary, as has been alleged previously, to halt all other development for a lengthy period of time while such a conversion is undertaken, nor do I believe that the community would or should accept such a solution were someone to propose it. I do believe that there are some difficult problems to be solved in order to make it work at all, and I believe even more strongly that a good deal of follow-up work will be necessary to reap the potential benefits of such a change. I also believe that it's absolutely necessary that both models coexist side by side for a period of time. I think we will eventually want to abandon the multi-process model, because I think over time the benefits of using threads will accumulate until they are overwhelming and the process model will end up appearing to be an obstacle to progress. However, I don't think we'll be able to do that particularly soon, because I think it's going to take a while to fully stabilize the thread model even as far as the core code is concerned, and extensions will take even longer to catch up. I realize Heikki in particular is hoping for a quick transition; I don't see that as feasible, but like everything else about this, opinions are going to vary. Obligatory disclaimer: Everything above (and below) is just a statement of what I believe, and everyone is free to dispute it. As always, I cannot speak to objective truth, but I can tell you what I think. == The Extension Compatibility Problem == I don't know yet whether we're going to end up with a system where the same build of PostgreSQL can produce processes or threads depending on configuration or whether it's going to be a build option, but I'm guessing the latter is more likely. Certainly, if an extension is assuming that its global variables are session-local and they suddenly become global to the cluster, chaos will ensue. The same is true for the core code, and will need to be solved by annotating global variables so that the appropriate ones can be made thread-local and the others can be given whatever treatment is appropriate considering how they are used. The details of how this annotation system will work are TBD, but the point for this email is that extension global variables, including file-level globals, will need the same kinds of annotations that we use in the core code in order to work. Other adjustments may also be needed. I think there are two severable problems here. One is that, if an extension is built for use with a non-threaded PostgreSQL, we shouldn't permit it to be used with a threaded PostgreSQL, even if the major version and other details are compatible. Hence, threading or the lack of it must become part of the data set up by PG_MODULE_MAGIC. Maybe this problem goes away if we decide that threads-vs-processes is a configuration option rather than a build-time option, but even then, we might still end up with a build-time option indicating whether threads are even a possibility, so I think it's pretty likely we need this in some form. If or when the process model eventually dies, then we can take this out again. The other problem is that we probably want a way for extensions to signal that they are believed to work with threading. It's a little bit debatable whether this is a good idea, because (1) some people are going to blindly state that their extension works fine with threading even if they haven't actually made the necessary changes and (2) one could simply declare that making an extension thread-ready is part of supporting whatever PostgreSQL release adds threading as an option and (3) one could also declare that extension authors should just document what they do or don't support rather than doing anything in code. However, I think it makes sense to try to make extensions fail to compile against a threaded PostgreSQL unless the extension declares that it supports such builds of PostgreSQL. I think that by doing this, we'll make it a LOT easier for packagers to find out what extensions still need updating. A packager could possibly do light testing of an extension and fail to miss the fact that the extension doesn't actually work properly against a threaded PostgreSQL, but you can't fail to notice a compile failure. There's still going to be some chaos because of (1), but I think we can mitigate that with good messaging: documentation, wiki pages, and blog posts explaining that this is coming and how to adapt to it can help a lot, IMHO. == Extension Compatibility Solutions == The attached patch is a sketch of one possible approach: PostgreSQL signals whether it is multithreaded by defining or not defining PG_MULTITHREADING in pg_config_manual.h, and an extension signals thread-readiness by defining PG_THREADSAFE_EXTENSION before including any PostgreSQL headers other than postgres.h. If PostgreSQL is built multithreaded and the extension does not signal thread-safety, you get something like this: ../pgsql/src/test/modules/dummy_seclabel/dummy_seclabel.c:20:1: error: static assertion failed due to requirement '1 == 0': must define PG_THREADSAFE_EXTENSION or use unthreaded PostgreSQL PG_MODULE_MAGIC; I'm not entirely happy with this solution because the results are confusing if PG_THREADSAFE_EXTENSION is declared after including fmgr.h. Perhaps this can be adequately handled by documenting and demonstrating the right pattern, or maybe somebody has a better idea. Another idea I considered was to replace the PG_MODULE_MAGIC; declaration with something that allows for arguments, like PG_MODULE_MAGIC(.process_model = false, .thread_model = true). But on further reflection, that seems like the wrong thing. AFAICS, that's going to tell you at runtime about something that you really want to know at compile time. But this kind of idea might need more thought if we decide that the *same* build of PostgreSQL can either launch processes or threads per session, because then we'd to know which extensions were available in whichever mode applied to the current session. That's all I've got for today. -- Robert Haas EDB: http://www.enterprisedb.com
v1-0001-POC-Extension-API-for-multithreading.patch
Description: Binary data