On 06.06.2023 5:13 PM, Robert Haas wrote:
On Tue, Jun 6, 2023 at 9:40 AM Robert Haas <robertmh...@gmail.com> wrote:
I'm not sure that there's a strong consensus, but I do think it's a good idea.
Let me elaborate on this a bit.
Not all databases have this problem, and PostgreSQL isn't going to be
able to stop having it without some kind of major architectural
change. Changing from a process model to a threaded model might be
insufficient, because while I think that threads consume fewer OS
resources than processes, what is really needed, in all likelihood, is
the ability to have idle connections have neither a process nor a
thread associated with them until they cease being idle. That's a huge
project and I'm not volunteering to do it, but if we want to have the
same kind of scalability as some competing products, that is probably
a place to which we ultimately need to go. Getting out of the current
model where every backend has an arbitrarily large amount of state
hanging off of random global variables, not all of which are even
known to any central system, is a critical step in that journey.
It looks like built-in connection pooler, doesn't it?
Actually built-in connection pooler has a lot o common things with
multithreaded Postgres.
It also needs to keep session context.
Te main difference is that there is no need to place here all Postgres
global/static variables, because lefitime of most of them is shorter
than transaction. So it is really enough to place all such variables in
single struct.
This is how built-in connection pooler was implemented in PgPro.
Reading all concerns against multithreading Postgres makes me think
that it may erasonable to combine two approaches:
still have processes (backends) but be able to spawn multiple threads
inside process (for example for parallel query execution).
It can be considered that such approach can only increase complexity of
implementation and combine drawbacks of both approaches.
But actually such approach allows:
1. Support old (external, non-reentrant) extensions - them will be
executed by dedicated backends.
2. Simplify parallel query execution and make it more efficient.
3. Allows to most efficiently use multitreaded PL-s (like JVM based). As
far as there will be no single VM for all connections, but only for some
group of them(for example belonging to one user), then most complaints
concerning sharing VM between different connections can be avoided
4. Avoid or minimize problems with OOM and memory fragmentation.
5. Can be combine with connection pooler (save inactive connection state
without having process or thread for it)