Re: Modernizing our GUC infrastructure
I wrote: >> Attached is a patch series that attempts to modernize our GUC >> infrastructure, in particular removing the performance bottlenecks >> it has when there are lots of GUC variables. > Rebased over 0a20ff54f. Here's a v3 rebased up to HEAD. The only real change is that I added a couple of "Assert(GetMemoryChunkContext(ptr) == GUCMemoryContext)" checks in hopes of improving detection of not-updated code that is still using malloc/free where it should be using guc_malloc/guc_free. This is per the nearby discussion of whether the mcxt.c infrastructure could recognize that [1]. I experimented a bit with leaving out parts of the 0002 patch to simulate such mistakes, and at least on a Linux box that seems to produce fairly intelligible errors now. In the case of free'ing a palloc'd pointer, what you get is a message from glibc followed by abort(), so their error detection is pretty solid too. I'm feeling pretty good about this patchset now. Does anyone want to review it further? regards, tom lane [1] https://postgr.es/m/2910981.1665080361%40sss.pgh.pa.us commit 1b48708e28c00df94df1911548e43aaeec65ad76 Author: Tom Lane Date: Fri Oct 7 14:39:07 2022 -0400 Preliminary improvements in memory-context infrastructure. We lack a version of repalloc() that supports MCXT_ALLOC_NO_OOM semantics, so invent repalloc_extended(). repalloc_huge() becomes a legacy wrapper for that. Also, fix dynahash.c so that it can support HASH_ENTER_NULL requests when using the default palloc-based allocator. The only reason it was like that was the lack of the MCXT_ALLOC_NO_OOM option when that code was written, ages ago. While here, simplify a few overcomplicated tests in mcxt.c. diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 3babde8d70..4f62958883 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -289,7 +289,8 @@ static void * DynaHashAlloc(Size size) { Assert(MemoryContextIsValid(CurrentDynaHashCxt)); - return MemoryContextAlloc(CurrentDynaHashCxt, size); + return MemoryContextAllocExtended(CurrentDynaHashCxt, size, + MCXT_ALLOC_NO_OOM); } @@ -939,9 +940,7 @@ calc_bucket(HASHHDR *hctl, uint32 hash_val) * * HASH_ENTER will normally ereport a generic "out of memory" error if * it is unable to create a new entry. The HASH_ENTER_NULL operation is - * the same except it will return NULL if out of memory. Note that - * HASH_ENTER_NULL cannot be used with the default palloc-based allocator, - * since palloc internally ereports on out-of-memory. + * the same except it will return NULL if out of memory. * * If foundPtr isn't NULL, then *foundPtr is set true if we found an * existing entry in the table, false otherwise. This is needed in the @@ -1084,12 +1083,8 @@ hash_search_with_hash_value(HTAB *hashp, } return NULL; - case HASH_ENTER_NULL: - /* ENTER_NULL does not work with palloc-based allocator */ - Assert(hashp->alloc != DynaHashAlloc); - /* FALL THRU */ - case HASH_ENTER: + case HASH_ENTER_NULL: /* Return existing element if found, else create one */ if (currBucket != NULL) return (void *) ELEMENTKEY(currBucket); diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index b1a3c74830..012517be5c 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -1114,8 +1114,8 @@ MemoryContextAllocExtended(MemoryContext context, Size size, int flags) AssertArg(MemoryContextIsValid(context)); AssertNotInCriticalSection(context); - if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) || - ((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size))) + if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : + AllocSizeIsValid(size))) elog(ERROR, "invalid memory alloc request size %zu", size); context->isReset = false; @@ -1269,8 +1269,8 @@ palloc_extended(Size size, int flags) AssertArg(MemoryContextIsValid(context)); AssertNotInCriticalSection(context); - if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) || - ((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size))) + if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : + AllocSizeIsValid(size))) elog(ERROR, "invalid memory alloc request size %zu", size); context->isReset = false; @@ -1351,6 +1351,50 @@ repalloc(void *pointer, Size size) return ret; } +/* + * repalloc_extended + * Adjust the size of a previously allocated chunk, + * with HUGE and NO_OOM options. + */ +void * +repalloc_extended(void *pointer, Size size, int flags) +{ +#if defined(USE_ASSERT_CHECKING) || defined(USE_VALGRIND) + MemoryContext context = GetMemoryChunkContext(pointer); +#endif + void *ret; + + if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : + AllocSizeIsValid(size))) + elog(ERROR, "invalid memory alloc reque
Re: Modernizing our GUC infrastructure
I wrote: > Attached is a patch series that attempts to modernize our GUC > infrastructure, in particular removing the performance bottlenecks > it has when there are lots of GUC variables. Rebased over 0a20ff54f. regards, tom lane commit d8b8742e80ef6f1e6546ddb9767bd846e8ff761c Author: Tom Lane Date: Tue Sep 13 11:36:56 2022 -0400 Preliminary improvements in memory-context infrastructure. We lack a version of repalloc() that supports MCXT_ALLOC_NO_OOM semantics, so invent repalloc_extended(). repalloc_huge() becomes a legacy wrapper for that. Also, fix dynahash.c so that it can support HASH_ENTER_NULL requests when using the default palloc-based allocator. The only reason it was like that was the lack of the MCXT_ALLOC_NO_OOM option when that code was written, ages ago. While here, simplify a few overcomplicated tests in mcxt.c. diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 3babde8d70..4f62958883 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -289,7 +289,8 @@ static void * DynaHashAlloc(Size size) { Assert(MemoryContextIsValid(CurrentDynaHashCxt)); - return MemoryContextAlloc(CurrentDynaHashCxt, size); + return MemoryContextAllocExtended(CurrentDynaHashCxt, size, + MCXT_ALLOC_NO_OOM); } @@ -939,9 +940,7 @@ calc_bucket(HASHHDR *hctl, uint32 hash_val) * * HASH_ENTER will normally ereport a generic "out of memory" error if * it is unable to create a new entry. The HASH_ENTER_NULL operation is - * the same except it will return NULL if out of memory. Note that - * HASH_ENTER_NULL cannot be used with the default palloc-based allocator, - * since palloc internally ereports on out-of-memory. + * the same except it will return NULL if out of memory. * * If foundPtr isn't NULL, then *foundPtr is set true if we found an * existing entry in the table, false otherwise. This is needed in the @@ -1084,12 +1083,8 @@ hash_search_with_hash_value(HTAB *hashp, } return NULL; - case HASH_ENTER_NULL: - /* ENTER_NULL does not work with palloc-based allocator */ - Assert(hashp->alloc != DynaHashAlloc); - /* FALL THRU */ - case HASH_ENTER: + case HASH_ENTER_NULL: /* Return existing element if found, else create one */ if (currBucket != NULL) return (void *) ELEMENTKEY(currBucket); diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 115a64cfe4..80f99d51fc 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -1060,8 +1060,8 @@ MemoryContextAllocExtended(MemoryContext context, Size size, int flags) AssertArg(MemoryContextIsValid(context)); AssertNotInCriticalSection(context); - if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) || - ((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size))) + if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : + AllocSizeIsValid(size))) elog(ERROR, "invalid memory alloc request size %zu", size); context->isReset = false; @@ -1215,8 +1215,8 @@ palloc_extended(Size size, int flags) AssertArg(MemoryContextIsValid(context)); AssertNotInCriticalSection(context); - if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) || - ((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size))) + if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : + AllocSizeIsValid(size))) elog(ERROR, "invalid memory alloc request size %zu", size); context->isReset = false; @@ -1297,6 +1297,50 @@ repalloc(void *pointer, Size size) return ret; } +/* + * repalloc_extended + * Adjust the size of a previously allocated chunk, + * with HUGE and NO_OOM options. + */ +void * +repalloc_extended(void *pointer, Size size, int flags) +{ +#if defined(USE_ASSERT_CHECKING) || defined(USE_VALGRIND) + MemoryContext context = GetMemoryChunkContext(pointer); +#endif + void *ret; + + if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : + AllocSizeIsValid(size))) + elog(ERROR, "invalid memory alloc request size %zu", size); + + AssertNotInCriticalSection(context); + + /* isReset must be false already */ + Assert(!context->isReset); + + ret = MCXT_METHOD(pointer, realloc) (pointer, size); + if (unlikely(ret == NULL)) + { + if ((flags & MCXT_ALLOC_NO_OOM) == 0) + { + MemoryContext cxt = GetMemoryChunkContext(pointer); + + MemoryContextStats(TopMemoryContext); + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"), + errdetail("Failed on request of size %zu in memory context \"%s\".", + size, cxt->name))); + } + return NULL; + } + + VALGRIND_MEMPOOL_CHANGE(context, pointer, ret, size); + + return ret; +} + /* * MemoryContextAllocHuge * Allocate (possibly-expansive) space within the specified context. @@ -1340,35 +1384,8 @@ MemoryContextAllocHuge(MemoryContext context, Size size)
Re: Modernizing our GUC infrastructure
On Tue, Sep 6, 2022 at 10:05 AM Tom Lane wrote: > > I haven't looked at that patch at all, but I would assume that > > variables would have SQL types, and that we would never add GUCs with > > SQL types, which seems like a pretty major semantic difference. > > Yeah, I do not think we'd want to extend GUCs beyond the existing > bool/int/float/string cases, since they have to be readable under > non-transactional circumstances. Having said that, that covers > an awful lot of practical territory. Schema variables of > arbitrary SQL types sound cool, sure, but how many real use cases > are there that can't be met with the GUC types? Well, if you use an undefined custom GUC, you're just going to get a string data type, I believe, which is pretty well equivalent to not having any type checking at all. You could extend that in some way to allow users to create dummy GUCs of any type supported by the mechanism, but I think that's mostly stacking one hack on top of another. I believe there's good evidence that users want variables based on SQL data types, whereas I can't see any reason why users would variables based on GUC data types. It is of course true that the GUC data types cover the cases people are mostly likely to want, but that's just because it covers the most generally useful data types. If you can want to pass an integer between one part of your application and another, why can't you want to pass a numeric or a bytea? I think you can, and I think people do. This is not really an endorsement of the SQL variables patch, which I haven't studied and which for all I know may have lots of problems, either as to design or as to implementation. But I think it's a little crazy to pretend that the ability to store strings - or even values of any GUC type - into a fictional GUC is an adequate substitute for SQL variables. Honestly, the fact that you can do that in the first place seems more like an undesirable wart necessitated by the way loadable modules interact with the GUC system than a feature -- but even if it were a feature, it's not the same feature. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Modernizing our GUC infrastructure
Hi > I think a large part of the reason the schema-variables patch > has gone sideways for so many years is that it's an ambitious > overdesign. > Last two weeks this patch is shorter and shorter. I removed a large part related to check of type consistency, because I can do this check more easily - and other work is done by dependencies. Big thanks to Julien - it does a lot of work and he shows me a lot of issues and possibilities on how to fix it. With Julien work this patch moved forward. Years before it was just a prototype. This patch is not too complex - important part is session_variable.c with 1500 lines , and it is almost simple code - store value to hashtab, and cleaning hash tab on sinval or on transaction end or abort + debug routine. [pavel@localhost commands]$ cloc session_variable.c 1 text file. 1 unique file. 0 files ignored. github.com/AlDanial/cloc v 1.90 T=0.02 s (50.0 files/s, 77011.1 lines/s) --- Language files blankcomment code --- C1257463 820 --- In other files there are +/- mechanical code > > regards, tom lane > > >
Re: Modernizing our GUC infrastructure
Robert Haas writes: > On Tue, Sep 6, 2022 at 1:43 AM Tom Lane wrote: >> I think there's a good analogy here to temporary tables. The SQL >> spec says that temp-table schemas are persistent and database-wide, >> but what we actually have is that they are session-local. > Well, I've thought about doing this a few times, but it's a real pain > in the neck, primarily because we store metadata that needs to be > per-instantiation in the catalog rows: relfrozenxid, relminmxid, and > the relation statistics. So I'm not sure "no one has bothered" is > quite the right way to characterize it. "no one has been able to > adequately untangle the mess" might be more accurate. I could agree on "no one has thought it was worth the work". It could be made to happen if we were sufficiently motivated, but we aren't. I believe a big chunk of the reason is that the SQL semantics are not obviously better than what we have. And some of the advantages they do have, like less catalog thrashing, wouldn't apply in the session variable case. > I haven't looked at that patch at all, but I would assume that > variables would have SQL types, and that we would never add GUCs with > SQL types, which seems like a pretty major semantic difference. Yeah, I do not think we'd want to extend GUCs beyond the existing bool/int/float/string cases, since they have to be readable under non-transactional circumstances. Having said that, that covers an awful lot of practical territory. Schema variables of arbitrary SQL types sound cool, sure, but how many real use cases are there that can't be met with the GUC types? I think a large part of the reason the schema-variables patch has gone sideways for so many years is that it's an ambitious overdesign. regards, tom lane
Re: Modernizing our GUC infrastructure
On Tue, Sep 6, 2022 at 1:43 AM Tom Lane wrote: > I think there's a good analogy here to temporary tables. The SQL > spec says that temp-table schemas are persistent and database-wide, > but what we actually have is that they are session-local. People > occasionally propose that we implement the SQL semantics for that, > but in the last twenty-plus years no one has bothered to write a > committable patch to support it ... much less remove the existing > behavior in favor of that, which I'm pretty sure no one would think > is a good idea. Well, I've thought about doing this a few times, but it's a real pain in the neck, primarily because we store metadata that needs to be per-instantiation in the catalog rows: relfrozenxid, relminmxid, and the relation statistics. So I'm not sure "no one has bothered" is quite the right way to characterize it. "no one has been able to adequately untangle the mess" might be more accurate. > So, is it actually a good idea to have persistent metadata for > session variables? I'd say that the issue is at best debatable, > and at worst proven wrong by a couple of decades of experience. > In what way are session variables less mutable than temp tables? I haven't looked at that patch at all, but I would assume that variables would have SQL types, and that we would never add GUCs with SQL types, which seems like a pretty major semantic difference. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Modernizing our GUC infrastructure
út 6. 9. 2022 v 7:42 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule > > napsal: > >> 1. Session variables can be persistent - so the usage of session > variables > >> can be checked by static analyze like plpgsql_check > > > more precious - metadata of session variables are persistent > > Right ... so the question is, is that a feature or a bug? > > I think there's a good analogy here to temporary tables. The SQL > spec says that temp-table schemas are persistent and database-wide, > but what we actually have is that they are session-local. People > occasionally propose that we implement the SQL semantics for that, > but in the last twenty-plus years no one has bothered to write a > committable patch to support it ... much less remove the existing > behavior in favor of that, which I'm pretty sure no one would think > is a good idea. > > So, is it actually a good idea to have persistent metadata for > session variables? I'd say that the issue is at best debatable, > and at worst proven wrong by a couple of decades of experience. > In what way are session variables less mutable than temp tables? > The access pattern is very different. The session variable is like the temp table with exactly one row. It reduces a lot of overheads with storage (for reading, for writing). For example, the minimum size of an empty temp table is 8KB. You can store all "like" session values to one temp table, but then there will be brutal overhead with reading. > > Still, this discussion would be better placed on the other thread. > sure - faster GUC is great - there are a lot of applications that overuse GUC, because there are no other solutions now. But I don't think so it is good solution when somebody need some like global variables in procedural code. And the design of session variables is more wide. Regards Pavel > > regards, tom lane >
Re: Modernizing our GUC infrastructure
Pavel Stehule writes: > út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule > napsal: >> 1. Session variables can be persistent - so the usage of session variables >> can be checked by static analyze like plpgsql_check > more precious - metadata of session variables are persistent Right ... so the question is, is that a feature or a bug? I think there's a good analogy here to temporary tables. The SQL spec says that temp-table schemas are persistent and database-wide, but what we actually have is that they are session-local. People occasionally propose that we implement the SQL semantics for that, but in the last twenty-plus years no one has bothered to write a committable patch to support it ... much less remove the existing behavior in favor of that, which I'm pretty sure no one would think is a good idea. So, is it actually a good idea to have persistent metadata for session variables? I'd say that the issue is at best debatable, and at worst proven wrong by a couple of decades of experience. In what way are session variables less mutable than temp tables? Still, this discussion would be better placed on the other thread. regards, tom lane
Re: Modernizing our GUC infrastructure
Pavel Stehule writes: > The bad performance is not the main reason for implementing session > variables (and in almost all cases the performance of GUC is not a problem, > because it is not a bottleneck, and in some terrible cases, I can save the > GUC to a variable). There are other differences: Well, yeah, the schema-variables patch offers a bunch of other features. What I'm not sure about is whether there's actually much field demand for those. I think if we fix guc.c's performance issues and add some simple features on top of that, like the ability to declare bool, int, float data types not just string for a user-defined GUC, we'd have exactly what a lot of people want --- not least because it'd be upwards-compatible with what they are already doing. However, that's probably a debate to have on the other thread not here. This patch doesn't foreclose pushing forward with the schema-variables patch, if people want that. regards, tom lane
Re: Modernizing our GUC infrastructure
Hi, On Tue, Sep 06, 2022 at 06:32:21AM +0200, Pavel Stehule wrote: > Hi > > út 6. 9. 2022 v 0:28 odesílatel Tom Lane napsal: > > > Attached is a patch series that attempts to modernize our GUC > > infrastructure, in particular removing the performance bottlenecks > > it has when there are lots of GUC variables. I wrote this because > > I am starting to question the schema-variables patch [1] --- that's > > getting to be quite a large patch and I grow less and less sure > > that it's solving a problem our users want solved. I think what > > people actually want is better support of the existing mechanism > > for ad-hoc session variables, namely abusing custom GUCs for that > > purpose. One of the big reasons we have been resistant to formally > > supporting that is fear of the poor performance guc.c would have > > with lots of variables. But we already have quite a lot of them: > > > > > The bad performance is not the main reason for implementing session > variables (and in almost all cases the performance of GUC is not a problem, > because it is not a bottleneck, and in some terrible cases, I can save the > GUC to a variable). There are other differences: > > 1. Session variables metadata can be persistent - so the usage of session > variables can be checked by static analyze like plpgsql_check > > 2. Session variables supports not atomic data types - so the work with row > types or arrays is much more comfortable and faster, because there is no > conversion string <-> binary > > 3. Session variables allows to set access rights > > 4. Session variables are nullable and allowed to specify default values. > > I don't think so users have ten thousand GUC and the huge count of GUC is > the main performance problem. The source of the performance problem is > storing the value only as string. I think we can also mention those differences with the proposed schema variables: - schema variables have normal SQL integration, having to use current_setting() isn't ideal (on top of only supporting text) and doesn't really play nice with pg_stat_statements for instance - schema variables implement stability in a single SQL statement (not in plpgsql), while current_setting always report the latest set value. This one may or may not be wanted, and maybe the discrepancy with procedural languages would be too problematic, but it's still something proposed
Re: Modernizing our GUC infrastructure
út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule napsal: > Hi > > út 6. 9. 2022 v 0:28 odesílatel Tom Lane napsal: > >> Attached is a patch series that attempts to modernize our GUC >> infrastructure, in particular removing the performance bottlenecks >> it has when there are lots of GUC variables. I wrote this because >> I am starting to question the schema-variables patch [1] --- that's >> getting to be quite a large patch and I grow less and less sure >> that it's solving a problem our users want solved. I think what >> people actually want is better support of the existing mechanism >> for ad-hoc session variables, namely abusing custom GUCs for that >> purpose. One of the big reasons we have been resistant to formally >> supporting that is fear of the poor performance guc.c would have >> with lots of variables. But we already have quite a lot of them: >> >> > The bad performance is not the main reason for implementing session > variables (and in almost all cases the performance of GUC is not a problem, > because it is not a bottleneck, and in some terrible cases, I can save the > GUC to a variable). There are other differences: > > 1. Session variables can be persistent - so the usage of session variables > can be checked by static analyze like plpgsql_check > more precious - metadata of session variables are persistent > > 2. Session variables supports not atomic data types - so the work with row > types or arrays is much more comfortable and faster, because there is no > conversion string <-> binary > > 3. Session variables allows to set access rights > > 4. Session variables are nullable and allowed to specify default values. > > I don't think so users have ten thousand GUC and the huge count of GUC is > the main performance problem. The source of the performance problem is > storing the value only as string. > > Regards > > Pavel > > >
Re: Modernizing our GUC infrastructure
Hi út 6. 9. 2022 v 0:28 odesílatel Tom Lane napsal: > Attached is a patch series that attempts to modernize our GUC > infrastructure, in particular removing the performance bottlenecks > it has when there are lots of GUC variables. I wrote this because > I am starting to question the schema-variables patch [1] --- that's > getting to be quite a large patch and I grow less and less sure > that it's solving a problem our users want solved. I think what > people actually want is better support of the existing mechanism > for ad-hoc session variables, namely abusing custom GUCs for that > purpose. One of the big reasons we have been resistant to formally > supporting that is fear of the poor performance guc.c would have > with lots of variables. But we already have quite a lot of them: > > The bad performance is not the main reason for implementing session variables (and in almost all cases the performance of GUC is not a problem, because it is not a bottleneck, and in some terrible cases, I can save the GUC to a variable). There are other differences: 1. Session variables can be persistent - so the usage of session variables can be checked by static analyze like plpgsql_check 2. Session variables supports not atomic data types - so the work with row types or arrays is much more comfortable and faster, because there is no conversion string <-> binary 3. Session variables allows to set access rights 4. Session variables are nullable and allowed to specify default values. I don't think so users have ten thousand GUC and the huge count of GUC is the main performance problem. The source of the performance problem is storing the value only as string. Regards Pavel
Re: Modernizing our GUC infrastructure
ah, yes, that makes sense ;) On Tue, Sep 6, 2022 at 10:48 AM Tom Lane wrote: > > Junwang Zhao writes: > > /* > > - * Create table with 20% slack > > + * Create hash table with 20% slack > > */ > > size_vars = num_vars + num_vars / 4; > > > Should we change 20% to 25%, I thought that might be > > a typo. > > No ... 20% of the allocated space is spare. > > regards, tom lane -- Regards Junwang Zhao
Re: Modernizing our GUC infrastructure
Junwang Zhao writes: > /* > - * Create table with 20% slack > + * Create hash table with 20% slack > */ > size_vars = num_vars + num_vars / 4; > Should we change 20% to 25%, I thought that might be > a typo. No ... 20% of the allocated space is spare. regards, tom lane
Re: Modernizing our GUC infrastructure
Hi Tom, @@ -5836,74 +5865,106 @@ build_guc_variables(void) } /* - * Create table with 20% slack + * Create hash table with 20% slack */ size_vars = num_vars + num_vars / 4; Should we change 20% to 25%, I thought that might be a typo. On Tue, Sep 6, 2022 at 6:28 AM Tom Lane wrote: > > Attached is a patch series that attempts to modernize our GUC > infrastructure, in particular removing the performance bottlenecks > it has when there are lots of GUC variables. I wrote this because > I am starting to question the schema-variables patch [1] --- that's > getting to be quite a large patch and I grow less and less sure > that it's solving a problem our users want solved. I think what > people actually want is better support of the existing mechanism > for ad-hoc session variables, namely abusing custom GUCs for that > purpose. One of the big reasons we have been resistant to formally > supporting that is fear of the poor performance guc.c would have > with lots of variables. But we already have quite a lot of them: > > regression=# select count(*) from pg_settings; > count > --- >353 > (1 row) > > and more are getting added all the time. I think this patch series > could likely be justified just in terms of positive effect on core > performance, never mind user-added GUCs. > > 0001 and 0002 below are concerned with converting guc.c to store its > data in a dedicated memory context (GUCMemoryContext) instead of using > raw malloc(). This is not directly a performance improvement, and > I'm prepared to drop the idea if there's a lot of pushback, but I > think it would be a good thing to do. The only hard reason for using > malloc() there was the lack of ability to avoid throwing elog(ERROR) > on out-of-memory in palloc(). But mcxt.c grew that ability years ago. > Switching to a dedicated context would greatly improve visibility and > accountability of GUC-related allocations. Also, the 0003 patch will > switch guc.c to relying on a palloc-based hashtable, and it seems a > bit silly to have part of the data structure in palloc and part in > malloc. However 0002 is a bit invasive, in that it forces code > changes in GUC check callbacks, if they want to reallocate the new > value or create an "extra" data structure. My feeling is that not > enough external modules use those facilities for this to pose a big > problem. However, the ones that are subject to it will have a > non-fun time tracking down why their code is crashing. (The recent > context-header changes mean that you get a very obscure failure when > trying to pfree() a malloc'd chunk -- for me, that typically ends > in an assertion failure in generation.c. Can we make that less > confusing?) > > 0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure > with a dynahash hash table. (I also looked at simplehash, but it > has no support for not elog'ing on OOM, and it increases the size > of guc.o by 10KB or so.) This fixes the worse-than-O(N^2) time > needed to create N new GUCs, as in > > do $$ > begin > for i in 1..1 loop > perform set_config('foo.bar' || i::text, i::text, false); > end loop; > end $$; > > On my machine, this takes about 4700 ms in HEAD, versus 23 ms > with this patch. However, the places that were linearly scanning > the array now need to use hash_seq_search, so some other things > like transaction shutdown (AtEOXact_GUC) get slower. > > To address that, 0004 adds some auxiliary lists that link together > just the variables that are interesting for specific purposes. > This is helpful even without considering the possibility of a > lot of user-added GUCs: in a typical session, for example, not > many of those 353 GUCs have non-default values, and even fewer > get modified in any one transaction (typically, anyway). > > As an example of the speedup from 0004, these DO loops: > > create or replace function func_with_set(int) returns int > strict immutable language plpgsql as > $$ begin return $1; end $$ > set enable_seqscan = false; > > do $$ > begin > for i in 1..10 loop > perform func_with_set(i); > end loop; > end $$; > > do $$ > begin > for i in 1..10 loop > begin > perform func_with_set(i); > exception when others then raise; > end; > end loop; > end $$; > > take about 260 and 320 ms respectively for me, in HEAD with > just the stock set of variables. But after creating 1 > GUCs with the previous DO loop, they're up to about 3200 ms. > 0004 brings that back down to being indistinguishable from the > speed with few GUCs. > > So I think this is good cleanup in its own right, plus it > removes one major objection to considering user-defined GUCs > as a supported feature. > > regards, tom lane > > [1] > https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4%3D6SvPe7ke6xPbh%3DK50LUAOwjC3jm8Me9Obg%40mail.gmail.com > -- Regards Junwang Zhao
Re: Modernizing our GUC infrastructure
Andres Freund writes: > It's only half related, but since we're talking about renovating guc.c: I > think it'd be good if we split the list of GUCs from the rest of the guc > machinery. Both for humans and compilers it's getting pretty large. And > commonly one either wants to edit the definition of GUCs or wants to edit the > GUC machinery. I don't mind doing that, but it seems like an independent patch. regards, tom lane
Re: Modernizing our GUC infrastructure
Hi, > I wrote this because I am starting to question the schema-variables patch > [1] --- that's getting to be quite a large patch and I grow less and less > sure that it's solving a problem our users want solved. --- that's getting > to be quite a large patch and I grow less and less sure that it's solving a > problem our users want solved. I think what people actually want is better > support of the existing mechanism for ad-hoc session variables, namely > abusing custom GUCs for that purpose. I don't really have an opinion on the highlevel directional question, yet anyway. But the stuff you're talking about changing in guc.c seem like a good idea independently. On 2022-09-05 18:27:46 -0400, Tom Lane wrote: > 0001 and 0002 below are concerned with converting guc.c to store its > data in a dedicated memory context (GUCMemoryContext) instead of using > raw malloc(). This is not directly a performance improvement, and > I'm prepared to drop the idea if there's a lot of pushback, but I > think it would be a good thing to do. +1 - I've been annoyed at this a couple times, even just because it makes it harder to identify memory leaks etc. > The only hard reason for using > malloc() there was the lack of ability to avoid throwing elog(ERROR) > on out-of-memory in palloc(). But mcxt.c grew that ability years ago. > Switching to a dedicated context would greatly improve visibility and > accountability of GUC-related allocations. Also, the 0003 patch will > switch guc.c to relying on a palloc-based hashtable, and it seems a > bit silly to have part of the data structure in palloc and part in > malloc. However 0002 is a bit invasive, in that it forces code > changes in GUC check callbacks, if they want to reallocate the new > value or create an "extra" data structure. My feeling is that not > enough external modules use those facilities for this to pose a big > problem. However, the ones that are subject to it will have a > non-fun time tracking down why their code is crashing. That sucks, but I think it's a bullet we're going to have to bite at some point. Perhaps we could do something like checking MemoryContextContains() and assert if not allocated in the right context? That way the crash is at least obvious. Or perhaps even transparently reallocate in that case? It does look like MemoryContextContains() currently is broken, I've raised that in the other thread. > (The recent context-header changes mean that you get a very obscure failure > when trying to pfree() a malloc'd chunk -- for me, that typically ends in an > assertion failure in generation.c. Can we make that less confusing?) Hm. We can do better in assert builds, but I'm not sure we want to add the overhead of explicit checks in normal builds, IIRC David measured the overhead of additional branches in pfree, and it was noticable. > 0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure > with a dynahash hash table. (I also looked at simplehash, but it > has no support for not elog'ing on OOM, and it increases the size > of guc.o by 10KB or so.) Dynahash seems reasonable here. Hard to believe raw lookup speed is a relevant bottleneck and due to the string names the key would be pretty wide (could obviously just be done via pointer, but then the locality benefits aren't as big). > However, the places that were linearly scanning the array now need to use > hash_seq_search, so some other things like transaction shutdown > (AtEOXact_GUC) get slower. > > To address that, 0004 adds some auxiliary lists that link together > just the variables that are interesting for specific purposes. Seems sane. It's only half related, but since we're talking about renovating guc.c: I think it'd be good if we split the list of GUCs from the rest of the guc machinery. Both for humans and compilers it's getting pretty large. And commonly one either wants to edit the definition of GUCs or wants to edit the GUC machinery. Greetings, Andres Freund