Re: Modernizing our GUC infrastructure

2022-10-07 Thread Tom Lane
I wrote:
>> Attached is a patch series that attempts to modernize our GUC
>> infrastructure, in particular removing the performance bottlenecks
>> it has when there are lots of GUC variables.

> Rebased over 0a20ff54f.

Here's a v3 rebased up to HEAD.  The only real change is that I added
a couple of "Assert(GetMemoryChunkContext(ptr) == GUCMemoryContext)"
checks in hopes of improving detection of not-updated code that is
still using malloc/free where it should be using guc_malloc/guc_free.
This is per the nearby discussion of whether the mcxt.c infrastructure
could recognize that [1].  I experimented a bit with leaving out parts
of the 0002 patch to simulate such mistakes, and at least on a Linux
box that seems to produce fairly intelligible errors now.  In the case
of free'ing a palloc'd pointer, what you get is a message from glibc
followed by abort(), so their error detection is pretty solid too.

I'm feeling pretty good about this patchset now.  Does anyone want
to review it further?

regards, tom lane

[1] https://postgr.es/m/2910981.1665080361%40sss.pgh.pa.us

commit 1b48708e28c00df94df1911548e43aaeec65ad76
Author: Tom Lane 
Date:   Fri Oct 7 14:39:07 2022 -0400

Preliminary improvements in memory-context infrastructure.

We lack a version of repalloc() that supports MCXT_ALLOC_NO_OOM
semantics, so invent repalloc_extended().  repalloc_huge()
becomes a legacy wrapper for that.

Also, fix dynahash.c so that it can support HASH_ENTER_NULL
requests when using the default palloc-based allocator.
The only reason it was like that was the lack of the
MCXT_ALLOC_NO_OOM option when that code was written, ages ago.

While here, simplify a few overcomplicated tests in mcxt.c.

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 3babde8d70..4f62958883 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -289,7 +289,8 @@ static void *
 DynaHashAlloc(Size size)
 {
 	Assert(MemoryContextIsValid(CurrentDynaHashCxt));
-	return MemoryContextAlloc(CurrentDynaHashCxt, size);
+	return MemoryContextAllocExtended(CurrentDynaHashCxt, size,
+	  MCXT_ALLOC_NO_OOM);
 }
 
 
@@ -939,9 +940,7 @@ calc_bucket(HASHHDR *hctl, uint32 hash_val)
  *
  * HASH_ENTER will normally ereport a generic "out of memory" error if
  * it is unable to create a new entry.  The HASH_ENTER_NULL operation is
- * the same except it will return NULL if out of memory.  Note that
- * HASH_ENTER_NULL cannot be used with the default palloc-based allocator,
- * since palloc internally ereports on out-of-memory.
+ * the same except it will return NULL if out of memory.
  *
  * If foundPtr isn't NULL, then *foundPtr is set true if we found an
  * existing entry in the table, false otherwise.  This is needed in the
@@ -1084,12 +1083,8 @@ hash_search_with_hash_value(HTAB *hashp,
 			}
 			return NULL;
 
-		case HASH_ENTER_NULL:
-			/* ENTER_NULL does not work with palloc-based allocator */
-			Assert(hashp->alloc != DynaHashAlloc);
-			/* FALL THRU */
-
 		case HASH_ENTER:
+		case HASH_ENTER_NULL:
 			/* Return existing element if found, else create one */
 			if (currBucket != NULL)
 return (void *) ELEMENTKEY(currBucket);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index b1a3c74830..012517be5c 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1114,8 +1114,8 @@ MemoryContextAllocExtended(MemoryContext context, Size size, int flags)
 	AssertArg(MemoryContextIsValid(context));
 	AssertNotInCriticalSection(context);
 
-	if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) ||
-		((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size)))
+	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
+		  AllocSizeIsValid(size)))
 		elog(ERROR, "invalid memory alloc request size %zu", size);
 
 	context->isReset = false;
@@ -1269,8 +1269,8 @@ palloc_extended(Size size, int flags)
 	AssertArg(MemoryContextIsValid(context));
 	AssertNotInCriticalSection(context);
 
-	if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) ||
-		((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size)))
+	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
+		  AllocSizeIsValid(size)))
 		elog(ERROR, "invalid memory alloc request size %zu", size);
 
 	context->isReset = false;
@@ -1351,6 +1351,50 @@ repalloc(void *pointer, Size size)
 	return ret;
 }
 
+/*
+ * repalloc_extended
+ *		Adjust the size of a previously allocated chunk,
+ *		with HUGE and NO_OOM options.
+ */
+void *
+repalloc_extended(void *pointer, Size size, int flags)
+{
+#if defined(USE_ASSERT_CHECKING) || defined(USE_VALGRIND)
+	MemoryContext context = GetMemoryChunkContext(pointer);
+#endif
+	void	   *ret;
+
+	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
+		  AllocSizeIsValid(size)))
+		elog(ERROR, "invalid memory alloc reque

Re: Modernizing our GUC infrastructure

2022-09-13 Thread Tom Lane
I wrote:
> Attached is a patch series that attempts to modernize our GUC
> infrastructure, in particular removing the performance bottlenecks
> it has when there are lots of GUC variables.

Rebased over 0a20ff54f.

regards, tom lane

commit d8b8742e80ef6f1e6546ddb9767bd846e8ff761c
Author: Tom Lane 
Date:   Tue Sep 13 11:36:56 2022 -0400

Preliminary improvements in memory-context infrastructure.

We lack a version of repalloc() that supports MCXT_ALLOC_NO_OOM
semantics, so invent repalloc_extended().  repalloc_huge()
becomes a legacy wrapper for that.

Also, fix dynahash.c so that it can support HASH_ENTER_NULL
requests when using the default palloc-based allocator.
The only reason it was like that was the lack of the
MCXT_ALLOC_NO_OOM option when that code was written, ages ago.

While here, simplify a few overcomplicated tests in mcxt.c.

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 3babde8d70..4f62958883 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -289,7 +289,8 @@ static void *
 DynaHashAlloc(Size size)
 {
 	Assert(MemoryContextIsValid(CurrentDynaHashCxt));
-	return MemoryContextAlloc(CurrentDynaHashCxt, size);
+	return MemoryContextAllocExtended(CurrentDynaHashCxt, size,
+	  MCXT_ALLOC_NO_OOM);
 }
 
 
@@ -939,9 +940,7 @@ calc_bucket(HASHHDR *hctl, uint32 hash_val)
  *
  * HASH_ENTER will normally ereport a generic "out of memory" error if
  * it is unable to create a new entry.  The HASH_ENTER_NULL operation is
- * the same except it will return NULL if out of memory.  Note that
- * HASH_ENTER_NULL cannot be used with the default palloc-based allocator,
- * since palloc internally ereports on out-of-memory.
+ * the same except it will return NULL if out of memory.
  *
  * If foundPtr isn't NULL, then *foundPtr is set true if we found an
  * existing entry in the table, false otherwise.  This is needed in the
@@ -1084,12 +1083,8 @@ hash_search_with_hash_value(HTAB *hashp,
 			}
 			return NULL;
 
-		case HASH_ENTER_NULL:
-			/* ENTER_NULL does not work with palloc-based allocator */
-			Assert(hashp->alloc != DynaHashAlloc);
-			/* FALL THRU */
-
 		case HASH_ENTER:
+		case HASH_ENTER_NULL:
 			/* Return existing element if found, else create one */
 			if (currBucket != NULL)
 return (void *) ELEMENTKEY(currBucket);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 115a64cfe4..80f99d51fc 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1060,8 +1060,8 @@ MemoryContextAllocExtended(MemoryContext context, Size size, int flags)
 	AssertArg(MemoryContextIsValid(context));
 	AssertNotInCriticalSection(context);
 
-	if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) ||
-		((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size)))
+	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
+		  AllocSizeIsValid(size)))
 		elog(ERROR, "invalid memory alloc request size %zu", size);
 
 	context->isReset = false;
@@ -1215,8 +1215,8 @@ palloc_extended(Size size, int flags)
 	AssertArg(MemoryContextIsValid(context));
 	AssertNotInCriticalSection(context);
 
-	if (((flags & MCXT_ALLOC_HUGE) != 0 && !AllocHugeSizeIsValid(size)) ||
-		((flags & MCXT_ALLOC_HUGE) == 0 && !AllocSizeIsValid(size)))
+	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
+		  AllocSizeIsValid(size)))
 		elog(ERROR, "invalid memory alloc request size %zu", size);
 
 	context->isReset = false;
@@ -1297,6 +1297,50 @@ repalloc(void *pointer, Size size)
 	return ret;
 }
 
+/*
+ * repalloc_extended
+ *		Adjust the size of a previously allocated chunk,
+ *		with HUGE and NO_OOM options.
+ */
+void *
+repalloc_extended(void *pointer, Size size, int flags)
+{
+#if defined(USE_ASSERT_CHECKING) || defined(USE_VALGRIND)
+	MemoryContext context = GetMemoryChunkContext(pointer);
+#endif
+	void	   *ret;
+
+	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
+		  AllocSizeIsValid(size)))
+		elog(ERROR, "invalid memory alloc request size %zu", size);
+
+	AssertNotInCriticalSection(context);
+
+	/* isReset must be false already */
+	Assert(!context->isReset);
+
+	ret = MCXT_METHOD(pointer, realloc) (pointer, size);
+	if (unlikely(ret == NULL))
+	{
+		if ((flags & MCXT_ALLOC_NO_OOM) == 0)
+		{
+			MemoryContext cxt = GetMemoryChunkContext(pointer);
+
+			MemoryContextStats(TopMemoryContext);
+			ereport(ERROR,
+	(errcode(ERRCODE_OUT_OF_MEMORY),
+	 errmsg("out of memory"),
+	 errdetail("Failed on request of size %zu in memory context \"%s\".",
+			   size, cxt->name)));
+		}
+		return NULL;
+	}
+
+	VALGRIND_MEMPOOL_CHANGE(context, pointer, ret, size);
+
+	return ret;
+}
+
 /*
  * MemoryContextAllocHuge
  *		Allocate (possibly-expansive) space within the specified context.
@@ -1340,35 +1384,8 @@ MemoryContextAllocHuge(MemoryContext context, Size size)

Re: Modernizing our GUC infrastructure

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 10:05 AM Tom Lane  wrote:
> > I haven't looked at that patch at all, but I would assume that
> > variables would have SQL types, and that we would never add GUCs with
> > SQL types, which seems like a pretty major semantic difference.
>
> Yeah, I do not think we'd want to extend GUCs beyond the existing
> bool/int/float/string cases, since they have to be readable under
> non-transactional circumstances.  Having said that, that covers
> an awful lot of practical territory.  Schema variables of
> arbitrary SQL types sound cool, sure, but how many real use cases
> are there that can't be met with the GUC types?

Well, if you use an undefined custom GUC, you're just going to get a
string data type, I believe, which is pretty well equivalent to not
having any type checking at all. You could extend that in some way to
allow users to create dummy GUCs of any type supported by the
mechanism, but I think that's mostly stacking one hack on top of
another. I believe there's good evidence that users want variables
based on SQL data types, whereas I can't see any reason why users
would variables based on GUC data types. It is of course true that the
GUC data types cover the cases people are mostly likely to want, but
that's just because it covers the most generally useful data types. If
you can want to pass an integer between one part of your application
and another, why can't you want to pass a numeric or a bytea? I think
you can, and I think people do.

This is not really an endorsement of the SQL variables patch, which I
haven't studied and which for all I know may have lots of problems,
either as to design or as to implementation. But I think it's a little
crazy to pretend that the ability to store strings - or even values of
any GUC type - into a fictional GUC is an adequate substitute for SQL
variables. Honestly, the fact that you can do that in the first place
seems more like an undesirable wart necessitated by the way loadable
modules interact with the GUC system than a feature -- but even if it
were a feature, it's not the same feature.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Modernizing our GUC infrastructure

2022-09-06 Thread Pavel Stehule
Hi


> I think a large part of the reason the schema-variables patch
> has gone sideways for so many years is that it's an ambitious
> overdesign.
>

Last two  weeks this patch is shorter and shorter. I removed a large part
related to check of type consistency, because I can do this check more
easily - and other work is done by dependencies.

Big thanks to Julien - it does a lot of work and he shows me a lot of
issues and possibilities on how to fix it. With Julien work this patch
moved forward. Years before it was just a prototype.

This patch is not too complex - important part is session_variable.c with
1500 lines , and it is almost simple code - store value to hashtab, and
cleaning hash tab on sinval or on transaction end or abort + debug routine.

[pavel@localhost commands]$ cloc session_variable.c
   1 text file.
   1 unique file.
   0 files ignored.

github.com/AlDanial/cloc v 1.90  T=0.02 s (50.0 files/s, 77011.1 lines/s)
---
Language files  blankcomment
code
---
C1257463
 820
---

In other files there are +/- mechanical code





>
> regards, tom lane
>
>
>


Re: Modernizing our GUC infrastructure

2022-09-06 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 6, 2022 at 1:43 AM Tom Lane  wrote:
>> I think there's a good analogy here to temporary tables.  The SQL
>> spec says that temp-table schemas are persistent and database-wide,
>> but what we actually have is that they are session-local.

> Well, I've thought about doing this a few times, but it's a real pain
> in the neck, primarily because we store metadata that needs to be
> per-instantiation in the catalog rows: relfrozenxid, relminmxid, and
> the relation statistics. So I'm not sure "no one has bothered" is
> quite the right way to characterize it. "no one has been able to
> adequately untangle the mess" might be more accurate.

I could agree on "no one has thought it was worth the work".  It could
be made to happen if we were sufficiently motivated, but we aren't.
I believe a big chunk of the reason is that the SQL semantics are not
obviously better than what we have.  And some of the advantages they
do have, like less catalog thrashing, wouldn't apply in the session
variable case.

> I haven't looked at that patch at all, but I would assume that
> variables would have SQL types, and that we would never add GUCs with
> SQL types, which seems like a pretty major semantic difference.

Yeah, I do not think we'd want to extend GUCs beyond the existing
bool/int/float/string cases, since they have to be readable under
non-transactional circumstances.  Having said that, that covers
an awful lot of practical territory.  Schema variables of
arbitrary SQL types sound cool, sure, but how many real use cases
are there that can't be met with the GUC types?

I think a large part of the reason the schema-variables patch
has gone sideways for so many years is that it's an ambitious
overdesign.

regards, tom lane




Re: Modernizing our GUC infrastructure

2022-09-06 Thread Robert Haas
On Tue, Sep 6, 2022 at 1:43 AM Tom Lane  wrote:
> I think there's a good analogy here to temporary tables.  The SQL
> spec says that temp-table schemas are persistent and database-wide,
> but what we actually have is that they are session-local.  People
> occasionally propose that we implement the SQL semantics for that,
> but in the last twenty-plus years no one has bothered to write a
> committable patch to support it ... much less remove the existing
> behavior in favor of that, which I'm pretty sure no one would think
> is a good idea.

Well, I've thought about doing this a few times, but it's a real pain
in the neck, primarily because we store metadata that needs to be
per-instantiation in the catalog rows: relfrozenxid, relminmxid, and
the relation statistics. So I'm not sure "no one has bothered" is
quite the right way to characterize it. "no one has been able to
adequately untangle the mess" might be more accurate.

> So, is it actually a good idea to have persistent metadata for
> session variables?  I'd say that the issue is at best debatable,
> and at worst proven wrong by a couple of decades of experience.
> In what way are session variables less mutable than temp tables?

I haven't looked at that patch at all, but I would assume that
variables would have SQL types, and that we would never add GUCs with
SQL types, which seems like a pretty major semantic difference.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Pavel Stehule
út 6. 9. 2022 v 7:42 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule 
> > napsal:
> >> 1. Session variables can be persistent - so the usage of session
> variables
> >> can be checked by static analyze like plpgsql_check
>
> > more precious - metadata of session variables are persistent
>
> Right ... so the question is, is that a feature or a bug?
>
> I think there's a good analogy here to temporary tables.  The SQL
> spec says that temp-table schemas are persistent and database-wide,
> but what we actually have is that they are session-local.  People
> occasionally propose that we implement the SQL semantics for that,
> but in the last twenty-plus years no one has bothered to write a
> committable patch to support it ... much less remove the existing
> behavior in favor of that, which I'm pretty sure no one would think
> is a good idea.
>
> So, is it actually a good idea to have persistent metadata for
> session variables?  I'd say that the issue is at best debatable,
> and at worst proven wrong by a couple of decades of experience.
> In what way are session variables less mutable than temp tables?
>

The access pattern is very different. The session variable is like the temp
table with exactly one row. It reduces a lot of overheads with storage (for
reading, for writing).

For example, the minimum size of an empty temp table is 8KB. You can store
all "like" session values to one temp table, but then there will be brutal
overhead with reading.


>
> Still, this discussion would be better placed on the other thread.
>

sure - faster GUC is great - there are a lot of applications that overuse
GUC, because there are no other solutions now.  But I don't think so it is
good solution when somebody need some like global variables in procedural
code. And the design of session variables is more wide.

Regards

Pavel

>
> regards, tom lane
>


Re: Modernizing our GUC infrastructure

2022-09-05 Thread Tom Lane
Pavel Stehule  writes:
> út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule 
> napsal:
>> 1. Session variables can be persistent - so the usage of session variables
>> can be checked by static analyze like plpgsql_check

> more precious - metadata of session variables are persistent

Right ... so the question is, is that a feature or a bug?

I think there's a good analogy here to temporary tables.  The SQL
spec says that temp-table schemas are persistent and database-wide,
but what we actually have is that they are session-local.  People
occasionally propose that we implement the SQL semantics for that,
but in the last twenty-plus years no one has bothered to write a
committable patch to support it ... much less remove the existing
behavior in favor of that, which I'm pretty sure no one would think
is a good idea.

So, is it actually a good idea to have persistent metadata for
session variables?  I'd say that the issue is at best debatable,
and at worst proven wrong by a couple of decades of experience.
In what way are session variables less mutable than temp tables?

Still, this discussion would be better placed on the other thread.

regards, tom lane




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Tom Lane
Pavel Stehule  writes:
> The bad performance is not the main reason for implementing session
> variables (and in almost all cases the performance of GUC is not a problem,
> because it is not a bottleneck, and in some terrible cases, I can save the
> GUC to a variable). There are other differences:

Well, yeah, the schema-variables patch offers a bunch of other features.
What I'm not sure about is whether there's actually much field demand
for those.  I think if we fix guc.c's performance issues and add some
simple features on top of that, like the ability to declare bool, int,
float data types not just string for a user-defined GUC, we'd have
exactly what a lot of people want --- not least because it'd be
upwards-compatible with what they are already doing.

However, that's probably a debate to have on the other thread not here.
This patch doesn't foreclose pushing forward with the schema-variables
patch, if people want that.

regards, tom lane




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Julien Rouhaud
Hi,

On Tue, Sep 06, 2022 at 06:32:21AM +0200, Pavel Stehule wrote:
> Hi
>
> út 6. 9. 2022 v 0:28 odesílatel Tom Lane  napsal:
>
> > Attached is a patch series that attempts to modernize our GUC
> > infrastructure, in particular removing the performance bottlenecks
> > it has when there are lots of GUC variables.  I wrote this because
> > I am starting to question the schema-variables patch [1] --- that's
> > getting to be quite a large patch and I grow less and less sure
> > that it's solving a problem our users want solved.  I think what
> > people actually want is better support of the existing mechanism
> > for ad-hoc session variables, namely abusing custom GUCs for that
> > purpose.  One of the big reasons we have been resistant to formally
> > supporting that is fear of the poor performance guc.c would have
> > with lots of variables.  But we already have quite a lot of them:
> >
> >
> The bad performance is not the main reason for implementing session
> variables (and in almost all cases the performance of GUC is not a problem,
> because it is not a bottleneck, and in some terrible cases, I can save the
> GUC to a variable). There are other differences:
>
> 1. Session variables metadata can be persistent - so the usage of session
> variables can be checked by static analyze like plpgsql_check
>
> 2. Session variables supports not atomic data types - so the work with row
> types or arrays is much more comfortable and faster, because there is no
> conversion string <-> binary
>
> 3. Session variables allows to set access rights
>
> 4. Session variables are nullable and allowed to specify default values.
>
> I don't think so users have ten thousand GUC and the huge count of GUC is
> the main performance problem. The source of the performance problem is
> storing the value only as string.

I think we can also mention those differences with the proposed schema
variables:

- schema variables have normal SQL integration, having to use current_setting()
  isn't ideal (on top of only supporting text) and doesn't really play nice
  with pg_stat_statements for instance

- schema variables implement stability in a single SQL statement (not in
  plpgsql), while current_setting always report the latest set value.  This one
  may or may not be wanted, and maybe the discrepancy with procedural languages
  would be too problematic, but it's still something proposed




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Pavel Stehule
út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule 
napsal:

> Hi
>
> út 6. 9. 2022 v 0:28 odesílatel Tom Lane  napsal:
>
>> Attached is a patch series that attempts to modernize our GUC
>> infrastructure, in particular removing the performance bottlenecks
>> it has when there are lots of GUC variables.  I wrote this because
>> I am starting to question the schema-variables patch [1] --- that's
>> getting to be quite a large patch and I grow less and less sure
>> that it's solving a problem our users want solved.  I think what
>> people actually want is better support of the existing mechanism
>> for ad-hoc session variables, namely abusing custom GUCs for that
>> purpose.  One of the big reasons we have been resistant to formally
>> supporting that is fear of the poor performance guc.c would have
>> with lots of variables.  But we already have quite a lot of them:
>>
>>
> The bad performance is not the main reason for implementing session
> variables (and in almost all cases the performance of GUC is not a problem,
> because it is not a bottleneck, and in some terrible cases, I can save the
> GUC to a variable). There are other differences:
>
> 1. Session variables can be persistent - so the usage of session variables
> can be checked by static analyze like plpgsql_check
>

more precious - metadata of session variables are persistent


>
> 2. Session variables supports not atomic data types - so the work with row
> types or arrays is much more comfortable and faster, because there is no
> conversion string <-> binary
>
> 3. Session variables allows to set access rights
>
> 4. Session variables are nullable and allowed to specify default values.
>
> I don't think so users have ten thousand GUC and the huge count of GUC is
> the main performance problem. The source of the performance problem is
> storing the value only as string.
>
> Regards
>
> Pavel
>
>
>


Re: Modernizing our GUC infrastructure

2022-09-05 Thread Pavel Stehule
Hi

út 6. 9. 2022 v 0:28 odesílatel Tom Lane  napsal:

> Attached is a patch series that attempts to modernize our GUC
> infrastructure, in particular removing the performance bottlenecks
> it has when there are lots of GUC variables.  I wrote this because
> I am starting to question the schema-variables patch [1] --- that's
> getting to be quite a large patch and I grow less and less sure
> that it's solving a problem our users want solved.  I think what
> people actually want is better support of the existing mechanism
> for ad-hoc session variables, namely abusing custom GUCs for that
> purpose.  One of the big reasons we have been resistant to formally
> supporting that is fear of the poor performance guc.c would have
> with lots of variables.  But we already have quite a lot of them:
>
>
The bad performance is not the main reason for implementing session
variables (and in almost all cases the performance of GUC is not a problem,
because it is not a bottleneck, and in some terrible cases, I can save the
GUC to a variable). There are other differences:

1. Session variables can be persistent - so the usage of session variables
can be checked by static analyze like plpgsql_check

2. Session variables supports not atomic data types - so the work with row
types or arrays is much more comfortable and faster, because there is no
conversion string <-> binary

3. Session variables allows to set access rights

4. Session variables are nullable and allowed to specify default values.

I don't think so users have ten thousand GUC and the huge count of GUC is
the main performance problem. The source of the performance problem is
storing the value only as string.

Regards

Pavel


Re: Modernizing our GUC infrastructure

2022-09-05 Thread Junwang Zhao
ah, yes, that makes sense ;)

On Tue, Sep 6, 2022 at 10:48 AM Tom Lane  wrote:
>
> Junwang Zhao  writes:
> >   /*
> > - * Create table with 20% slack
> > + * Create hash table with 20% slack
> >   */
> >   size_vars = num_vars + num_vars / 4;
>
> > Should we change 20% to 25%, I thought that might be
> > a typo.
>
> No ... 20% of the allocated space is spare.
>
> regards, tom lane



-- 
Regards
Junwang Zhao




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Tom Lane
Junwang Zhao  writes:
>   /*
> - * Create table with 20% slack
> + * Create hash table with 20% slack
>   */
>   size_vars = num_vars + num_vars / 4;

> Should we change 20% to 25%, I thought that might be
> a typo.

No ... 20% of the allocated space is spare.

regards, tom lane




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Junwang Zhao
Hi Tom,

@@ -5836,74 +5865,106 @@ build_guc_variables(void)
  }

  /*
- * Create table with 20% slack
+ * Create hash table with 20% slack
  */
  size_vars = num_vars + num_vars / 4;

Should we change 20% to 25%, I thought that might be
a typo.

On Tue, Sep 6, 2022 at 6:28 AM Tom Lane  wrote:
>
> Attached is a patch series that attempts to modernize our GUC
> infrastructure, in particular removing the performance bottlenecks
> it has when there are lots of GUC variables.  I wrote this because
> I am starting to question the schema-variables patch [1] --- that's
> getting to be quite a large patch and I grow less and less sure
> that it's solving a problem our users want solved.  I think what
> people actually want is better support of the existing mechanism
> for ad-hoc session variables, namely abusing custom GUCs for that
> purpose.  One of the big reasons we have been resistant to formally
> supporting that is fear of the poor performance guc.c would have
> with lots of variables.  But we already have quite a lot of them:
>
> regression=# select count(*) from pg_settings;
>  count
> ---
>353
> (1 row)
>
> and more are getting added all the time.  I think this patch series
> could likely be justified just in terms of positive effect on core
> performance, never mind user-added GUCs.
>
> 0001 and 0002 below are concerned with converting guc.c to store its
> data in a dedicated memory context (GUCMemoryContext) instead of using
> raw malloc().  This is not directly a performance improvement, and
> I'm prepared to drop the idea if there's a lot of pushback, but I
> think it would be a good thing to do.  The only hard reason for using
> malloc() there was the lack of ability to avoid throwing elog(ERROR)
> on out-of-memory in palloc().  But mcxt.c grew that ability years ago.
> Switching to a dedicated context would greatly improve visibility and
> accountability of GUC-related allocations.  Also, the 0003 patch will
> switch guc.c to relying on a palloc-based hashtable, and it seems a
> bit silly to have part of the data structure in palloc and part in
> malloc.  However 0002 is a bit invasive, in that it forces code
> changes in GUC check callbacks, if they want to reallocate the new
> value or create an "extra" data structure.  My feeling is that not
> enough external modules use those facilities for this to pose a big
> problem.  However, the ones that are subject to it will have a
> non-fun time tracking down why their code is crashing.  (The recent
> context-header changes mean that you get a very obscure failure when
> trying to pfree() a malloc'd chunk -- for me, that typically ends
> in an assertion failure in generation.c.  Can we make that less
> confusing?)
>
> 0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure
> with a dynahash hash table.  (I also looked at simplehash, but it
> has no support for not elog'ing on OOM, and it increases the size
> of guc.o by 10KB or so.)  This fixes the worse-than-O(N^2) time
> needed to create N new GUCs, as in
>
> do $$
> begin
>   for i in 1..1 loop
> perform set_config('foo.bar' || i::text, i::text, false);
>   end loop;
> end $$;
>
> On my machine, this takes about 4700 ms in HEAD, versus 23 ms
> with this patch.  However, the places that were linearly scanning
> the array now need to use hash_seq_search, so some other things
> like transaction shutdown (AtEOXact_GUC) get slower.
>
> To address that, 0004 adds some auxiliary lists that link together
> just the variables that are interesting for specific purposes.
> This is helpful even without considering the possibility of a
> lot of user-added GUCs: in a typical session, for example, not
> many of those 353 GUCs have non-default values, and even fewer
> get modified in any one transaction (typically, anyway).
>
> As an example of the speedup from 0004, these DO loops:
>
> create or replace function func_with_set(int) returns int
> strict immutable language plpgsql as
> $$ begin return $1; end $$
> set enable_seqscan = false;
>
> do $$
> begin
>   for i in 1..10 loop
> perform func_with_set(i);
>   end loop;
> end $$;
>
> do $$
> begin
>   for i in 1..10 loop
> begin
>   perform func_with_set(i);
> exception when others then raise;
> end;
>   end loop;
> end $$;
>
> take about 260 and 320 ms respectively for me, in HEAD with
> just the stock set of variables.  But after creating 1
> GUCs with the previous DO loop, they're up to about 3200 ms.
> 0004 brings that back down to being indistinguishable from the
> speed with few GUCs.
>
> So I think this is good cleanup in its own right, plus it
> removes one major objection to considering user-defined GUCs
> as a supported feature.
>
> regards, tom lane
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4%3D6SvPe7ke6xPbh%3DK50LUAOwjC3jm8Me9Obg%40mail.gmail.com
>


-- 
Regards
Junwang Zhao




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Tom Lane
Andres Freund  writes:
> It's only half related, but since we're talking about renovating guc.c: I
> think it'd be good if we split the list of GUCs from the rest of the guc
> machinery. Both for humans and compilers it's getting pretty large. And
> commonly one either wants to edit the definition of GUCs or wants to edit the
> GUC machinery.

I don't mind doing that, but it seems like an independent patch.

regards, tom lane




Re: Modernizing our GUC infrastructure

2022-09-05 Thread Andres Freund
Hi,

> I wrote this because I am starting to question the schema-variables patch
> [1] --- that's getting to be quite a large patch and I grow less and less
> sure that it's solving a problem our users want solved. --- that's getting
> to be quite a large patch and I grow less and less sure that it's solving a
> problem our users want solved.  I think what people actually want is better
> support of the existing mechanism for ad-hoc session variables, namely
> abusing custom GUCs for that purpose.

I don't really have an opinion on the highlevel directional question, yet
anyway. But the stuff you're talking about changing in guc.c seem like a good
idea independently.


On 2022-09-05 18:27:46 -0400, Tom Lane wrote:
> 0001 and 0002 below are concerned with converting guc.c to store its
> data in a dedicated memory context (GUCMemoryContext) instead of using
> raw malloc().  This is not directly a performance improvement, and
> I'm prepared to drop the idea if there's a lot of pushback, but I
> think it would be a good thing to do.

+1 - I've been annoyed at this a couple times, even just because it makes it
harder to identify memory leaks etc.


> The only hard reason for using
> malloc() there was the lack of ability to avoid throwing elog(ERROR)
> on out-of-memory in palloc().  But mcxt.c grew that ability years ago.
> Switching to a dedicated context would greatly improve visibility and
> accountability of GUC-related allocations.  Also, the 0003 patch will
> switch guc.c to relying on a palloc-based hashtable, and it seems a
> bit silly to have part of the data structure in palloc and part in
> malloc.  However 0002 is a bit invasive, in that it forces code
> changes in GUC check callbacks, if they want to reallocate the new
> value or create an "extra" data structure.  My feeling is that not
> enough external modules use those facilities for this to pose a big
> problem.  However, the ones that are subject to it will have a
> non-fun time tracking down why their code is crashing.

That sucks, but I think it's a bullet we're going to have to bite at some
point.

Perhaps we could do something like checking MemoryContextContains() and assert
if not allocated in the right context? That way the crash is at least
obvious. Or perhaps even transparently reallocate in that case?  It does look
like MemoryContextContains() currently is broken, I've raised that in the
other thread.


> (The recent context-header changes mean that you get a very obscure failure
> when trying to pfree() a malloc'd chunk -- for me, that typically ends in an
> assertion failure in generation.c.  Can we make that less confusing?)

Hm. We can do better in assert builds, but I'm not sure we want to add the
overhead of explicit checks in normal builds, IIRC David measured the overhead
of additional branches in pfree, and it was noticable.


> 0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure
> with a dynahash hash table.  (I also looked at simplehash, but it
> has no support for not elog'ing on OOM, and it increases the size
> of guc.o by 10KB or so.)

Dynahash seems reasonable here. Hard to believe raw lookup speed is a relevant
bottleneck and due to the string names the key would be pretty wide (could
obviously just be done via pointer, but then the locality benefits aren't as
big).


> However, the places that were linearly scanning the array now need to use
> hash_seq_search, so some other things like transaction shutdown
> (AtEOXact_GUC) get slower.
>
> To address that, 0004 adds some auxiliary lists that link together
> just the variables that are interesting for specific purposes.

Seems sane.


It's only half related, but since we're talking about renovating guc.c: I
think it'd be good if we split the list of GUCs from the rest of the guc
machinery. Both for humans and compilers it's getting pretty large. And
commonly one either wants to edit the definition of GUCs or wants to edit the
GUC machinery.


Greetings,

Andres Freund