Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Andy Fan
Hi Tom:

On Fri, Aug 4, 2023 at 3:13 AM Tom Lane  wrote:

> Andy Fan  writes:
> >> If you use explicit cast, then the code should not be hard, in the
> >> rewrite stage all information should be known.
>
> > Can you point to me where the code is for the XML stuff?
>
> I think Pavel means XMLTABLE, which IMO is an ugly monstrosity of
> syntax --- but count on the SQL committee to do it that way :-(.
>

Thanks for this input!


>
> As far as the current discussion goes, I'm strongly against
> introducing new functions or operators to do the same things that
> we already have perfectly good syntax for.  "There's more than one
> way to do it" isn't necessarily a virtue, and for sure it isn't a
> virtue if people have to rewrite their existing queries to make use
> of your optimization.
>

I agree, this is always the best/only reason I'd like to accept.


>
>
> I do like the idea of attaching a Simplify planner support function
> to jsonb_numeric (and any other ones that seem worth optimizing)
>

I have a study planner support function today,  that looks great and
I don't think we need much work to do to get our goal, that's amzing.

For all the people who are interested in this topic, I will post a
planner support function soon,  you can check that then.

-- 
Best Regards
Andy Fan


Re: Improve join_search_one_level readibilty (one line change)

2023-08-03 Thread Richard Guo
On Fri, Aug 4, 2023 at 10:36 AM David Rowley  wrote:

> The whole lnext() stuff all feels a bit old now that Lists are arrays.
> I think we'd be better adjusting the code to pass the List index where
> we start from rather than the ListCell to start from.  That way we can
> use for_each_from() to iterate rather than for_each_cell().  What's
> there today feels a bit crufty and there's some element of danger that
> the given ListCell does not even belong to the given List.


I think we can go even further to do the same for 'bushy plans' case,
like the attached.

Thanks
Richard


v2-0001-join_search_one_level-tidyup.patch
Description: Binary data


pgbench: allow to exit immediately when any client is aborted

2023-08-03 Thread Yugo NAGATA
Hi,

I would like to propose to add an option to pgbench so that benchmark
can quit immediately when any client is aborted. Currently, when a
client is aborted due to some error, for example, network trouble, 
other clients continue their run until a certain number of transactions
specified -t is reached or the time specified by -T is expired. At the
end, the results are printed, but they are not useful, as the message
"Run was aborted; the above results are incomplete" shows.

For precise benchmark purpose, we would not want to wait to get such
incomplete results, rather we would like to know some trouble happened
to allow a quick retry. Therefore, it would be nice to add an option to
make pgbench exit instead of continuing run in other clients when any
client is aborted. I think adding the optional is better than  whole
behavioural change because some users that use pgbench just in order
to stress on backends for testing purpose rather than benchmark might
not want to stop pgbench even a client is aborted. 

Attached is the patch to add the option --exit-on-abort.
If this option is specified, when any client is aborted, pgbench
immediately quit by calling exit(2).

What do you think about it?

Regards,
Yugo Nagata
-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 539c2795e2..6fae5a43e1 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -765,6 +765,8 @@ static int64 total_weight = 0;
 
 static bool verbose_errors = false; /* print verbose messages of all errors */
 
+static bool exit_on_abort = false;	/* exit when any client is aborted */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -911,6 +913,7 @@ usage(void)
 		   "  -T, --time=NUM   duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --exit-on-abort  exit when any client is aborted\n"
 		   "  --failures-detailed  report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX  prefix for transaction time log file\n"
 		   "   (default: \"pgbench_log\")\n"
@@ -6612,6 +6615,7 @@ main(int argc, char **argv)
 		{"failures-detailed", no_argument, NULL, 13},
 		{"max-tries", required_argument, NULL, 14},
 		{"verbose-errors", no_argument, NULL, 15},
+		{"exit-on-abort", no_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6945,6 +6949,10 @@ main(int argc, char **argv)
 benchmarking_option_set = true;
 verbose_errors = true;
 break;
+			case 16:			/* exit-on-abort */
+benchmarking_option_set = true;
+exit_on_abort = true;
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -7553,11 +7561,13 @@ threadRun(void *arg)
 
 			advanceConnectionState(thread, st, );
 
+			if (exit_on_abort && st->state == CSTATE_ABORTED)
+goto done;
 			/*
 			 * If advanceConnectionState changed client to finished state,
 			 * that's one fewer client that remains.
 			 */
-			if (st->state == CSTATE_FINISHED || st->state == CSTATE_ABORTED)
+			else if (st->state == CSTATE_FINISHED || st->state == CSTATE_ABORTED)
 remains--;
 		}
 
@@ -7592,6 +7602,15 @@ threadRun(void *arg)
 done:
 	disconnect_all(state, nstate);
 
+	for (int i = 0; i < nstate; i++)
+	{
+		if (state[i].state != CSTATE_FINISHED)
+		{
+			pg_log_error("Run was aborted due to an error in thread %d", thread->tid);
+			exit(2);
+		}
+	}
+
 	if (thread->logfile)
 	{
 		if (agg_interval > 0)


Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Andy Fan
Hi:

>
> can confirm the patch's jsonb_object_field_numeric is faster than
> pg_catalog."numeric"(jsonb).
>

Thanks for the confirmation.


>
> This function is not easy to find out...
>
> select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail
>

jsonb_numeric is a prosrc rather than a proname,  that's why you
can't call them directly.

select * from pg_proc where prosrc = 'jsonb_numeric';
select * from pg_proc where proname = 'jsonb_numeric';

It is bound to "numeric"(jsonb) cast, so we can call it with
a->'a'::numeric.

select numeric('{"a":11}'->'a'); --fail.

> select "numeric"('{"a":11}'::jsonb->'a'); --ok
>

The double quotes look weird to me.  but it looks like  a common situation.

select numeric('1'::int); -- failed.
select "numeric"('1'::int); -- ok.

-- 
Best Regards
Andy Fan


Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Pavel Stehule
čt 3. 8. 2023 v 16:27 odesílatel Andy Fan  napsal:

> Hi:
>
>
>>  If you use explicit cast, then the code should not be hard, in the
>> rewrite stage all information should be known.
>>
>
> Can you point to me where the code is for the XML stuff?  I thought
> this is a bad idea but I may accept it if some existing code does
> such a thing already.   "such thing"  is  typeA:typeB is
> converted something else but user can't find out an entry in
> pg_cast for typeA to typeB.
>

in XML there is src/backend/utils/adt/xml.c, the XmlTableGetValue routine.
It is not an internal transformation - and from XML type to some else.

you can look at parser - parse_expr, parse_func. You can watch the
lifecycle of :: operator. There are transformations of nodes to different
nodes

you can look to patches related to SQL/JSON (not fully committed yet) and
json_table




>
>
>> It would be cool but still I didn't see a way to do that without making
>>> something else complex.
>>>
>>
>>  The custom @-> operator you can implement in your own custom extension.
>> Builtin solutions should be generic as it is possible.
>>
>
> I agree, but actually I think there is no clean way to do it, at least I
> dislike the conversion of typeA to typeB in a cast syntax but there
> is no entry in pg_cast for it.  Are you saying something like this
> or I misunderstood you?
>

There is not any possibility of user level space.  The conversions should
be supported by cast from pg_cast, where it is possible. When it is
impossible, then you can raise an exception in some strict mode, or you can
do IO cast. But this is not hard part

You should to teach parser to push type info deeper to some nodes about
expected result

(2023-08-04 05:28:36) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::numeric;
┌─┐
│ numeric │
╞═╡
│   2 │
└─┘
(1 row)

(2023-08-04 05:28:36) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::numeric;
┌─┐
│ numeric │
╞═╡
│   2 │
└─┘
(1 row)

(2023-08-04 05:28:41) postgres=# select ('{"a":2,
"b":"nazdar"}'::jsonb)['a']::int;
┌──┐
│ int4 │
╞══╡
│2 │
└──┘
(1 row)

when the parser iterates over the expression, it crosses ::type node first,
so you have information about the target type. Currently this information
is used when the parser is going back and when the source type is the same
as the target type, the cast can be ignored. Probably it needs to add some
flag to the operator if they are able to use this. Maybe it can be a new
third argument with an expected type. So new kinds of op functions can look
like opfx("any", "any", anyelement) returns anyelement. Maybe you find
another possibility. It can be invisible for me (or for you) now.

It is much more work, but the benefits will be generic. I think this is an
important part for container types, so partial fix is not good, and it
requires a system solution. The performance is important, but without
generic solutions, the complexity increases, and this is a much bigger
problem.

Regards

Pavel




>
> --
> Best Regards
> Andy Fan
>


Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread jian he
On Thu, Aug 3, 2023 at 6:04 PM Matthias van de Meent
 wrote:
>
>
> Is it? Detoasting only happens if the argument was toasted, and I have
> serious doubts that the result of (a->'a') will be toasted in our
> current system. Sure, we do need to allocate an intermediate result,
> but that's in a temporary memory context that should be trivially
> cheap to free.
>
> > /*
> >  * v.val.numeric points into jsonb body, so we need to make a copy to
> >  * return
> >  */
> > retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));
> >
> > At last this method needs 1 extra FuncExpr than my method, this would
> > cost some expression execution effort. I'm not saying we need to avoid
> > expression execution generally, but extracting numeric fields from jsonb
> > looks a reasonable case.
>
> What's tb here?
>
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
>
>

can confirm the patch's jsonb_object_field_numeric is faster than
pg_catalog."numeric"(jsonb).
also it works accurately either jsonb is in the page or in toast schema chunks.
I don't understand why we need to allocate an intermediate result
part. since you cannot directly update a jsonb value field.

This function is not easy to find out...
select numeric('{"a":11}'->'a'); --fail.
select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail
select "numeric"('{"a":11}'::jsonb->'a'); --ok




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-03 Thread Peter Smith
FWIW, I confirmed that my review comments for v22* have all been
addressed in the latest v26* patches.

Thanks!

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Improve join_search_one_level readibilty (one line change)

2023-08-03 Thread David Rowley
On Tue, 1 Aug 2023 at 01:48, Julien Rouhaud  wrote:
> Apart from that +1 from me for the patch, I think it helps focusing the
> attention on what actually matters here.

I think it's worth doing something to improve this code. However, I
think we should go a bit further than what the proposed patch does.

In 1cff1b95a, Tom changed the signature of make_rels_by_clause_joins
to pass the List that the given ListCell belongs to.  This was done
because lnext(), as of that commit, requires the owning List to be
passed to the function, where previously when Lists were linked lists,
that wasn't required.

The whole lnext() stuff all feels a bit old now that Lists are arrays.
I think we'd be better adjusting the code to pass the List index where
we start from rather than the ListCell to start from.  That way we can
use for_each_from() to iterate rather than for_each_cell().  What's
there today feels a bit crufty and there's some element of danger that
the given ListCell does not even belong to the given List.

Doing this seems to shrink down the assembly a bit:

$ wc -l joinrels*
  3344 joinrels_tidyup.s
  3363 joinrels_unpatched.s

I also see a cmovne in joinrels_tidyup.s, which means that there are
fewer jumps which makes things a little better for the branch
predictor as there's fewer jumps. I doubt this is going to be
performance critical, but it's a nice extra bonus to go along with the
cleanup.

old:
cmpl $2, 24(%rsp)
je .L616

new:
cmpl $2, 16(%rsp)
cmovne %edx, %eax

David


join_search_one_level_tidyup.patch
Description: Binary data


Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread Tom Lane
David Rowley  writes:
> Thank you for the report. I've just pushed a patch which I'm hoping will fix 
> it.

Passes now on my VM.

regards, tom lane




Re: Using defines for protocol characters

2023-08-03 Thread Dave Cramer
On Thu, 3 Aug 2023 at 16:59, Tatsuo Ishii  wrote:

> > On Thu, 3 Aug 2023 at 13:25, Tatsuo Ishii  wrote:
> >
> >> > Greetings,
> >> >
> >> > Attached is a patch which introduces a file protocol.h. Instead of
> using
> >> > the actual characters everywhere in the code this patch names the
> >> > characters and removes the comments beside each usage.
> >>
> >> > +#define DESCRIBE_PREPARED   'S'
> >> > +#define DESCRIBE_PORTAL 'P'
> >>
> >> You use these for Close message as well. I don't like the idea because
> >> Close is different message from Describe message.
> >>
> >> What about adding following for Close too use them instead?
> >>
> >> #define CLOSE_PREPARED   'S'
> >> #define CLOSE_PORTAL 'P'
> >>
> >
> > Good catch.
> > I recall when writing this it was a bit hacky.
> > What do you think of PREPARED_SUB_COMMAND   and PORTAL_SUB_COMMAND
> instead
> > of duplicating them ?
>
> Nice. Looks good to me.
>
New patch attached which uses  PREPARED_SUB_COMMAND   and
PORTAL_SUB_COMMAND instead
and uses
PQMSG_REQ_* and  PQMSG_RESP_*  as per Alvaro's suggestion

Dave Cramer


0001-Created-protocol.h.patch
Description: Binary data


Re: Use of additional index columns in rows filtering

2023-08-03 Thread Peter Geoghegan
On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra
 wrote:
> Because my patch is all about reducing the heap pages, which are usually
> the expensive part of the index scan. But you're right the "index scan"
> with index filter may access more index pages, because it has fewer
> "access predicates".

It's not so much the unnecessary index page accesses that bother me.
At least I didn't push that aspect very far when I constructed my
adversarial test case -- index pages were only a small part of the
overall problem. (I mean the problem at runtime, in the executor. The
planner expected to save a small number of leaf page accesses, which
was kinda, sorta the problem there -- though the planner might have
technically still been correct about that, and can't have been too far
wrong in any case.)

The real problem that my adversarial case seemed to highlight was a
problem of extra heap page accesses. The tenk1 table's VM is less than
one page in size, so how could it have been VM buffer hits? Sure,
there were no "table filters" involved -- only "index filters". But
even "index filters" require heap access when the page isn't marked
all-visible in the VM.

That problem just cannot happen with a similar plan that eliminates
the same index tuples within the index AM proper (the index quals
don't even have to be "access predicates" for this to apply, either).
Of course, we never need to check the visibility of index tuples just
to be able to consider eliminating them via nbtree search scan
keys/index quals -- and so there is never any question of heap/VM
access for tuples that don't pass index quals. Not so for "index
filters", where there is at least some chance of accessing the heap
proper just to be able to eliminate non-matches.

While I think that it makes sense to assume that "index filters" are
strictly better than "table filters" (assuming they're directly
equivalent in that they contain the same clause), they're not
*reliably* any better. So "index filters" are far from being anywhere
near as good as an equivalent index qual (AFAICT we should always
assume that that's true). This is true of index quals generally --
this advantage is *not* limited to "access predicate index quals". (It
is most definitely not okay for "index filters" to displace equivalent
"access predicate index quals", but it's also not really okay to allow
them to displace equivalent "index filter predicate index quals" --
the latter case is less bad, but AFAICT they both basically aren't
acceptable "substitutions".)

-- 
Peter Geoghegan




Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread David Rowley
On Fri, 4 Aug 2023 at 11:54, Nathan Bossart  wrote:
> I'm seeing some reliable test failures for 32-bit builds on cfbot [0].  At
> a glance, it looks like the relations are swapped in the plan.

Thank you for the report. I've just pushed a patch which I'm hoping will fix it.

David




Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread Tom Lane
Nathan Bossart  writes:
> On Fri, Aug 04, 2023 at 09:28:51AM +1200, David Rowley wrote:
>> Thank you for reviewing.  I've pushed the patch to master only.

> I'm seeing some reliable test failures for 32-bit builds on cfbot [0].  At
> a glance, it looks like the relations are swapped in the plan.

Yeah, I got the same result in a 32-bit FreeBSD VM.  Probably, the two
plans are of effectively-identical estimated cost, and there's some
roundoff effect in those estimates that differs between machines with
4-byte and 8-byte MAXALIGN.

You could likely stabilize the plan choice by joining two tables that
aren't of identical size -- maybe add an additional WHERE constraint
on one of the tables?

regards, tom lane




Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-03 Thread Richard Guo
On Thu, Aug 3, 2023 at 7:20 PM Ashutosh Bapat 
wrote:

> However, if reparameterization can *not* happen at the planning time, we
> have chosen a plan which can not be realised meeting a dead end. So as long
> as we can ensure that the reparameterization is possible we can delay
> actual translations. I think it should be possible to decide whether
> reparameterization is possible based on the type of path alone. So seems
> doable.
>

That has been done in v2 patch.  See the new added function
path_is_reparameterizable_by_child().

Thanks
Richard


Re: Use of additional index columns in rows filtering

2023-08-03 Thread Peter Geoghegan
On Thu, Aug 3, 2023 at 3:04 PM Tomas Vondra
 wrote:
> When you say "index page accesses" do you mean accesses to index pages,
> or accesses to heap pages from the index scan?

Yes, that's exactly what I mean. Note that that's the dominant cost
for the original BitmapOr plan.

As I said upthread, the original BitmapOr plan has 7 buffer hits. The
breakdown is 1 single heap page access, 3 root page accesses, and 3
leaf page accesses. There is only 1 heap page access because only 1
out of the 3 index scans that feed into the BitmapOr actually end up
finding any matching rows in the index.

In short, the dominant cost here is index page accesses. It's a
particularly good case for my SAOP patch!

> Because my patch is all about reducing the heap pages, which are usually
> the expensive part of the index scan. But you're right the "index scan"
> with index filter may access more index pages, because it has fewer
> "access predicates".

The fact is that your patch correctly picks the cheapest plan, which
is kinda like a risky version of the plan that my SAOP patch would
pick -- it is cheaper for the very same reason. I understand that
that's not your intention at all, but this is clearly what happened.
That's what I meant by "weird second order effects".

To me, it really does kinda look like your patch accidentally
discovered a plan that's fairly similar to the plan that my SAOP patch
would have found by design! Perhaps I should have been clearer on this
point earlier. (If you're only now seeing this for yourself for the
first time, then...oops. No wonder you were confused about which
patch it was I was going on about!)

> I don't quite see that with the tenk1 query we've been discussing (the
> extra buffers were due to non-allvisible heap pages), but I guess that's
> possible.

The extra buffer hits occur because I made them occur by inserting new
tuples where thousand = 42. Obviously, I did it that way because I had
a point that I wanted to make. Obviously, there wouldn't have been any
notable regression from your patch at all if I had (say) inserted
tuples where thousand = 43 instead. (Not for the original "42" query,
at least.)

That's part of the problem, as I see it. Local changes like that can
have outsized impact on individual queries, even though there is no
inherent reason to expect it. How can statistics reliably guide the
planner here? Statistics are supposed to be a summary of the whole
attribute, that allow us to make various generalizations during
planning. But this plan leaves us sensitive to relatively small
changes in one particular "thousand" grouping, with potentially
outsized impact. And, this can happen very suddenly, because it's so
"local".

Making this plan perform robustly just doesn't seem to be one of the
things that statistics can be expected to help us with very much.

--
Peter Geoghegan




Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread Nathan Bossart
On Fri, Aug 04, 2023 at 09:28:51AM +1200, David Rowley wrote:
> Thank you for reviewing.  I've pushed the patch to master only.

I'm seeing some reliable test failures for 32-bit builds on cfbot [0].  At
a glance, it looks like the relations are swapped in the plan.

[0] 
https://api.cirrus-ci.com/v1/artifact/task/5728127981191168/testrun/build-32/testrun/regress/regress/regression.diffs

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Adding argument names to aggregate functions

2023-08-03 Thread Nathan Bossart
On Wed, Jul 19, 2023 at 09:38:12PM +0200, Daniel Gustafsson wrote:
> Great, thanks!  I had a quick look at this while rebasing (as well as your
> updated patch) and it seems like a good idea to add this.  Unless there are
> objections I will look at getting this in.

Hey Daniel, are you still planning on committing this?  I can pick it up if
you are busy.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Use of additional index columns in rows filtering

2023-08-03 Thread Peter Geoghegan
On Thu, Aug 3, 2023 at 2:46 PM Tomas Vondra
 wrote:
> Sure, having more choices means a risk of making mistakes. But does
> simply postponing the choices to runtime actually solves this?

It solves that one problem, yes. This is particularly important in
cases where we would otherwise get truly pathological performance --
not just mediocre or bad performance. Most of the time, mediocre
performance isn't such a big deal.

Having a uniform execution strategy for certain kinds of index scans
is literally guaranteed to beat a static strategy in some cases. For
example, with some SAOP scans (with my patch), we'll have to skip lots
of the index, and then scan lots of the index -- just because of a
bunch of idiosyncratic details that are almost impossible to predict
using statistics. Such an index scan really shouldn't be considered
"moderately skippy". It is not the average of two opposite things --
it is more like two separate things that are opposites.

It's true that even this "moderately skippy" case needs to be costed.
But if we can entirely eliminate variation that really looks like
noise, it should be more likely that we'll get the cheapest plan.
Costing may not be any easier, but getting the cheapest plan might be.

> > 1. An "access predicate" is always strictly better than an equivalent
> > "index filter predicate" (for any definition of "index filter
> > predicate" you can think of).
>
> Yes, probably.
>
> > 2. An "Index Filter: " is always strictly better than an equivalent
> > "Filter: " (i.e. table filter).
>
> Not sure about this. As I explained earlier, I think it needs to
> consider the cost/selectivity of the predicate, and fraction of
> allvisible pages. But yes, it's a static decision.

What I said is limited to "equivalent" predicates. If it's just not
possible to get an "access predicate" at all, then my point 1 doesn't
apply. Similarly, if it just isn't possible to get an "Index Filter"
(only a table filter), then my point #2 doesn't apply.

This does mean that there could still be competition between multiple
index paths for the same composite index, but I have no objections to
that -- it makes sense to me because it isn't duplicative in the way
that I'm concerned about. It just isn't possible to delay anything
until run time in this scenario, so nothing that I've said should
apply.

> (I didn't say it explicitly, but this assumes those paths are not for
> the same index. If they were, then PATH #3 would have to exist too.)

That changes everything, then. If they're completely different indexes
then nothing I've said should apply. I can't think of a way to avoid
making an up-front commitment to that in the planner (I'm thinking of
far more basic things than that).

> I feel a bit like the rubber duck from [1], but I'm OK with that ;-)

Not from my point of view. Besides, even when somebody says that they
just don't understand what I'm saying at all (which wasn't ever fully
the case here), that is often useful feedback in itself.

-- 
Peter Geoghegan




Re: Using defines for protocol characters

2023-08-03 Thread Tatsuo Ishii
> On Thu, 3 Aug 2023 at 13:25, Tatsuo Ishii  wrote:
> 
>> > Greetings,
>> >
>> > Attached is a patch which introduces a file protocol.h. Instead of using
>> > the actual characters everywhere in the code this patch names the
>> > characters and removes the comments beside each usage.
>>
>> > +#define DESCRIBE_PREPARED   'S'
>> > +#define DESCRIBE_PORTAL 'P'
>>
>> You use these for Close message as well. I don't like the idea because
>> Close is different message from Describe message.
>>
>> What about adding following for Close too use them instead?
>>
>> #define CLOSE_PREPARED   'S'
>> #define CLOSE_PORTAL 'P'
>>
> 
> Good catch.
> I recall when writing this it was a bit hacky.
> What do you think of PREPARED_SUB_COMMAND   and PORTAL_SUB_COMMAND instead
> of duplicating them ?

Nice. Looks good to me.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Use of additional index columns in rows filtering

2023-08-03 Thread Tomas Vondra



On 8/3/23 21:21, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra
>  wrote:
>> Not sure. I'm a bit confused about what exactly is so risky on the plan
>> produced with the patch.
> 
> It's all about the worst case. In the scenarios that I'm concerned
> about, we can be quite sure that the saving from not using a BitmapOr
> will be fairly low -- the cost of not having to repeat the same index
> page accesses across several similar index scans is, at best, some
> small multiple of the would-be number of index scans that the BitmapOr
> plan gets. We can be certain that the possible benefits are fixed and
> low. This is always true; presumably the would-be BitmapOr plan can
> never have all that many index scans. And we always know how many
> index scans a BitmapOr plan would use up front.
> 

When you say "index page accesses" do you mean accesses to index pages,
or accesses to heap pages from the index scan?

Because my patch is all about reducing the heap pages, which are usually
the expensive part of the index scan. But you're right the "index scan"
with index filter may access more index pages, because it has fewer
"access predicates".

I don't quite see that with the tenk1 query we've been discussing (the
extra buffers were due to non-allvisible heap pages), but I guess that's
possible.

> On the other hand, the possible downsides have no obvious limit. So
> even if we're almost certain to win on average, we only have to be
> unlucky once to lose all we gained before that point. As a general
> rule, we want the index AM to have all the context required to
> terminate its scan at the earliest possible opportunity. This is
> enormously important in the worst case.
> 

Yeah, I agree there's some asymmetry in the risk/benefit. It's not
unlike e.g. seqscan vs. index scan, where the index scan can't really
save more than what the seqscan costs, but it can get (almost)
arbitrarily expensive.

> It's easier for me to make this argument because I know that we don't
> really need to make any trade-off here. But even if that wasn't the
> case, I'd probably arrive at the same general conclusion.
> 
> Importantly, it isn't possible to make a similar argument that works
> in the opposite direction -- IMV that's the difference between this
> flavor of riskiness, and the inevitable riskiness that comes with any
> planner change. In other words, your patch isn't going to win by an
> unpredictably high amount. Not in the specific scenarios that I'm
> focussed on here, with a BitmapOr + multiple index scans getting
> displaced.
> 

True. It probably can't beat BitmapOr plan if it means moving access
predicate into index filter (or even worse a table filter).

> The certainty about the upside is just as important as the uncertainty
> about the downside. The huge asymmetry matters, and is fairly
> atypical. If, somehow, there was less certainty about the possible
> upside, then my argument wouldn't really work.
> 


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Use of additional index columns in rows filtering

2023-08-03 Thread Tomas Vondra



On 8/3/23 20:50, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra
>  wrote:
>>> I understand that that's how the patch is structured. It is
>>> nevertheless possible (as things stand) that the patch will make the
>>> planner shift from a plan that uses "Access Predicates" to the maximum
>>> extent possible when scanning a composite index, to a similar plan
>>> that has a similar index scan, for the same index, but with fewer
>>> "Access Predicates" in total. In effect, the patched planner will swap
>>> one type of predicate for another type because doing so enables the
>>> executor to scan an index once instead of scanning it several times.
>>>
>>
>> That seems very much like something the costing is meant to handle, no?
>> I mean, surely "access predicate" and "filter" should affect the cost
>> differently, with "filter" being more expensive (and table filter being
>> more expensive than index filter).
> 
> I'm not 100% sure that it's not a costing issue, but intuitively it
> doesn't seem like one.
> 
> As Goetz Graefe once said, "choice is confusion". It seems desirable
> to have fewer, better index paths. This is possible whenever there is
> a way to avoid the index paths that couldn't possibly be better in the
> first place. Though we must also make sure that there is no real
> downside -- possibly by teaching the executor to behave adaptively
> instead of needlessly trusting what the planner says. Turning a plan
> time decision into a runtime decision seems strictly better.
> 

Sure, having more choices means a risk of making mistakes. But does
simply postponing the choices to runtime actually solves this? Even if
you're able to do a perfect choice at that point, it's only works for
that particular path type (e.g. index scan). You still need to cost it
somehow, to decide which path type to pick ...

Perhaps my mental model of what you intend to do is wrong?

> Obviously the planner will always need to be trusted to a significant
> degree (especially for basic things like join order), but why not
> avoid it when we can avoid it without any real downsides? Having lots
> of slightly different index paths with slightly different types of
> logically equivalent predicates seems highly undesirable, and quite
> avoidable.
> 
> ISTM that it should be possible to avoid generating some of these
> index paths based on static rules that assume that:
> 
> 1. An "access predicate" is always strictly better than an equivalent
> "index filter predicate" (for any definition of "index filter
> predicate" you can think of).

Yes, probably.

> 2. An "Index Filter: " is always strictly better than an equivalent
> "Filter: " (i.e. table filter).

Not sure about this. As I explained earlier, I think it needs to
consider the cost/selectivity of the predicate, and fraction of
allvisible pages. But yes, it's a static decision.

> 
> The first item is what I've been going on about, of course. The second
> item is the important principle behind your patch -- and one that I
> also agree with. I don't see any contradictions here -- these two
> principles are compatible. I think that we can have it both ways.
> 
>> AFAICS the assumption is that path #1 should be better, as it has two
>> proper access predicates. But maybe if we add another condition C, it
>> might end up like this:
>>
>>   PATH #1: access predicates (A,B), table filter C
>>   PATH #2: access predicate A, index filter (B,C)
>>
>> and #2 will end up winning.
> 
> Why wouldn't we expect there to also be this path:
> 
> PATH #3: access predicates (A,B), index filter C
> 

Why? Maybe the index doesn't have all the columns needed for condition
C, in which case it has to be evaluated as a table filter.

(I didn't say it explicitly, but this assumes those paths are not for
the same index. If they were, then PATH #3 would have to exist too.)

> And why wouldn't we also expect this other path to always be better?
> So much better that we don't even need to bother generating PATH #1
> and PATH #2 in the first place, even?
> 

Yes, I agree that path would likely be superior to the other paths.

> Right now there are weird reasons why it might not be so -- strange
> interactions with things like BitmapOr nodes that could make either
> PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem
> particularly fundamental to me. We should probably just avoid those
> plan shapes that have the potential to make PATH #1 and PATH #2
> slightly cheaper, due only to perverse interactions.
> 
>> I like the discussion, but it feels a bit abstract (and distant from
>> what the patch aimed to do) and I have trouble turning it into something
>> actionable.
> 
> I think that I have gotten a lot out of this discussion -- it has made
> my thinking about this stuff more rigorous. I really appreciate that.
> 

I feel a bit like the rubber duck from [1], but I'm OK with that ;-)

>> Does this apply to the index scan vs. index-only scans too? That is, do
>> you think we 

Re: Using defines for protocol characters

2023-08-03 Thread Dave Cramer
On Thu, 3 Aug 2023 at 15:22, Dave Cramer  wrote:

>
>
> On Thu, 3 Aug 2023 at 13:25, Tatsuo Ishii  wrote:
>
>> > Greetings,
>> >
>> > Attached is a patch which introduces a file protocol.h. Instead of using
>> > the actual characters everywhere in the code this patch names the
>> > characters and removes the comments beside each usage.
>>
>> > +#define DESCRIBE_PREPARED   'S'
>> > +#define DESCRIBE_PORTAL 'P'
>>
>> You use these for Close message as well. I don't like the idea because
>> Close is different message from Describe message.
>>
>> What about adding following for Close too use them instead?
>>
>> #define CLOSE_PREPARED   'S'
>> #define CLOSE_PORTAL 'P'
>>
>
> Good catch.
> I recall when writing this it was a bit hacky.
> What do you think of PREPARED_SUB_COMMAND   and PORTAL_SUB_COMMAND instead
> of duplicating them ?
>

While reviewing this I found a number of mistakes where I used
DATA_ROW_RESPONSE instead of DESCRIBE_REQUEST.

I can provide a patch now or wait until we resolve the above


>
> Dave
>


Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread David Rowley
On Fri, 4 Aug 2023 at 02:02, Andy Fan  wrote:
> I have checked the updated patch and LGTM.

Thank you for reviewing.  I've pushed the patch to master only.

David




Re: Using defines for protocol characters

2023-08-03 Thread Dave Cramer
On Thu, 3 Aug 2023 at 13:25, Tatsuo Ishii  wrote:

> > Greetings,
> >
> > Attached is a patch which introduces a file protocol.h. Instead of using
> > the actual characters everywhere in the code this patch names the
> > characters and removes the comments beside each usage.
>
> > +#define DESCRIBE_PREPARED   'S'
> > +#define DESCRIBE_PORTAL 'P'
>
> You use these for Close message as well. I don't like the idea because
> Close is different message from Describe message.
>
> What about adding following for Close too use them instead?
>
> #define CLOSE_PREPARED   'S'
> #define CLOSE_PORTAL 'P'
>

Good catch.
I recall when writing this it was a bit hacky.
What do you think of PREPARED_SUB_COMMAND   and PORTAL_SUB_COMMAND instead
of duplicating them ?

Dave


Re: Track Oldest Initialized WAL Buffer Page

2023-08-03 Thread Daniel Gustafsson
> On 3 Jul 2023, at 15:27, Heikki Linnakangas  wrote:

> But I don't see the point of tracking OldestInitializedPage. It seems cheap 
> enough that we could, if there's a need for it, but I don't see the need.

Based on the above comments, and the thread stalling, I am marking this
returned with feedback.  Please feel free to continue the discussion here and
re-open a new entry in a future CF if there is a new version of the patch.

--
Daniel Gustafsson





Re: Introduce "log_connection_stages" setting.

2023-08-03 Thread Daniel Gustafsson
> On 3 Jul 2023, at 15:57, Daniel Gustafsson  wrote:
> 
>> On 16 May 2023, at 20:51, Sergey Dudoladov  
>> wrote:
> 
>> I have attached the fourth version of the patch.
> 
> This version fails the ldap_password test on all platforms on pg_ctl failing 
> to start:
> 
> [14:48:10.544] --- stdout ---
> [14:48:10.544] # executing test in 
> /tmp/cirrus-ci-build/build/testrun/ldap_password_func/001_mutated_bindpasswd 
> group ldap_password_func test 001_mutated_bindpasswd
> [14:48:10.544] # waiting for slapd to accept requests...
> [14:48:10.544] # setting up PostgreSQL instance
> [14:48:10.544] Bail out! pg_ctl start failed
> [14:48:10.544] # test failed
> 
> Updating src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl with
> the new GUC might solve the problem from skimming this.
> 
> Please send a fixed version, I'm marking the patch Waiting on Author in the
> meantime.

With the patch failing tests and the thread stalled with no update I am marking
this returned with feedback. Please feel free to resubmit to a future CF.

--
Daniel Gustafsson





Re: Configurable FP_LOCK_SLOTS_PER_BACKEND

2023-08-03 Thread Tomas Vondra
On 8/3/23 01:51, Matt Smiley wrote:
> I thought it might be helpful to share some more details from one of the
> case studies behind Nik's suggestion.
> 
> Bursty contention on lock_manager lwlocks recently became a recurring
> cause of query throughput drops for GitLab.com, and we got to study the
> behavior via USDT and uprobe instrumentation along with more
> conventional observations (see
> https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/2301
> ). 
> This turned up some interesting finds, and I thought sharing some of
> that research might be helpful.
> 

The analysis in the linked gitlab issue is pretty amazing. I wasn't
planning to argue against the findings anyway, but plenty of data
supporting the conclusions is good.

I'm not an expert on locking, so some of the stuff I say may be
trivially obvious - it's just me thinking about ...

I wonder what's the rough configuration of those systems, though. Both
the hardware and PostgreSQL side. How many cores / connections, etc.?


> Results so far suggest that increasing FP_LOCK_SLOTS_PER_BACKEND would
> have a much larger positive impact than any other mitigation strategy we
> have evaluated.  Rather than reducing hold duration or collision rate,
> adding fastpath slots reduces the frequency of even having to acquire
> those lock_manager lwlocks.  I suspect this would be helpful for many
> other workloads, particularly those having high frequency queries whose
> tables collectively have more than about 16 or indexes.
> 

Yes, I agree with that. Partitioning makes this issue works, I guess.
Schemas with indexes on every column are disturbingly common these days
too, which hits the issue too ...

> Lowering the lock_manager lwlock acquisition rate means lowering its
> contention rate (and probably also its contention duration, since
> exclusive mode forces concurrent lockers to queue).
> 
> I'm confident this would help our workload, and I strongly suspect it
> would be generally helpful by letting queries use fastpath locking more
> often.
> 

OK

>> However, the lmgr/README says this is meant to alleviate contention on
>> the lmgr partition locks. Wouldn't it be better to increase the number
>> of those locks, without touching the PGPROC stuff?
> 
> That was my first thought too, but growing the lock_manager lwlock
> tranche isn't nearly as helpful.
> 
> On the slowpath, each relation's lock tag deterministically hashes onto
> a specific lock_manager lwlock, so growing the number of lock_manager
> lwlocks just makes it less likely for two or more frequently locked
> relations to hash onto the same lock_manager.
> 

Hmmm, so if we have a query that joins 16 tables, or a couple tables
with indexes, all backends running this will acquire exactly the same
partition locks. And we're likely acquiring them in exactly the same
order (to lock objects in the same order because of deadlocks), making
the issue worse.

> In contrast, growing the number of fastpath slots completely avoids
> calls to the slowpath (i.e. no need to acquire a lock_manager lwlock).
> 
> The saturation condition we'd like to solve is heavy contention on one
> or more of the lock_manager lwlocks.  Since that is driven by the
> slowpath acquisition rate of heavyweight locks, avoiding the slowpath is
> better than just moderately reducing the contention on the slowpath.
> 
> To be fair, increasing the number of lock_manager locks definitely can
> help to a certain extent, but it doesn't cover an important general
> case.  As a thought experiment, suppose we increase the lock_manager
> tranche to some arbitrarily large size, larger than the number of
> relations in the db.  This unrealistically large size means we have the
> best case for avoiding collisions -- each relation maps uniquely onto
> its own lock_manager lwlock.  That helps a lot in the case where the
> workload is spread among many non-overlapping sets of relations.  But it
> doesn't help a workload where any one table is accessed frequently via
> slowpath locking.
> 

Understood.

> Continuing the thought experiment, if that frequently queried table has
> 16 or more indexes, or if it is joined to other tables that collectively
> add up to over 16 relations, then each of those queries is guaranteed to
> have to use the slowpath and acquire the deterministically associated
> lock_manager lwlocks.
> 
> So growing the tranche of lock_manager lwlocks would help for some
> workloads, while other workloads would not be helped much at all.  (As a
> concrete example, a workload at GitLab has several frequently queried
> tables with over 16 indexes that consequently always use at least some
> slowpath locks.)
> 
> For additional context:
> 
> https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/2301#what-influences-lock_manager-lwlock-acquisition-rate
>  
> 

Re: [PoC] Implementation of distinct in Window Aggregates

2023-08-03 Thread Daniel Gustafsson
> On 11 Jul 2023, at 01:06, Andreas Karlsson  wrote:
> 
> On 3/12/23 09:17, Ankit Kumar Pandey wrote:
>> Attaching updated patch with a fix for an issue in window function.
>> I have also fixed naming convention of patch as last patch had incompatible 
>> name.
> 
> Hi,
> 
> This patch does not apply to master. Could you rebase it and submit it as one 
> patch which applies directly to master? Maybe I am wrong but the latest 
> version looks like it only applies on top of one of your previous patches 
> which makes it hard for the reviewer.

Since no update was posted, the patch was considered a PoC and the thread has
stalled, I will mark this returned with feedback.  Please feel free to reopen a
new CF entry when there is a new patch available which addresses Andreas'
feedback on patch structure.

--
Daniel Gustafsson





Re: POC, WIP: OR-clause support for indexes

2023-08-03 Thread Alena Rybakina

On 02.08.2023 21:58, Peter Geoghegan wrote:

On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina  wrote:

No, I haven't thought about it yet. I studied the example and it would
really be nice to add optimization here. I didn't notice any problems
with its implementation. I also have an obvious example with the "or"
operator, for example
, select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1
) ...;

Although I think such a case will be used less often.

Right. As I said, I don't particularly care about the row constructor
syntax -- it's not essential.

In my experience patches like this one that ultimately don't succeed
usually *don't* have specific problems that cannot be fixed. The real
problem tends to be ambiguity about the general high level design. So
more than anything else, ambiguity is the thing that you need to
minimize to be successful here. This is the #1 practical problem, by
far. This may be the only thing about your patch that I feel 100% sure
of.

In my experience it can actually be easier to expand the scope of a
project, and to come up with a more general solution:

https://en.wikipedia.org/wiki/Inventor%27s_paradox

I'm not trying to make your work more difficult by expanding its
scope. I'm actually trying to make your work *easier* by expanding its
scope. I don't claim to know what the specific scope of your patch
should be at all. Just that it might be possible to get a much clearer
picture of what the ideal scope really is by *trying* to generalize it
further -- that understanding is what we lack right now. Even if this
exercise fails in some way, it won't really have been a failure. The
reasons why it fails will still be interesting and practically
relevant.

As I explained to Jim, I am trying to put things in this area on a
more rigorous footing. For example, I have said that the way that the
nbtree code executes SAOP quals is equivalent to DNF. That is
basically true, but it's also my own slightly optimistic
interpretation of history and of the design. That's a good start, but
it's not enough on its own.

My interpretation might still be wrong in some subtle way, that I have
yet to discover. That's really what I'm concerned about with your
patch, too. I'm currently trying to solve a problem that I don't yet
fully understand, so for me "getting a properly working flow of
information" seems like a good practical exercise. I'm trying to
generalize the design of my own patch as far as I can, to see what
breaks, and why it breaks. My intuition is that this will help me with
my own patch by forcing me to gain a truly rigorous understanding of
the problem.

My suggestion about generalizing your approach to cover RowCompareExpr
cases is what I would do, if I were you, and this was my patch. That's
almost exactly what I'm doing with my own patch already, in fact.

It's all right. I understand your position)

I also agree to try to find other optimization cases and generalize them.

I read the wiki article, and as I understand it, in such a situation we 
see a difficult problem with finding expressions that need to be 
converted into a logically correct expression and simplify execution for 
the executor. For example, this is a ROW type. It can have a simpler 
expression with AND and OR operations, besides we can exclude 
duplicates. But some of these transformations may be incorrect or they 
will have a more complex representation. We can try to find the 
problematic expressions and try to combine them into groups and finally 
find a solutions for each groups or, conversely, discover that the 
existing transformation is uncorrected. If I understand correctly, we 
should first start searching for "ROW" expressions (define a group for 
them) and think about a solution for the group.

It seems to me the most difficult thing is to notice problematic cases
where the transformations are incorrect, but I think it can be implemented.

Right. To be clear, I am sure that it won't be practical to come up
with a 100% theoretically pure approach. If for no other reason than
this: normalizing to CNF in all cases will run into problems with very
complex predicates. It might even be computationally intractable
(could just be very slow). So there is a clear need to keep
theoretical purity in balance with practical considerations. There is
a need for some kind of negotiation between those two things. Probably
some set of heuristics will ultimately be required to keep costs and
benefits in balance.

I don't expect you to determine what set of heuristics will ultimately
be required to determine when and how to perform CNF conversions, in
the general case. But having at least some vague idea seems like it
might increase confidence in your design.
I agree, but I think this will be the second step after solutions are 
found.

Do you think that this problem is just an accidental side-effect? It
isn't necessarily the responsibility of your patch to fix such things.
If it's even possible for selectivity 

Re: Using defines for protocol characters

2023-08-03 Thread Tatsuo Ishii
> Greetings,
> 
> Attached is a patch which introduces a file protocol.h. Instead of using
> the actual characters everywhere in the code this patch names the
> characters and removes the comments beside each usage.

> +#define DESCRIBE_PREPARED   'S'
> +#define DESCRIBE_PORTAL 'P'

You use these for Close message as well. I don't like the idea because
Close is different message from Describe message.

What about adding following for Close too use them instead?

#define CLOSE_PREPARED   'S'
#define CLOSE_PORTAL 'P'

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Use of additional index columns in rows filtering

2023-08-03 Thread Peter Geoghegan
On Thu, Aug 3, 2023 at 11:17 AM Tomas Vondra
 wrote:
> Not sure. I'm a bit confused about what exactly is so risky on the plan
> produced with the patch.

It's all about the worst case. In the scenarios that I'm concerned
about, we can be quite sure that the saving from not using a BitmapOr
will be fairly low -- the cost of not having to repeat the same index
page accesses across several similar index scans is, at best, some
small multiple of the would-be number of index scans that the BitmapOr
plan gets. We can be certain that the possible benefits are fixed and
low. This is always true; presumably the would-be BitmapOr plan can
never have all that many index scans. And we always know how many
index scans a BitmapOr plan would use up front.

On the other hand, the possible downsides have no obvious limit. So
even if we're almost certain to win on average, we only have to be
unlucky once to lose all we gained before that point. As a general
rule, we want the index AM to have all the context required to
terminate its scan at the earliest possible opportunity. This is
enormously important in the worst case.

It's easier for me to make this argument because I know that we don't
really need to make any trade-off here. But even if that wasn't the
case, I'd probably arrive at the same general conclusion.

Importantly, it isn't possible to make a similar argument that works
in the opposite direction -- IMV that's the difference between this
flavor of riskiness, and the inevitable riskiness that comes with any
planner change. In other words, your patch isn't going to win by an
unpredictably high amount. Not in the specific scenarios that I'm
focussed on here, with a BitmapOr + multiple index scans getting
displaced.

The certainty about the upside is just as important as the uncertainty
about the downside. The huge asymmetry matters, and is fairly
atypical. If, somehow, there was less certainty about the possible
upside, then my argument wouldn't really work.

-- 
Peter Geoghegan




Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Tom Lane
Andy Fan  writes:
>> If you use explicit cast, then the code should not be hard, in the
>> rewrite stage all information should be known.

> Can you point to me where the code is for the XML stuff?

I think Pavel means XMLTABLE, which IMO is an ugly monstrosity of
syntax --- but count on the SQL committee to do it that way :-(.

As far as the current discussion goes, I'm strongly against
introducing new functions or operators to do the same things that
we already have perfectly good syntax for.  "There's more than one
way to do it" isn't necessarily a virtue, and for sure it isn't a
virtue if people have to rewrite their existing queries to make use
of your optimization.

Also, why stop at optimizing "(jsonbval->'fld')::sometype"?  There are
many other extraction cases that people might wish were faster, such
as json array indexing, nested fields, etc.  It certainly won't make
sense to introduce yet another set of functions for each pattern you
want to optimize --- or at least, we won't want to ask users to change
their queries to invoke those functions explicitly.

I do like the idea of attaching a Simplify planner support function
to jsonb_numeric (and any other ones that seem worth optimizing)
that can convert a stack of jsonb transformations into a bundled
operation that avoids unnecessary conversions.  Then you get the
speedup without any need for people to change their existing queries.
We'd still have functions like jsonb_field_as_numeric() under the
hood, but there's not an expectation that users call them explicitly.
(Alternatively, the output of this Simplify could be a new kind of
expression node that bundles one or more jsonb extractions with a
type conversion.  I don't have an opinion yet on which way is better.)

regards, tom lane




Re: Using defines for protocol characters

2023-08-03 Thread Nathan Bossart
On Thu, Aug 03, 2023 at 12:07:21PM -0600, Dave Cramer wrote:
> On Thu, 3 Aug 2023 at 11:59, Alvaro Herrera  wrote:
>> I don't really like the name pattern you've chosen though; I think we
>> need to have a common prefix in the defines.  Maybe prepending PQMSG_ to
>> each name would be enough.  And maybe turn the _RESPONSE and _REQUEST
>> suffixes you added into prefixes as well, so instead of PARSE_REQUEST
>> you could make it PQMSG_REQ_PARSE, PQMSG_RESP_BIND_COMPLETE and so
>> on.
>>
> That becomes trivial to do now that the names are defined. I presumed
> someone would object to the names.
> I'm fine with the names you propose, but I suggest we wait to see if anyone
> objects.

I'm okay with the proposed names as well.

> + * src/include/protocol.h

Could we put these definitions in an existing header such as
src/include/libpq/pqcomm.h?  I see that's where the authentication request
codes live today.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Use of additional index columns in rows filtering

2023-08-03 Thread Peter Geoghegan
On Thu, Aug 3, 2023 at 4:57 AM Tomas Vondra
 wrote:
> > I understand that that's how the patch is structured. It is
> > nevertheless possible (as things stand) that the patch will make the
> > planner shift from a plan that uses "Access Predicates" to the maximum
> > extent possible when scanning a composite index, to a similar plan
> > that has a similar index scan, for the same index, but with fewer
> > "Access Predicates" in total. In effect, the patched planner will swap
> > one type of predicate for another type because doing so enables the
> > executor to scan an index once instead of scanning it several times.
> >
>
> That seems very much like something the costing is meant to handle, no?
> I mean, surely "access predicate" and "filter" should affect the cost
> differently, with "filter" being more expensive (and table filter being
> more expensive than index filter).

I'm not 100% sure that it's not a costing issue, but intuitively it
doesn't seem like one.

As Goetz Graefe once said, "choice is confusion". It seems desirable
to have fewer, better index paths. This is possible whenever there is
a way to avoid the index paths that couldn't possibly be better in the
first place. Though we must also make sure that there is no real
downside -- possibly by teaching the executor to behave adaptively
instead of needlessly trusting what the planner says. Turning a plan
time decision into a runtime decision seems strictly better.

Obviously the planner will always need to be trusted to a significant
degree (especially for basic things like join order), but why not
avoid it when we can avoid it without any real downsides? Having lots
of slightly different index paths with slightly different types of
logically equivalent predicates seems highly undesirable, and quite
avoidable.

ISTM that it should be possible to avoid generating some of these
index paths based on static rules that assume that:

1. An "access predicate" is always strictly better than an equivalent
"index filter predicate" (for any definition of "index filter
predicate" you can think of).
2. An "Index Filter: " is always strictly better than an equivalent
"Filter: " (i.e. table filter).

The first item is what I've been going on about, of course. The second
item is the important principle behind your patch -- and one that I
also agree with. I don't see any contradictions here -- these two
principles are compatible. I think that we can have it both ways.

> AFAICS the assumption is that path #1 should be better, as it has two
> proper access predicates. But maybe if we add another condition C, it
> might end up like this:
>
>   PATH #1: access predicates (A,B), table filter C
>   PATH #2: access predicate A, index filter (B,C)
>
> and #2 will end up winning.

Why wouldn't we expect there to also be this path:

PATH #3: access predicates (A,B), index filter C

And why wouldn't we also expect this other path to always be better?
So much better that we don't even need to bother generating PATH #1
and PATH #2 in the first place, even?

Right now there are weird reasons why it might not be so -- strange
interactions with things like BitmapOr nodes that could make either
PATH #1 or PATH #2 look slightly cheaper. But that doesn't seem
particularly fundamental to me. We should probably just avoid those
plan shapes that have the potential to make PATH #1 and PATH #2
slightly cheaper, due only to perverse interactions.

> I like the discussion, but it feels a bit abstract (and distant from
> what the patch aimed to do) and I have trouble turning it into something
> actionable.

I think that I have gotten a lot out of this discussion -- it has made
my thinking about this stuff more rigorous. I really appreciate that.

> Does this apply to the index scan vs. index-only scans too? That is, do
> you think we should we have just one index-scan path, doing index-only
> stuff when possible?

I think so, yes. But index-only scans don't appear under BitmapOr
nodes, so right now I can't think of an obvious way of demonstrating
that this is true. Maybe it accidentally doesn't come up with
index-only scans in practice, but the same underlying principles
should be just as true.

> If we can form some sort of plan what needs to be done (both for my
> patch and for the SAOP patch), I'm willing to work on it ... But it's
> not quite clear to me what the requirements are.

I do hope to have more concrete proposals soon. Thanks for being patient.

For what it's worth, I actually think that there is a good chance that
I'll end up relying on what you've done here to make certain things I
want to do with the SOAP patch okay. It would be rather convenient to
be able to handle some of the SAOP safety issues without needing any
table filters (just index filters), in some corner cases. I think that
what you're doing here makes a lot of sense. FWIW, I am already
personally invested in the success of your patch.

-- 
Peter Geoghegan




Re: Use of additional index columns in rows filtering

2023-08-03 Thread Tomas Vondra
On 8/3/23 18:47, Peter Geoghegan wrote:
> On Thu, Aug 3, 2023 at 4:20 AM Tomas Vondra
>  wrote:
>> Which is just the 7 buffers ...
>>
>> Did I do something wrong?
> 
> I think that it might have something to do with your autovacuum
> settings. Note that the plan that you've shown for the master branch
> isn't the same one that appears in
> src/test/regress/expected/create_index.out for the master branch --
> that plan (the BitmapOr plan) was my baseline case for master.
> 
> That said, I am a little surprised that you could ever get the plan
> that you showed for master (without somehow unnaturally forcing it).
> It's almost the same plan that your patch gets, but much worse. Your
> patch can use an index filter, but master uses a table filter instead.
> 

Well I did force it - I thought we're talking about regular index scans,
so I disabled bitmap scans. Without doing that I get the BitmapOr plan
like you.

However, with the patch I get this behavior (starting from a "fresh"
state right after "make installcheck")

QUERY PLAN

 Index Scan using tenk1_thous_tenthous on tenk1
   (cost=0.29..8.38 rows=1 width=244)
   (actual time=0.033..0.036 rows=1 loops=1)
   Index Cond: (thousand = 42)
   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Rows Removed by Index Recheck: 9
   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Buffers: shared read=4
 Planning:
   Buffers: shared hit=119 read=32
 Planning Time: 0.673 ms
 Execution Time: 0.116 ms
(10 rows)

insert into tenk1 (thousand, tenthous) select 42, i from
generate_series(43, 1000) i;

QUERY PLAN

 Index Scan using tenk1_thous_tenthous on tenk1
   (cost=0.29..8.38 rows=1 width=244)
   (actual time=0.038..0.605 rows=1 loops=1)
   Index Cond: (thousand = 42)
   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Rows Removed by Index Recheck: 967
   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Buffers: shared hit=336
 Planning Time: 0.114 ms
 Execution Time: 0.632 ms
(8 rows)

analyze tenk1;

QUERY PLAN

 Bitmap Heap Scan on tenk1  (cost=12.89..16.91 rows=1 width=244)
(actual time=0.016..0.019 rows=1 loops=1)
   Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR
  ((thousand = 42) AND (tenthous = 3)) OR
  ((thousand = 42) AND (tenthous = 42)))
   Heap Blocks: exact=1
   Buffers: shared hit=7
   ->  BitmapOr  ...
 Buffers: shared hit=6
 ->  Bitmap Index Scan on tenk1_thous_tenthous ...
   Index Cond: ((thousand = 42) AND (tenthous = 1))
   Buffers: shared hit=2
 ->  Bitmap Index Scan on tenk1_thous_tenthous ...
   Index Cond: ((thousand = 42) AND (tenthous = 3))
   Buffers: shared hit=2
 ->  Bitmap Index Scan on tenk1_thous_tenthous ...
   Index Cond: ((thousand = 42) AND (tenthous = 42))
   Buffers: shared hit=2
 Planning Time: 0.344 ms
 Execution Time: 0.044 ms
(19 rows)

vacuum analyze tenk1;

QUERY PLAN

 Bitmap Heap Scan on tenk1  (cost=12.89..16.91 rows=1 width=244)
(actual time=0.017..0.019 rows=1 loops=1)
   Recheck Cond: (((thousand = 42) AND (tenthous = 1)) OR
  ((thousand = 42) AND (tenthous = 3)) OR
  ((thousand = 42) AND (tenthous = 42)))
   Heap Blocks: exact=1
   Buffers: shared hit=7
   ->  BitmapOr  ...
 Buffers: shared hit=6
 ->  Bitmap Index Scan on tenk1_thous_tenthous  ...
   Index Cond: ((thousand = 42) AND (tenthous = 1))
   Buffers: shared hit=2
 ->  Bitmap Index Scan on tenk1_thous_tenthous  ...
   Index Cond: ((thousand = 42) AND (tenthous = 3))
   Buffers: shared hit=2
 ->  Bitmap Index Scan on tenk1_thous_tenthous  ...
   Index Cond: ((thousand = 42) AND (tenthous = 42))
   Buffers: shared hit=2
 Planning Time: 0.277 ms
 Execution Time: 0.046 ms
(19 rows)

set enable_bitmapscan = off;

QUERY PLAN

 Index Scan using tenk1_thous_tenthous on tenk1
 (cost=0.29..23.57 rows=1 width=244)
 (actual time=0.042..0.235 rows=1 loops=1)
   Index Cond: (thousand = 42)
   Index Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Rows Removed by Index Recheck: 967
   Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))
   Buffers: shared hit=7
 Planning Time: 0.119 ms
 Execution Time: 0.261 ms

Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread Tim Palmer
On Thu, 3 Aug 2023 at 05:50, David Rowley  wrote:

> I'm currently thinking it's
> a bad idea to backpatch this but I'd consider it more if someone else
> thought it was a good idea or if more people came along complaining
> about poor plan choice in plans containing WindowAggs. Currently, it
> seems better not to destabilise plans in the back branches. (CC'd Tim,
> who reported #17862, as he may have an opinion on this)
>

I agree it's better not to destabilise plans in the back branches.

Tim


Re: Using defines for protocol characters

2023-08-03 Thread Dave Cramer
On Thu, 3 Aug 2023 at 11:59, Alvaro Herrera  wrote:

> On 2023-Aug-03, Dave Cramer wrote:
>
> > Greetings,
> >
> > Attached is a patch which introduces a file protocol.h. Instead of using
> > the actual characters everywhere in the code this patch names the
> > characters and removes the comments beside each usage.
>
> I saw this one last week.  I think it's a very idea (and I fact I had
> thought of doing it when I last messed with libpq code).
>
> I don't really like the name pattern you've chosen though; I think we
> need to have a common prefix in the defines.  Maybe prepending PQMSG_ to
> each name would be enough.  And maybe turn the _RESPONSE and _REQUEST
> suffixes you added into prefixes as well, so instead of PARSE_REQUEST
> you could make it PQMSG_REQ_PARSE, PQMSG_RESP_BIND_COMPLETE and so
> on.
>
That becomes trivial to do now that the names are defined. I presumed
someone would object to the names.
I'm fine with the names you propose, but I suggest we wait to see if anyone
objects.

Dave

>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
>  It does it in a really, really complicated way
>  why does it need to be complicated?
>  Because it's MakeMaker.
>


Re: Using defines for protocol characters

2023-08-03 Thread Alvaro Herrera
On 2023-Aug-03, Dave Cramer wrote:

> Greetings,
> 
> Attached is a patch which introduces a file protocol.h. Instead of using
> the actual characters everywhere in the code this patch names the
> characters and removes the comments beside each usage.

I saw this one last week.  I think it's a very idea (and I fact I had
thought of doing it when I last messed with libpq code).

I don't really like the name pattern you've chosen though; I think we
need to have a common prefix in the defines.  Maybe prepending PQMSG_ to
each name would be enough.  And maybe turn the _RESPONSE and _REQUEST
suffixes you added into prefixes as well, so instead of PARSE_REQUEST
you could make it PQMSG_REQ_PARSE, PQMSG_RESP_BIND_COMPLETE and so
on.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.




Using defines for protocol characters

2023-08-03 Thread Dave Cramer
Greetings,

Attached is a patch which introduces a file protocol.h. Instead of using
the actual characters everywhere in the code this patch names the
characters and removes the comments beside each usage.


Dave Cramer


0001-Created-protocol.h.patch
Description: Binary data


Re: [RFC] Clang plugin for catching suspicious typedef casting

2023-08-03 Thread Tristan Partin
This is the first I am learning about clang plugins. Interesting 
concept. Did you give any thought to using libclang instead of 
a compiler plugin? I am kind of doing similar work, but I went with 
libclang instead of a plugin.


--
Tristan Partin
Neon (https://neon.tech)




[RFC] Clang plugin for catching suspicious typedef casting

2023-08-03 Thread Dmitry Dolgov
Hi,

In the commit [1] one side change was to fix mixed up usage of
BlockNumber and Buffer variables. This was a one-off manual effort, and
indeed it hardly seems possible to get compilation warnings for such
code, as long as the underlying types could be converted in a standard
conforming way. As far as I see such conversions are acceptable and
used every now and then in the project, but they may hide some subtle
issues.

One interesting way I found to address this was to benefit from clang
plugin system [2]. A clang plugin allows you to run some frontend
actions when compiling code with clang, and this approach is used in
some large open source projects (e.g. I've got my inspiration largely
from libreoffice [3]). After a little experimenting I could teach such a
plugin to warn about situations similar to the one described above. What
it does is:

* It visits all the function call expressions
* Verify if the function arguments are defined via typedef
* Compare the actual argument with the function declaration
* Consult with the suppression rules to minimize false positives

In the end the plugin catches the error mentioned above, and we get
something like this:

ginget.c:143:41: warning: Typedef check: Expected 'BlockNumber' (aka 
'unsigned int'),
got 'Buffer' (aka 'int') in PredicateLockPage 
PredicateLockPage(btree->index, stack->buffer, snapshot);

Of course the plugin produces more warning, and I haven't checked all of
them yet -- some probably would have to be ignored as well. But in the
long run I'm pretty confident it's possible to fine tune this logic and
suppression rules to get minimum noise.

As I see it, there are advantages of using plugins in such way:

* Helps automatically detect some issues
* Other type of plugins could be useful to support large undertakings,
  where a lot of code transformation is involved.

And of course disadvantages as well:

* It has to be fine-tuned to be useful
* It's compiler dependent, not clear how to always exercise it

I would like to get your opinion, folks. Does it sound interesting
enough for the community, is it worth it to pursue the idea?

Some final notes about infrastructure bits. Never mind cmake in there --
it was just for a quick start, I'm going to convert it to something
else. There are some changes needed to tell the compiler to actually
load the plugin, those of course could be done much better as well. I
didn't do anything with meson here, because it turns out meson tends to
enclose the plugin file with '-Wl,--start-group ... -Wl,--end-group' and
it breaks the plugin discovery.

[1]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=126552c85c1cfb6ce6445159b8024cfa5631f33e
[2]: https://clang.llvm.org/docs/ClangPlugins.html
[3]: https://git.libreoffice.org/core/+/refs/heads/master/compilerplugins/clang/
>From 01e645f456f9476c60943f12d794465567f2d265 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Thu, 3 Aug 2023 15:51:26 +0200
Subject: [PATCH] Typedef-check clang plugin

Add a clang plugin to warn about suspicious type casting for types
defined via typedef, e.g. BlockNumber and Buffer.
---
 configure |  36 
 configure.ac  |  12 ++
 src/tools/clang/typedef_check/CMakeLists.txt  |  31 +++
 .../clang/typedef_check/TypedefCheck.cpp  | 183 ++
 4 files changed, 262 insertions(+)
 create mode 100644 src/tools/clang/typedef_check/CMakeLists.txt
 create mode 100644 src/tools/clang/typedef_check/TypedefCheck.cpp

diff --git a/configure b/configure
index 2e518c8007d..71da30779de 100755
--- a/configure
+++ b/configure
@@ -767,6 +767,7 @@ enable_coverage
 GENHTML
 LCOV
 GCOV
+enable_typedef_check
 enable_debug
 enable_rpath
 default_port
@@ -835,6 +836,7 @@ enable_rpath
 enable_spinlocks
 enable_atomics
 enable_debug
+enable_typedef_check
 enable_profiling
 enable_coverage
 enable_dtrace
@@ -1528,6 +1530,7 @@ Optional Features:
   --disable-spinlocks do not use spinlocks
   --disable-atomics   do not use atomic operations
   --enable-debug  build with debugging symbols (-g)
+  --enable-typedef-check  build with typedef-check plugin
   --enable-profiling  build with profiling enabled
   --enable-coverage   build with coverage testing instrumentation
   --enable-dtrace build with DTrace support
@@ -3344,6 +3347,34 @@ fi
 
 
 
+#
+# --enable-typedef-check adds clang typedef-check plugin
+#
+
+
+# Check whether --enable-typedef-check was given.
+if test "${enable_typedef_check+set}" = set; then :
+  enableval=$enable_typedef_check;
+  case $enableval in
+yes)
+  :
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --enable-typedef-check option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  enable_typedef_check=no
+
+fi
+
+
+
+
 #
 # --enable-profiling enables gcc profiling
 #
@@ -7828,6 +7859,11 @@ if test "$enable_debug" = yes && test 

Re: Use of additional index columns in rows filtering

2023-08-03 Thread Peter Geoghegan
On Thu, Aug 3, 2023 at 4:20 AM Tomas Vondra
 wrote:
> Which is just the 7 buffers ...
>
> Did I do something wrong?

I think that it might have something to do with your autovacuum
settings. Note that the plan that you've shown for the master branch
isn't the same one that appears in
src/test/regress/expected/create_index.out for the master branch --
that plan (the BitmapOr plan) was my baseline case for master.

That said, I am a little surprised that you could ever get the plan
that you showed for master (without somehow unnaturally forcing it).
It's almost the same plan that your patch gets, but much worse. Your
patch can use an index filter, but master uses a table filter instead.

While the plan used by the patch is risky in the way that I described,
the plan you saw for master is just horrible. I mean, it's not even
risky -- it seems almost certain to lose. Whereas at least the plan
from the patch really is cheaper than the BitmapOr plan (the master
branch plan from create_index.out) on average.

--
Peter Geoghegan




Re: Improve const use in zlib-using code

2023-08-03 Thread Tristan Partin

Both patches look good to me.

--
Tristan Partin
Neon (https://neon.tech)




Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Andy Fan
Hi:


>  If you use explicit cast, then the code should not be hard, in the
> rewrite stage all information should be known.
>

Can you point to me where the code is for the XML stuff?  I thought
this is a bad idea but I may accept it if some existing code does
such a thing already.   "such thing"  is  typeA:typeB is
converted something else but user can't find out an entry in
pg_cast for typeA to typeB.


> It would be cool but still I didn't see a way to do that without making
>> something else complex.
>>
>
>  The custom @-> operator you can implement in your own custom extension.
> Builtin solutions should be generic as it is possible.
>

I agree, but actually I think there is no clean way to do it, at least I
dislike the conversion of typeA to typeB in a cast syntax but there
is no entry in pg_cast for it.  Are you saying something like this
or I misunderstood you?

>

-- 
Best Regards
Andy Fan


Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread Andy Fan
On Thu, Aug 3, 2023 at 7:29 PM David Rowley  wrote:

> Thanks for having a look at this.
>
> On Thu, 3 Aug 2023 at 18:49, Andy Fan  wrote:
> > 1. ORDER BY or PARTITION BY
> >
> > select *, count(two) over (order by unique1) from tenk1 limit 1;
> > DEBUG:  startup_tuples = 1
> > DEBUG:  startup_tuples = 1
> >
> > select *, count(two) over (partition by unique1) from tenk1 limit 1;
> > DEBUG:  startup_tuples = 1
> > DEBUG:  startup_tuples = 1
> >
> > Due to the Executor of nodeWindowAgg, we have to fetch the next tuple
> > until it mismatches with the current one, then we can calculate the
> > WindowAgg function. In the current patch, we didn't count the
> > mismatched tuple. I verified my thought with 'break at IndexNext'
> > function and see IndexNext is called twice, so in the above case the
> > startup_tuples should be 2?
>
> You're probably right here. I'd considered that it wasn't that
> critical and aimed to attempt to keep the estimate close to the number
> of rows that'll be aggregated. I think that's probably not the best
> thing to do as if you consider the EXCLUDE options, those just exclude
> tuples from aggregation, it does not mean we read fewer tuples from
> the subnode. I've updated the patch accordingly.
>

Thanks.


>
> > 2. ORDER BY and PARTITION BY
> >
> > select two, hundred,
> > count(two) over (partition by ten order by hundred)
> > from tenk1 limit 1;
> >
> > DEBUG:  startup_tuples = 10
> >  two | hundred | count
> > -+-+---
> >0 |   0 |   100
> >
> > If we consider the mismatched tuples, it should be 101?
>
> I don't really see how we could do better with the current level of
> statistics.  The stats don't know that there are only 10 distinct
> "hundred" values for rows which have ten=1. All we have is n_distinct
> on tenk1.hundred, which is 100.


Yes,  actually I didn't figure it out before / after my posting.

>


> > 3. As we can see the log for startup_tuples is logged twice sometimes,
> > the reason is because it is used in cost_windowagg, so it is calculated
> > for every create_one_window_path. I think the startup_tuples should be
> > independent with the physical path, maybe we can cache it somewhere to
> > save some planning cycles?
>
> I wondered about that too but I ended up writing off the idea of
> caching because the input_tuple count comes from the Path and the
> extra calls are coming from other Paths, which could well have some
> completely different value for input_tuples.
>
>
Looks reasonable.

I have checked the updated patch and LGTM.

-- 
Best Regards
Andy Fan


Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Pavel Stehule
Hi

čt 3. 8. 2023 v 15:23 odesílatel Andy Fan  napsal:

> Hi:
>
>
>> More, I believe so lot of people uses more common syntax, and then this
>> syntax should to have good performance - for jsonb - (val->'op')::numeric
>> works, and then there should not be performance penalty, because this
>> syntax will be used in 99%.
>>
>
> This looks like a valid opinion IMO,  but to rescue it, we have to do
> something like "internal structure" and remove the existing cast.
> But even we pay the effort, it still breaks some common knowledge,
> since xx:numeric is not a cast.  It is an "internal structure"!
>

I didn't study jsonb function, but there is an xml function that extracts
value and next casts it to some target type. It does what is expected - for
known types use hard coded casts, for other ask system catalog for cast
function or does IO cast. This code is used for the XMLTABLE function. The
JSON_TABLE function is not implemented yet, but there should be similar
code. If you use explicit cast, then the code should not be hard, in the
rewrite stage all information should be known.



>
> I don't think "Black magic" is a proper word here, since it is not much
>>> different from ->> return a text.  If you argue text can be cast to
>>> most-of-types,  that would be a reason, but I doubt this difference
>>> should generate a "black magic".
>>>
>>
>> I used the term black magic, because nobody without reading documentation
>> can find this operator.
>>
>
> I think this is what document is used for..
>
>
>> It is used just for this special case, and the functionality is the same
>> as using cast (only with different performance).
>>
>
> This is not good, but I didn't see a better choice so far,  see my first
> graph.
>
>
>>
>> The operator ->> is more widely used. But if we have some possibility to
>> work without it, then the usage for a lot of users will be more simple.
>> More if the target types can be based on context
>>
>
> It would be cool but still I didn't see a way to do that without making
> something else complex.
>

sure - it is significantly more work, but it should be usable for all types
and just use common syntax. The custom @-> operator you can implement in
your own custom extension. Builtin solutions should be generic as it is
possible.

The things should be as simple as possible - mainly for users, that missing
knowledge, and any other possibility of how to do some task just increases
their confusion. Can be nice if users find one solution on stack overflow
and this solution should be great for performance too. It is worse if users
find more solutions, but it is not too bad, if these solutions have similar
performance. It is too bad if any solution has great performance and others
not too much. Users has not internal knowledge, and then don't understand
why sometimes should to use special operator and not common syntax.


>
>
 Maybe we can introduce some *internal operator* "extract to type", and
 in rewrite stage we can the pattern (x->'field')::type transform to OP(x,
 'field', typid)

>>>
>>> Not sure what the OP should be?  If it is a function, what is the
>>> return value?  It looks to me like it is hard to do in c language?
>>>
>>
>> It should be internal structure - it can be similar like COALESCE or IS
>> operator
>>
>
> It may work, but see my answer in the first graph.
>

>
>>
>>
>>>
>>> After all,  if we really care about the number of operators, I'm OK
>>> with just let users use the function directly, like
>>>
>>> jsonb_field_as_numeric(jsonb, 'filedname')
>>> jsonb_field_as_timestamp(jsonb, 'filedname');
>>> jsonb_field_as_timestamptz(jsonb, 'filedname');
>>> jsonb_field_as_date(jsonb, 'filedname');
>>>
>>> it can save an operator and sloves the readable issue.
>>>
>>
>> I don't like it too much, but it is better than introduction new operator
>>
>
> Good to know it.  Naming operators is a complex task  if we add four.
>
>
>> We already have the jsonb_extract_path and jsonb_extract_path_text
>> function.
>>
>
> I can't follow this.  jsonb_extract_path returns a jsonb, which is  far
> away from
> our goal: return a numeric effectively?
>

I proposed `jsonb_extract_path_type` that is of anyelement type.

Regards

Pavel



> I can imagine to usage "anyelement" type too. some like
>> `jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`
>>
>
> Can you elaborate this please?
>
> --
> Best Regards
> Andy Fan
>


Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Andy Fan
Hi:

On Thu, Aug 3, 2023 at 8:34 PM Chapman Flack  wrote:

> On 2023-08-03 03:53, Andy Fan wrote:
> > I didn't realize timetime types are binary compatible with SQL,
> > so maybe we can have some similar optimization as well.
> > (It is a pity that timestamp(tz) are not binary, or else we may
> > just need one operator).
>
> Not to veer from the thread, but something about that paragraph
> has been hard for me to parse/follow.
>

I don't think this is a key conflict so far. but I'd explain this in more
detail. If timestamp -> timestamptz or timestamptz -> timestamp is
binary compatible,  we can only have 1 operator to return a timestamp.
then when we cast it to timestamptz, it will be a no-op during runtime.
however cast between timestamp and timestamptz is not binary
compatible. whose castmethod is 'f';



>
> >> Maybe we can introduce some *internal operator* "extract to type", and
> >> in
> >> rewrite stage we can the pattern (x->'field')::type transform to OP(x,
> >> 'field', typid)
> >
> > Not sure what the OP should be?  If it is a function, what is the
> > return value?  It looks to me like it is hard to do in c language?
>
> Now I am wondering about the 'planner support function' available
> in CREATE FUNCTION since PG 12. I've never played with that yet.
> Would that make it possible to have some, rather generic, extract
> from JSON operator that can look at the surrounding expression
> and replace itself sometimes with something  efficient?
>

I didn't realize this before,  'planner support function' looks
amazing and SupportRequestSimplify looks promising, I will check it
more.

-- 
Best Regards
Andy Fan


Re: Show WAL write and fsync stats in pg_stat_io

2023-08-03 Thread Nazir Bilal Yavuz
Hi,

Thanks for the review!

Current status of the patch is:
- 'WAL read' stats in xlogrecovery.c are added to pg_stat_io.
- IOCONTEXT_INIT is added to count 'WAL init'. 'WAL init' stats are
added to pg_stat_io.
- pg_stat_io shows different op_bytes for the IOOBJECT_WAL operations.
- Working on which 'BackendType / IOContext / IOOp' should be banned
in pg_stat_io.
- Working on adding 'WAL read' to the xlogreader.c and walsender.c.
- PendingWalStats.wal_sync and
PendingWalStats.wal_write_time/PendingWalStats.wal_sync_time are moved
to pgstat_count_io_op_n()/pgstat_count_io_op_time() respectively.

TODOs:
- Documentation.
- Thinking about how to interpret the different IOOps within the two
IOContexts and discussing what would be useful to count.
- Decide which 'BackendType / IOContext / IOOp' should not be tracked.
- Adding 'WAL read' to the xlogreader.c and walsender.c. (This could
be an another patch)
- Adding WAIT_EVENT_WAL_COPY_* operations to pg_stat_io if needed.
(This could be an another patch)

On Sat, 22 Jul 2023 at 01:30, Melanie Plageman
 wrote:
> I think it would be good to count WAL reads even though they are not
> currently represented in pg_stat_wal. Here is a thread discussing this
> [1].

I used the same implementation in the thread link [1]. I added 'WAL
read' to only xlogrecovery.c for now. I didn't add 'WAL read' to
xlogreader.c and walsender.c because they cause some failures on:
'!pgStatLocal.shmem->is_shutdown' asserts. I will spend more time on
these. Also, I added Bharath to CC. I have a question about 'WAL
read':
1. There are two places where 'WAL read' happens.
a. In WALRead() in xlogreader.c, it reads 'count' bytes, most of the
time count is equal to XLOG_BLCKSZ but there are some cases it is not.
For example
- in XLogSendPhysical() in walsender.c WALRead() is called by nbytes
- in WALDumpReadPage() in pg_waldump.c WALRead() is called by count
These nbytes and count variables could be different from XLOG_BLCKSZ.

b. in XLogPageRead() in xlogreader.c, it reads exactly XLOG_BLCKSZ bytes:
pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);

So, what should op_bytes be set to for 'WAL read' operations?

> Eventually, the docs will need an update as well. You can wait until a
> later version of the patch to do this, but I would include it in a list
> of the remaining TODOs in your next version.

Done. I shared TODOs at the top.

> I think we will also want to add an IOContext for WAL initialization.
> Then we can track how long is spent doing 'WAL init' (including filling
> the WAL file with zeroes). XLogFileInitInternal() is likely where we
> would want to add it. And op_bytes for this would likely be
> wal_segment_size. I thought I heard about someone proposing adding WAL
> init to pg_stat_wal, but I can't find the thread.

Done. I created a new IOCONTEXT_INIT IOContext for the 'WAL init'. I
have a question there:
1. Some of the WAL processes happens at initdb (standalone backend
IOCONTEXT_NORMAL/(IOOP_READ & IOOP_WRITE) and
IOCONTEXT_INIT/(IOOP_WRITE & IOOP_FSYNC)). Since this happens at the
initdb, AFAIK there is no way to set 'track_wal_io_timing' and
'track_io_timing' variables there. So, their timings appear as 0.
Should I use IsBootstrapProcessingMode() to enable WAL io timings at
the initdb or are they not that much important?

> I think there is also an argument for counting WAL files recycled as
> IOOP_REUSES. We should start thinking about how to interpret the
> different IOOps within the two IOContexts and discussing what would be
> useful to count. For example, should removing a logfile count as an
> IOOP_EVICT? Maybe it is not directly related to "IO" enough or even an
> interesting statistic, but we should think about what kinds of
> IO-related WAL statistics we want to track.

I added that to TODOs.

> Any that we decide not to count for now should be "banned" in
> pgstat_tracks_io_op() for clarity. For example, if we create a separate
> IOContext for WAL file init, I'm not sure what would count as an
> IOOP_EXTEND in IOCONTEXT_NORMAL for IOOBJECT_WAL.
>
> Also, I think there are some backend types which will not generate WAL
> and we should determine which those are and skip those rows in
> pgstat_tracks_io_object().

I agree, I am working on this. I have a couple of questions:
1. Can client backend and background worker do IOCONTEXT_NORMAL/IOOP_READ?
2. Is there an easy way to check if 'BackendType / IOOBJECT_WAL' does
specific IOOp operations?

> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/transam/xlog.c
> index 8b0710abe6..2ee6c21398 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -2207,6 +2207,10 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID
> tli, bool flexible)
>
> I think we should likely follow the pattern of using
> pgstat_prepare_io_time() and pgstat_count_io_op_time() as it is done
> elsewhere. You could pass the IOObject as a parameter to
> pgstat_prepare_io_time() in order 

Re: [PoC] Reducing planning time when tables have many partitions

2023-08-03 Thread Ashutosh Bapat
On Wed, Aug 2, 2023 at 12:11 PM Yuya Watari  wrote:

> Hello,
>
> I really appreciate sharing very useful scripts and benchmarking results.
>
> On Fri, Jul 28, 2023 at 6:51 PM Ashutosh Bapat
>  wrote:
> > Given that most of the developers run assert enabled builds it would
> > be good to bring down the degradation there while keeping the
> > excellent speedup in non-assert builds.
>
> From my observation, this degradation in assert enabled build is
> caused by verifying the iteration results of EquivalenceMembers. My
> patch uses Bitmapset-based indexes to speed up the iteration. When
> assertions are enabled, we verify that the result of the iteration is
> the same with and without the indexes. This verification results in
> executing a similar loop three times, which causes the degradation. I
> measured planning time by using your script without this verification.
> The results are as follows:
>
> Master: 144.55 ms
> Patched (v19): 529.85 ms
> Patched (v19) without verification: 78.84 ms
> (*) All runs are with assertions.
>
> As seen from the above, verifying iteration results was the cause of
> the performance degradation. I agree that we should avoid such
> degradation because it negatively affects the development of
> PostgreSQL. Removing the verification when committing this patch is
> one possible option.
>

If you think that the verification is important to catch bugs, you may want
to encapsulate it with an #ifdef .. #endif such that the block within is
not compiled by default. See OPTIMIZER_DEBUG for example.


>
> > Queries on partitioned tables eat a lot of memory anyways, increasing
> > that further should be avoided.
> >
> > I have not studied the patches. But I think the memory increase has to
> > do with our Bitmapset structure. It's space inefficient when there are
> > thousands of partitions involved. See my comment at [2]
>
> Thank you for pointing this out. I have never considered the memory
> usage impact of this patch. As you say, the Bitmapset structure caused
> this increase. I will try to look into this further.
>
>
Do you think that the memory measurement patch I have shared in those
threads is useful in itself? If so, I will start another proposal to
address it.

-- 
Best Wishes,
Ashutosh Bapat


Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Andy Fan
Hi:


> More, I believe so lot of people uses more common syntax, and then this
> syntax should to have good performance - for jsonb - (val->'op')::numeric
> works, and then there should not be performance penalty, because this
> syntax will be used in 99%.
>

This looks like a valid opinion IMO,  but to rescue it, we have to do
something like "internal structure" and remove the existing cast.
But even we pay the effort, it still breaks some common knowledge,
since xx:numeric is not a cast.  It is an "internal structure"!


I don't think "Black magic" is a proper word here, since it is not much
>> different from ->> return a text.  If you argue text can be cast to
>> most-of-types,  that would be a reason, but I doubt this difference
>> should generate a "black magic".
>>
>
> I used the term black magic, because nobody without reading documentation
> can find this operator.
>

I think this is what document is used for..


> It is used just for this special case, and the functionality is the same
> as using cast (only with different performance).
>

This is not good, but I didn't see a better choice so far,  see my first
graph.


>
> The operator ->> is more widely used. But if we have some possibility to
> work without it, then the usage for a lot of users will be more simple.
> More if the target types can be based on context
>

It would be cool but still I didn't see a way to do that without making
something else complex.


>>> Maybe we can introduce some *internal operator* "extract to type", and
>>> in rewrite stage we can the pattern (x->'field')::type transform to OP(x,
>>> 'field', typid)
>>>
>>
>> Not sure what the OP should be?  If it is a function, what is the
>> return value?  It looks to me like it is hard to do in c language?
>>
>
> It should be internal structure - it can be similar like COALESCE or IS
> operator
>

It may work, but see my answer in the first graph.


>
>
>>
>> After all,  if we really care about the number of operators, I'm OK
>> with just let users use the function directly, like
>>
>> jsonb_field_as_numeric(jsonb, 'filedname')
>> jsonb_field_as_timestamp(jsonb, 'filedname');
>> jsonb_field_as_timestamptz(jsonb, 'filedname');
>> jsonb_field_as_date(jsonb, 'filedname');
>>
>> it can save an operator and sloves the readable issue.
>>
>
> I don't like it too much, but it is better than introduction new operator
>

Good to know it.  Naming operators is a complex task  if we add four.


> We already have the jsonb_extract_path and jsonb_extract_path_text
> function.
>

I can't follow this.  jsonb_extract_path returns a jsonb, which is  far
away from
our goal: return a numeric effectively?

I can imagine to usage "anyelement" type too. some like
> `jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`
>

Can you elaborate this please?

-- 
Best Regards
Andy Fan


Re: Adding a pg_servername() function

2023-08-03 Thread Christoph Moench-Tegeder
## Laetitia Avrot (laetitia.av...@gmail.com):

> For my customer,  their use case is to be able from an SQL client to double
> check they're on the right host before doing things that could become a
> production disaster.

Why not use cluster_name for that? Even if it may not be relevant
for their setup, there might be multiple clusters on the same host
(multiple clusters with the same hostname) or you database could be
hiding behind some failover/loadbalancer mechanism (multiple hostnames
in one cluster), thus using the hostname to identify the cluster is
totally not universal (the same goes for the monitoring/metrics
stuff). And there's of course inet_server_addr(), if you really
need to double-check if you're connected to the right machine.

Regards,
Christoph

-- 
Spare Space.




Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Chapman Flack

On 2023-08-03 03:53, Andy Fan wrote:

I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).


Not to veer from the thread, but something about that paragraph
has been hard for me to parse/follow.

Maybe we can introduce some *internal operator* "extract to type", and 
in

rewrite stage we can the pattern (x->'field')::type transform to OP(x,
'field', typid)


Not sure what the OP should be?  If it is a function, what is the
return value?  It looks to me like it is hard to do in c language?


Now I am wondering about the 'planner support function' available
in CREATE FUNCTION since PG 12. I've never played with that yet.
Would that make it possible to have some, rather generic, extract
from JSON operator that can look at the surrounding expression
and replace itself sometimes with something more efficient?

Regards,
-Chap




Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Andy Fan
Hi:


>
> Yes, it's not great, but that's just how this works. We can't
> pre-specialize all possible operations that one might want to do in
> PostgreSQL - that'd be absurdly expensive for binary and initial
> database sizes.
>

Are any people saying we would  pre-specialize all possible operators?
I would say anything if adding operators will be expensive for binary and
initial database sizes.  If so,  how many per operator and how many
operators would be in your expectation?


> > Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
> > which may detoast a value so we need to free it with PG_FREE_IF_COPY.
> > then this looks like another potential wastage.
>
> Is it? Detoasting only happens if the argument was toasted, and I have
> serious doubts that the result of (a->'a') will be toasted in our
> current system. Sure, we do need to allocate an intermediate result,
> but that's in a temporary memory context that should be trivially
> cheap to free.


If you take care about my context, I put this as a second factor for the
current strategy.  and it is the side effects of factor 1.  FWIW,  that cost
is paid for every jsonb object, not something during the initial database.


> > As a comparison, cast to other data types like
> > int2/int4 may be not needed since they are not binary compatible.
>
> Yet there are casts from jsonb to and back from int2, int4 and int8. I
> don't see a very good reason to add this, for the same reasons
> mentioned by Pavel.
>

Who is insisting on adding such an operator in your opinion?


> *If* we were to add this operator, I would want this patch to also
> include a #-variant for text[]-based deep access (c.q. #> / #>>), and
> equivalent operators for the json type to keep the current access
> operator parity.
>
> > Here is the performance comparison (with -O3, my previous post is -O0).
> >
> > select 1 from tb where (a->'a')::numeric = 2;  31ms.
> > select 1 from tb where (a@->'a') = 2;  15ms
>
> What's tb here?
>

This is my first post.  Copy it here again.

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 10)i;

-- 
Best Regards
Andy Fan


Re: Adding a pg_servername() function

2023-08-03 Thread Laetitia Avrot
Le jeu. 3 août 2023 à 14:20, <066ce...@free.fr> a écrit :

>
>
> > Agreed, depending on how hosts and dns are set, it can be useless.
> > But normally, companies have host making standards to avoid that.
>
>
> BTW you already have it :
>
> select setting from pg_settings where name = 'listen_addresses'
>
> No ?
>
>
Hello,

No, this is not the same thing. Most of the time, we will find either the
wildcard '*' or an Ip address or a list of IP addresses, which is not what
we want in this particular use case. But I agree that there are some use
cases where we will find the hostname there, but from my experience, that's
not the majority of the use cases.

For example, you could use a VIP in listen_addresses, so that you ensure
that distant connection will go through that VIP and you'll always end up
on the right server even though it might be another physical instance
(after switch or failover).

Lætitia


Re: Use of additional index columns in rows filtering

2023-08-03 Thread Tomas Vondra



On 8/3/23 03:32, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 6:48 AM Tomas Vondra
>  wrote:
>> How come we don't know that until the execution time? Surely when
>> building the paths/plans, we match the clauses to the index keys, no? Or
>> are you saying that just having a scan key is not enough for it to be
>> "access predicate"?
> 
> In principle we can and probably should recognize the difference
> between "Access Predicates" and "Index Filter Predicates" much earlier
> on, during planning. Probably when index paths are first built.
> 
> It seems quite likely that there will turn out to be at least 2 or 3
> reasons to do this. The EXPLAIN usability issue might be enough of a
> reason on its own, though.
> 

OK

>> Anyway, this patch is mostly about "Index Cond" mixing two types of
>> predicates. But the patch is really about "Filter" predicates - moving
>> some of them from table to index. So quite similar to the "index filter
>> predicates" except that those are handled by the index AM.
> 
> I understand that that's how the patch is structured. It is
> nevertheless possible (as things stand) that the patch will make the
> planner shift from a plan that uses "Access Predicates" to the maximum
> extent possible when scanning a composite index, to a similar plan
> that has a similar index scan, for the same index, but with fewer
> "Access Predicates" in total. In effect, the patched planner will swap
> one type of predicate for another type because doing so enables the
> executor to scan an index once instead of scanning it several times.
> 

That seems very much like something the costing is meant to handle, no?
I mean, surely "access predicate" and "filter" should affect the cost
differently, with "filter" being more expensive (and table filter being
more expensive than index filter).

I was not sure why would the patch affect the plan choice, as it does
not really affect the index predicates (passed to the AM) at all. But I
think I get the point now - the thing is in having two different index
paths, like:

  PATH #1: access predicates (A,B)
  PATH #2: access predicate A, index filter B

AFAICS the assumption is that path #1 should be better, as it has two
proper access predicates. But maybe if we add another condition C, it
might end up like this:

  PATH #1: access predicates (A,B), table filter C
  PATH #2: access predicate A, index filter (B,C)

and #2 will end up winning.

I still think this seems more like a costing issue (and I'd guess we may
already have similar cases for index-only scans).

IMO we can either consider the different predicate types during costing.
Sure, then we have the usual costing risks, but that's expected.

Or we could just ignore this during costing entirely (and ditch the
costing from the patch). Then the cost doesn't change, and we don't have
any new risks.

> I don't dispute the fact that this can only happen when the planner
> believes (with good reason) that the expected cost will be lower. But
> I maintain that there is a novel risk to be concerned about, which is
> meaningfully distinct from the general risk of regressions that comes
> from making just about any change to the planner. The important
> principle here is that we should have a strong bias in the direction
> of making quals into true "Access Predicates" whenever practical.
> 
> Yeah, technically the patch doesn't directly disrupt how existing
> index paths get generated. But it does have the potential to disrupt
> it indirectly, by providing an alternative very-similar index path
> that's likely to outcompete the existing one in these cases. I think
> that we should have just one index path that does everything well
> instead.
> 

Yeah, I think that's the scenario I described above ...

>> But differentiating between access and filter predicates (at the index
>> AM level) seems rather independent of what this patch aims to do.
> 
> My concern is directly related to the question of "access predicates
> versus filter predicates", and the planner's current ignorance on
> which is which. The difference may not matter too much right now, but
> ISTM that your patch makes it matter a lot more. And so in that
> indirect sense it does seem relevant.
> 

I'm not sure my patch makes it matter a lot more. Yes, moving a filter
from the table to the index may lower the scan cost, but that can happen
for a lot of other reasons ...

> The planner has always had a strong bias in the direction of making
> clauses that can be index quals into index quals, rather than filter
> predicates. It makes sense to do that even when they aren't very
> selective. This is a similar though distinct principle.
> 
> It's awkward to discuss the issue, since we don't really have official
> names for these things just yet (I'm just going with what Markus calls
> them for now). And because there is more than one type of "index
> filter predicate" in play with the patch (namely those in the index
> AM, and those in the index scan 

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-03 Thread Melih Mutlu
Hi,

Amit Kapila , 3 Ağu 2023 Per, 09:22 tarihinde şunu
yazdı:

> On Thu, Aug 3, 2023 at 9:35 AM Peter Smith  wrote:
> > I checked the latest patch v25-0001.
> >
> > LGTM.
> >
>
> Thanks, I have pushed 0001. Let's focus on the remaining patches.
>

Thanks!

Peter Smith , 3 Ağu 2023 Per, 12:06 tarihinde şunu
yazdı:

> Just to clarify my previous post, I meant we will need new v26* patches
>

Right. I attached the v26 as you asked.

Thanks,
-- 
Melih Mutlu
Microsoft


v26-0001-Reuse-Tablesync-Workers.patch
Description: Binary data


v26-0002-Reuse-connection-when-tablesync-workers-change-t.patch
Description: Binary data


Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread David Rowley
Thanks for having a look at this.

On Thu, 3 Aug 2023 at 18:49, Andy Fan  wrote:
> 1. ORDER BY or PARTITION BY
>
> select *, count(two) over (order by unique1) from tenk1 limit 1;
> DEBUG:  startup_tuples = 1
> DEBUG:  startup_tuples = 1
>
> select *, count(two) over (partition by unique1) from tenk1 limit 1;
> DEBUG:  startup_tuples = 1
> DEBUG:  startup_tuples = 1
>
> Due to the Executor of nodeWindowAgg, we have to fetch the next tuple
> until it mismatches with the current one, then we can calculate the
> WindowAgg function. In the current patch, we didn't count the
> mismatched tuple. I verified my thought with 'break at IndexNext'
> function and see IndexNext is called twice, so in the above case the
> startup_tuples should be 2?

You're probably right here. I'd considered that it wasn't that
critical and aimed to attempt to keep the estimate close to the number
of rows that'll be aggregated. I think that's probably not the best
thing to do as if you consider the EXCLUDE options, those just exclude
tuples from aggregation, it does not mean we read fewer tuples from
the subnode. I've updated the patch accordingly.

> 2. ORDER BY and PARTITION BY
>
> select two, hundred,
> count(two) over (partition by ten order by hundred)
> from tenk1 limit 1;
>
> DEBUG:  startup_tuples = 10
>  two | hundred | count
> -+-+---
>0 |   0 |   100
>
> If we consider the mismatched tuples, it should be 101?

I don't really see how we could do better with the current level of
statistics.  The stats don't know that there are only 10 distinct
"hundred" values for rows which have ten=1. All we have is n_distinct
on tenk1.hundred, which is 100.

> 3. As we can see the log for startup_tuples is logged twice sometimes,
> the reason is because it is used in cost_windowagg, so it is calculated
> for every create_one_window_path. I think the startup_tuples should be
> independent with the physical path, maybe we can cache it somewhere to
> save some planning cycles?

I wondered about that too but I ended up writing off the idea of
caching because the input_tuple count comes from the Path and the
extra calls are coming from other Paths, which could well have some
completely different value for input_tuples.

David


fix_bug_17862_v4.patch
Description: Binary data


Re: Use of additional index columns in rows filtering

2023-08-03 Thread Tomas Vondra
On 8/3/23 07:26, Peter Geoghegan wrote:
> On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan  wrote:
>> I don't dispute the fact that this can only happen when the planner
>> believes (with good reason) that the expected cost will be lower. But
>> I maintain that there is a novel risk to be concerned about, which is
>> meaningfully distinct from the general risk of regressions that comes
>> from making just about any change to the planner. The important
>> principle here is that we should have a strong bias in the direction
>> of making quals into true "Access Predicates" whenever practical.
>>

OK, preference for access predicates sounds like a reasonable principle.

>> Yeah, technically the patch doesn't directly disrupt how existing
>> index paths get generated. But it does have the potential to disrupt
>> it indirectly, by providing an alternative very-similar index path
>> that's likely to outcompete the existing one in these cases. I think
>> that we should have just one index path that does everything well
>> instead.
> 
> You can see this for yourself, quite easily. Start by running the
> relevant query from the regression tests, which is:
> 
> SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous
> = 3 OR tenthous = 42);
> 
> EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query
> slightly faster, as expected. I see 7 buffer hits for the bitmap index
> scan plan on master, versus only 4 buffer hits for the patch's index
> scan. Obviously, this is because we go from multiple index scans
> (multiple bitmap index scans) to only one.
> 
> But, if I run this insert statement and try the same thing again,
> things look very different:
> 
> insert into tenk1 (thousand, tenthous) select 42, i from
> generate_series(43, 1000) i;
> 
> (Bear in mind that we've inserted rows that don't actually need to be
> selected by the query in question.)
> 
> Now the master branch's plan works in just the same way as before --
> it has exactly the same overhead (7 buffer hits). Whereas the patch
> still gets the same risky plan -- which now blows up. The plan now
> loses by far more than it could ever really hope to win by: 336 buffer
> hits. (It could be a lot higher than this, even, but you get the
> picture.)

Are you sure? Because if I try this on master (62e9af4c without any
patches), I get this:

regression=# explain (verbose, analyze, buffers) SELECT * FROM tenk1
WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42);

  QUERY PLAN

 Index Scan using tenk1_thous_tenthous on public.tenk1
(cost=0.29..1416.32 rows=1 width=244) (actual time=0.078..1.361 rows=1
loops=1)
   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
   Index Cond: (tenk1.thousand = 42)
   Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Rows Removed by Filter: 967
   Buffers: shared hit=335
 Planning Time: 0.225 ms
 Execution Time: 1.430 ms
(8 rows)

So not sure about the claim that master works fine as before. OTOH, the
patched branch (with "my" patch 2023/07/16, just to be clear) does this:

  QUERY PLAN

 Index Scan using tenk1_thous_tenthous on public.tenk1
(cost=0.29..23.57 rows=1 width=244) (actual time=0.077..0.669 rows=1
loops=1)
   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
   Index Cond: (tenk1.thousand = 42)
   Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Rows Removed by Index Recheck: 967
   Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42))
   Buffers: shared hit=7
 Planning Time: 0.211 ms
 Execution Time: 0.722 ms
(9 rows)

Which is just the 7 buffers ...

Did I do something wrong?

> 
> Sure, it's difficult to imagine a very general model that captures
> this sort of risk, in the general case. But you don't need a degree in
> actuarial science to understand that it's inherently a bad idea to
> juggle chainsaws -- no matter what your general risk tolerance happens
> to be. Some things you just don't do.
> 

Oh come on. I've been juggling chainsaws (lit on fire!) since I was a
little boy! There's nothing risky about it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-03 Thread Ashutosh Bapat
On Thu, Aug 3, 2023 at 4:14 PM Richard Guo  wrote:

>
> On Thu, Aug 3, 2023 at 12:38 PM Ashutosh Bapat <
> ashutosh.bapat@gmail.com> wrote:
>


> Sometimes it would help to avoid the translation at all if we postpone
> the reparameterization until createplan.c, such as in the case that a
> non-parameterized path wins at last.  For instance, for the query below
>
> regression=# explain (costs off)
> select * from prt1 t1 join prt1 t2 on t1.a = t2.a;
>  QUERY PLAN
> 
>  Append
>->  Hash Join
>  Hash Cond: (t1_1.a = t2_1.a)
>  ->  Seq Scan on prt1_p1 t1_1
>  ->  Hash
>->  Seq Scan on prt1_p1 t2_1
>->  Hash Join
>  Hash Cond: (t1_2.a = t2_2.a)
>  ->  Seq Scan on prt1_p2 t1_2
>  ->  Hash
>->  Seq Scan on prt1_p2 t2_2
>->  Hash Join
>  Hash Cond: (t1_3.a = t2_3.a)
>  ->  Seq Scan on prt1_p3 t1_3
>  ->  Hash
>->  Seq Scan on prt1_p3 t2_3
> (16 rows)
>
> Our current code would reparameterize each inner paths although at last
> we choose the non-parameterized paths.  If we do the reparameterization
> in createplan.c, we would be able to avoid all the translations.
>
>
I agree. The costs do not change because of reparameterization. The process
of creating paths is to estimate costs of different plans. So I think it
makes sense to delay anything which doesn't contribute to costing till the
final plan is decided.

However, if reparameterization can *not* happen at the planning time, we
have chosen a plan which can not be realised meeting a dead end. So as long
as we can ensure that the reparameterization is possible we can delay
actual translations. I think it should be possible to decide whether
reparameterization is possible based on the type of path alone. So seems
doable.

-- 
Best Wishes,
Ashutosh Bapat


Re: Synchronizing slots from primary to standby

2023-08-03 Thread shveta malik
On Thu, Aug 3, 2023 at 12:28 AM Bharath Rupireddy
 wrote:
>
> On Tue, Aug 1, 2023 at 5:01 PM shveta malik  wrote:
> >
> > > The work division amongst the sync workers can
> > > be simple, the logical replication launcher builds a shared memory
> > > structure based on number of slots to sync and starts the sync workers
> > > dynamically, and each sync worker picks {dboid, slot name, conninfo}
> > > from the shared memory, syncs it and proceeds with other slots.
> >
> > Do you mean  the logical replication launcher builds a shared memory
> > structure based
> > on the number of 'dbs' to sync as I understood from your initial comment?
>
> Yes. I haven't looked at the 0003 patch posted upthread. However, the
> standby must do the following at a minimum:
>
> - Make GUCs synchronize_slot_names and max_slot_sync_workers of
> PGC_POSTMASTER type needing postmaster restart when changed as they
> affect the number of slot sync workers.

I agree that max_slot_sync_workers should be allowed to change only
during startup but I strongly feel that synchronize_slot_names should
be runtime modifiable. We should give that flexibility to the user.

> - LR (logical replication) launcher connects to primary to fetch the
> logical slots specified in synchronize_slot_names. This is a one-time
> task.

if synchronize_slot_names='*', we need to fetch slots info at regular
intervals even if it is not runtime modifiable. For a runtime
modifiable case, it is obvious to reftech it regular intervals.

> - LR launcher prepares a dynamic shared memory (created via
> dsm_create) with some state like locks for IPC and an array of
> {slot_name, dboid_associated_with_slot, is_sync_in_progress} - maximum
> number of elements in the array is the number of slots specified in
> synchronize_slot_names. This is a one-time task.


yes, we need dynamic-shared-memory but it is not a
one-time-allocation. If it were a one-time allocation, then there was
no need for DSM, only shared memory allocation was enough. It is not a
one time allocation in any of the designs. If it is slot based design,
slots may keep on varying for '*' case and if it is DB based design,
then number of DBs may go beyond the initial memory allocated and we
may need reallocation and relaunch of worker and thus the need of DSM.

> - LR launcher decides the *best* number of slot sync workers - (based
> on some perf numbers) it can just launch, say, one worker per 2 or 4
> or 8 etc. slots.
> - Each slot sync worker then picks up a slot from the DSM, connects to
> primary using primary conn info, syncs it, and moves to another slot.
>

The design based on slots i.e. launcher dividing the slots among the
available workers, could prove beneficial over db based division for a
case where number of slots per DB varies largely and we end up
assigning all DBs with lesser slots to one worker
while all heavily loaded DBs to another. But other than this, I see
lot of pain points:

1) Since we are going to do slots based synching, query construction
will be complex. We will have a query with a long 'where' clause:
where slots in (slot1, slot2, slots...).

2) Number of pings to primary will be more as we are pinging it slot
based instead of DB based. So the information which we could have
fetched collectively in one query (if it was db based) is now splitted
to multiple queries assuming that there could be cases where slots
belonging to the same DBs end up getting splitted among different
workers.

3) if number of slots < max number of workers, how are we going to
assign the worker? One slot per worker or all in one worker. If it is
one slot per worker, it will again be not that efficient as it will
result in more network traffic. This needs more thoughts and case to
case varying design.


> Not having the capability of on-demand stop/launch of slot sync
> workers makes the above design simple IMO.
>

We need to anyways relaunch workers when DSM is reallocated in case
Dbs (or sya slots) exceed some initial allocation limit.

thanks
Shveta




Re: Adding a pg_servername() function

2023-08-03 Thread 066ce286



> Agreed, depending on how hosts and dns are set, it can be useless.
> But normally, companies have host making standards to avoid that.


BTW you already have it : 

select setting from pg_settings where name = 'listen_addresses' 

No ?




Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-03 Thread Richard Guo
On Thu, Aug 3, 2023 at 12:38 PM Ashutosh Bapat 
wrote:

> On Tue, Aug 1, 2023 at 6:50 PM Tom Lane  wrote:
> > Alternatively, could we postpone the reparameterization until
> > createplan.c?  Having to build reparameterized expressions we might
> > not use seems expensive, and it would also contribute to the memory
> > bloat being complained of in nearby threads.
>
> As long as the translation happens only once, it should be fine. It's
> the repeated translations which should be avoided.


Sometimes it would help to avoid the translation at all if we postpone
the reparameterization until createplan.c, such as in the case that a
non-parameterized path wins at last.  For instance, for the query below

regression=# explain (costs off)
select * from prt1 t1 join prt1 t2 on t1.a = t2.a;
 QUERY PLAN

 Append
   ->  Hash Join
 Hash Cond: (t1_1.a = t2_1.a)
 ->  Seq Scan on prt1_p1 t1_1
 ->  Hash
   ->  Seq Scan on prt1_p1 t2_1
   ->  Hash Join
 Hash Cond: (t1_2.a = t2_2.a)
 ->  Seq Scan on prt1_p2 t1_2
 ->  Hash
   ->  Seq Scan on prt1_p2 t2_2
   ->  Hash Join
 Hash Cond: (t1_3.a = t2_3.a)
 ->  Seq Scan on prt1_p3 t1_3
 ->  Hash
   ->  Seq Scan on prt1_p3 t2_3
(16 rows)

Our current code would reparameterize each inner paths although at last
we choose the non-parameterized paths.  If we do the reparameterization
in createplan.c, we would be able to avoid all the translations.

Thanks
Richard


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-03 Thread Amit Kapila
On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for giving comments! PSA new version patchset.
>
> > 1. Do we really need 0001 patch after the latest change proposed by
> > Vignesh in the 0004 patch?
>
> I removed 0001 patch and revived old patch which serializes slots at shutdown.
> This is because the problem which slots are not serialized to disk still 
> remain [1]
> and then confirmed_flush becomes behind, even if we implement the approach.
>

So, IIUC, you are talking about a patch with the below commit message.
[PATCH v18 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.


As per this commit message, this patch should be numbered as 1 but you
have placed it as 2 after the main upgrade patch?


> > 2.
> > + if (dopt.logical_slots_only)
> > + {
> > + if (!dopt.binary_upgrade)
> > + pg_fatal("options --logical-replication-slots-only requires option
> > --binary-upgrade");
> > +
> > + if (dopt.dataOnly)
> > + pg_fatal("options --logical-replication-slots-only and
> > -a/--data-only cannot be used together");
> > +
> > + if (dopt.schemaOnly)
> > + pg_fatal("options --logical-replication-slots-only and
> > -s/--schema-only cannot be used together");
> >
> > Can you please explain why the patch imposes these restrictions? I
> > guess the binary_upgrade is because you want this option to be used
> > for the upgrade. Do we want to avoid giving any other option with
> > logical_slots, if so, are the above checks sufficient and why?
>
> Regarding the --binary-upgrade, the motivation is same as you expected. I 
> covered
> up the --logical-replication-slots-only option from users, so it should not be
> used not for upgrade. Additionaly, this option is not shown in help and 
> document.
>
> As for -{data|schema}-only options, I removed restrictions.
> Firstly I set as excluded because it may be confused - as discussed at [2], 
> slots
> must be dumped after all the pg_resetwal is done and at that time all the 
> definitions
> are already dumped. to avoid duplicated definitions, we must ensure only 
> slots are
> written in the output file. I thought this requirement contradict 
> descirptions of
> these options (Dump only the A, not B).
> But after considering more, I thought this might not be needed because it was 
> not
> opened to users - no one would be confused by using both them.
> (Restriction for -c is also removed for the same motivation)
>

I see inconsistent behavior here with the patch. If I use "pg_dump.exe
--schema-only --logical-replication-slots-only --binary-upgrade
postgres" then I get only a dump of slots without any schema. When I
use "pg_dump.exe --data-only --logical-replication-slots-only
--binary-upgrade postgres" then neither table data nor slots. When I
use "pg_dump.exe --create --logical-replication-slots-only
--binary-upgrade postgres" then it returns the error "pg_dump: error:
role with OID 10 does not exist".

Now, I tried using --binary-upgrade with some other option like
"pg_dump.exe --create --binary-upgrade postgres" and then I got a dump
with all required objects with support for binary-upgrade.

I think your thought here is that this new option won't be usable
directly with pg_dump but we should study whether we allow to support
other options with --binary-upgrade for in-place upgrade utilities
other than pg_upgrade.

>
> > 4.
> > + /*
> > + * Check that all logical replication slots have reached the current WAL
> > + * position.
> > + */
> > + res = executeQueryOrDie(conn,
> > + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> > + "WHERE (SELECT count(record_type) "
> > + " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
> > pg_catalog.pg_current_wal_insert_lsn()) "
> > + " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
> > + "AND temporary = false AND wal_status IN ('reserved', 'extended');");
> >
> > I think this can unnecessarily lead to reading a lot of WAL data if
> > the confirmed_flush_lsn for a slot is too much behind. Can we think of
> > improving this by passing the number of records to read which in this
> > case should be 1?
>
> I checked and pg_wal_record_info() seemed to be used for the purpose. I tried 
> to
> move the functionality to core.
>

But I don't see how it addresses my concern about reading too many
records. If the confirmed_flush_lsn is too much behind, it will also
try to read all the remaining WAL for such slots.

> But this function raise an ERROR when there is no valid record after the 
> specified
> lsn. This means that the pg_upgrade fails if logical slots has caught up the 
> current
> WAL 

Re: Adding a pg_servername() function

2023-08-03 Thread Laetitia Avrot
Le jeu. 3 août 2023, 11:31, Matthias van de Meent <
boekewurm+postg...@gmail.com> a écrit :

> On Thu, 3 Aug 2023 at 10:37, Laetitia Avrot 
> wrote:
> >
> > Dear Hackers,
> >
> > One of my customers suggested creating a function that could return the
> server's hostname.
>
> Mostly for my curiosity: What would be their use case?
>

Thank you for showing interest in that patch.

>
For my customer,  their use case is to be able from an SQL client to double
check they're on the right host before doing things that could become a
production disaster.

I see also another use case: being able to identify postgres metrics on a
monitoring tool. Look at the hack pg_staviz uses here:
https://github.com/vyruss/pg_statviz/blob/7cd0c694cea40f780fb8b76275c6097b5d210de6/src/pg_statviz/libs/info.py#L30

Those are the  use cases I can think of.

I only see limited usability, considering that the local user's
> hostname can be very different from the hostname used in the url that
> connects to that PostgreSQL instance.
>

Agreed, depending on how hosts and dns are set,  it can be useless. But
normally,  companies have host making standards to avoid that.

Have a nice day,

Lætitia


> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
>


Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Matthias van de Meent
On Wed, 2 Aug 2023 at 03:05, Andy Fan  wrote:
>
> Hi Matthias:
>
> On Wed, Aug 2, 2023 at 7:33 AM Andy Fan  wrote:
>>
>>
>>
>> On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent 
>>  wrote:
>>>
>>> On Tue, 1 Aug 2023 at 06:39, Andy Fan  wrote:
>>> >
>>> > Hi:
>>> >
>>> > Currently if we want to extract a numeric field in jsonb, we need to use
>>> > the following expression:  cast (a->>'a' as numeric). It will turn a 
>>> > numeric
>>> > to text first and then turn the text to numeric again.
>>>
>>> Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
>>
>>
>> Thanks for this information! I didn't realize we have this function
>> already at [1].
>>
>> https://www.postgresql.org/docs/15/functions-json.html
>
>
> Hi:
>
> I just found ((a->'a')::numeric) is not as effective as I expected.
>
> First in the above expression we used jsonb_object_field which
> returns a jsonb (see JsonbValueToJsonb), and then we convert jsonb
> to jsonbValue in jsonb_numeric (see JsonbExtractScalar). This
> looks like a wastage.

Yes, it's not great, but that's just how this works. We can't
pre-specialize all possible operations that one might want to do in
PostgreSQL - that'd be absurdly expensive for binary and initial
database sizes.

> Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
> which may detoast a value so we need to free it with PG_FREE_IF_COPY.
> then this looks like another potential wastage.

Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.

> /*
>  * v.val.numeric points into jsonb body, so we need to make a copy to
>  * return
>  */
> retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));
>
> At last this method needs 1 extra FuncExpr than my method, this would
> cost some expression execution effort. I'm not saying we need to avoid
> expression execution generally, but extracting numeric fields from jsonb
> looks a reasonable case.

But we don't have special cases for the other jsonb types  - the one
that is available (text) is lossy and doesn't work reliably without
making sure the field we're accessing is actually a string, and not
any other type of value.

> As a comparison, cast to other data types like
> int2/int4 may be not needed since they are not binary compatible.

Yet there are casts from jsonb to and back from int2, int4 and int8. I
don't see a very good reason to add this, for the same reasons
mentioned by Pavel.

*If* we were to add this operator, I would want this patch to also
include a #-variant for text[]-based deep access (c.q. #> / #>>), and
equivalent operators for the json type to keep the current access
operator parity.

> Here is the performance comparison (with -O3, my previous post is -O0).
>
> select 1 from tb where (a->'a')::numeric = 2;  31ms.
> select 1 from tb where (a@->'a') = 2;  15ms

What's tb here?


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Pavel Stehule
Hi

čt 3. 8. 2023 v 9:53 odesílatel Andy Fan  napsal:

> Hi Pavel:
>
> Thanks for the feedback.
>
> I don't like this solution because it is bloating  operators and it is not
>> extra readable.
>>
>
> If we support it with cast, could we say we are bloating CAST?  It is true
> that it is not extra readable, if so how about  a->>'a'  return text?
> Actually
> I can't guess any meaning of the existing jsonb operations without
> documentation.
>

yes, it can bloat CAST, but for usage we have already used syntax, and
these casts are cooked already:

(2023-08-03 11:04:51) postgres=# select castfunc::regprocedure from pg_cast
where castsource = 'jsonb'::regtype;
┌──┐
│ castfunc │
╞══╡
│ -│
│ bool(jsonb)  │
│ "numeric"(jsonb) │
│ int2(jsonb)  │
│ int4(jsonb)  │
│ int8(jsonb)  │
│ float4(jsonb)│
│ float8(jsonb)│
└──┘
(8 rows)

the operator ->> was a special case, the text type is special in postgres
as the most convertible type. And when you want to visualise a value or
display the value, you should convert value to text.

I can live with that because it is just one, but with your proposal opening
the doors for implementing tens of similar operators, I think it is bad.
Using ::target_type is common syntax and doesn't require reading
documentation.

More, I believe so lot of people uses more common syntax, and then this
syntax should to have good performance - for jsonb - (val->'op')::numeric
works, and then there should not be performance penalty, because this
syntax will be used in 99%.

Usage of cast is self documented.


> For completeness you should implement cast for date, int, boolean too.
>> Next, the same problem is with XML or hstore type (probably with any types
>> that are containers).
>>
>
> I am not sure completeness is a gold rule we should obey anytime,
> like we have some function like int24le to avoid the unnecessary
> cast, but we just avoid casting for special types for performance
> reason, but not for all. At the same time,  `int2/int4/int8` doesn't
> have a binary compatibility type in jsonb. and the serialization
> /deserialization for boolean is pretty cheap.
>
> I didn't realize timetime types are binary compatible with SQL,
> so maybe we can have some similar optimization as well.
> (It is a pity that timestamp(tz) are not binary, or else we may
> just need one operator).
>
>
>>
>> I don't like the idea so using a special operator is 2x faster than
>> common syntax for casting. It is a signal, so there is a space for
>> optimization. Black magic with special operators is not user friendly for
>> relatively common problems.
>>
>
> I don't think "Black magic" is a proper word here, since it is not much
> different from ->> return a text.  If you argue text can be cast to
> most-of-types,  that would be a reason, but I doubt this difference
> should generate a "black magic".
>

I used the term black magic, because nobody without reading documentation
can find this operator. It is used just for this special case, and the
functionality is the same as using cast (only with different performance).

The operator ->> is more widely used. But if we have some possibility to
work without it, then the usage for a lot of users will be more simple.
More if the target types can be based on context

Can be nice to use some like `EXTRACT(YEAR FROM val->'field')` instead
`EXTRACT(YEAR FROM (val->>'field')::date)`


>
>>
>> Maybe we can introduce some *internal operator* "extract to type", and in
>> rewrite stage we can the pattern (x->'field')::type transform to OP(x,
>> 'field', typid)
>>
>
> Not sure what the OP should be?  If it is a function, what is the
> return value?  It looks to me like it is hard to do in c language?
>

It should be internal structure - it can be similar like COALESCE or IS
operator


>
> After all,  if we really care about the number of operators, I'm OK
> with just let users use the function directly, like
>
> jsonb_field_as_numeric(jsonb, 'filedname')
> jsonb_field_as_timestamp(jsonb, 'filedname');
> jsonb_field_as_timestamptz(jsonb, 'filedname');
> jsonb_field_as_date(jsonb, 'filedname');
>
> it can save an operator and sloves the readable issue.
>

I don't like it too much, but it is better than introduction new operator

We already have the jsonb_extract_path and jsonb_extract_path_text
function.

I can imagine to usage "anyelement" type too. some like
`jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`






> --
> Best Regards
> Andy Fan
>


Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2023-08-03 Thread tender wang
I think the code to determine that fk of a partition is inherited or not is
not enough.
For example, in this case, foreign key r_1_p_id_fkey1  is not inherited
from parent.

If conform->conparentid(in DetachPartitionFinalize func) is valid, we
should recheck confrelid(pg_constraint) field.

I try to fix this problem in the attached patch.
Any thoughts.

Alvaro Herrera  于2023年8月3日周四 17:02写道:

> On 2023-Aug-03, tender wang wrote:
>
> > I think  old "sub-FK" should not be dropped, that will be violates
> foreign
> > key constraint.
>
> Yeah, I've been playing more with the patch and it is definitely not
> doing the right things.  Just eyeballing the contents of pg_trigger and
> pg_constraint for partitions added by ALTER...ATTACH shows that the
> catalog contents are inconsistent with those added by CREATE TABLE
> PARTITION OF.
>
> --
> Álvaro Herrera PostgreSQL Developer  —
> https://www.EnterpriseDB.com/
>
From d84395c7321c201a78661de0b41b76e71ab10678 Mon Sep 17 00:00:00 2001
From: "tender.wang" 
Date: Thu, 3 Aug 2023 17:23:06 +0800
Subject: [PATCH] Recheck foreign key of a partition is inherited from parent.

Previously, fk is inherited if conparentid(pg_constraint) is valid.
It is not enough and we should compare confrelid field to determine
fk is inherited.
---
 src/backend/commands/tablecmds.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 727f151750..1447433109 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -18556,6 +18556,8 @@ DetachPartitionFinalize(Relation rel, Relation partRel, 
bool concurrent,
ForeignKeyCacheInfo *fk = lfirst(cell);
HeapTuple   contup;
Form_pg_constraint conform;
+   HeapTuple parentTup;
+   Form_pg_constraint parentForm;
Constraint *fkconstraint;
Oid insertTriggerOid,
updateTriggerOid;
@@ -18573,6 +18575,23 @@ DetachPartitionFinalize(Relation rel, Relation 
partRel, bool concurrent,
continue;
}
 
+   /* recheck confrelid field */
+   if (OidIsValid(conform->conparentid))
+   {
+   parentTup = SearchSysCache1(CONSTROID, 
ObjectIdGetDatum(conform->conparentid));
+   if (!HeapTupleIsValid(parentTup))
+   elog(ERROR, "cache lookup failed for constraint 
%u", conform->conparentid);
+   parentForm = (Form_pg_constraint) GETSTRUCT(parentTup);
+   /* It is not inherited foreign keys */
+   if (parentForm->confrelid != conform->confrelid)
+   {
+   ReleaseSysCache(contup);
+   ReleaseSysCache(parentTup);
+   continue;
+   }
+   ReleaseSysCache(parentTup);
+   }
+
/* unset conparentid and adjust conislocal, coninhcount, etc. */
ConstraintSetParentConstraint(fk->conoid, InvalidOid, 
InvalidOid);
 
-- 
2.25.1



Re: Adding a pg_servername() function

2023-08-03 Thread Matthias van de Meent
On Thu, 3 Aug 2023 at 10:37, Laetitia Avrot  wrote:
>
> Dear Hackers,
>
> One of my customers suggested creating a function that could return the 
> server's hostname.

Mostly for my curiosity: What would be their use case?
I only see limited usability, considering that the local user's
hostname can be very different from the hostname used in the url that
connects to that PostgreSQL instance.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-03 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> I see your point related to WALAVAIL_REMOVED status of the slot but
> did you test the scenario I have explained in my comment? Basically, I
> want to know whether it can impact the user in some way. So, please
> check whether the corresponding subscriptions will be allowed to drop.
> You can test it both before and after the upgrade.

Yeah, this is a real issue. I have tested and confirmed the expected things.
Even if the status of the slot is 'lost', it may be needed for dropping
subscriptions properly.

* before upgrading, the subscription which refers the lost slot could be dropped
* after upgrading, the subscription could not be dropped as-is.
  users must ALTER SUBSCRIPTION sub SET (slot_name = NONE);

Followings are the stepped what I did:

## Setup

1. constructed a logical replication system
2. disabled the subscriber once
3. consumed many WALs so that the status of slot became 'lost'

```
publisher=# SELECT slot_name, wal_status FROM pg_replication_slots ;
slot_name | wal_status 
---+
sub   | lost
(1 row)
```

# testcase a - try to drop sub. before upgrading

a-1. enabled the subscriber again.
  At that time following messages are shown on subscriber log:
```
ERROR:  could not start WAL streaming: ERROR:  can no longer get changes from 
replication slot "sub"
DETAIL:  This slot has been invalidated because it exceeded the maximum 
reserved size.
```

a-2. did DROP SUBSCRIPTION ...
a-3. succeeded.

```
subscriber=# DROP SUBSCRIPTION sub;
NOTICE:  dropped replication slot "sub" on publisher
DROP SUBSCRIPTION
```

# testcase b - try to drop sub. after upgrading

b-1. did pg_upgrade command
b-2. enabled the subscriber. From that point an apply worker connected to new 
node...
b-3. did DROP SUBSCRIPTION ...
b-4. failed with the message:

```
subscriber=# DROP SUBSCRIPTION sub;
ERROR:  could not drop replication slot "sub" on publisher: ERROR:  replication 
slot "sub" does not exist
```

The workaround was to disassociate the slot, which was written in the document. 

```
subscriber =# ALTER SUBSCRIPTION sub DISABLE;
ALTER SUBSCRIPTION
subscriber =# ALTER SUBSCRIPTION sub SET (slot_name = NONE);
ALTER SUBSCRIPTION
subscriber =# DROP SUBSCRIPTION sub;
DROP SUBSCRIPTION
```

PSA the script for emulating above tests.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



test_0803.sh
Description: test_0803.sh


Re: On disable_cost

2023-08-03 Thread Jian Guo
Hi hackers,

I have write an initial patch to retire the disable_cost​ GUC, which labeled a 
flag on the Path struct instead of adding up a big cost which is hard to 
estimate. Though it involved in tons of plan changes in regression tests, I 
have tested on some simple test cases such as eagerly generate a two-stage agg 
plans and it worked. Could someone help to review?


regards,

Jian

From: Euler Taveira 
Sent: Friday, November 1, 2019 22:48
To: Zhenghua Lyu 
Cc: PostgreSQL Hackers 
Subject: Re: On disable_cost

!! External Email

Em sex, 1 de nov de 2019 às 03:42, Zhenghua Lyu  escreveu:
>
> My issue: I did some spikes and tests on TPCDS 1TB Bytes data. For query 
> 104, it generates
>  nestloop join even with enable_nestloop set off. And the final plan's total 
> cost is very huge (about 1e24). But If I enlarge the disable_cost to 1e30, 
> then, planner will generate hash join.
>
> So I guess that disable_cost is not large enough for huge amount of data.
>
> It is tricky to set disable_cost a huge number. Can we come up with 
> better solution?
>
Isn't it a case for a GUC disable_cost? As Thomas suggested, DBL_MAX
upper limit should be sufficient.

> The following thoughts are from Heikki:
>>
>> Aside from not having a large enough disable cost, there's also the fact 
>> that the high cost might affect the rest of the plan, if we have to use a 
>> plan type that's disabled. For example, if a table doesn't have any indexes, 
>> but enable_seqscan is off, we might put the unavoidable Seq Scan on 
>> different side of a join than we we would with enable_seqscan=on, because of 
>> the high cost estimate.
>
>
>>
>> I think a more robust way to disable forbidden plan types would be to handle 
>> the disabling in add_path(). Instead of having a high disable cost on the 
>> Path itself, the comparison add_path() would always consider disabled paths 
>> as more expensive than others, regardless of the cost.
>
I'm afraid it is not as cheap as using diable_cost as a node cost. Are
you proposing to add a new boolean variable in Path struct to handle
those cases in compare_path_costs_fuzzily?


--
   Euler Taveira   Timbira -
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.timbira.com.br%2F=05%7C01%7Cgjian%40vmware.com%7C12a30b2852dd4651667608db9401d056%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638266507757076648%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=v54JhsW8FX4mSmjgt2yP59t7xtv1mZvC%2BBhtKrfp%2FBY%3D=0
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento





!! External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.
From baf0143438a91c8739534c7d85b9d121a7bb560e Mon Sep 17 00:00:00 2001
From: Jian Guo 
Date: Thu, 3 Aug 2023 17:03:49 +0800
Subject: [PATCH] Retire disable_cost, introduce new flag is_disabled into Path
 struct.

Signed-off-by: Jian Guo 
---
 src/backend/optimizer/path/costsize.c | 30 +++---
 src/backend/optimizer/util/pathnode.c | 58 +++
 src/include/nodes/pathnodes.h |  1 +
 3 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a1..0814604c49 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -274,7 +274,7 @@ cost_seqscan(Path *path, PlannerInfo *root,
 		path->rows = baserel->rows;
 
 	if (!enable_seqscan)
-		startup_cost += disable_cost;
+		path->is_disabled = true;
 
 	/* fetch estimated page cost for tablespace containing table */
 	get_tablespace_page_costs(baserel->reltablespace,
@@ -463,7 +463,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 		path->path.rows = rel->rows;
 
 	if (!enable_gathermerge)
-		startup_cost += disable_cost;
+		path->path.is_disabled = true;
 
 	/*
 	 * Add one to the number of workers to account for the leader.  This might
@@ -576,7 +576,7 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	}
 
 	if (!enable_indexscan)
-		startup_cost += disable_cost;
+		path->path.is_disabled = true;
 	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
 
 	/*
@@ -1011,7 +1011,7 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 		path->rows = baserel->rows;
 
 	if (!enable_bitmapscan)
-		startup_cost += disable_cost;
+		path->is_disabled = true;
 
 	pages_fetched = compute_bitmap_pages(root, baserel, bitmapqual,
 		 loop_count, ,
@@ -1279,11 +1279,10 @@ cost_tidscan(Path *path, PlannerInfo *root,
 	 */
 	if (isCurrentOf)
 	{
-		Assert(baserel->baserestrictcost.startup >= disable_cost);
-		startup_cost -= disable_cost;
+		path->is_disabled = false;
 	}
 	else if (!enable_tidscan)
-		startup_cost 

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-03 Thread Peter Smith
Just to clarify my previous post, I meant we will need new v26* patches

v24-0001 -> not needed because v25-0001 pushed
v24-0002 -> v26-0001
v24-0003 -> v26-0002

On Thu, Aug 3, 2023 at 6:19 PM Peter Smith  wrote:
>
> Hi Melih,
>
> Now that v25-0001 has been pushed, can you please rebase the remaining 
> patches?
>
> --
> Kind Regards,
> Peter Smith.
> Fujitsu Australia




Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2023-08-03 Thread Alvaro Herrera
On 2023-Aug-03, tender wang wrote:

> I think  old "sub-FK" should not be dropped, that will be violates foreign
> key constraint.

Yeah, I've been playing more with the patch and it is definitely not
doing the right things.  Just eyeballing the contents of pg_trigger and
pg_constraint for partitions added by ALTER...ATTACH shows that the
catalog contents are inconsistent with those added by CREATE TABLE
PARTITION OF.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Missing comments/docs about custom scan path

2023-08-03 Thread Etsuro Fujita
On Mon, Jul 31, 2023 at 7:05 PM Etsuro Fujita  wrote:
> While working on [1], I noticed $SUBJECT: commit e7cb7ee14 failed to
> update comments for the CustomPath struct in pathnodes.h, and commit
> f49842d1e failed to update docs about custom scan path callbacks in
> custom-scan.sgml, IIUC.  Attached are patches for updating these,
> which I created separately for ease of review (patch
> update-custom-scan-path-comments.patch for the former and patch
> update-custom-scan-path-docs.patch for the latter).  In the second
> patch I used almost the same text as for the
> ReparameterizeForeignPathByChild callback function in fdwhandler.sgml.

There seem to be no objections, so I pushed both.

Best regards,
Etsuro Fujita




Re: Adding a LogicalRepWorker type field

2023-08-03 Thread Ashutosh Bapat
On Wed, Aug 2, 2023 at 12:14 PM Peter Smith  wrote:

>
> We can't use the same names for both with/without-parameter functions
> because there is no function overloading in C. In patch v3-0001 I've
> replaced the "dual set of macros", with a single inline function of a
> different name, and one set of space-saving macros.
>

I think it's a good idea to add worker type field. Trying to deduce it
based on the contents of the structure isn't good. RelOptInfo, for example,
has RelOptKind. But RelOptInfo has become really large with members
required by all RelOptKinds crammed under the same structure. If we can
avoid that here at the beginning itself, that will be great. May be we
should create a union of type specific members while we are introducing the
type. This will also provide some protection against unrelated members
being (mis)used in the future.

-- 
Best Wishes,
Ashutosh Bapat


Adding a pg_servername() function

2023-08-03 Thread Laetitia Avrot
Dear Hackers,

One of my customers suggested creating a function that could return the
server's hostname.

After a quick search, we found [this Wiki page](
https://wiki.postgresql.org/wiki/Pg_gethostname) referring to [that
extension](https://github.com/theory/pg-hostname/) from David E. Wheeler.

I used shamelessly the idea and created a working proof of concept:

- the function takes no argument and returns the hostname or a null value
if any error occurs
- the function was added to the network.c file because it makes sense to me
to have that near the inet_server_addr() and inet_server_port() functions.
- I didn't add any test as the inet functions are not tested either.

If you think my design is good enough, I'll go ahead and port/test that
function for Windows.

Have a nice day,

Lætitia


pg_servername_function_v1.patch
Description: Binary data


Re: Inquiry about Functionality Availability in PostgreSQL

2023-08-03 Thread Ashutosh Bapat
Hi Sultan,
Thanks for your email and interest in PostgreSQL.

On Wed, Aug 2, 2023 at 8:41 AM Sultan Berentaev 
wrote:
>
>
> Dear PostgreSQL Development Team,
>
> I am inquiring about the availability of certain functionalities in the
standard PostgreSQL database. Could you please confirm if the following
functionalities are currently available in PostgreSQL:
>

It might help if you describe each of these functionalities in a sentence
or two.

> 1. Enforcement of Security Attribute Expiry

This looks like a broader term. Each attribute (certificates, passwords, )
may have a different mechanism of expiry. Some attributes may not have
expiry enforcement. Some may have expiry controlled by external agents. Can
you please provide more details about what you are looking for?

> 2. Restricted Data Audit Management Access

I have vague understanding of what this could be but I think more details
will be helpful.

> 3. Limit on Concurrent Sessions

If it's just the number of client connections PostgreSQL is supposed to
handle at a time, max_connection

is all you need.

> 4. Access History for PostgreSQL

More details on what kind of access you are looking for will be helpful.
PostgreSQL has GUCs that enable/disable different kinds of accesses to be
logged to server error log.

>
> If any of these functionalities are not present in the standard
PostgreSQL database, could you kindly let me know and provide any relevant
insights about their potential implementation in future versions?
>

Core PostgreSQL may not provide some of the functionalities by itself but
extensions like pg_audit may provide those functionalities. Maybe you want
to look for such extensions.

-- 
Best Wishes,
Ashutosh Bapat


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-03 Thread Peter Smith
Hi Melih,

Now that v25-0001 has been pushed, can you please rebase the remaining patches?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Extract numeric filed in JSONB more effectively

2023-08-03 Thread Andy Fan
Hi Pavel:

Thanks for the feedback.

I don't like this solution because it is bloating  operators and it is not
> extra readable.
>

If we support it with cast, could we say we are bloating CAST?  It is true
that it is not extra readable, if so how about  a->>'a'  return text?
Actually
I can't guess any meaning of the existing jsonb operations without
documentation.

For completeness you should implement cast for date, int, boolean too.
> Next, the same problem is with XML or hstore type (probably with any types
> that are containers).
>

I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance
reason, but not for all. At the same time,  `int2/int4/int8` doesn't
have a binary compatibility type in jsonb. and the serialization
/deserialization for boolean is pretty cheap.

I didn't realize timetime types are binary compatible with SQL,
so maybe we can have some similar optimization as well.
(It is a pity that timestamp(tz) are not binary, or else we may
just need one operator).


>
> I don't like the idea so using a special operator is 2x faster than common
> syntax for casting. It is a signal, so there is a space for optimization.
> Black magic with special operators is not user friendly for relatively
> common problems.
>

I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text.  If you argue text can be cast to
most-of-types,  that would be a reason, but I doubt this difference
should generate a "black magic".


>
> Maybe we can introduce some *internal operator* "extract to type", and in
> rewrite stage we can the pattern (x->'field')::type transform to OP(x,
> 'field', typid)
>

Not sure what the OP should be?  If it is a function, what is the
return value?  It looks to me like it is hard to do in c language?

After all,  if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname')
jsonb_field_as_timestamp(jsonb, 'filedname');
jsonb_field_as_timestamptz(jsonb, 'filedname');
jsonb_field_as_date(jsonb, 'filedname');

it can save an operator and sloves the readable issue.

-- 
Best Regards
Andy Fan


Re: Improve const use in zlib-using code

2023-08-03 Thread Peter Eisentraut

On 02.08.23 16:39, Tristan Partin wrote:
I like the idea. Though the way you have it implemented at the moment 
seems like a trap in that any time zlib.h is included someone also has 
to remember to add this define. I would recommend adding the define to 
the build systems instead.


Ok, moved to c.h.

Since you put in the work to find the version of zlib that added this, 
You might also want to add `required: '>= 1.2.5.2'` to the 
`dependency('zlib')` call in the main meson.build.


Well, it's not a hard requirement, so I think this is not necessary.

I am wondering if we 
could remove the `z_streamp` check too. The version that it was added 
isn't obvious.


Yeah, that appears to be very obsolete.  I have made an additional patch 
to remove that.
From 81f053c68109d85ae657f6d9f652d90f1d55fb2a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 3 Aug 2023 08:47:02 +0200
Subject: [PATCH v2 1/2] Improve const use in zlib-using code

If we define ZLIB_CONST before including zlib.h, zlib augments some
interfaces with const decorations.  By doing that we can keep our own
interfaces cleaner and can remove some unconstify calls.

ZLIB_CONST was introduced in zlib 1.2.5.2 (17 Dec 2011).

XXX CentOS 6 has zlib-1.2.3-29.el6.x86_64
---
 contrib/pgcrypto/pgp-compress.c | 2 +-
 src/bin/pg_basebackup/bbstreamer_gzip.c | 2 +-
 src/bin/pg_basebackup/walmethods.c  | 5 ++---
 src/include/c.h | 5 +
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/contrib/pgcrypto/pgp-compress.c b/contrib/pgcrypto/pgp-compress.c
index 086bec31ae..961cf21e74 100644
--- a/contrib/pgcrypto/pgp-compress.c
+++ b/contrib/pgcrypto/pgp-compress.c
@@ -113,7 +113,7 @@ compress_process(PushFilter *next, void *priv, const uint8 
*data, int len)
/*
 * process data
 */
-   st->stream.next_in = unconstify(uint8 *, data);
+   st->stream.next_in = data;
st->stream.avail_in = len;
while (st->stream.avail_in > 0)
{
diff --git a/src/bin/pg_basebackup/bbstreamer_gzip.c 
b/src/bin/pg_basebackup/bbstreamer_gzip.c
index 3bdbfa0bc4..fb25fef150 100644
--- a/src/bin/pg_basebackup/bbstreamer_gzip.c
+++ b/src/bin/pg_basebackup/bbstreamer_gzip.c
@@ -269,7 +269,7 @@ bbstreamer_gzip_decompressor_content(bbstreamer *streamer,
mystreamer = (bbstreamer_gzip_decompressor *) streamer;
 
zs = >zstream;
-   zs->next_in = (uint8 *) data;
+   zs->next_in = (const uint8 *) data;
zs->avail_in = len;
 
/* Process the current chunk */
diff --git a/src/bin/pg_basebackup/walmethods.c 
b/src/bin/pg_basebackup/walmethods.c
index 376ddf72b7..965d49dec4 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -705,7 +705,7 @@ typedef struct TarMethodData
 
 #ifdef HAVE_LIBZ
 static bool
-tar_write_compressed_data(TarMethodData *tar_data, void *buf, size_t count,
+tar_write_compressed_data(TarMethodData *tar_data, const void *buf, size_t 
count,
  bool flush)
 {
tar_data->zp->next_in = buf;
@@ -782,8 +782,7 @@ tar_write(Walfile *f, const void *buf, size_t count)
 #ifdef HAVE_LIBZ
else if (f->wwmethod->compression_algorithm == PG_COMPRESSION_GZIP)
{
-   if (!tar_write_compressed_data(tar_data, unconstify(void *, 
buf),
-  
count, false))
+   if (!tar_write_compressed_data(tar_data, buf, count, false))
return -1;
f->currpos += count;
return count;
diff --git a/src/include/c.h b/src/include/c.h
index f69d739be5..82f8e9d4c7 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -75,6 +75,11 @@
 #include 
 #endif
 
+/* Define before including zlib.h to add const decorations to zlib API. */
+#ifdef HAVE_LIBZ
+#define ZLIB_CONST
+#endif
+
 
 /* 
  * Section 1: compiler characteristics
-- 
2.41.0

From 76fb004fa151746d584816befb3a070570697298 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 3 Aug 2023 08:52:30 +0200
Subject: [PATCH v2 2/2] Remove check for z_streamp

This is surely obsolete.  zlib version 1.0.4, which includes
z_streamp, was released 1996-07-24.  When this check was put in in
2001 (19c97b8579), it was already labeling that release ancient.
---
 configure| 15 ---
 configure.ac |  9 -
 meson.build  |  8 
 3 files changed, 32 deletions(-)

diff --git a/configure b/configure
index 2e518c8007..963fbbcf1e 100755
--- a/configure
+++ b/configure
@@ -15162,21 +15162,6 @@ _ACEOF
 fi
 
 
-if test "$with_zlib" = yes; then
-  # Check that  defines z_streamp (versions before about 1.0.4
-  # did not).  While we could work around the lack of z_streamp, it
-  # seems unwise to encourage people to use such old zlib versions...
-  ac_fn_c_check_type "$LINENO" 

Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

2023-08-03 Thread tender wang
I think  old "sub-FK" should not be dropped, that will be violates foreign
key constraint. For example :
postgres=# insert into r values(1,1);
INSERT 0 1
postgres=# ALTER TABLE r DETACH PARTITION r_1;
ALTER TABLE
postgres=# delete from p_1 where id = 1;
DELETE 1
postgres=# select * from r_1;
 id | p_id
+--
  1 |1
(1 row)

If I run above SQLs on pg12.12, it will report error below:
postgres=# delete from p_1 where id = 1;
ERROR:  update or delete on table "p_1" violates foreign key constraint
"r_1_p_id_fkey1" on table "r_1"
DETAIL:  Key (id)=(1) is still referenced from table "r_1".

Alvaro Herrera  于2023年7月31日周一 20:58写道:

> On 2023-Jul-05, Jehan-Guillaume de Rorthais wrote:
>
> >   ALTER TABLE r ATTACH PARTITION r_1 FOR VALUES IN (1);
> >
> > The old sub-FKs (below 18289) created in this table to enforce the action
> > triggers on referenced partitions are not deleted when the table becomes
> a
> > partition. Because of this, we have additional and useless triggers on
> the
> > referenced partitions and we can not DETACH this partition on the
> referencing
> > side anymore:
>
> Oh, hm, interesting.  Thanks for the report and patch.  I found a couple
> of minor issues with it (most serious one: nkeys should be 3, not 2;
> also sysscan should use conrelid index), but I'll try and complete it so
> that it's ready for 2023-08-10's releases.
>
> Regards
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
>
>
>


Re: Faster "SET search_path"

2023-08-03 Thread Jeff Davis
On Tue, 2023-08-01 at 21:52 -0700, Nathan Bossart wrote:
> I
> typically see this done with two ѕeparate lists and forboth().

Agreed, done.

> 
> Any reason not to use hash_combine() here?

Thank you, fixed.

> > I changed it to move the hook so that it's called after retrieving
> > from
> > the cache.
...
> I think you are right.  This adds some complexity, but I don't have
> anything else to propose at the moment.

I reworked it a bit to avoid allocations in most cases, and it only
reprocesses the oidlist and runs the hooks when necessary (and even
then, still benefits from the cached OIDs that have already passed the
ACL checks).

Now I'm seeing timings around 7.1s, which is starting to look really
nice.

> 
> I'm not following why this logic was moved.

There was previously a comment:

"We want to detect the case where the effective value of the base
search path variables didn't change.  As long as we're doing so, we can
avoid copying the OID list unnecessarily."

But with v2 the copy had already happened by that point. In v3, there
is no copy at all.
> 

Regards,
Jeff Davis


From d9a3d4364731a152096d4a8c8cf6a81c1e86f8ab Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 27 Jul 2023 15:11:42 -0700
Subject: [PATCH v3 1/3] Transform proconfig for faster execution.

Store function config settings as a list of (name, value) pairs to
avoid the need to parse for each function invocation.
---
 src/backend/utils/fmgr/fmgr.c | 29 +++---
 src/backend/utils/misc/guc.c  | 45 ---
 src/include/utils/guc.h   |  2 ++
 3 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 9208c31fe0..6b2d7d7be3 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -612,7 +612,8 @@ struct fmgr_security_definer_cache
 {
 	FmgrInfo	flinfo;			/* lookup info for target function */
 	Oid			userid;			/* userid to set, or InvalidOid */
-	ArrayType  *proconfig;		/* GUC values to set, or NULL */
+	List	   *configNames;	/* GUC names to set, or NIL */
+	List	   *configValues;	/* GUC values to set, or NIL */
 	Datum		arg;			/* passthrough argument for plugin modules */
 };
 
@@ -634,6 +635,8 @@ fmgr_security_definer(PG_FUNCTION_ARGS)
 	FmgrInfo   *save_flinfo;
 	Oid			save_userid;
 	int			save_sec_context;
+	ListCell   *lc1;
+	ListCell   *lc2;
 	volatile int save_nestlevel;
 	PgStat_FunctionCallUsage fcusage;
 
@@ -666,8 +669,11 @@ fmgr_security_definer(PG_FUNCTION_ARGS)
 );
 		if (!isnull)
 		{
+			ArrayType *array;
 			oldcxt = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
-			fcache->proconfig = DatumGetArrayTypePCopy(datum);
+			array = DatumGetArrayTypeP(datum);
+			TransformGUCArray(array, >configNames,
+			  >configValues);
 			MemoryContextSwitchTo(oldcxt);
 		}
 
@@ -680,7 +686,7 @@ fmgr_security_definer(PG_FUNCTION_ARGS)
 
 	/* GetUserIdAndSecContext is cheap enough that no harm in a wasted call */
 	GetUserIdAndSecContext(_userid, _sec_context);
-	if (fcache->proconfig)		/* Need a new GUC nesting level */
+	if (fcache->configNames != NIL)		/* Need a new GUC nesting level */
 		save_nestlevel = NewGUCNestLevel();
 	else
 		save_nestlevel = 0;		/* keep compiler quiet */
@@ -689,12 +695,17 @@ fmgr_security_definer(PG_FUNCTION_ARGS)
 		SetUserIdAndSecContext(fcache->userid,
 			   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
 
-	if (fcache->proconfig)
+	forboth(lc1, fcache->configNames, lc2, fcache->configValues)
 	{
-		ProcessGUCArray(fcache->proconfig,
-		(superuser() ? PGC_SUSET : PGC_USERSET),
-		PGC_S_SESSION,
-		GUC_ACTION_SAVE);
+		GucContext	 context = superuser() ? PGC_SUSET : PGC_USERSET;
+		GucSource	 source	 = PGC_S_SESSION;
+		GucAction	 action	 = GUC_ACTION_SAVE;
+		char		*name	 = lfirst(lc1);
+		char		*value	 = lfirst(lc2);
+
+		(void) set_config_option(name, value,
+ context, source,
+ action, true, 0, false);
 	}
 
 	/* function manager hook */
@@ -737,7 +748,7 @@ fmgr_security_definer(PG_FUNCTION_ARGS)
 
 	fcinfo->flinfo = save_flinfo;
 
-	if (fcache->proconfig)
+	if (fcache->configNames != NIL)
 		AtEOXact_GUC(true, save_nestlevel);
 	if (OidIsValid(fcache->userid))
 		SetUserIdAndSecContext(save_userid, save_sec_context);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5308896c87..fa96b79192 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -6213,16 +6213,13 @@ ParseLongOption(const char *string, char **name, char **value)
 			*cp = '_';
 }
 
-
 /*
- * Handle options fetched from pg_db_role_setting.setconfig,
- * pg_proc.proconfig, etc.  Caller must specify proper context/source/action.
- *
- * The array parameter must be an array of TEXT (it must not be NULL).
+ * Transform array of GUC settings into a list of (name, value) pairs. The
+ * list is faster to process in cases where the settings must be applied
+ * repeatedly (e.g. for each 

Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

2023-08-03 Thread Andy Fan
Hi David:

Sorry for feedback at the last minute!  I study the patch and find the
following cases.

1. ORDER BY or PARTITION BY

select *, count(two) over (order by unique1) from tenk1 limit 1;
DEBUG:  startup_tuples = 1
DEBUG:  startup_tuples = 1

select *, count(two) over (partition by unique1) from tenk1 limit 1;
DEBUG:  startup_tuples = 1
DEBUG:  startup_tuples = 1

Due to the Executor of nodeWindowAgg, we have to fetch the next tuple
until it mismatches with the current one, then we can calculate the
WindowAgg function. In the current patch, we didn't count the
mismatched tuple. I verified my thought with 'break at IndexNext'
function and see IndexNext is called twice, so in the above case the
startup_tuples should be 2?


2. ORDER BY and PARTITION BY

select two, hundred,
count(two) over (partition by ten order by hundred)
from tenk1 limit 1;

DEBUG:  startup_tuples = 10
 two | hundred | count
-+-+---
   0 |   0 |   100

If we consider the mismatched tuples, it should be 101?

3. As we can see the log for startup_tuples is logged twice sometimes,
the reason is because it is used in cost_windowagg, so it is calculated
for every create_one_window_path. I think the startup_tuples should be
independent with the physical path, maybe we can cache it somewhere to
save some planning cycles?

Thanks for the patch!

-- 
Best Regards
Andy Fan


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-03 Thread Amit Kapila
On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for giving comments! PSA new version patchset.
>
> > 3.
> > + /*
> > + * Get replication slots.
> > + *
> > + * XXX: Which information must be extracted from old node? Currently three
> > + * attributes are extracted because they are used by
> > + * pg_create_logical_replication_slot().
> > + */
> > + appendPQExpBufferStr(query,
> > + "SELECT slot_name, plugin, two_phase "
> > + "FROM pg_catalog.pg_replication_slots "
> > + "WHERE database = current_database() AND temporary = false "
> > + "AND wal_status IN ('reserved', 'extended');");
> >
> > Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
> > or WALAVAIL_UNRESERVED? I think the slots where wal status is
> > WALAVAIL_REMOVED, the corresponding slots are invalidated at some
> > point. I think such slots can't be used for decoding but these will be
> > dropped along with the subscription or when a user does it manually.
> > So, if we don't copy such slots after the upgrade then there could be
> > a problem in dropping the corresponding subscription. If we don't want
> > to copy over such slots then we need to provide instructions on what
> > users should do in such cases. OTOH, if we want to copy over such
> > slots then we need to find a way to invalidate such slots after copy.
> > Either way, this needs more analysis.
>
> I considered again here. At least WALAVAIL_UNRESERVED should be supported 
> because
> the slot is still usable. It can return reserved or extended.
>
> As for WALAVAIL_REMOVED, I don't think it should be so that I added a 
> description
> to the document.
>
> This feature re-create slots which have same name/plugins as old ones, not 
> replicate
> its state. So if we copy them as-is slots become usable again. If subscribers 
> refer
> the slot and then connect again at that time, changes between 
> 'WALAVAIL_REMOVED'
> may be lost.
>
> Based on above slots must be copied as WALAVAIL_REMOVED, but as you said, we 
> do
> not have a way to control that. the status is calculated by using restart_lsn,
> but there are no function to modify directly.
>
> One approach is adding an SQL funciton which set restart_lsn to aritrary value
> (or 0/0, invalid), but it seems dangerous.
>

I see your point related to WALAVAIL_REMOVED status of the slot but
did you test the scenario I have explained in my comment? Basically, I
want to know whether it can impact the user in some way. So, please
check whether the corresponding subscriptions will be allowed to drop.
You can test it both before and after the upgrade.

-- 
With Regards,
Amit Kapila.




Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-03 Thread jian he
On Thu, Aug 3, 2023 at 1:23 PM Amul Sul  wrote:
>
>
> That is not expected & acceptable. But, somehow, I am not able to reproduce
> this behavior. Could you please retry this experiment by adding "table_schema"
> in your output query?
>
> Thank you.
>
> Regards,
> Amul
>

sorry. my mistake.
I created these partitions in a public schema and test schema. I
ignored table_schema when querying it.
Now, this patch works as expected.




Re: Exposing the lock manager's WaitForLockers() to SQL

2023-08-03 Thread Will Mortensen
Updated docs a bit. I'll see about adding this to the next CF in hopes
of attracting a reviewer. :-)


v3-0001-Add-WAIT-ONLY-option-to-LOCK-command.patch
Description: Binary data


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-03 Thread Amit Kapila
On Thu, Aug 3, 2023 at 9:35 AM Peter Smith  wrote:
>
> On Wed, Aug 2, 2023 at 11:19 PM Amit Kapila  wrote:
> >
> > On Wed, Aug 2, 2023 at 4:09 PM Melih Mutlu  wrote:
> > >
> > > PFA an updated version with some of the earlier reviews addressed.
> > > Forgot to include them in the previous email.
> > >
> >
> > It is always better to explicitly tell which reviews are addressed but
> > anyway, I have done some minor cleanup in the 0001 patch including
> > removing includes which didn't seem necessary, modified a few
> > comments, and ran pgindent. I also thought of modifying some variable
> > names based on suggestions by Peter Smith in an email [1] but didn't
> > find many of them any better than the current ones so modified just a
> > few of those. If you guys are okay with this then let's commit it and
> > then we can focus more on the remaining patches.
> >
>
> I checked the latest patch v25-0001.
>
> LGTM.
>

Thanks, I have pushed 0001. Let's focus on the remaining patches.


-- 
With Regards,
Amit Kapila.




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-03 Thread Yuya Watari
Hello,

On Wed, Aug 2, 2023 at 6:43 PM Andrey Lepikhov
 wrote:
> You introduced list_ptr_cmp as an extern function of a List, but use it
> the only under USE_ASSERT_CHECKING ifdef.
> Maybe you hide it under USE_ASSERT_CHECKING or remove all the stuff?

Thank you for your quick reply and for pointing that out. If we remove
the verification code when committing this patch, we should also
remove the list_ptr_cmp() function because nobody will use it. If we
don't remove the verification, whether to hide it by
USE_ASSERT_CHECKING is a difficult question. The list_ptr_cmp() can be
used for generic use and is helpful even without assertions, so not
hiding it is one option. However, I understand that it is not pretty
to have the function compiled even though it is not referenced from
anywhere when assertions are disabled. As you say, I think hiding it
by USE_ASSERT_CHECKING is also a possible solution.

-- 
Best regards,
Yuya Watari