Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-10 Thread Tom Lane
I've managed to reproduce this locally, by dint of running the
src/bin/scripts tests over and over and tweaking the timing by
trying different "taskset" parameters to vary the number of CPUs
available.  I find that I duplicated the report from spurfowl,
particularly

(gdb) bt
#0  0x7f67bb6807d5 in raise () from /lib64/libc.so.6
#1  0x7f67bb669895 in abort () from /lib64/libc.so.6
#2  0x0094ce37 in errfinish (filename=, 
lineno=, 
funcname=0x9ac120 <__func__.1> "visibilitymap_clear") at elog.c:680
#3  0x00488b8c in visibilitymap_clear (rel=rel@entry=0x7f67b2837330, 
heapBlk=, buf=buf@entry=0, flags=flags@entry=3 '\003')
 ^^^
at visibilitymap.c:155
#4  0x0055cd87 in heap_update (relation=0x7f67b2837330, 
otid=0x7f67b274744c, newtup=0x7f67b2747448, cid=, 
crosscheck=, wait=, tmfd=0x7ffecf4d5700, 
lockmode=0x7ffecf4d56fc) at heapam.c:3993
#5  0x0055dd61 in simple_heap_update (
relation=relation@entry=0x7f67b2837330, otid=otid@entry=0x7f67b274744c, 
tup=tup@entry=0x7f67b2747448) at heapam.c:4211
#6  0x005e531c in CatalogTupleUpdate (heapRel=0x7f67b2837330, 
otid=0x7f67b274744c, tup=0x7f67b2747448) at indexing.c:309
#7  0x006420f9 in update_attstats (relid=1255, inh=false, 
natts=natts@entry=30, vacattrstats=vacattrstats@entry=0x19c9fc0)
at analyze.c:1758
#8  0x006430dd in update_attstats (vacattrstats=0x19c9fc0, natts=30, 
inh=false, relid=) at analyze.c:1646
#9  do_analyze_rel (onerel=, params=0x7ffecf4d5e50, 
va_cols=0x0, acquirefunc=, relpages=86, 
inh=, in_outer_xact=false, elevel=13) at analyze.c:589
#10 0x006447a1 in analyze_rel (relid=, 
relation=, params=params@entry=0x7ffecf4d5e50, va_cols=0x0, 
in_outer_xact=, bstrategy=) at analyze.c:261
#11 0x006a5718 in vacuum (relations=0x19c8158, params=0x7ffecf4d5e50, 
bstrategy=, isTopLevel=) at vacuum.c:478
#12 0x006a5c94 in ExecVacuum (pstate=pstate@entry=0x1915970, 
vacstmt=vacstmt@entry=0x18ed5c8, isTopLevel=isTopLevel@entry=true)
at vacuum.c:254
#13 0x0083c32c in standard_ProcessUtility (pstmt=0x18ed918, 
queryString=0x18eca20 "ANALYZE pg_catalog.pg_proc;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x18eda08, qc=0x7ffecf4d61c0) at utility.c:826

I'd not paid much attention to that point before, but now it
seems there is no question that heap_update is reaching line 3993

visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
vmbuffer, VISIBILITYMAP_VALID_BITS);

without having changed "vmbuffer" from its initial value of
InvalidBuffer.  It looks that way both at frame 3 and frame 4:

(gdb) f 4
#4  0x0055cd87 in heap_update (relation=0x7f67b2837330, 
otid=0x7f67b274744c, newtup=0x7f67b2747448, cid=, 
crosscheck=, wait=, tmfd=0x7ffecf4d5700, 
lockmode=0x7ffecf4d56fc) at heapam.c:3993
3993visibilitymap_clear(relation, 
BufferGetBlockNumber(buffer),
(gdb) i locals
...
vmbuffer = 0
vmbuffer_new = 0
...

It is also hard to doubt that somebody broke this in the last-minute
commit blizzard.  There are no reports of this PANIC in the buildfarm for
the last month, but we're now up to four (last I checked) since Thursday.

While the first thing that comes to mind is a logic bug in heap_update
itself, that code doesn't seem to have changed much in the last few days.
Moreover, why is it that this only seems to be happening within
do_analyze_rel -> update_attstats?  (We only have two stack traces
positively showing that, but all the buildfarm reports look like the
failure is happening within manual or auto analyze of a system catalog.
Fishy as heck.)

Just eyeing the evidence on hand, I'm wondering if something has decided
it can start setting the page-all-visible bit without adequate lock,
perhaps only in system catalogs.  heap_update is clearly assuming that
that flag won't change underneath it, and if it did, it's clear how this
symptom would ensue.

Too tired to take it further tonight ... discuss among yourselves.

regards, tom lane




Re: TRUNCATE on foreign table

2021-04-10 Thread Justin Pryzby
On Thu, Apr 08, 2021 at 10:14:17PM +0900, Fujii Masao wrote:
> On 2021/04/08 22:02, Kohei KaiGai wrote:
> > > Anyway, attached is the updated version of the patch. This is still based 
> > > on the latest Kazutaka-san's patch. That is, extra list for ONLY is still 
> > > passed to FDW. What about committing this version at first? Then we can 
> > > continue the discussion and change the behavior later if necessary.
> 
> Pushed! Thank all involved in this development!!
> For record, I attached the final patch I committed.

Find attached language fixes.

I'm also proposing to convert an if/else to an switch(), since I don't like
"if/else if" without an "else", and since the compiler can warn about missing
enum values in switch cases.  You could also write:
| Assert(behavior == DROP_RESTRICT || behavior == DROP_CASCADE)

Also, you currently test:
> + if (extra & TRUNCATE_REL_CONTEXT_ONLY)

but TRUNCATE_REL_ aren't indepedent bits, so shouldn't be tested with "&".

src/include/commands/tablecmds.h-#define TRUNCATE_REL_CONTEXT_NORMAL   1 /* 
specified without ONLY clause */
src/include/commands/tablecmds.h-#define TRUNCATE_REL_CONTEXT_ONLY 2 /* 
specified with ONLY clause */
src/include/commands/tablecmds.h:#define TRUNCATE_REL_CONTEXT_CASCADING 3   
/* not specified but truncated
src/include/commands/tablecmds.h-   
 * due to dependency (e.g.,
src/include/commands/tablecmds.h-   
 * partition table) */

> +++ b/contrib/postgres_fdw/deparse.c
> @@ -2172,6 +2173,43 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List 
> **retrieved_attrs)
>   deparseRelation(buf, rel);
>  }
>  
> +/*
> + * Construct a simple "TRUNCATE rel" statement
> + */
> +void
> +deparseTruncateSql(StringInfo buf,
> +List *rels,
> +List *rels_extra,
> +DropBehavior behavior,
> +bool restart_seqs)
> +{
> + ListCell   *lc1,
> +*lc2;
> +
> + appendStringInfoString(buf, "TRUNCATE ");
> +
> + forboth(lc1, rels, lc2, rels_extra)
> + {
> + Relationrel = lfirst(lc1);
> + int extra = lfirst_int(lc2);
> +
> + if (lc1 != list_head(rels))
> + appendStringInfoString(buf, ", ");
> + if (extra & TRUNCATE_REL_CONTEXT_ONLY)
> + appendStringInfoString(buf, "ONLY ");
> +
> + deparseRelation(buf, rel);
> + }
> +
> + appendStringInfo(buf, " %s IDENTITY",
> +  restart_seqs ? "RESTART" : "CONTINUE");
> +
> + if (behavior == DROP_RESTRICT)
> + appendStringInfoString(buf, " RESTRICT");
> + else if (behavior == DROP_CASCADE)
> + appendStringInfoString(buf, " CASCADE");
> +}
>From 7bbd9312464899dfc2c70fdc64c95a298ac01594 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 8 Apr 2021 10:10:58 -0500
Subject: [PATCH 1/3] WIP doc review: Allow TRUNCATE command to truncate
 foreign tables.

8ff1c94649f5c9184ac5f07981d8aea9dfd7ac19
---
 doc/src/sgml/fdwhandler.sgml   | 43 +-
 doc/src/sgml/postgres-fdw.sgml |  2 +-
 doc/src/sgml/ref/truncate.sgml |  2 +-
 3 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 98882ddab8..5a76dede24 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1075,40 +1075,39 @@ ExecForeignTruncate(List *rels, List *rels_extra,
 DropBehavior behavior, bool restart_seqs);
 
 
- Truncate a set of foreign tables specified in rels.
+ Truncate foreign tables.
  This function is called when  is executed
- on foreign tables.  rels is the list of
- Relation data structure that indicates
- a foreign table to truncate.  rels_extra the list of
- int value, which delivers extra information about
- a foreign table to truncate.  Possible values are
+ on a foreign table.  rels is a list of
+ Relation data structures for the
+ foreign tables to be truncated.  rels_extra is a list of
+ corresponding int values which provide extra information about
+ the foreign tables.  Each element of rels_extra may have the value
  TRUNCATE_REL_CONTEXT_NORMAL, which means that
- the foreign table is specified WITHOUT ONLY clause
+ the foreign table is specified WITHOUT the ONLY clause
  in TRUNCATE,
  TRUNCATE_REL_CONTEXT_ONLY, which means that
- the foreign table is specified WITH ONLY clause,
- and TRUNCATE_REL_CONTEXT_CASCADING,
+ the foreign table is specified WITH the ONLY clause,
+ or TRUNCATE_REL_CONTEXT_CASCADING,
  which means that the foreign table is not specified explicitly,
- but will be truncated due to dependency (for example, pa

Re: Add header support to text format and matching feature

2021-04-10 Thread Zhihong Yu
On Sat, Apr 10, 2021 at 4:17 PM Rémi Lapeyre 
wrote:

>
> >
> > Michael, since the issue of duplicated options has been fixed do either
> of these patches look like they are ready for commit?
> >
>
> Here’s a rebased version of the patch.
>
>
> Cheers,
> Rémi
>
>
> > Regards,
> > --
> > -David
> > da...@pgmasters.net
>
>
> Hi,
>> sure it matches what is expected and exit immediatly if it does not.

Typo: immediately

+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server

nit: since header is singular, you can name the table header_doesnt_match

+  from the one expected, or the name or case do not match, the copy
will

For 'the name or case do not match', either use plural for the subjects or
change 'do' to doesn't

-   opts_out->header_line = defGetBoolean(defel);
+   opts_out->header_line = DefGetCopyHeader(defel);

Existing method starts with lower case d, I wonder why the new method
starts with upper case D.

+   if (fldct < list_length(cstate->attnumlist))
+   ereport(ERROR,
+   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("missing header")));

The message seems to be inaccurate: the header may be there - it just
misses some fields.

+ * Represents whether the header must be absent, present or present and
match.

present and match: it seems present is redundant - if header is absent, how
can it match ?

Cheers


Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-10 Thread Justin Pryzby
On Thu, Apr 08, 2021 at 04:11:49PM -0400, Alvaro Herrera wrote:
> On 2021-Apr-08, Tom Lane wrote:
> 
> > > So I tend to think that my initial instinct was the better direction: we
> > > should not be doing any find_all_inheritors() here at all, but instead
> > > rely on pg_class.reltuples to be set for the partitioned table.
> > 
> > +1
> 
> This patch does that.

|commit 0e69f705cc1a3df273b38c9883fb5765991e04fe
|Author: Alvaro Herrera 
|Date:   Fri Apr 9 11:29:08 2021 -0400
|
|Set pg_class.reltuples for partitioned tables
|
|When commit 0827e8af70f4 added auto-analyze support for partitioned
|tables, it included code to obtain reltuples for the partitioned table
|as a number of catalog accesses to read pg_class.reltuples for each
|partition.  That's not only very inefficient, but also problematic
|because autovacuum doesn't hold any locks on any of those tables -- and
|doesn't want to.  Replace that code with a read of pg_class.reltuples
|for the partitioned table, and make sure ANALYZE and TRUNCATE properly
|maintain that value.
|
|I found no code that would be affected by the change of relpages from
|zero to non-zero for partitioned tables, and no other code that should
|be maintaining it, but if there is, hopefully it'll be an easy fix.

+   else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   {
+   /*
+* Partitioned tables don't have storage, so we don't set any 
fields in
+* their pg_class entries except for relpages, which is 
necessary for
+* auto-analyze to work properly.
+*/
+   vac_update_relstats(onerel, -1, totalrows,
+   0, false, 
InvalidTransactionId,
+   InvalidMultiXactId,
+   in_outer_xact);
+   }

This refers to "relpages", but I think it means "reltuples".

src/include/commands/vacuum.h:extern void vac_update_relstats(Relation relation,
src/include/commands/vacuum.h-BlockNumber 
num_pages,
src/include/commands/vacuum.h-double num_tuples,
src/include/commands/vacuum.h-BlockNumber 
num_all_visible_pages,

I'm adding it for the next round of "v14docs" patch if you don't want to make a
separate commit for that.

-- 
Justin




Re: PL/R regression on windows, but not linux with master.

2021-04-10 Thread Dave Cramer
On Sat, 10 Apr 2021 at 20:56, Tomas Vondra 
wrote:

> On 4/11/21 2:38 AM, Dave Cramer wrote:
> >
> >
> >
> >
> > On Sat, 10 Apr 2021 at 20:34, Tom Lane  > > wrote:
> >
> > Dave Cramer mailto:davecra...@gmail.com>>
> writes:
> > > On Sat, 10 Apr 2021 at 20:24, Tom Lane  > > wrote:
> > >> That's quite bizarre.  What is the actual error level according to
> > >> the source code, and where is the error being thrown exactly?
> >
> > > Well it really is an ERROR, and is being downgraded on windows to
> > WARNING.
> >
> > That seems quite awful.
> >
> > However, now that I think about it, the elog.h error-level constants
> > were renumbered not so long ago.  Maybe you've failed to recompile
> > everything for v14?
> >
> >
> > We see this on a CI with a fresh pull from master.
> >
> > As I said I will dig into it and figure it out.
> >
>
> Well, plr.h does this:
>
> #define WARNING 19
> #define ERROR   20
>
> which seems a bit weird, because elog.h does this (since 1f9158ba481):
>
> #define WARNING 19
> #define WARNING_CLIENT_ONLY 20
> #define ERROR   21
>
> Not sure why this would break Windows but not Linux, though.
>
>
Thanks, I think ERROR is redefined in Windows as well for some strange
reason.

Dave

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: PL/R regression on windows, but not linux with master.

2021-04-10 Thread Tomas Vondra
On 4/11/21 2:38 AM, Dave Cramer wrote:
> 
> 
> 
> 
> On Sat, 10 Apr 2021 at 20:34, Tom Lane  > wrote:
> 
> Dave Cramer mailto:davecra...@gmail.com>> writes:
> > On Sat, 10 Apr 2021 at 20:24, Tom Lane  > wrote:
> >> That's quite bizarre.  What is the actual error level according to
> >> the source code, and where is the error being thrown exactly?
> 
> > Well it really is an ERROR, and is being downgraded on windows to
> WARNING.
> 
> That seems quite awful.
> 
> However, now that I think about it, the elog.h error-level constants
> were renumbered not so long ago.  Maybe you've failed to recompile
> everything for v14?
> 
> 
> We see this on a CI with a fresh pull from master.
> 
> As I said I will dig into it and figure it out. 
> 

Well, plr.h does this:

#define WARNING 19
#define ERROR   20

which seems a bit weird, because elog.h does this (since 1f9158ba481):

#define WARNING 19
#define WARNING_CLIENT_ONLY 20
#define ERROR   21

Not sure why this would break Windows but not Linux, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PL/R regression on windows, but not linux with master.

2021-04-10 Thread Dave Cramer
On Sat, 10 Apr 2021 at 20:34, Tom Lane  wrote:

> Dave Cramer  writes:
> > On Sat, 10 Apr 2021 at 20:24, Tom Lane  wrote:
> >> That's quite bizarre.  What is the actual error level according to
> >> the source code, and where is the error being thrown exactly?
>
> > Well it really is an ERROR, and is being downgraded on windows to
> WARNING.
>
> That seems quite awful.
>
> However, now that I think about it, the elog.h error-level constants
> were renumbered not so long ago.  Maybe you've failed to recompile
> everything for v14?
>

We see this on a CI with a fresh pull from master.

As I said I will dig into it and figure it out.

Cheers,

Dave

>
> regards, tom lane
>


Re: PL/R regression on windows, but not linux with master.

2021-04-10 Thread Tom Lane
Dave Cramer  writes:
> On Sat, 10 Apr 2021 at 20:24, Tom Lane  wrote:
>> That's quite bizarre.  What is the actual error level according to
>> the source code, and where is the error being thrown exactly?

> Well it really is an ERROR, and is being downgraded on windows to WARNING.

That seems quite awful.

However, now that I think about it, the elog.h error-level constants
were renumbered not so long ago.  Maybe you've failed to recompile
everything for v14?

regards, tom lane




Re: PL/R regression on windows, but not linux with master.

2021-04-10 Thread Dave Cramer
On Sat, 10 Apr 2021 at 20:24, Tom Lane  wrote:

> Dave Cramer  writes:
> > One of our tests purposely throws an error which returns
> > "ERROR:  R interpreter parse error" on linux
> > and
> > "WARNING:  R interpreter parse error" on windows.
>
> That's quite bizarre.  What is the actual error level according to
> the source code, and where is the error being thrown exactly?
>
> I recall that elog.c has some code to force ERROR up to FATAL or
> PANIC in some cases, but it shouldn't ever promote a non-error to
> an ERROR.
>

Well it really is an ERROR, and is being downgraded on windows to WARNING.

I was hoping someone familiar with the code could remember something before
I dig into this tomorrow.

Thanks,
Dave

>
> regards, tom lane
>


Re: PL/R regression on windows, but not linux with master.

2021-04-10 Thread Tom Lane
Dave Cramer  writes:
> One of our tests purposely throws an error which returns
> "ERROR:  R interpreter parse error" on linux
> and
> "WARNING:  R interpreter parse error" on windows.

That's quite bizarre.  What is the actual error level according to
the source code, and where is the error being thrown exactly?

I recall that elog.c has some code to force ERROR up to FATAL or
PANIC in some cases, but it shouldn't ever promote a non-error to
an ERROR.

regards, tom lane




PL/R regression on windows, but not linux with master.

2021-04-10 Thread Dave Cramer
One of our tests purposely throws an error which returns

"ERROR:  R interpreter parse error" on linux
and

"WARNING:  R interpreter parse error" on windows.

I'm hoping someone can point me to the code that may be responsible? Was
there a change in the error handling that might be attributed to this?

Dave Cramer


Re: Add header support to text format and matching feature

2021-04-10 Thread Rémi Lapeyre

> 
> Michael, since the issue of duplicated options has been fixed do either of 
> these patches look like they are ready for commit?
> 

Here’s a rebased version of the patch.


Cheers,
Rémi


> Regards,
> -- 
> -David
> da...@pgmasters.net




v8-0001-Add-header-support-to-COPY-TO-text-format.patch
Description: Binary data


v8-0002-Add-header-matching-mode-to-COPY-FROM.patch
Description: Binary data


Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-04-10 Thread Tomas Vondra



On 4/11/21 12:03 AM, David Rowley wrote:
> On Sat, 10 Apr 2021 at 10:32, Tomas Vondra
>  wrote:
>> But I think the puzzle is not so much about v5 vs v6, but more about v5
>> vs. master. I still don't understand how v5 managed to be faster than
>> master, but maybe I'm missing something.
> 
> Well, v5 wrapped ScalarArrayOpExpr inside HashedScalarArrayOpExpr, but
> didn't add cases for HashedScalarArrayOpExpr in all locations where it
> should have. For example, fix_expr_common() has a case for
> ScalarArrayOpExpr, but if we gave it a HashedScalarArrayOpExpr it
> would have very little work to do.
> 
> I've not gone and proved that's the exact reason why the planner
> became faster, but I really don't see any other reason.
> 
>> Maybe the right solution is to rely on the estimates, but then also
>> enable the hashing if we significantly cross the threshold during
>> execution. So for example we might get estimate 10 rows, and calculate
>> that the hashing would start winning at 100 rows, so we start without
>> hashing. But then at execution if we get 200 rows, we build the hash
>> table and start using it.
> 
> To do that, we'd need to store the number of evaluations of the
> function somewhere. I'm not really sure that would be a good idea as I
> imagine we'd need to store that in ExprEvalStep.  I imagine if that
> was a good idea then we'd have done the same for JIT.
> 

Sure, we'd need to track the number of lookups, but I'd imagine that's
fairly cheap and we can stop once we switch to hash mode.

I'm not sure "JIT does not do that" is really a proof it's a bad idea.
My guess is it wasn't considered back then, and the current heuristics
is the simplest possible. So maybe it's the other way and we should
consider to do the same thing for JIT?

FWIW if we look at what JIT does, it'd argue it supports the approach to
trust the estimates. Because if we under-estimate stuff, the cost won't
exceed the "jit_above_cost" threshold, and we won't use JIT.

> 
>> On 4/9/21 1:21 AM, David Rowley wrote:
>>> time values are in seconds:
>>>
>>> pg-master 28.07
>>> prepared 11.28
>>> simple 16.79
>>>
>>> pg-v6 15.86
>>> prepared 5.23
>>> simple 10.63
>>>
>>> So, the overall result when applying the total time is 177%.
>>>
>>
>> Not sure how you got those numbers, or how it explains the results.
> 
> I got it by doing "1 / tps * 1000" to get the time it would take to
> execute 1000 transactions of each of your tests. I then grouped by
> patch, mode and took the sum of the calculated number.  My point was
> that overall the patch is significantly faster.  I was trying to
> highlight that the 0 and 1 row test take up very little time and the
> overhead of building the hash table is only showing up because the
> query executes so quickly.
> 

Ah, I see. TBH I don't think combining the results gives us a very
meaningful value - those cases were quite arbitrary, but summing them
together like this assumes the workload has about 25% of each. But if
your workload is exclusively 0/1/5 rows it's going to be hit.

> FWIW, I think executing a large IN clause on a table that has 0 rows
> is likely not that interesting a case to optimise for.  That's not the
> same as a query that just returns 0 rows due to filtering out hundreds
> or thousands of rows during execution.  The overhead of building the
> hash table is not going to show up very easily in that sort of case.
> 

Yeah, it's probably true that queries with long IN lists are probably
dealing with many input rows. And you're right we don't really care
about how many rows are ultimately produced by the query (or even the
step with the IN list) - if we spent a lot of time to filter the rows
before applying the IN list, the time to initialize the hash table is
probably just noise.

I wonder what's the relationship between the length of the IN list and
the minimum number of rows needed for the hash to start winning.

>> I understand why the prepared mode got slower. I don't understand how
>> the simple mode got faster.
> 
> I very much imagine v5 was faster at planning due to the unfinished
> nature of the patch. I'd not added support for HashedScalarArrayOpExpr
> in all the places I should have. That would result in the planner
> skipping lots of work that it needs to do.  The way I got it to work
> was to add it, then just add enough cases in the planner to handle
> HashedScalarArrayOpExpr so I didn't get any errors.  I stopped after
> that just to show the idea.  Lack of errors does not mean it was
> correct. At least setrefs.c was not properly handling
> HashedScalarArrayOpExpr.
> 
> I really think it would be best if we just ignore the performance of
> v5. Looking at the performance of a patch that was incorrectly
> skipping a bunch of required work does not seem that fair.
> 

Aha! I was assuming v5 was correct, but if that assumption is incorrect
then the whole "v5 speedup" is just an illusion, aAnd you're right we
should simply ignore that. Thanks for the explanation!

re

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-04-10 Thread David Rowley
On Sat, 10 Apr 2021 at 10:32, Tomas Vondra
 wrote:
> But I think the puzzle is not so much about v5 vs v6, but more about v5
> vs. master. I still don't understand how v5 managed to be faster than
> master, but maybe I'm missing something.

Well, v5 wrapped ScalarArrayOpExpr inside HashedScalarArrayOpExpr, but
didn't add cases for HashedScalarArrayOpExpr in all locations where it
should have. For example, fix_expr_common() has a case for
ScalarArrayOpExpr, but if we gave it a HashedScalarArrayOpExpr it
would have very little work to do.

I've not gone and proved that's the exact reason why the planner
became faster, but I really don't see any other reason.

> Maybe the right solution is to rely on the estimates, but then also
> enable the hashing if we significantly cross the threshold during
> execution. So for example we might get estimate 10 rows, and calculate
> that the hashing would start winning at 100 rows, so we start without
> hashing. But then at execution if we get 200 rows, we build the hash
> table and start using it.

To do that, we'd need to store the number of evaluations of the
function somewhere. I'm not really sure that would be a good idea as I
imagine we'd need to store that in ExprEvalStep.  I imagine if that
was a good idea then we'd have done the same for JIT.


> On 4/9/21 1:21 AM, David Rowley wrote:
> > time values are in seconds:
> >
> > pg-master 28.07
> > prepared 11.28
> > simple 16.79
> >
> > pg-v6 15.86
> > prepared 5.23
> > simple 10.63
> >
> > So, the overall result when applying the total time is 177%.
> >
>
> Not sure how you got those numbers, or how it explains the results.

I got it by doing "1 / tps * 1000" to get the time it would take to
execute 1000 transactions of each of your tests. I then grouped by
patch, mode and took the sum of the calculated number.  My point was
that overall the patch is significantly faster.  I was trying to
highlight that the 0 and 1 row test take up very little time and the
overhead of building the hash table is only showing up because the
query executes so quickly.

FWIW, I think executing a large IN clause on a table that has 0 rows
is likely not that interesting a case to optimise for.  That's not the
same as a query that just returns 0 rows due to filtering out hundreds
or thousands of rows during execution.  The overhead of building the
hash table is not going to show up very easily in that sort of case.

> I understand why the prepared mode got slower. I don't understand how
> the simple mode got faster.

I very much imagine v5 was faster at planning due to the unfinished
nature of the patch. I'd not added support for HashedScalarArrayOpExpr
in all the places I should have. That would result in the planner
skipping lots of work that it needs to do.  The way I got it to work
was to add it, then just add enough cases in the planner to handle
HashedScalarArrayOpExpr so I didn't get any errors.  I stopped after
that just to show the idea.  Lack of errors does not mean it was
correct. At least setrefs.c was not properly handling
HashedScalarArrayOpExpr.

I really think it would be best if we just ignore the performance of
v5. Looking at the performance of a patch that was incorrectly
skipping a bunch of required work does not seem that fair.

David




Re: Reference Leak with type

2021-04-10 Thread Tom Lane
Here's a proposed patch for this problem.

The core problem in this test case is that the refcount is logged in the
Portal resowner, which is a child of the initial transaction's resowner,
so it goes away in the COMMIT (after warning of a resource leak); but
the expression tree is still there and still thinks it has a refcount.
By chance a new ResourceOwner is created in the same place where the old
one was, so that when the expression tree is finally destroyed at the
end of the DO block, we see an error about "this refcount isn't logged
here" rather than a crash.  Unrelated-looking code changes could turn
that into a real crash, of course.

I spent quite a bit of time fruitlessly trying to fix it by manipulating
which resowner the tupledesc refcount is logged in, specifically by
running plpgsql "simple expressions" with the simple_eval_resowner as
CurrentResourceOwner.  But this just causes other problems to appear,
because then that resowner becomes responsible for more stuff than
just the plancache refcounts that plpgsql is expecting it to hold.
Some of that stuff needs to be released at subtransaction abort,
which is problematic because most of what plpgsql wants it to deal
in needs to survive until end of main transaction --- in particular,
the plancache refcounts need to live that long, and so do the tupdesc
refcounts we're concerned with here, because those are associated with
"simple expression" trees that are supposed to have that lifespan.
It's possible that we could make this approach work, but at minimum
it'd require creating and destroying an additional resowner per
subtransaction; and maybe we'd have to give up on sharing "simple
expression" trees across subtransactions.  So the potential performance
hit is pretty bad, and I'm not even 100% sure it'd work at all.

So the alternative proposed in the attached is to give up on associating
a long-lived tupdesc refcount with these expression nodes at all.
Intead, we can use a method that plpgsql has been using for a few
years, which is to rely on the fact that typcache entries never go
away once made, and just save a pointer into the typcache.  We can
detect possible changes in the cache entry by watching for changes
in its tupDesc_identifier counter.

This infrastructure exists as far back as v11, so using it doesn't
present any problems for back-patchability.  It is slightly
nervous-making that we have to change some fields in struct ExprEvalStep
--- but the overall struct size isn't changing, and I can't really
see a reason why extensions would be interested in the contents of
these particular subfield types.

regards, tom lane

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 74fcfbea56..77c9d785d9 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1375,7 +1375,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 scratch.opcode = EEOP_FIELDSELECT;
 scratch.d.fieldselect.fieldnum = fselect->fieldnum;
 scratch.d.fieldselect.resulttype = fselect->resulttype;
-scratch.d.fieldselect.argdesc = NULL;
+scratch.d.fieldselect.rowcache.cacheptr = NULL;
 
 ExprEvalPushStep(state, &scratch);
 break;
@@ -1385,7 +1385,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 			{
 FieldStore *fstore = (FieldStore *) node;
 TupleDesc	tupDesc;
-TupleDesc  *descp;
+ExprEvalRowtypeCache *rowcachep;
 Datum	   *values;
 bool	   *nulls;
 int			ncolumns;
@@ -1401,9 +1401,9 @@ ExecInitExprRec(Expr *node, ExprState *state,
 values = (Datum *) palloc(sizeof(Datum) * ncolumns);
 nulls = (bool *) palloc(sizeof(bool) * ncolumns);
 
-/* create workspace for runtime tupdesc cache */
-descp = (TupleDesc *) palloc(sizeof(TupleDesc));
-*descp = NULL;
+/* create shared composite-type-lookup cache struct */
+rowcachep = palloc(sizeof(ExprEvalRowtypeCache));
+rowcachep->cacheptr = NULL;
 
 /* emit code to evaluate the composite input value */
 ExecInitExprRec(fstore->arg, state, resv, resnull);
@@ -1411,7 +1411,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 /* next, deform the input tuple into our workspace */
 scratch.opcode = EEOP_FIELDSTORE_DEFORM;
 scratch.d.fieldstore.fstore = fstore;
-scratch.d.fieldstore.argdesc = descp;
+scratch.d.fieldstore.rowcache = rowcachep;
 scratch.d.fieldstore.values = values;
 scratch.d.fieldstore.nulls = nulls;
 scratch.d.fieldstore.ncolumns = ncolumns;
@@ -1469,7 +1469,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 /* finally, form result tuple */
 scratch.opcode = EEOP_FIELDSTORE_FORM;
 scratch.d.fieldstore.fstore = fstore;
-scratch.d.fieldstore.argdesc = descp;
+scratch.d.fieldstore.rowcache = rowcachep;
 scratch.d.fieldstore.values = values;
 scratch.d.fieldstore.nulls = nulls;
 scratch.d.fieldstore.ncolumns = ncolumns;
@@ -1615,17 +1615,24 @@ ExecInitExprRec(Expr *node, ExprState *state

Re: pgsql: autovacuum: handle analyze for partitioned tables

2021-04-10 Thread Tomas Vondra
On 4/10/21 12:29 AM, Justin Pryzby wrote:
> On Fri, Apr 09, 2021 at 06:16:59PM -0400, Alvaro Herrera wrote:
>> On 2021-Apr-09, Justin Pryzby wrote:
>>
>>> One data point: we do DETACH/ATTACH tables during normal operation, before
>>> type-promoting ALTERs, to avoid worst-case disk use, and to avoid locking 
>>> the
>>> table for a long time.  It'd be undesirable (but maybe of no great 
>>> consequence)
>>> to trigger an ALTER when we DETACH them, since we'll re-ATTACH it shortly
>>> afterwards.
>>
>> You mean to trigger an ANALYZE, not to trigger an ALTER, right?
> 
> Oops, right.  It's slightly undesirable for a DETACH to cause an ANALYZE.
> 
>> I think I agree with Tomas: we should do it by default, and offer some
>> way to turn that off.  I suppose a new reloptions, solely for
>> partitioned tables, would be the way to do it.
>>
>>> However, I think DROP should be handled ?
>>
>> DROP of a partition? ... I would think it should do the same as DETACH,
>> right?  Inform that however many rows the partition had, are now changed
>> in ancestors.
> 
> Yes, drop of an (attached) partition.  The case for DROP is clear, since it
> was clearly meant to go away forever.  The case for DETACH seems somewhat less
> clear.
> 
> The current behavior of pg_dump/restore (since 33a53130a) is to CREATE+ATTACH,
> so there's an argument that if DROPping the partition counts towards the
> parent's analyze, then so should CREATE+ATTACH.
> 

I think it's tricky to "optimize" the behavior after ATTACH/DETACH. I'd
argue that in principle, we should aim to keep accurate statistics,  so
ATTACH should be treated as insert of all rows, and DETACH should be
treated as delete of all rows. Se for the purpose of ANALYZE, we should
propagate reltuples as changes_since_analyze after ATTACH/DETACH.

Yes, it may result in more frequent ANALYZE on the parent, but I think
that's necessary. Repeated attach/detach of the same partition may bloat
the value, but I guess that's an example of "If it hurts don't do it."

What I think we might do is offer some light-weight analyze variant,
e.g. based on the merging of statistics (I've posted a PoC patch a
couple days ago.). That would make the ANALYZEs on parent much cheaper,
so those "unnecessary" analyzes would not be an issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Reference Leak with type

2021-04-10 Thread Zhihong Yu
On Fri, Apr 9, 2021 at 10:31 AM Tom Lane  wrote:

> Michael Paquier  writes:
> > On Tue, Apr 06, 2021 at 11:09:13AM +0530, Rohit Bhogate wrote:
> >> I found the below reference leak on master.
>
> > Thanks for the report.  This is indeed a new problem as of HEAD,
>
> Just for the record, it's not new.  The issue is (I think) that
> the tupledesc refcount created by get_cached_rowtype is being
> logged in the wrong ResourceOwner.  Other cases that use
> get_cached_rowtype, such as IS NOT NULL on a composite value,
> reproduce the same type of failure back to v11:
>
> create type float_rec_typ as (i float8);
>
> do $$
>  declare
>   f float_rec_typ := row(42);
>   r bool;
>  begin
>   r := f is not null;
>   commit;
>  end
> $$;
>
> WARNING:  TupleDesc reference leak: TupleDesc 0x7f5f549809d8 (53719,-1)
> still referenced
> ERROR:  tupdesc reference 0x7f5f549809d8 is not owned by resource owner
> TopTransaction
>
> Still poking at a suitable fix.
>
> regards, tom lane
>
>
> Hi,
I think I have some idea about the cause for the 'resource owner' error.

When commit results in calling exec_stmt_commit(), the ResourceOwner
switches to a new one.
Later, when ResourceOwnerForgetTupleDesc() is called, we get the error
since owner->tupdescarr doesn't carry the tuple Desc to be forgotten.

One potential fix is to add the following to resowner.c
/*
 * Transfer resources from resarr1 to resarr2
 */
static void
ResourceArrayTransfer(ResourceArray *resarr1, ResourceArray *resarr2)
{
}

In exec_stmt_commit(), we save reference to the old ResourceOwner before
calling SPI_commit() (line 4824).
Then after the return from SPI_start_transaction(), ResourceArrayTransfer()
is called to transfer remaining items in tupdescarr from old ResourceOwner
to the current ResourceOwner.

I want to get some opinion on the feasibility of this route.

It seems ResourceOwner is opaque inside exec_stmt_commit(). And
no ResourceArrayXX call exists in pl_exec.c
So I am still looking for the proper structure of the solution.

Cheers


Re: SQL-standard function body

2021-04-10 Thread Noah Misch
On Sat, Apr 10, 2021 at 10:52:15AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Fri, Apr 09, 2021 at 12:09:43PM -0400, Tom Lane wrote:
> >> The real value of 0003 of course would be to get an error cursor at
> >> runtime
> 
> > A key benefit of $SUBJECT is the function body following DDL renames:
> 
> Agreed.  But ...
> 
> > After the rename, any stored prosrc is obsolete.  To show accurate error
> > cursors, deparse prosqlbody and use that in place of prosrc.
> 
> ... I'm not sure this conclusion follows.  There are two problems with it:
> 
> 1. I don't see an acceptably low-overhead way to mechanize it.
> Deparsing prosqlbody is unlikely to be safe in a post-error transaction,
> but surely we'd not want to expend that cost in advance on every use
> of a SQL function.  Even ignoring that, the act of deparsing would not
> in itself tell you what offset to use.  Should we deparse and then
> re-parse to get a new node tree with corrected token locations?

If you really want those error cursors, yes.  (I feel we can continue to live
without them; their absence is no more important than it was ten years ago.)
One can envision several ways to cache that high-overhead work.  Otherwise,
the usual PostgreSQL answer would be to omit an error cursor, not show one
that reflects an obsolete sense of the function.

If the original CREATE FUNCTION query text were so valuable, I'd be arguing to
preserve it across dump/reload.

> 2. The reason we can get away with showing a fragment of a large query
> (or function body) in an error message is that the user is supposed to
> be able to correlate the display with what she wrote.  That assumption
> falls to the ground if the display is based on a deconstruction that is
> virtually certain to have line breaks in different places, not to mention
> that the details of what is shown may be substantially different from the
> original.

Preferences on this matter will be situation-dependent.  If I do CREATE
FUNCTION f() ...; SELECT f() all in one sitting, then it's fine for an error
in the SELECT to show the function I wrote.  If I'm calling a function defined
years ago, I'm likely to compare the error report to "\sf foo" and not likely
to compare it to a years-old record of the SQL statement.  I think it's fine
to expect users to consult "\sf foo" when the user is in doubt.

> Still, I take your point that the original text may be less useful
> for this purpose than I was supposing.




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-10 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > But note that it doesn't check if an existing constraint "implies" the new
> > constraint - maybe it should.
> 
> Hm, I'm not sure I want to do that, because that means that if I later
> have to attach the partition again with the same partition bounds, then
> I might have to incur a scan to recheck the constraint.  I think we want
> to make the new constraint be as tight as possible ...

If it *implies* the partition constraint, then it's at least as tight (and
maybe tighter), yes ?

I think you're concerned with the case that someone has a partition with
"tight" bounds like (a>=200 and a<300) and a check constraint that's "less
tight" like (a>=100 AND a<400).  In that case, the loose check constraint
doesn't imply the tighter partition constraint, so your patch would add a
non-redundant constraint.

I'm interested in the case that someone has a check constraint that almost but
not exactly matches the partition constraint, like (a<300 AND a>=200).  In that
case, your patch adds a redundant constraint.

I wrote a patch which seems to effect my preferred behavior - please check.

-- 
Justin
>From 1ecc8cb17192d13f7432fe582a43ab8597b15c00 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 25 Mar 2021 21:24:10 -0500
Subject: [PATCH] DETACH CONCURRENTLY: avoid creation of redundant constraint

---
 src/backend/commands/tablecmds.c  | 29 ++-
 src/test/regress/expected/alter_table.out | 20 
 src/test/regress/sql/alter_table.sql  |  6 +
 3 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7d0bdc9ac4..62398b6882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17873,31 +17873,28 @@ ATExecDetachPartitionFinalize(Relation rel, RangeVar *name)
  * DetachAddConstraintIfNeeded
  *		Subroutine for ATExecDetachPartition.  Create a constraint that
  *		takes the place of the partition constraint, but avoid creating
- *		a dupe if an equivalent constraint already exists.
+ *		a dupe if an constraint already exists which implies the needed
+ *		constraint.
  */
 static void
 DetachAddConstraintIfNeeded(List **wqueue, Relation partRel)
 {
 	AlteredTableInfo *tab;
-	Expr	   *constraintExpr;
-	TupleDesc	td = RelationGetDescr(partRel);
+	List	   *constraintExpr;
 	Constraint *n;
 
-	constraintExpr = make_ands_explicit(RelationGetPartitionQual(partRel));
+	constraintExpr = (List *) eval_const_expressions(NULL,
+			(Node *) RelationGetPartitionQual(partRel));
 
-	/* If an identical constraint exists, we don't need to create one */
-	if (td->constr && td->constr->num_check > 0)
-	{
-		for (int i = 0; i < td->constr->num_check; i++)
-		{
-			Node	*thisconstr;
+	constraintExpr = (List *) make_ands_explicit(constraintExpr);
 
-			thisconstr = stringToNode(td->constr->check[i].ccbin);
-
-			if (equal(constraintExpr, thisconstr))
-return;
-		}
-	}
+	/*
+	 * Avoid adding a new constraint if the needed constraint is implied by an
+	 * existing constraint
+	 */
+	if (PartConstraintImpliedByRelConstraint(partRel,
+list_make1(constraintExpr)))
+		return;
 
 	tab = ATGetQueueEntry(wqueue, partRel);
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index ec14b060a6..f81bdf513b 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -4191,6 +4191,26 @@ ALTER TABLE range_parted2 DETACH PARTITION part_rp CONCURRENTLY;
 Partition key: RANGE (a)
 Number of partitions: 0
 
+-- constraint should be created
+\d part_rp
+  Table "public.part_rp"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+Check constraints:
+"part_rp_a_check" CHECK (a IS NOT NULL AND a >= 0 AND a < 100)
+
+CREATE TABLE part_rp100 PARTITION OF range_parted2 (CHECK (a>=123 AND a<133 AND a IS NOT NULL)) FOR VALUES FROM (100) to (200);
+ALTER TABLE range_parted2 DETACH PARTITION part_rp100 CONCURRENTLY;
+-- redundant constraint should not be created
+\d part_rp100
+ Table "public.part_rp100"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+Check constraints:
+"part_rp100_a_check" CHECK (a >= 123 AND a < 133 AND a IS NOT NULL)
+
 DROP TABLE range_parted2;
 -- Check ALTER TABLE commands for partitioned tables and partitions
 -- cannot add/drop column to/from *only* the parent
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 7a9c9252dc..dc0200adcb 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2696,6 +2696,12 @@ DROP TABLE part_rpd;
 -- works fine
 ALTER TABLE range_parted2 DETACH PARTITION part_rp CONCUR

Re: truncating timestamps on arbitrary intervals

2021-04-10 Thread Peter Eisentraut

On 10.04.21 14:53, John Naylor wrote:


On Sat, Apr 10, 2021 at 7:43 AM Peter Eisentraut 
> wrote:

 >
 > On 30.03.21 18:06, John Naylor wrote:
 > > Currently, when the origin is after the input, the result is the
 > > timestamp at the end of the bin, rather than the beginning as expected.
 > > The attached puts the result consistently at the beginning of the bin.
 >
 > In the patch
 >
 > +   if (origin > timestamp && stride_usecs > 1)
 > +       tm_delta -= stride_usecs;
 >
 > is the condition stride_usecs > 1 really necessary?  My assessment is
 > that it's not, in which case it would be better to omit it.

Without the condition, the case of 1 microsecond will fail to be a 
no-op. This case has no practical use, but it still must work correctly, 
just as date_trunc('microsecond', input) does.


Ah yes, the tests cover that.  Committed.




Re: check_function_bodies: At least the description seems wrong, since we have prodedures

2021-04-10 Thread Tom Lane
"Daniel Westermann (DWE)"  writes:
>> +1 for updating the description though.  We could s/function/routine/
>> where space is tight.

> Thanks for your inputs. Attached a proposal which updates the description.

I changed config.sgml's description similarly, and pushed this.

regards, tom lane




Re: Truncate in synchronous logical replication failed

2021-04-10 Thread Japin Li


On Thu, 08 Apr 2021 at 19:20, Japin Li  wrote:
> On Wed, 07 Apr 2021 at 16:34, tanghy.f...@fujitsu.com 
>  wrote:
>> On Wednesday, April 7, 2021 5:28 PM Amit Kapila  
>> wrote
>>
>>>Can you please check if the behavior is the same for PG-13? This is
>>>just to ensure that we have not introduced any bug in PG-14.
>>
>> Yes, same failure happens at PG-13, too.
>>
>
> I found that when we truncate a table in synchronous logical replication,
> LockAcquireExtended() [1] will try to take a lock via fast path and it
> failed (FastPathStrongRelationLocks->count[fasthashcode] = 1).
> However, it can acquire the lock when in asynchronous logical replication.
> I'm not familiar with the locks, any suggestions? What the difference
> between sync and async logical replication for locks?
>

After some analyze, I find that when the TRUNCATE finish, it will call
SyncRepWaitForLSN(), for asynchronous logical replication, it will exit
early, and then it calls ResourceOwnerRelease(RESOURCE_RELEASE_LOCKS) to
release the locks, so the walsender can acquire the lock.

But for synchronous logical replication, SyncRepWaitForLSN() will wait
for specified LSN to be confirmed, so it cannot release the lock, and
the walsender try to acquire the lock.  Obviously, it cannot acquire the
lock, because the lock hold by the process which performs TRUNCATE
command. This is why the TRUNCATE in synchronous logical replication is
blocked.


I don't know if it makes sense to fix this, if so, how to do fix it?
Thoughts?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: SQL-standard function body

2021-04-10 Thread Tom Lane
Noah Misch  writes:
> On Fri, Apr 09, 2021 at 12:09:43PM -0400, Tom Lane wrote:
>> The real value of 0003 of course would be to get an error cursor at
>> runtime

> A key benefit of $SUBJECT is the function body following DDL renames:

Agreed.  But ...

> After the rename, any stored prosrc is obsolete.  To show accurate error
> cursors, deparse prosqlbody and use that in place of prosrc.

... I'm not sure this conclusion follows.  There are two problems with it:

1. I don't see an acceptably low-overhead way to mechanize it.
Deparsing prosqlbody is unlikely to be safe in a post-error transaction,
but surely we'd not want to expend that cost in advance on every use
of a SQL function.  Even ignoring that, the act of deparsing would not
in itself tell you what offset to use.  Should we deparse and then
re-parse to get a new node tree with corrected token locations?

2. The reason we can get away with showing a fragment of a large query
(or function body) in an error message is that the user is supposed to
be able to correlate the display with what she wrote.  That assumption
falls to the ground if the display is based on a deconstruction that is
virtually certain to have line breaks in different places, not to mention
that the details of what is shown may be substantially different from the
original.

Still, I take your point that the original text may be less useful
for this purpose than I was supposing.

regards, tom lane




Re: Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?

2021-04-10 Thread Tom Lane
Bharath Rupireddy  writes:
> I'm reading the code for vacuum/analyze and it looks like currently we
> call vacuum_rel/analyze_rel for each relation specified. Which means
> that if a relation is specified more than once, then we simply
> vacuum/analyze it that many times. Do we gain any advantage by
> vacuuming/analyzing a relation back-to-back within a single command? I
> strongly feel no. I'm thinking we could do a simple optimization here,

This really is not something to expend cycles and code complexity on.
If the user wrote the same table more than once, that's their choice.

regards, tom lane




Re: Replication slot stats misgivings

2021-04-10 Thread vignesh C
On Sat, Apr 10, 2021 at 6:24 PM Amit Kapila  wrote:
>
> On Sat, Apr 10, 2021 at 1:06 PM vignesh C  wrote:
> >
> > Thanks Amit for your Patch. I have merged your changes into my
> > patchset. I did not find any issues in my testing.
> > Thoughts?
> >
>
> 0001
> --
>   PgStat_Counter m_stream_bytes;
> + PgStat_Counter m_total_txns;
> + PgStat_Counter m_total_bytes;
>  } PgStat_MsgReplSlot;
>
> ..
> ..
>
> + PgStat_Counter total_txns;
> + PgStat_Counter total_bytes;
>   TimestampTz stat_reset_timestamp;
>  } PgStat_ReplSlotStats;
>
> Doesn't this change belong to the second patch?

Missed it while splitting the patches, it is fixed in the attached patch,

Regards,
Vignesh
From 7b9cbe94ee1398b149ccd20a10d97c6fd7920442 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v6 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c   | 32 +++
 src/backend/replication/logical/logical.c | 13 ++---
 src/backend/replication/slot.c|  7 -
 src/backend/utils/adt/pgstatfuncs.c   |  2 +-
 src/include/pgstat.h  | 11 +++-
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..666ce95d08 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * --
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-	   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-	   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-	   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5202,7 +5200,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5538,7 +5536,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5763,7 +5761,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5776,7 +5774,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
diff --git 

Re: Replication slot stats misgivings

2021-04-10 Thread Amit Kapila
On Sat, Apr 10, 2021 at 1:06 PM vignesh C  wrote:
>
> Thanks Amit for your Patch. I have merged your changes into my
> patchset. I did not find any issues in my testing.
> Thoughts?
>

0001
--
  PgStat_Counter m_stream_bytes;
+ PgStat_Counter m_total_txns;
+ PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;

..
..

+ PgStat_Counter total_txns;
+ PgStat_Counter total_bytes;
  TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;

Doesn't this change belong to the second patch?

-- 
With Regards,
Amit Kapila.




Re: truncating timestamps on arbitrary intervals

2021-04-10 Thread John Naylor
On Sat, Apr 10, 2021 at 7:43 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> On 30.03.21 18:06, John Naylor wrote:
> > Currently, when the origin is after the input, the result is the
> > timestamp at the end of the bin, rather than the beginning as expected.
> > The attached puts the result consistently at the beginning of the bin.
>
> In the patch
>
> +   if (origin > timestamp && stride_usecs > 1)
> +   tm_delta -= stride_usecs;
>
> is the condition stride_usecs > 1 really necessary?  My assessment is
> that it's not, in which case it would be better to omit it.

Without the condition, the case of 1 microsecond will fail to be a no-op.
This case has no practical use, but it still must work correctly, just as
date_trunc('microsecond', input) does.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Replication slot stats misgivings

2021-04-10 Thread Amit Kapila
On Sat, Apr 10, 2021 at 7:24 AM Masahiko Sawada  wrote:
>
> IIUC there are two problems in the case where the drop message is lost:
>
> 1. Writing beyond the end of replSlotStats.
> This can happen if after restarting the number of slots whose stats
> are stored in the stats file exceeds max_replication_slots. Vignesh's
> patch addresses this problem.
>
> 2. The stats for the new slot are not recorded.
> If the stats for already-dropped slots remain in replSlotStats, the
> stats for the new slot cannot be registered due to the full of
> replSlotStats. This can happen even when after restarting the number
> of slots whose stats are stored in the stat file does NOT exceed
> max_replication_slots as well as even during the server running. The
> patch doesn’t address this problem. (If this happens, we will have to
> reset all slot stats since pg_stat_reset_replication_slot() cannot
> remove the slot stats with the non-existing name).
>
> I think we can use HTAB to store slot stats and have
> pg_stat_get_replication_slot() inquire about stats by the slot name,
> resolving both problems. By using HTAB we're no longer concerned about
> the problem of writing stats beyond the end of the replSlotStats
> array. Instead, we have to consider how and when to clean up the stats
> for already-dropped slots. We can have the startup process send slot
> names at startup time, which borrows the idea proposed by Amit. But
> maybe we need to consider the case again where the message from the
> startup process is lost? Another idea would be to have
> pgstat_vacuum_stat() check the existing slots and call
> pgstat_report_replslot_drop() if the slot in the stats file doesn't
> exist. That way, we can continuously check the stats for
> already-dropped slots.
>

Agreed, I think checking periodically via pgstat_vacuum_stat is a
better idea then sending once at start up time. I also think using
slot_name is better than using 'idx' (index in
ReplicationSlotCtl->replication_slots) in this scheme because even
after startup 'idx' changes we will be able to drop the dead slot.

> I've written a PoC patch for the above idea; using HTAB and cleaning
> up slot stats at pgstat_vacuum_stat(). The patch can be applied on top
> of 0001 patch Vignesh proposed before[1].
>

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

> Please note that this cannot resolve the problem of ending up
> accumulating the stats to the old slot if the slot is re-created with
> the same name and the drop message is lost. To deal with this problem
> I think we would need to use something unique identifier for each slot
> instead of slot name.
>

Right, we can probably write it in comments and or docs about this
caveat and the user can probably use pg_stat_reset_replication_slot
for such slots.

-- 
With Regards,
Amit Kapila.




Re: truncating timestamps on arbitrary intervals

2021-04-10 Thread Peter Eisentraut

On 30.03.21 18:06, John Naylor wrote:
Currently, when the origin is after the input, the result is the 
timestamp at the end of the bin, rather than the beginning as expected. 
The attached puts the result consistently at the beginning of the bin.


In the patch

+   if (origin > timestamp && stride_usecs > 1)
+   tm_delta -= stride_usecs;

is the condition stride_usecs > 1 really necessary?  My assessment is 
that it's not, in which case it would be better to omit it.





Re: check_function_bodies: At least the description seems wrong, since we have prodedures

2021-04-10 Thread Daniel Westermann (DWE)
>> It's possible the parameter name also appears in documentation for
>> out-of-tree PLs, as each PL's validator function determines what
>> "check_function_bodies" really means in that setting.

>That parameter is also set explicitly in pg_dump output, so we
>can't rename it without breaking existing dump files.

>Admittedly, guc.c does have provisions for substituting new names
>if we rename some parameter.  But I'm not in a hurry to create
>more instances of that behavior; the potential for confusion
>seems to outweigh any benefit.

>+1 for updating the description though.  We could s/function/routine/
>where space is tight.

Thanks for your inputs. Attached a proposal which updates the description.

Regards
Danieldiff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ca378bd6af..d0a51b507d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1759,7 +1759,7 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 	{
 		{"check_function_bodies", PGC_USERSET, CLIENT_CONN_STATEMENT,
-			gettext_noop("Check function bodies during CREATE FUNCTION."),
+			gettext_noop("Check routine bodies during CREATE FUNCTION and CREATE PROCEDURE."),
 			NULL
 		},
 		&check_function_bodies,


Is it worth to optimize VACUUM/ANALYZE by combining duplicate rel instances into single rel instance?

2021-04-10 Thread Bharath Rupireddy
Hi,

I'm reading the code for vacuum/analyze and it looks like currently we
call vacuum_rel/analyze_rel for each relation specified. Which means
that if a relation is specified more than once, then we simply
vacuum/analyze it that many times. Do we gain any advantage by
vacuuming/analyzing a relation back-to-back within a single command? I
strongly feel no. I'm thinking we could do a simple optimization here,
by transforming following VACUUM/ANALYZE commands to:
1) VACUUM t1, t2, t1, t2, t1; transform to -->
VACUUM t1, t2;
2) VACUUM ANALYZE t1(a1), t2(a2), t1(b1), t2(b2), t1(c1);
transform to --> VACUUM ANALYZE t1(a1, b1, c1), t2(a2, b2)
3) ANALYZE t1, t2, t1, t2, t1; transform to -->
ANALYZE t1, t2;
4) ANALYZE t1(a1), t2(a2), t1(b1), t2(b2), t1(c1);
transform to --> ANALYZE t1(a1, b1, c1), t2(a2, b2)

Above use cases may look pretty much unsound and we could think of
disallowing with an error for the use cases (1) and 3(), but the use
cases (2) and (4) are quite possible in customer scenarios(??). Please
feel free to add any other use cases you may think of.

The main advantage of the above said optimization is that the commands
can become a bit faster because we will avoid extra processing. I
would like to hear opinions on this. I'm not sure if this optimization
was already given a thought and not done because of some specific
reasons. If so, it will be great if someone can point me to those
discussions. Or it could be that I'm badly missing in my understanding
of current vacuum/analyze code, feel free to correct me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Replication slot stats misgivings

2021-04-10 Thread vignesh C
On Sat, Apr 10, 2021 at 9:50 AM Amit Kapila  wrote:
>
> On Fri, Apr 9, 2021 at 4:13 PM Amit Kapila  wrote:
> >
> > 2.
> > @@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> > ReorderBufferTXN *txn,
> >   rb->begin(rb, txn);
> >   }
> >
> > + /*
> > + * Update total transaction count and total transaction bytes, if
> > + * transaction is streamed or spilled it will be updated while the
> > + * transaction gets spilled or streamed.
> > + */
> > + if (!rb->streamBytes && !rb->spillBytes)
> > + {
> > + rb->totalTxns++;
> > + rb->totalBytes += rb->size;
> > + }
> >
> > I think this will skip a transaction if it is interleaved between a
> > streaming transaction. Assume, two transactions t1 and t2. t1 sends
> > changes in multiple streams and t2 sends all changes in one go at
> > commit time. So, now, if t2 is interleaved between multiple streams
> > then I think the above won't count t2.
> >
> > 3.
> > @@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
> > ReorderBufferTXN *txn)
> >   {
> >   rb->spillCount += 1;
> >   rb->spillBytes += size;
> > + rb->totalBytes += size;
> >
> >   /* don't consider already serialized transactions */
> >   rb->spillTxns += (rbtxn_is_serialized(txn) ||
> > rbtxn_is_serialized_clear(txn)) ? 0 : 1;
> > + rb->totalTxns += (rbtxn_is_serialized(txn) ||
> > rbtxn_is_serialized_clear(txn)) ? 0 : 1;
> >   }
> >
> > We do serialize each subtransaction separately. So totalTxns will
> > include subtransaction count as well when serialized, otherwise not.
> > The description of totalTxns also says that it doesn't include
> > subtransactions. So, I think updating rb->totalTxns here is wrong.
> >
>
> The attached patch should fix the above two comments. I think it
> should be sufficient if we just update the stats after processing the
> TXN. We need to ensure that don't count streamed transactions multiple
> times. I have not tested the attached, can you please review/test it
> and include it in the next set of patches if you agree with this
> change.

Thanks Amit for your Patch. I have merged your changes into my
patchset. I did not find any issues in my testing.
Thoughts?

Regards,
Vignesh
From bb176d2399f4f550ceb0b733ee23d513cd835db9 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v5 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c   | 32 +++
 src/backend/replication/logical/logical.c | 13 ++---
 src/backend/replication/slot.c|  7 -
 src/backend/utils/adt/pgstatfuncs.c   |  2 +-
 src/include/pgstat.h  | 15 ++-
 5 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..1becff09d0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * --
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-	   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-	   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-	   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = tr

Re: Lots of incorrect comments in nodeFuncs.c

2021-04-10 Thread David Rowley
On Fri, 9 Apr 2021 at 23:22, Tom Lane  wrote:
> LGTM.

Thanks. Pushed.

David