date:20210120

Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Pavel Stehule

Hi

so 16. 1. 2021 v 0:09 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > [ plpgsql-window-functions-20210104.patch.gz ]
>
> I spent some time looking at this patch.  It would certainly be
> appealing to have some ability to write custom window functions
> without descending into C; but I'm not very happy about the details.
>
> I'm okay with the idea of having a special variable of a new pseudotype.
> That's not exactly pretty, but it descends directly from how we handle
> the arguments of trigger functions, so at least there's precedent.
> What's bugging me though is the "typedvalue" stuff.  That seems like a
> conceptual mess, a performance loss, and a permanent maintenance time
> sink.  To avoid performance complaints, eventually this hard-wired set
> of conversions would have to bloom to cover every built-in cast, and
> as for extension types, you're just out of luck.
>

I invited typed values with an idea of larger usability. With this type we
can implement dynamic iteration over records better than now, when the
fields of records should be cast to text or json before operation. With
this type I can hold typed value longer time and I can do some like:

DECLARE var typedvalue;

var := fx(..);
IF var IS OF integer THEN
  var_int := CAST(var AS int);
ELSEIF var IS OF date THEN
  var_date := CAST(var AS date);
ELSE
  var_text := CAST(var AS text);
END;

Sometimes (when you process some external data) this late (lazy) cast can
be better and allows you to use typed values. When I read external data,
sometimes I don't know types of these data before reading. I would like to
inject a possibility of more dynamic work with values and variables (but
still cleanly and safely). It should be more safe and faster than now, when
people should use the "text" type.

But I understand and I agree with your objections. Probably a lot of people
will use this type badly.



> One way to avoid that would be to declare the argument-fetching
> functions as polymorphics with a dummy argument that just provides
> the expected result type.  So users would write something like
>
> create function pl_lag(x numeric)
>   ...
>   v := get_input_value_in_partition(windowobject, x, 1, -1,
> 'seek_current', false);
>
> where the argument-fetching function is declared
>
>get_input_value_in_partition(windowobject, anyelement, int, ...)
>returns anyelement
>
> and internally it could verify that the n'th window function argument
> matches the type of its second argument.  While this could be made
> to work, it's kind of unsatisfying because the argument number "1" is
> so obviously redundant with the reference to "x".  Ideally one should
> only have to write "x".  I don't quite see how to make that work,
> but maybe there's a way?
>
> On the whole though, I think your original idea of bespoke plpgsql
> syntax is better, ie let's write something like
>
>GET WINDOW VALUE v := x AT PARTITION CURRENT(-1);
>
> and hide all the mechanism behind that.  The reference to "x" is enough
> to provide the argument number and type, and the window object doesn't
> have to be explicitly visible at all.
>

yes, this syntax looks well.

The second question is work with partition context value. This should be
only one value, and of only one but of any type per function. In this case
we cannot use GET statements. I had an idea of enhancing declaration. Some
like

DECLARE
  pcx PARTITION CONTEXT (int); -- read partition context
BEGIN
  pcx := 10; -- set partition context

What do you think about it?

Regards

Pavel










> Yeah, this will mean that anybody who wants to provide equivalent
> functionality in some other PL will have to do more work.  But it's
> not like it was going to be zero effort for them before.  Furthermore,
> it's not clear to me that other PLs would want to adopt your current
> design anyway.  For example, I bet PL/R would like to somehow make
> window arguments map into vectors on the R side, but there's no chance
> of that with this SQL layer in between.
>
> regards, tom lane
>

[bug?] EXPLAIN outputs 0 for rows and width in cost estimate for update nodes

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

Hello,


While I'm investigating problems with parallel DML on another thread, I 
encountered a fishy behavior of EXPLAIN on HEAD.  Is this a bug?


As follows, the rows and width values of Update node is 0.  These were 1 and 10 
respectively in versions 9.4.26 and 10.12 at hand.


postgres=# create table a (c int);
CREATE TABLE
postgres=# insert into a values(1);
INSERT 0 1
postgres=# analyze a;
ANALYZE
postgres=# begin;
BEGIN
postgres=*# explain analyze update a set c=2;
QUERY PLAN  
--
 Update on a  (cost=0.00..1.01 rows=0 width=0) (actual time=0.189..0.191 rows=0 
loops=1)
   ->  Seq Scan on a  (cost=0.00..1.01 rows=1 width=10) (actual 
time=0.076..0.079 rows=1 loops=1)
 Planning Time: 0.688 ms
 Execution Time: 0.494 ms
(4 rows)


With RETURNING, the values are not 0 as follows.

postgres=*# rollback;
ROLLBACK
postgres=# begin;
BEGIN
postgres=# explain analyze update a set c=2 returning *;
QUERY PLAN  
--
 Update on a  (cost=0.00..1.01 rows=1 width=10) (actual time=0.271..0.278 
rows=1 loops=1)
   ->  Seq Scan on a  (cost=0.00..1.01 rows=1 width=10) (actual 
time=0.080..0.082 rows=1 loops=1)
 Planning Time: 0.308 ms
 Execution Time: 0.392 ms
(4 rows)

The above holds true for Insert and Delete nodes as well.

In the manual, they are not 0.

https://www.postgresql.org/docs/devel/using-explain.html
--
EXPLAIN ANALYZE UPDATE tenk1 SET hundred = hundred + 1 WHERE unique1 < 100;

   QUERY PLAN
----
 Update on tenk1  (cost=5.07..229.46 rows=101 width=250) (actual 
time=14.628..14.628 rows=0 loops=1)
   ->  Bitmap Heap Scan on tenk1  (cost=5.07..229.46 rows=101 width=250) 
(actual time=0.101..0.439 rows=100 loops=1)
...
--


This behavior may possibly be considered as an intended behavior for the reason 
that Update/Insert/Delete nodes don't output rows without RETURNING.  Is this a 
bug or a correct behavior?


Regards
Takayuki Tsunakawa

Re: [bug?] EXPLAIN outputs 0 for rows and width in cost estimate for update nodes

2021-01-20 Thread Thomas Munro

On Wed, Jan 20, 2021 at 9:12 PM tsunakawa.ta...@fujitsu.com
 wrote:
> This behavior may possibly be considered as an intended behavior for the 
> reason that Update/Insert/Delete nodes don't output rows without RETURNING.  
> Is this a bug or a correct behavior?

Hi Tsunakawa-san,

This was a change made deliberately.  Do you see a problem?

commit f0f13a3a08b2757997410f3a1c38bdc22973c525
Author: Thomas Munro 
Date:   Mon Oct 12 20:41:16 2020 +1300

Fix estimates for ModifyTable paths without RETURNING.

In the past, we always estimated that a ModifyTable node would emit the
same number of rows as its subpaths.  Without a RETURNING clause, the
correct estimate is zero.  Fix, in preparation for a proposed parallel
write patch that is sensitive to that number.

A remaining problem is that for RETURNING queries, the estimated width
is based on subpath output rather than the RETURNING tlist.

Reviewed-by: Greg Nancarrow 
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV%3DqpFJr
R3AcrTS3g%40mail.gmail.com

RE: [bug?] EXPLAIN outputs 0 for rows and width in cost estimate for update nodes

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

Hi Thomas-san,

From: Thomas Munro 
> This was a change made deliberately.  Do you see a problem?

Thank you, I was surprised at your very quick response.  I just wanted to 
confirm I can believe EXPLAIN output.  Then the problem is the sample output in 
the manual.  The fix is attached.


Regards
Takayuki Tsunakawa



0001-Fix-sample-output-of-EXPLAIN-ANALYZE.patch
Description: 0001-Fix-sample-output-of-EXPLAIN-ANALYZE.patch

Re: Wrong usage of RelationNeedsWAL

2021-01-20 Thread Kyotaro Horiguchi

At Tue, 19 Jan 2021 01:31:52 -0800, Noah Misch  wrote in 
> On Tue, Jan 19, 2021 at 01:48:31PM +0900, Kyotaro Horiguchi wrote:
> > I understand that you are suggesting that at least
> > TransactionIdLimitedForOldSnapshots should follow not only relation
> > persistence but RelationNeedsWAL, based on the discussion on the
> > check on LSN of TestForOldSnapshot().
> > 
> > I still don't think that LSN in the WAL-skipping-created relfilenode
> > harm.  ALTER TABLE SET TABLESPACE always makes a dead copy of every
> > block (except checksum) including page LSN regardless of wal_level. In
> > the scenario above, the last select at H correctly reads page LSN set
> > by E then copied by G, which is larger than the snapshot LSN at B. So
> > doesn't go the fast-return path before actual check performed by
> > RelationAllowsEarlyPruning.
> 
> I agree the above procedure doesn't have a problem under your patch versions.
> See https://postgr.es/m/20210116043816.ga1644...@rfd.leadboat.com for a
> different test case.  In more detail:
> 
A> setup: CREATE TABLE t AS SELECT ...;
B> xact1: BEGIN; DELETE FROM t;
C> xact2: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
D> xact1: COMMIT;
> (plenty of time passes, rows of t are now eligible for early pruning)
E> xact3: BEGIN;
F> xact3: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
G> xact3: SELECT count(*) FROM t;  -- early pruning w/o WAL, w/o LSN changes

H> xact3: COMMIT;
J> xact2: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"
> 
> I expect that shows no bug for git master, but I expect it does show a bug

I didn't see "snapshot too old" at J with the patch, but at the same
time the SELECT at G didn't prune tuples almost all of the trial. (the
last SELECT returns the correct answer.) I saw it get pruned only once
among many trials but I'm not sure how I could get to the situation
and haven't succeed to reproduce that.

The reason that the SELECT at G doesn't prune is that limited_xmin in
heap_page_prune_opt at G is limited up to the oldest xid among active
sessions. In this case the xmin of the session 2 is the same with the
xid of the session 1. So xmin of the session 3 at G is the same with
the xmin of the session 2, which is the same with the xid of the
session 1.  So pruning doesn't happen. However that is very fragile as
the base for asserting that pruning won't happen.

Anyway, it seems actually dangerous that cause pruning on wal-skipped
relation.

> with your patch versions.  Could you try implementing both test procedures in
> src/test/modules/snapshot_too_old?  There's no need to make the test use
> wal_level=minimal explicitly; it's sufficient to catch these bugs when
> wal_level=minimal is in the TEMP_CONFIG file.

In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
instead of moving the condition on LSN to TestForOldSnapshot_impl for
performance.

I'll add the test part in the next version.

regards.
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5976674e48fdce3c6e911f0ae63485d92941bfd8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 18 Jan 2021 14:47:21 +0900
Subject: [PATCH v4] Do not use RelationNeedsWAL to identify relation
 persistence

RelationNeedsWAL() may return false for a permanent relation when
wal_level=minimal and the relation is created or truncated in the
current transaction. Directly examine relpersistence instead of using
the function to know relation persistence.
---
 src/backend/catalog/pg_publication.c | 2 +-
 src/backend/optimizer/util/plancat.c | 3 ++-
 src/include/storage/bufmgr.h | 8 +++-
 src/include/utils/rel.h  | 2 +-
 src/include/utils/snapmgr.h  | 2 +-
 5 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 5f8e1c64e1..84d2efcfd2 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
  errdetail("System tables cannot be added to publications.")));
 
 	/* UNLOGGED and TEMP relations cannot be part of publication. */
-	if (!RelationNeedsWAL(targetrel))
+	if (targetrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index da322b453e..177e6e336a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	relation = table_open(relationObjectId, NoLock);
 
 	/* Temporary and unlogged relations are inaccessible during recovery. */
-	if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+	if (relation->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT &&
+		RecoveryInProgress())

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Bharath Rupireddy

On Wed, Jan 20, 2021 at 11:53 AM Fujii Masao
 wrote:
> So, furthermore, we can use hash_search() to find the target cached
> connection, instead of using hash_seq_search(), when the server name
> is given. This would simplify the code a bit more? Of course,
> hash_seq_search() is necessary when closing all the connections, though.

Note that the cache entry key is user mapping oid and to use
hash_search() we need the user mapping oid. But in
postgres_fdw_disconnect we can get server oid and we can also get user
mapping id using GetUserMapping, but it requires
GetUserId()/CurrentUserId as an input, I doubt we will have problems
if CurrentUserId is changed somehow with the change of current user in
the session. And user mapping may be dropped but still the connection
can exist if it's in use, in that case GetUserMapping fails in cache
lookup.

And yes, disconnecting all connections requires hash_seq_search().

Keeping above in mind, I feel we can do hash_seq_search(), as we do
currently, even when the server name is given as input. This way, we
don't need to bother much on the above points.

Thoughts?

> + * 2) If no input argument is provided, then it tries to disconnect all 
> the
> + *connections.
>
> I'm concerned that users can easily forget to specify the argument and
> accidentally discard all the connections. So, IMO, to alleviate this 
> situation,
> what about changing the function name (only when closing all the connections)
> to something postgres_fdw_disconnect_all(), like we have
> pg_advisory_unlock_all() against pg_advisory_unlock()?

+1. We will have two functions postgres_fdw_disconnect(server name),
postgres_fdw_disconnect_all.

> +   if (result)
> +   {
> +   /* We closed at least one connection, others 
> are in use. */
> +   ereport(WARNING,
> +   (errmsg("cannot close all 
> connections because some of them are still in use")));
> +   }
>
> Sorry if this was already discussed upthread. Isn't it more helpful to
> emit a warning for every connections that fail to be closed? For example,
>
> WARNING:  cannot close connection for server "loopback1" because it is still 
> in use
> WARNING:  cannot close connection for server "loopback2" because it is still 
> in use
> WARNING:  cannot close connection for server "loopback3" because it is still 
> in use
> ...
>
> This enables us to identify easily which server connections cannot be
> closed for now.

+1. Looks like pg_advisory_unlock is doing that. Given the fact that
still in use connections are possible only in explicit txns, we might
not have many still in use connections in the real world use case, so
I'm okay to change that way.

I will address all these comments and post an updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [bug?] EXPLAIN outputs 0 for rows and width in cost estimate for update nodes

2021-01-20 Thread Laurenz Albe

On Wed, 2021-01-20 at 08:35 +, tsunakawa.ta...@fujitsu.com wrote:
> > This was a change made deliberately.  Do you see a problem?
> 
> Thank you, I was surprised at your very quick response.
> I just wanted to confirm I can believe EXPLAIN output.
> Then the problem is the sample output in the manual.
> The fix is attached.

+1. That was obviously an oversight.

Yours,
Laurenz Albe

Re: Rethinking plpgsql's assignment implementation

2021-01-20 Thread Pavel Stehule

út 19. 1. 2021 v 19:21 odesílatel Pavel Stehule 
napsal:

> Hi
>
> Now, I am testing subscribing on the jsonb feature, and I found one issue,
> that is not supported by parser.
>
> When the target is scalar, then all is ok. But we can have a plpgsql array
> of jsonb values.
>
> postgres=# do $$
> declare j jsonb[];
> begin
>   j[1] = '{"b":"Ahoj"}';
>   raise notice '%', j;
>   raise notice '%', (j[1])['b'];
> end
> $$;
> NOTICE:  {"{\"b\": \"Ahoj\"}"}
> NOTICE:  "Ahoj"
> DO
>
> Parenthesis work well in expressions, but are not supported on the left
> side of assignment.
>
> postgres=# do $$
> declare j jsonb[];
> begin
>   (j[1])['b'] = '"Ahoj"';
>   raise notice '%', j;
>   raise notice '%', j[1]['b'];
> end
> $$;
> ERROR:  syntax error at or near "("
> LINE 4:   (j[1])['b'] = '"Ahoj"';
>   ^
>

Assignment for nesting composite types is working better - although there
is some inconsistency too:

create type t_inner as (x int, y int);
create type t_outer as (a t_inner, b t_inner);

do $$
declare v t_outer;
begin
  v.a.x := 10; -- parenthesis not allowed here, but not required
  raise notice '%', v;
  raise notice '%', (v).a.x; -- parenthesis are required here
end;
$$;

Regards

Pavel


> Regards
>
> Pavel
>
>
>

Re: should INSERT SELECT use a BulkInsertState?

2021-01-20 Thread Justin Pryzby

On Sat, Dec 05, 2020 at 01:59:41PM -0600, Justin Pryzby wrote:
> On Thu, Dec 03, 2020 at 10:59:34AM +0530, Bharath Rupireddy wrote:
> > On Wed, Dec 2, 2020 at 10:24 PM Justin Pryzby  wrote:
> > >
> > > One loose end in this patch is how to check for volatile default 
> > > expressions.
> > 
> > I think we should be doing all the necessary checks in the planner and
> > have a flag in the planned stmt to indicate whether to go with multi
> > insert or not. For the required checks, we can have a look at how the
> > existing COPY decides to go with either CIM_MULTI or CIM_SINGLE.
> 
> Yes, you can see that I've copied the checks from copy.
> Like copy, some checks are done once, in ExecInitModifyTable, outside of the
> ExecModifyTable "loop".
> 
> This squishes some commits together.
> And uses bistate for ON CONFLICT.
> And attempts to use memory context for tuple size.

Rebased on 9dc718bdf2b1a574481a45624d42b674332e2903

I guess my patch should/may be subsumed by this other one - I'm fine with that.
https://commitfest.postgresql.org/31/2871/

Note that my interest here is just in bistate, to avoid leaving behind many
dirty buffers, not improved performance of COPY.

-- 
Justin
>From 65fd9d9352634e4d0ad49685a988c5e4e157b9d8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 8 May 2020 02:17:32 -0500
Subject: [PATCH v9 1/4] INSERT SELECT to use BulkInsertState and multi_insert

Renames structures;
Move MultipleInsert functions from copyfrom.c to (tentatively) nodeModifyTable.h;
Move into MultiInsertInfo: transition_capture and cur_lineno (via cstate->miinfo);

Dynamically switch to multi-insert mode based on the number of insertions.
This is intended to accomodate 1) the original use case of INSERT using a small
ring buffer to avoid leaving behind dirty buffers; and, 2) Automatically using
multi-inserts for batch operations; 3) allow the old behavior of leaving behind
dirty buffers, which might allow INSERT to run more quickly, at the cost of
leaving behind many dirty buffers which other backends may have to write out.

XXX: for (1), the bulk-insert state is used even if not multi-insert, including
for a VALUES.

TODO: use cstate->miinfo.cur_lineno++ instead of mtstate->miinfo->ntuples
---
 src/backend/commands/copyfrom.c  | 394 +--
 src/backend/commands/copyfromparse.c |  10 +-
 src/backend/executor/execMain.c  |   2 +-
 src/backend/executor/execPartition.c |   2 +-
 src/backend/executor/nodeModifyTable.c   | 196 +--
 src/backend/utils/misc/guc.c |  10 +
 src/include/commands/copyfrom_internal.h |   5 +-
 src/include/executor/nodeModifyTable.h   | 371 +
 src/include/nodes/execnodes.h|  16 +-
 src/test/regress/expected/insert.out |  43 +++
 src/test/regress/sql/insert.sql  |  20 ++
 src/tools/pgindent/typedefs.list |   4 +-
 12 files changed, 658 insertions(+), 415 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c39cc736ed..8221a2c5d3 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -46,54 +46,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * No more than this many tuples per CopyMultiInsertBuffer
- *
- * Caution: Don't make this too big, as we could end up with this many
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
- * multiInsertBuffers list.  Increasing this can cause quadratic growth in
- * memory requirements during copies into partitioned tables with a large
- * number of partitions.
- */
-#define MAX_BUFFERED_TUPLES		1000
-
-/*
- * Flush buffers if there are >= this many bytes, as counted by the input
- * size, of tuples stored.
- */
-#define MAX_BUFFERED_BYTES		65535
-
-/* Trim the list of buffers back down to this number after flushing */
-#define MAX_PARTITION_BUFFERS	32
-
-/* Stores multi-insert data related to a single relation in CopyFrom. */
-typedef struct CopyMultiInsertBuffer
-{
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
-	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel */
-	int			nused;			/* number of 'slots' containing tuples */
-	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
- * stream */
-} CopyMultiInsertBuffer;
-
-/*
- * Stores one or many CopyMultiInsertBuffers and details about the size and
- * number of tuples which are stored in them.  This allows multiple buffers to
- * exist at once when COPYing into a partitioned table.
- */
-typedef struct CopyMultiInsertInfo
-{
-	List	   *multiInsertBuffers; /* List of tracked CopyMultiInsertBuffers */
-	int			bufferedTuples; /* number of tuples buffered over all buffers */
-	int			bufferedBytes;	/* number of bytes from all buffered tuples */
-	CopyFromState	cstate;			/* Copy state for this CopyMultiInsertInfo */
-	EState	   *estate;			/* Executor state used for COPY */

Re: pg_stat_statements oddity with track = all

2021-01-20 Thread Julien Rouhaud

Hi,

On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda  wrote:
>
> Thanks for making the patch to add "toplevel" column in
> pg_stat_statements.
> This is a review comment.

Thanks a lot for the thorough review!

> I tested the "update" command can work.
> postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';
>
> Although the "toplevel" column of all queries which already stored is
> 'false',
> we have to decide the default. I think 'false' is ok.

Mmm, I'm not sure that I understand this result.  The "toplevel" value
is tracked by the C code loaded at startup, so it should have a
correct value even if you used to have the extension in a previous
version, it's just that you can't access the toplevel field before
doing the ALTER EXTENSION pg_stat_statements UPDATE.  There's also a
change in the magic number, so you can't use the stored stat file from
a version before this patch.

I also fail to reproduce this behavior.  I did the following:

- start postgres with pg_stat_statements v1.10 (so with this patch) in
shared_preload_libraries
- CREATE EXTENSION pg_stat_statements VERSION '1.9';
- execute a few queries
- ALTER EXTENSION pg_stat_statements UPDATE;
- SELECT * FROM pg_stat_statements reports the query with toplvel to TRUE

Can you share a way to reproduce your findings?

> 2. usability review
> 
> [...]
> Although one concern is whether only top-level is enough or not,
> I couldn't come up with any use-case to use nested level, so I think
> it's ok.

I agree, I don't see how tracking statistics per nesting level would
really help.  The only additional use case I see would tracking
triggers/FK query execution, but the nesting level won't help with
that.

> 3. coding review
> =
> [...]
> B. add a argument of the pg_stat_statements_reset().
>
> Now, pg_stat_statements supports a flexible reset feature.
> > pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)
>
> Although I wondered whether we need to add a top-level flag to the
> arguments,
> I couldn't come up with any use-case to reset only top-level queries or
> not top-level queries.

Ah, I didn't think of the reset function.  I also fail to see a
reasonable use case to provide a switch for the reset function.

> 4. others
> ==
>
> These comments are not related to this patch.
>
> A. about the topic of commitfests
>
> Since I think this feature is for monitoring,
> it's better to change the topic from "System Administration"
> to "Monitoring & Control".

I agree, thanks for the change!

> B. check logic whether a query is top-level or not in pg_stat_kcache
>
> This patch uses the only exec_nested_level to check whether a query is
> top-level or not.
> The reason is nested_level is less useful and I agree.

Do you mean plan_nest_level is less useful?

> But, pg_stat_kcache uses plan_nested_level too.
> Although the check result is the same, it's better to change it
> corresponding to this patch after it's committed.

I agree, we should be consistent here.  I'll take care of the needed
changes  if and when this patch is commited!

Re: [bug?] EXPLAIN outputs 0 for rows and width in cost estimate for update nodes

2021-01-20 Thread Thomas Munro

On Wed, Jan 20, 2021 at 9:42 PM Laurenz Albe  wrote:
> On Wed, 2021-01-20 at 08:35 +, tsunakawa.ta...@fujitsu.com wrote:
> > > This was a change made deliberately.  Do you see a problem?
> >
> > Thank you, I was surprised at your very quick response.
> > I just wanted to confirm I can believe EXPLAIN output.
> > Then the problem is the sample output in the manual.
> > The fix is attached.
>
> +1. That was obviously an oversight.

Pushed.  Thanks.

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Fujii Masao





On 2021/01/20 17:41, Bharath Rupireddy wrote:

On Wed, Jan 20, 2021 at 11:53 AM Fujii Masao
 wrote:

So, furthermore, we can use hash_search() to find the target cached
connection, instead of using hash_seq_search(), when the server name
is given. This would simplify the code a bit more? Of course,
hash_seq_search() is necessary when closing all the connections, though.


Note that the cache entry key is user mapping oid and to use
hash_search() we need the user mapping oid. But in
postgres_fdw_disconnect we can get server oid and we can also get user
mapping id using GetUserMapping, but it requires
GetUserId()/CurrentUserId as an input, I doubt we will have problems
if CurrentUserId is changed somehow with the change of current user in
the session. And user mapping may be dropped but still the connection
can exist if it's in use, in that case GetUserMapping fails in cache
lookup.

And yes, disconnecting all connections requires hash_seq_search().

Keeping above in mind, I feel we can do hash_seq_search(), as we do
currently, even when the server name is given as input. This way, we
don't need to bother much on the above points.

Thoughts?


Thanks for explaining this! You're right. I'd withdraw my suggestion.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Greg Nancarrow

On Fri, Jan 8, 2021 at 8:16 PM Amit Kapila  wrote:
>
>
> - if (pcxt->nworkers_launched > 0)
> + if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader &&
> !isParallelModifyWithReturning))
>   {
>
> I think this check could be simplified to if (pcxt->nworkers_launched
> > 0 && isParallelModifyWithReturning) or something like that.
>

Not quite. The existing check is correct, because it needs to account
for existing Parallel SELECT functionality (not just new Parallel
INSERT).
But I will re-write the test as an equivalent expression, so it's
hopefully more readable (taking into account Antonin's suggested
variable-name changes):

if (pcxt->nworkers_launched > 0 && (!isModify || isModifyWithReturning))

> >
> > @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> > PlannerGlobal *glob = root->glob;
> > int rtoffset = list_length(glob->finalrtable);
> > ListCell   *lc;
> > +   Plan   *finalPlan;
> >
> > /*
> >  * Add all the query's RTEs to the flattened rangetable.  The live 
> > ones
> > @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> > }
> >
> > /* Now fix the Plan tree */
> > -   return set_plan_refs(root, plan, rtoffset);
> > +   finalPlan = set_plan_refs(root, plan, rtoffset);
> > +   if (finalPlan != NULL && IsA(finalPlan, Gather))
> > +   {
> > +   Plan   *subplan = outerPlan(finalPlan);
> > +
> > +   if (IsA(subplan, ModifyTable) && castNode(ModifyTable, 
> > subplan)->returningLists != NULL)
> > +   {
> > +   finalPlan->targetlist = 
> > copyObject(subplan->targetlist);
> > +   }
> > +   }
> > +   return finalPlan;
> >  }
> >
> > I'm not sure if the problem of missing targetlist should be handled here 
> > (BTW,
> > NIL is the constant for an empty list, not NULL). Obviously this is a
> > consequence of the fact that the ModifyTable node has no regular targetlist.
> >
>
> I think it is better to add comments along with this change. In this
> form, this looks quite hacky to me.
>

The targetlist on the ModifyTable node has been setup correctly, but
it hasn't been propagated to the Gather node.
Of course, I have previously tried to elegantly fix this, but struck
various problems, using different approaches.
Perhaps this code could just be moved into set_plan_refs().
For the moment, I'll just keep the current code, but I'll add a FIXME
comment for this.
I'll investigate Antonin's suggestions as a lower-priority side-task.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: a misbehavior of partition row movement (?)

2021-01-20 Thread Amit Langote

On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut
 wrote:
> On 2021-01-08 09:54, Amit Langote wrote:
> >>> I don't quite recall if the decision to implement it like this was
> >>> based on assuming that this is what users would like to see happen in
> >>> this case or the perceived difficulty of implementing it the other way
> >>> around, that is, of firing AFTER UPDATE triggers in this case.
> >> I tried to look that up, but I couldn't find any discussion about this. Do 
> >> you have any ideas in which thread that was handled?
> > It was discussed here:
> >
> > https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com
> >
> > It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
> > relevant emails.  You might notice that the developers who
> > participated in that discussion gave various opinions and what we have
> > today got there as a result of a majority of them voting for the
> > current approach.  Someone also said this during the discussion:
> > "Regarding the trigger issue, I can't claim to have a terribly strong
> > opinion on this. I think that practically anything we do here might
> > upset somebody, but probably any halfway-reasonable thing we choose to
> > do will be OK for most people." So what we've got is that
> > "halfway-reasonable" thing, YMMV. :)
>
> Could you summarize here what you are trying to do with respect to what
> was decided before?  I'm a bit confused, looking through the patches you
> have posted.  The first patch you posted hard-coded FK trigger OIDs
> specifically, other patches talk about foreign key triggers in general
> or special case internal triggers or talk about all triggers.

The original problem statement is this: the way we generally fire
row-level triggers of a partitioned table can lead to some unexpected
behaviors of the foreign keys pointing to that partitioned table
during its cross-partition updates.

Let's start with an example that shows how triggers are fired during a
cross-partition update:

create table p (a numeric primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig1 before update on p for each row execute function
report_trig_details();
create trigger trig2 after update on p for each row execute function
report_trig_details();
create trigger trig3 before delete on p for each row execute function
report_trig_details();
create trigger trig4 after delete on p for each row execute function
report_trig_details();
create trigger trig5 before insert on p for each row execute function
report_trig_details();
create trigger trig6 after insert on p for each row execute function
report_trig_details();

insert into p values (1);
update p set a = 2;
NOTICE:  BEFORE UPDATE on p1
NOTICE:  BEFORE DELETE on p1
NOTICE:  BEFORE INSERT on p2
NOTICE:  AFTER DELETE on p1
NOTICE:  AFTER INSERT on p2
UPDATE 1

(AR update triggers are not fired.)

For contrast, here is an intra-partition update:

update p set a = a;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  AFTER UPDATE on p2
UPDATE 1

Now, the trigger machinery makes no distinction between user-defined
and internal triggers, which has implications for the foreign key
enforcing triggers on partitions.  Consider the following example:

create table q (a bigint references p);
insert into q values (2);
update p set a = 1;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  BEFORE DELETE on p2
NOTICE:  BEFORE INSERT on p1
ERROR:  update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL:  Key (a)=(2) is still referenced from table "q".

So the RI delete trigger (NOT update) on p2 prevents the DELETE that
occurs as part of the row movement.  One might make the updates
cascade and expect that to prevent the error:

drop table q;
create table q (a bigint references p on update cascade);
insert into q values (2);
update p set a = 1;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  BEFORE DELETE on p2
NOTICE:  BEFORE INSERT on p1
ERROR:  update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL:  Key (a)=(2) is still referenced from table "q".

No luck, because again it's the RI delete trigger on p2 that gets
fired.  If you make deletes cascade too, an even worse thing happens:

drop table q;
create table q (a bigint references p on update cascade on delete cascade);
insert into q values (2);
update p set a = 1;
NOTICE:  BEFORE UPDATE on p2
NOTICE:  BEFORE DELETE on p2
NOTICE:  BEFORE INSERT on p1
NOTICE:  AFTER DELETE on p2
NOTICE:  AFTER INSERT on p1
UPDATE 1
select * from q;
 a
---
(0 rows)

The RI delete trigger deleted 2 from q, whereas the expected result
would've been for q to be updated to change 2

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Bharath Rupireddy

On Wed, Jan 20, 2021 at 3:24 PM Fujii Masao  wrote:
> > Keeping above in mind, I feel we can do hash_seq_search(), as we do
> > currently, even when the server name is given as input. This way, we
> > don't need to bother much on the above points.
> >
> > Thoughts?
>
> Thanks for explaining this! You're right. I'd withdraw my suggestion.

Attaching v14 patch set with review comments addressed. Please review
it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v14-0001-postgres_fdw-function-to-discard-cached-connecti.patch
Description: Binary data


v14-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data


v14-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data

Re: pg_stat_statements oddity with track = all

2021-01-20 Thread Masahiro Ikeda


On 2021-01-20 18:14, Julien Rouhaud wrote:
On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda 
 wrote:

I tested the "update" command can work.
postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';

Although the "toplevel" column of all queries which already stored is
'false',
we have to decide the default. I think 'false' is ok.


Mmm, I'm not sure that I understand this result.  The "toplevel" value
is tracked by the C code loaded at startup, so it should have a
correct value even if you used to have the extension in a previous
version, it's just that you can't access the toplevel field before
doing the ALTER EXTENSION pg_stat_statements UPDATE.  There's also a
change in the magic number, so you can't use the stored stat file from
a version before this patch.

I also fail to reproduce this behavior.  I did the following:

- start postgres with pg_stat_statements v1.10 (so with this patch) in
shared_preload_libraries
- CREATE EXTENSION pg_stat_statements VERSION '1.9';
- execute a few queries
- ALTER EXTENSION pg_stat_statements UPDATE;
- SELECT * FROM pg_stat_statements reports the query with toplvel to 
TRUE


Can you share a way to reproduce your findings?


Sorry for making you confused. I can't reproduce although I tried now.
I think my procedure was something wrong. So, please ignore this 
comment, sorry about that.




B. check logic whether a query is top-level or not in pg_stat_kcache

This patch uses the only exec_nested_level to check whether a query is
top-level or not.
The reason is nested_level is less useful and I agree.


Do you mean plan_nest_level is less useful?


I think so. Anyway, it's important to correspond core's implementation
because pg_stat_statements and pg_stat_kcache are used at the same time.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Michael Paquier

On Mon, Jan 18, 2021 at 02:37:57AM -0600, Justin Pryzby wrote:
> Attached.  I will re-review these myself tomorrow.

I have begun looking at 0001 and 0002...

+/*
+ * This is mostly duplicating ATExecSetTableSpaceNoStorage,
+ * which should maybe be factored out to a library function.
+ */
Wouldn't it be better to do first the refactoring of 0002 and then
0001 so as REINDEX can use the new routine, instead of putting that
into a comment?

+  This specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, 
then
+  all unsuitable relations will be skipped and a single 
WARNING
+  will be generated.
What is an unsuitable relation?  How can the end user know that?

This is missing ACL checks when moving the index into a new location,
so this requires some pg_tablespace_aclcheck() calls, and the other
patches share the same issue.

+   else if (partkind == RELKIND_PARTITIONED_TABLE)
+   {
+   Relation rel = table_open(partoid, ShareLock);
+   List*indexIds = RelationGetIndexList(rel);
+   ListCell *lc;
+
+   table_close(rel, NoLock);
+   foreach (lc, indexIds)
+   {
+   Oid indexid = lfirst_oid(lc);
+   (void) set_rel_tablespace(indexid, params->tablespaceOid);
+   }
+   }
This is really a good question.  ReindexPartitions() would trigger one
transaction per leaf to work on.  Changing the tablespace of the
partitioned table(s) before doing any work has the advantage to tell
any new partition to use the new tablespace.  Now, I see a struggling
point here: what should we do if the processing fails in the middle of
the move, leaving a portion of the leaves in the previous tablespace?
On a follow-up reindex with the same command, should the command force
a reindex even on the partitions that have been moved?  Or could there
be a point in skipping the partitions that are already on the new
tablespace and only process the ones on the previous tablespace?  It
seems to me that the first scenario makes the most sense as currently
a REINDEX works on all the relations defined, though there could be
use cases for the second case.  This should be documented, I think.

There are no tests for partitioned tables, aka we'd want to make sure
that the new partitioned index is on the correct tablespace, as well
as all its leaves.  It may be better to have at least two levels of
partitioned tables, as well as a partitioned table with no leaves in
the cases dealt with.

+*
+* Even if a table's indexes were moved to a new tablespace, the index
+* on its toast table is not normally moved.
 */
Still, REINDEX (TABLESPACE) TABLE should move all of them to be
consistent with ALTER TABLE SET TABLESPACE, but that's not the case
with this code, no?  This requires proper test coverage, but there is
nothing of the kind in this patch.
--
Michael


signature.asc
Description: PGP signature

Re: Stronger safeguard for archive recovery not to miss data

2021-01-20 Thread Laurenz Albe

On Wed, 2021-01-20 at 13:10 +0900, Fujii Masao wrote:
> +errhint("Run recovery again from a new base 
> backup taken after setting wal_level higher than minimal")));
> 
> Isn't it impossible to do this in normal archive recovery case? In that case,
> since the server crashed and the database got corrupted, probably
> we cannot take a new base backup.
> 
> Originally even when users accidentally set wal_level to minimal, they could
> start the server from the backup by disabling hot_standby and salvage the 
> data.
> But with the patch, we lost the way to do that. Right? I was wondering that
> WARNING was used intentionally there for that case.

I would argue that if you set your "wal_level" to minimal, do some work,
reset it to "replica" and recover past that, two things could happen:

1. Since "archive_mode" was off, you are missing some WAL segments and
   cannot recover past that point anyway.

2. You are missing some relevant WAL records, and your recovered
   database is corrupted.  This is very likely, because you probably
   switched to "minimal" with the intention to generate less WAL.

Now I see your point that a corrupted database may be better than no
database at all, but wouldn't you agree that a warning in the log
(that nobody reads) is too little information?

Normally we don't take such a relaxed attitude towards data corruption.

Perhaps there could be a GUC "recovery_allow_data_corruption" to
override this check, but I'd say that a warning is too little.

> if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
> ereport(ERROR,
> (errmsg("hot standby is not possible 
> because wal_level was not set to \"replica\" or higher on the primary 
> server"),
>  errhint("Either set wal_level to 
> \"replica\" on the primary, or turn off hot_standby here.")));
> 
> With the patch, we never reach the above code?

Right, that would have to go.  I didn't notice that.

Yours,
Laurenz Albe

Re: TOAST condition for column size

2021-01-20 Thread torikoshia


On 2021-01-19 19:32, Amit Kapila wrote:

On Mon, Jan 18, 2021 at 7:53 PM torikoshia
Because no benefit is to be expected by compressing it. The size will
be mostly the same. Also, even if we somehow try to fit this data via
toast, I think reading speed will be slower because for all such
columns an extra fetch from toast would be required. Another thing is
you or others can still face the same problem with 17-byte column
data. I don't this is the right way to fix it. I don't have many good
ideas but I think you can try by (a) increasing block size during
configure, (b) reduce the number of columns, (c) create char columns
of somewhat bigger size say greater than 24 bytes to accommodate your
case.

I know none of these are good workarounds but at this moment I can't
think of better alternatives.


Thanks for your explanation and workarounds!



On 2021-01-20 00:40, Tom Lane wrote:

Dilip Kumar  writes:
On Tue, 19 Jan 2021 at 6:28 PM, Amit Kapila  
wrote:

Won't it be safe because we don't align individual attrs of type
varchar where length is less than equal to 127?



Yeah right,  I just missed that point.


Yeah, the minimum on biggest_size has nothing to do with alignment
decisions.  It's just a filter to decide whether it's worth trying
to toast anything.
Having said that, I'm pretty skeptical of this patch: I think its
most likely real-world effect is going to be to waste cycles (and
create TOAST-table bloat) on the way to failing anyway.  I do not
think that toasting a 20-byte field down to 18 bytes is likely to be
a productive thing to do in typical situations.  The given example
looks like a cherry-picked edge case rather than a useful case to
worry about.


I agree with you, it seems only work when there are many columns with
19 ~ 23 bytes of data and it's not a normal case.
I'm not sure, but a rare exception might be some geographic data.
That's the situation I heard that problem happened.


Regards,

--
Atsushi Torikoshi

Re: Added schema level support for publication.

2021-01-20 Thread Rahila Syed

Hi Vignesh,


> I have handled the above scenario(drop schema should automatically
> remove the schema entry from publication schema relation) & addition
> of tests in the new v2 patch attached.
> Thoughts?
>

Please see some initial comments:

1. I think there should be more tests to show that the schema data is
actually replicated
to the subscriber.  Currently, I am not seeing the data being replicated
when I use FOR SCHEMA.

2. How does replication behave when a table is added or removed from a
subscribed schema
using ALTER TABLE SET SCHEMA?

3. Can we have a default schema like a public or current schema that gets
replicated in case the user didn't
specify one, this can be handy to replicate current schema tables.

4. +   The fourth, fifth and sixth variants change which schemas are part
of the
+   publication.  The SET TABLE clause will replace the
list
+   of schemas in the publication with the specified one.  The ADD

There is a typo above s/SET TABLE/SET SCHEMA

Thank you,
Rahila Syed

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Fujii Masao





On 2021/01/20 19:17, Bharath Rupireddy wrote:

On Wed, Jan 20, 2021 at 3:24 PM Fujii Masao  wrote:

Keeping above in mind, I feel we can do hash_seq_search(), as we do
currently, even when the server name is given as input. This way, we
don't need to bother much on the above points.

Thoughts?


Thanks for explaining this! You're right. I'd withdraw my suggestion.


Attaching v14 patch set with review comments addressed. Please review
it further.


Thanks for updating the patch!

+ * It checks if the cache has a connection for the given foreign server that's
+ * not being used within current transaction, then returns true. If the
+ * connection is in use, then it emits a warning and returns false.

The comment also should mention the case where no open connection
for the given server is found? What about rewriting this to the following?

-
If the cached connection for the given foreign server is found and has not
been used within current transaction yet, close the connection and return
true. Even when it's found, if it's already used, keep the connection, emit
a warning and return false. If it's not found, return false.
-

+ * It returns true, if it closes at least one connection, otherwise false.
+ *
+ * It returns false, if the cache doesn't exit.

The above second comment looks redundant.

+   if (ConnectionHash)
+   result = disconnect_cached_connections(0, true);

Isn't it smarter to make disconnect_cached_connections() check
ConnectionHash and return false if it's NULL? If we do that, we can
simplify the code of postgres_fdw_disconnect() and _all().

+ * current transaction are disconnected. Otherwise, the unused entries with the
+ * given hashvalue are disconnected.

In the above second comment, a singular form should be used instead?
Because there must be no multiple entries with the given hashvalue.

+   server = GetForeignServer(entry->serverid);

This seems to cause an error "cache lookup failed"
if postgres_fdw_disconnect_all() is called when there is
a connection in use but its server is dropped. To avoid this error,
GetForeignServerExtended() with FSV_MISSING_OK should be used
instead, like postgres_fdw_get_connections() does?

+   if (entry->server_hashvalue == hashvalue &&
+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term(&scan);
+   break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?

+-- Closes loopback connection, returns true and issues a warning as loopback2
+-- connection is still in use and can not be closed.
+SELECT * FROM postgres_fdw_disconnect_all();
+WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+ postgres_fdw_disconnect_all
+-
+ t
+(1 row)

After the above test, isn't it better to call postgres_fdw_get_connections()
to check that loopback is not output?

+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close connection for server "loopback2" because it is still 
in use

Just in the case please let me confirm that the order of these warning
messages is always stable?

+   
+postgres_fdw_disconnect(IN servername text) returns 
boolean

I think that "IN" of "IN servername text" is not necessary.

I'd like to replace "servername" with "server_name" because
postgres_fdw_get_connections() uses "server_name" as the output
column name.

+
+ 
+  When called in local session with foreign server name as input, it
+  discards the unused open connection previously made to the foreign server
+  and returns true.

"unused open connection" sounds confusing to me. What about the following?

-
This function discards the open connection that postgres_fdw established
from the local session to the foriegn server with the given name if it's not
used in the current local transaction yet, and then returns true. If it's
already used, the function doesn't discard the connection, emits
a warning and then returns false. If there is no open connection to
the given foreign server, false is returned. If no foreign server with
the given name is found, an error is emitted. Example usage of the function:
-

+postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');

"SELECT postgres_fdw_disconnect('loopback1')" is more common?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-20 Thread Amit Kapila

On Wed, Jan 20, 2021 at 10:58 AM Peter Geoghegan  wrote:
>
> On Tue, Jan 19, 2021 at 7:54 PM Amit Kapila  wrote:
> > The worst cases could be (a) when there is just one such duplicate
> > (indexval logically unchanged) on the page and that happens to be the
> > last item and others are new insertions, (b) same as (a) and along
> > with it lets say there is an open transaction due to which we can't
> > remove even that duplicate version. Have we checked the overhead or
> > results by simulating such workloads?
>
> There is no such thing as a workload that has page splits caused by
> non-HOT updaters, but almost no actual version churn from the same
> non-HOT updaters. It's possible that a small number of individual page
> splits will work out like that, of course, but they'll be extremely
> rare, and impossible to see in any kind of consistent way.
>
> That just leaves long running transactions. Of course it's true that
> eventually a long-running transaction will make it impossible to
> perform any cleanup, for the usual reasons. And at that point this
> mechanism is bound to fail (which costs additional cycles -- the
> wasted access to a single heap page, some CPU cycles). But it's still
> a bargain to try. Even with a long running transactions there will be
> a great many bottom-up deletion passes that still succeed earlier on
> (because at least some of the dups are deletable, and we can still
> delete those that became garbage right before the long running
> snapshot was acquired).
>

How many of the dups are deletable till there is an open long-running
transaction in the system before the transaction that has performed an
update? I tried a simple test to check this.

create table t1(c1 int, c2 int, c3 char(10));
create index idx_t1 on t1(c1);
create index idx_t2 on t1(c2);

insert into t1 values(generate_series(1,5000),1,'aa');
update t1 set c2 = 2;

The update will try to remove the tuples via bottom-up cleanup
mechanism for index 'idx_t1' and won't be able to remove any tuple
because the duplicates are from the same transaction.

> Victor independently came up with a benchmark that ran over several
> hours, with cleanup consistently held back by ~5 minutes by a long
> running transaction:
>

AFAICS, the long-running transaction used in the test is below:
SELECT abalance, pg_sleep(300) FROM pgbench_accounts WHERE mtime >
now() - INTERVAL '15min' ORDER BY aid LIMIT 1;

This shouldn't generate a transaction id so would it be sufficient to
hold back the clean-up?


> https://www.postgresql.org/message-id/cagnebogatzs1mwmvx8fzzhmxzudecb10anvwwhctxtibpg3...@mail.gmail.com
>
> This was actually one of the most favorable cases of all for the patch
> -- the patch prevented logically unchanged indexes from growing (this
> is a mix of inserts, updates, and deletes, not just updates, so it was
> less than perfect -- we did see the indexes grow by a half of one
> percent). Whereas without the patch each of the same 3 indexes grew by
> 30% - 60%.
>
> So yes, I did think about long running transactions, and no, the
> possibility of wasting one heap block access here and there when the
> database is melting down anyway doesn't seem like a big deal to me.
>

First, it is not clear to me if that has properly simulated the
long-running test but even if it is what I intend to say was to have
an open long-running transaction possibly for the entire duration of
the test? If we do that, we will come to know if there is any overhead
and if so how much?


> > I feel unlike LP_DEAD optimization this new bottom-up scheme can cost
> > us extra CPU and I/O because there seems to be not much consideration
> > given to the fact that we might not be able to delete any item (or
> > very few) due to long-standing open transactions except that we limit
> > ourselves when we are not able to remove even one tuple from any
> > particular heap page.
>
> There was plenty of consideration given to that. It was literally
> central to the design, and something I poured over at length. Why
> don't you go read some of that now? Or, why don't you demonstrate an
> actual regression using a tool like pgbench?
>
> I do not appreciate being accused of having acted carelessly. You
> don't have a single shred of evidence.
>

I think you have done a good job and I am just trying to see if there
are any loose ends which we can tighten-up. Anyway, here are results
from some simple performance tests:

Test with 2 un-modified indexes
===
create table t1(c1 int, c2 int, c3 int, c4 char(10));
create index idx_t1 on t1(c1);
create index idx_t2 on t1(c2);
create index idx_t3 on t1(c3);

insert into t1 values(generate_series(1,500), 1, 10, 'aa');
update t1 set c2 = 2;

Without nbree mod (without commit d168b66682)
===
postgres=# update t1 set c2 = 2;
UPDATE 500
Time: 46533.530 ms (00:46.534)

With HEAD
==
postgres=# update t1 set c2 = 2;
UPDATE 500
Time: 525

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-20 Thread Amit Kapila

On Wed, Jan 20, 2021 at 7:03 PM Amit Kapila  wrote:
>
> On Wed, Jan 20, 2021 at 10:58 AM Peter Geoghegan  wrote:
> >
> > On Tue, Jan 19, 2021 at 7:54 PM Amit Kapila  wrote:
> > > The worst cases could be (a) when there is just one such duplicate
> > > (indexval logically unchanged) on the page and that happens to be the
> > > last item and others are new insertions, (b) same as (a) and along
> > > with it lets say there is an open transaction due to which we can't
> > > remove even that duplicate version. Have we checked the overhead or
> > > results by simulating such workloads?
> >
> > There is no such thing as a workload that has page splits caused by
> > non-HOT updaters, but almost no actual version churn from the same
> > non-HOT updaters. It's possible that a small number of individual page
> > splits will work out like that, of course, but they'll be extremely
> > rare, and impossible to see in any kind of consistent way.
> >
> > That just leaves long running transactions. Of course it's true that
> > eventually a long-running transaction will make it impossible to
> > perform any cleanup, for the usual reasons. And at that point this
> > mechanism is bound to fail (which costs additional cycles -- the
> > wasted access to a single heap page, some CPU cycles). But it's still
> > a bargain to try. Even with a long running transactions there will be
> > a great many bottom-up deletion passes that still succeed earlier on
> > (because at least some of the dups are deletable, and we can still
> > delete those that became garbage right before the long running
> > snapshot was acquired).
> >
>
> How many ...
>

Typo. /many/any

-- 
With Regards,
Amit Kapila.

Re: Printing backtrace of postgres processes

2021-01-20 Thread vignesh C

On Wed, Jan 20, 2021 at 2:52 AM Tom Lane  wrote:
>
> Robert Haas  writes:
> > On Tue, Jan 19, 2021 at 12:50 PM Tom Lane  wrote:
> >> I think it's got security hazards as well.  If we restricted the
> >> feature to cause a trace of only one process at a time, and required
> >> that process to be logged in as the same database user as the
> >> requestor (or at least someone with the privs of that user, such
> >> as a superuser), that'd be less of an issue.
>
> > I am not sure I see a security hazard here. I think the checks for
> > this should have the same structure as pg_terminate_backend() and
> > pg_cancel_backend(); whatever is required there should be required
> > here, too, but not more, unless we have a real clear reason for such
> > an inconsistency.
>
> Yeah, agreed.  The "broadcast" option seems inconsistent with doing
> things that way, but I don't have a problem with being able to send
> a trace signal to the same processes you could terminate.
>

The current patch supports both getting the trace for all the
processes of that instance and getting the trace for a particular
process, I'm understanding the concern here with broadcasting to all
the connected backends, I will remove the functionality to get trace
for all the processes. And I will also change the trace of a single
process in such a way that the user can get the trace of only the
processes which that user could terminate.

> >> I share your estimate that there'll be small-fraction-of-a-percent
> >> hazards that could still add up to dangerous instability if people
> >> go wild with this.
>
> > Right. I was more concerned about whether we could, for example, crash
> > while inside the function that generates the backtrace, on some
> > platforms or in some scenarios. That would be super-sad. I assume that
> > the people who wrote the code tried to make sure that didn't happen
> > but I don't really know what's reasonable to expect.
>
> Recursion is scary, but it should (I think) not be possible if this
> is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
> of those in libbacktrace.
>
> One point here is that it might be a good idea to suppress elog.c's
> calls to functions in the error context stack.  As we saw in another
> connection recently, allowing that to happen makes for a *very*
> large increase in the footprint of code that you are expecting to
> work at any random CHECK_FOR_INTERRUPTS call site.
>
> BTW, it also looks like the patch is doing nothing to prevent the
> backtrace from being sent to the connected client.  I'm not sure
> what I think about whether it'd be okay from a security standpoint
> to do that on the connection that requested the trace, but I sure
> as heck don't want it to happen on connections that didn't.  Also,
> whatever you think about security concerns, it seems like there'd be
> pretty substantial reentrancy hazards if the interrupt occurs
> anywhere in code dealing with normal client communication.
>
> Maybe, given all of these things, we should forget using elog at
> all and just emit the trace with fprintf(stderr).  That seems like
> it would decrease the odds of trouble by about an order of magnitude.
> It would make it more painful to collect the trace in some logging
> setups, of course.

I would prefer if the trace appears in the log file rather on the
console, as you rightly pointed out it will be difficult to collect
the trace of fprint(stderr). Let me know if I misunderstood. I think
you are concerned about the problem where elog logs the trace to the
client also. Can we use LOG_SERVER_ONLY log level that would prevent
it from logging to the client. That way we can keep the elog
implementation itself.

Thoughts?


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-20 Thread Amit Kapila

On Wed, Jan 20, 2021 at 10:50 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-01-20 09:24:35 +0530, Amit Kapila wrote:
> > I feel extending the deletion mechanism based on the number of LP_DEAD
> > items sounds more favorable than giving preference to duplicate
> > items. Sure, it will give equally good or better results if there are
> > no long-standing open transactions.
>
> There's a lot of workloads that never set LP_DEAD because all scans are
> bitmap index scans. And there's no obvious way to address that. So I
> don't think it's wise to purely rely on LP_DEAD.
>

Right, I understand this point. The point I was trying to make was
that in this new technique we might not be able to delete any tuple
(or maybe very few) if there are long-running open transactions in the
system and still incur a CPU and I/O cost. I am completely in favor of
this technique and patch, so don't get me wrong. As mentioned in my
reply to Peter, I am just trying to see if there are more ways we can
use this optimization and reduce the chances of regression (if there
is any).

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Amit Kapila

On Wed, Jan 20, 2021 at 3:27 PM Greg Nancarrow  wrote:
>
> On Fri, Jan 8, 2021 at 8:16 PM Amit Kapila  wrote:
> > >
> > > I'm not sure if the problem of missing targetlist should be handled here 
> > > (BTW,
> > > NIL is the constant for an empty list, not NULL). Obviously this is a
> > > consequence of the fact that the ModifyTable node has no regular 
> > > targetlist.
> > >
> >
> > I think it is better to add comments along with this change. In this
> > form, this looks quite hacky to me.
> >
>
> The targetlist on the ModifyTable node has been setup correctly, but
> it hasn't been propagated to the Gather node.
> Of course, I have previously tried to elegantly fix this, but struck
> various problems, using different approaches.
> Perhaps this code could just be moved into set_plan_refs().
> For the moment, I'll just keep the current code, but I'll add a FIXME
> comment for this.
> I'll investigate Antonin's suggestions as a lower-priority side-task.
>

+1, that sounds like a good approach because anyway this has to be
dealt for the second patch and at this point, we should make the first
patch ready.


-- 
With Regards,
Amit Kapila.

Re: Discarding DISCARD ALL

2021-01-20 Thread James Coleman

I hope to do further review of the patch later this week, but I wanted
to at least comment on this piece:

On Wed, Jan 20, 2021 at 2:48 AM Peter Eisentraut
 wrote:
>
> On 2020-12-23 15:33, Simon Riggs wrote:
> > Poolers such as pgbouncer would then be able to connect transaction
> > mode pools by setting transaction_cleanup=on at time of connection,
> > avoiding any need to issue a server_reset_query, removing the DISCARD
> > ALL command from the normal execution path, while still achieving the
> > same thing.
>
> PgBouncer does not send DISCARD ALL in transaction mode.  There is a
> separate setting to do that, but it's not the default, and it's more of
> a workaround for bad client code.  So I don't know if this feature would
> be of much use for PgBouncer.  Other connection poolers might have other
> opinions.

Yes, to have server_reset_query apply in transaction pooling mode you
have to additionally configure pgbouncer with
server_reset_query_always enabled.

I'd mildly take issue with "a workaround for bad client code". Yes,
clients in transaction pooling mode shouldn't issue (for example) `SET
...`, but there's no way I'm aware of in Postgres to prevent
session-specific items like those GUCs from being set by a given user,
so I view it more like a safeguard than a workaround.

In our setup we have server_reset_query_always=1 as such a safeguard,
because it's too easy for application code to update, for example,
statement_timeout to disastrous results. But we also work to make sure
those don't happen (or get cleaned up if they happen to slip in).

An alternative approach that occurred to me while typing this reply: a
setting in Postgres that would disallow setting session level GUCs
(i.e., enforce `SET LOCAL` transaction level usage instead) would
remove a large chunk of our need to set server_reset_query_always=1
(and more interestingly it'd highlight when broken code gets pushed).
But even with that, I see some value in the proposed setting since
there is additional session state beyond GUCs.

James

Re: Discarding DISCARD ALL

2021-01-20 Thread Simon Riggs

On Wed, 20 Jan 2021 at 14:21, James Coleman  wrote:

> An alternative approach that occurred to me while typing this reply: a
> setting in Postgres that would disallow setting session level GUCs
> (i.e., enforce `SET LOCAL` transaction level usage instead) would
> remove a large chunk of our need to set server_reset_query_always=1
> (and more interestingly it'd highlight when broken code gets pushed).
> But even with that, I see some value in the proposed setting since
> there is additional session state beyond GUCs.

With transaction_cleanup=on we could force all SETs to be SET LOCAL.

The point is that if we declare ahead of time that the transaction
will be reset then we can act differently and more easily for various
circumstances, for SETs, for Temp tables and others.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-20 Thread Dian M Fay

On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote:
> Hi
>
> I found minor issues.
>
> Doc - missing tag
>
> and three whitespaces issues
>
> see attached patch
>
> Following sentence is hard to read due long nested example
>
> If the
> + path contradicts structure of modified jsonb for any
> individual
> + value (e.g. path val['a']['b']['c'] assumes keys
> + 'a' and 'b' have object values
> + assigned to them, but if val['a'] or
> + val['b'] is null, a string, or a number, then the
> path
> + contradicts with the existing structure), an error is raised even if
> other
> + values do conform.
>
> It can be divided into two sentences - predicate, and example.
>
> Regards
>
> Pavel

Here's a full editing pass on the documentation, with v45 and Pavel's
doc-whitespaces-fix.patch applied. I also corrected a typo in one of the
added hints.
From 086a34ca860e8513484d829db1cb3f0c17c4ec1e Mon Sep 17 00:00:00 2001
From: Dian M Fay 
Date: Tue, 19 Jan 2021 23:44:23 -0500
Subject: [PATCH] Revise jsonb subscripting documentation

---
 doc/src/sgml/json.sgml  | 101 +---
 src/backend/utils/adt/jsonbsubs.c   |   2 +-
 src/test/regress/expected/jsonb.out |   2 +-
 3 files changed, 49 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 4e19fe4fb8..eb3952193a 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -605,97 +605,90 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE 
jdoc @> '{"tags": ["qu
  
   jsonb Subscripting
   
-   jsonb data type supports array-style subscripting expressions
-   to extract or update particular elements. It's possible to use multiple
-   subscripting expressions to extract nested values. In this case, a chain of
-   subscripting expressions follows the same rules as the
-   path argument in jsonb_set function,
-   e.g. in case of arrays it is a 0-based operation or that negative integers
-   that appear in path count from the end of JSON arrays.
-   The result of subscripting expressions is always jsonb data type.
+   The jsonb data type supports array-style subscripting 
expressions
+   to extract and modify elements. Nested values can be indicated by chaining
+   subscripting expressions, following the same rules as the 
path
+   argument in the jsonb_set function. If a 
jsonb
+   value is an array, numeric subscripts start at zero, and negative integers 
count
+   backwards from the last element of the array. Slice expressions are not 
supported.
+   The result of a subscripting expression is always of the jsonb data type.
   
 
   
UPDATE statements may use subscripting in the
-   SET clause to modify jsonb values. Every
-   affected value must conform to the path defined by the subscript(s). If the
-   path contradicts structure of modified jsonb for any individual
-   value (e.g. path val['a']['b']['c'] assumes keys
-   'a' and 'b' have object values
-   assigned to them, but if val['a'] or
-   val['b'] is null, a string, or a number, then the path
-   contradicts with the existing structure), an error is raised even if other
-   values do conform.
+   SET clause to modify jsonb values. Object
+   values being traversed must exist as specified by the subscript path. For
+   instance, the path val['a']['b']['c'] assumes that
+   val, val['a'], and 
val['a']['b']
+   are all objects in every record being updated 
(val['a']['b']
+   may or may not contain a field named c, as long as it's 
an
+   object). If any individual val, 
val['a'],
+   or val['a']['b'] is a non-object such as a string, a 
number,
+   or NULL, an error is raised even if other values do 
conform.
+   Array values are not subject to this restriction, as detailed below.
   
-  
 
+  
An example of subscripting syntax:
+
 
--- Extract value by key
+-- Extract object value by key
 SELECT ('{"a": 1}'::jsonb)['a'];
 
--- Extract nested value by key path
+-- Extract nested object value by key path
 SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
 
--- Extract element by index
+-- Extract array element by index
 SELECT ('[1, "2", null]'::jsonb)[1];
 
--- Update value by key, note the single quotes - the assigned value
--- needs to be of jsonb type as well
+-- Update object value by key. Note the quotes around '1': the assigned
+-- value must be of the jsonb type as well
 UPDATE table_name SET jsonb_field['key'] = '1';
 
--- This will raise an error if jsonb_field is {"a": 1}
+-- This will raise an error if any record's jsonb_field['a']['b'] is something
+-- other than an object. For example, the value {"a": 1} has no 'b' key.
 UPDATE table_name SET jsonb_field['a']['b']['c'] = '1';
 
--- Select records using where clause with subscripting. Since the result of
--- subscripting is jsonb and we basically want to compare two jsonb objects, we
--- need to put the value in double quotes to be able to convert it to jsonb.
+-- Filter records using a WHERE clause with subscripting. Since the result of
+-- subscripting

Re: Wrong usage of RelationNeedsWAL

2021-01-20 Thread Kyotaro Horiguchi

At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Anyway, it seems actually dangerous that cause pruning on wal-skipped
> relation.
> 
> > with your patch versions.  Could you try implementing both test procedures 
> > in
> > src/test/modules/snapshot_too_old?  There's no need to make the test use
> > wal_level=minimal explicitly; it's sufficient to catch these bugs when
> > wal_level=minimal is in the TEMP_CONFIG file.
> 
> In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
> instead of moving the condition on LSN to TestForOldSnapshot_impl for
> performance.
> 
> I'll add the test part in the next version.

This is it.

However, with the previous patch, two existing tests sto_using_cursor
and sto_using_select behaves differently from the master.  That change
is coming from the omission of actual LSN check in TestForOldSnapshot
while wal_level=minimal. So we have no choice other than actually
updating page LSN.

In the scenario under discussion there are two places we need to do
that. one is heap_page_prune, and the other is RelationCopyStorge. As
a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
attached third file.

If it is the right direction, I will move XLOG_GIST_ASSIGN_LSN to
XLOG_ASSIGN_LSN and move gistXLogAssignLSN() to XLogAdvanceLSN() or
XLogNop() or such.

With the third patch, the test succedds both wal_level = replica and
minimal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 332ac10844befb8a9c00df9f68993eb3696f0a85 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 20 Jan 2021 20:57:03 +0900
Subject: [PATCH v5 1/3] Test for snapshot too old and wal_level=minimal

---
 src/test/modules/snapshot_too_old/Makefile| 12 -
 .../expected/sto_wal_optimized.out| 24 ++
 .../input_sto/sto_wal_optimized.spec  | 44 +++
 src/test/modules/snapshot_too_old/sto.conf|  3 ++
 4 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out
 create mode 100644 src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec

diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..1e0b0b6efc 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -4,7 +4,7 @@
 # we have to clean those result files explicitly
 EXTRA_CLEAN = $(pg_regress_clean_files)
 
-ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
+ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index sto_wal_optimized
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
 # Disabled because these tests require "old_snapshot_threshold" >= 0, which
@@ -22,6 +22,16 @@ include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
 
+.PHONY: tablespace-setup specfile-setup
+tablespace-setup:
+	rm -rf ./testtablespace
+	mkdir ./testtablespace
+
+specfile-setup: input_sto/*.spec
+	sed 's!@srcdir@!$(realpath $(top_srcdir))!g' $? > specs/$(notdir $?)
+
+check: tablespace-setup specfile-setup
+
 # But it can nonetheless be very helpful to run tests on preexisting
 # installation, allow to do so, but only if requested explicitly.
 installcheck-force:
diff --git a/src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out b/src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out
new file mode 100644
index 00..4038d0a713
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out
@@ -0,0 +1,24 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1a1 s1a2 s2a1 s2a2 s1b1 a3-0 a3-1 s3-2 s3-3 s3-4 s2b1
+step s1a1: BEGIN;
+step s1a2: DELETE FROM t;
+step s2a1: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2a2: SELECT 1;
+?column?   
+
+1  
+step s1b1: COMMIT;
+step a3-0: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+settingpg_sleep   
+
+0 
+step a3-1: BEGIN;
+step s3-2: ALTER TABLE t SET TABLESPACE tsp1;
+step s3-3: SELECT count(*) FROM t;
+count  
+
+0  
+step s3-4: COMMIT;
+step s2b1: SELECT count(*) FROM t;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec b/src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec
new file mode 100644
index 00..02adb50581
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec
@@ -0,0 +1,44 @@
+# This test provokes a "snapshot too old" error 
+#
+# The sleep is needed because with a threshold of zero a statement could error
+# on changes it made.  With more normal settings no external delay is needed,
+# but we don't want these tests to run long enough to see that, since
+# granularity is in minutes.
+#
+# Since results depend on the valu

Re: pg_stat_statements oddity with track = all

2021-01-20 Thread Masahiko Sawada

On Wed, Jan 20, 2021 at 6:15 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda  
> wrote:
> >
> > Thanks for making the patch to add "toplevel" column in
> > pg_stat_statements.
> > This is a review comment.
>
> Thanks a lot for the thorough review!
>
> > I tested the "update" command can work.
> > postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';
> >
> > Although the "toplevel" column of all queries which already stored is
> > 'false',
> > we have to decide the default. I think 'false' is ok.
>
> Mmm, I'm not sure that I understand this result.  The "toplevel" value
> is tracked by the C code loaded at startup, so it should have a
> correct value even if you used to have the extension in a previous
> version, it's just that you can't access the toplevel field before
> doing the ALTER EXTENSION pg_stat_statements UPDATE.  There's also a
> change in the magic number, so you can't use the stored stat file from
> a version before this patch.
>
> I also fail to reproduce this behavior.  I did the following:
>
> - start postgres with pg_stat_statements v1.10 (so with this patch) in
> shared_preload_libraries
> - CREATE EXTENSION pg_stat_statements VERSION '1.9';
> - execute a few queries
> - ALTER EXTENSION pg_stat_statements UPDATE;
> - SELECT * FROM pg_stat_statements reports the query with toplvel to TRUE
>
> Can you share a way to reproduce your findings?
>
> > 2. usability review
> > 
> > [...]
> > Although one concern is whether only top-level is enough or not,
> > I couldn't come up with any use-case to use nested level, so I think
> > it's ok.
>
> I agree, I don't see how tracking statistics per nesting level would
> really help.  The only additional use case I see would tracking
> triggers/FK query execution, but the nesting level won't help with
> that.
>
> > 3. coding review
> > =
> > [...]
> > B. add a argument of the pg_stat_statements_reset().
> >
> > Now, pg_stat_statements supports a flexible reset feature.
> > > pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)
> >
> > Although I wondered whether we need to add a top-level flag to the
> > arguments,
> > I couldn't come up with any use-case to reset only top-level queries or
> > not top-level queries.
>
> Ah, I didn't think of the reset function.  I also fail to see a
> reasonable use case to provide a switch for the reset function.
>
> > 4. others
> > ==
> >
> > These comments are not related to this patch.
> >
> > A. about the topic of commitfests
> >
> > Since I think this feature is for monitoring,
> > it's better to change the topic from "System Administration"
> > to "Monitoring & Control".
>
> I agree, thanks for the change!

I've changed the topic accordingly.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alvaro Herrera

On 2021-Jan-20, Michael Paquier wrote:

> +/*
> + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> + * which should maybe be factored out to a library function.
> + */
> Wouldn't it be better to do first the refactoring of 0002 and then
> 0001 so as REINDEX can use the new routine, instead of putting that
> into a comment?

I think merging 0001 and 0002 into a single commit is a reasonable
approach.  I don't oppose an initial refactoring commit if you want to
do that, but it doesn't seem necessary.

-- 
Álvaro Herrera39°49'30"S 73°17'W

Re: Discarding DISCARD ALL

2021-01-20 Thread James Coleman

On Wed, Jan 20, 2021 at 9:58 AM Simon Riggs  wrote:
>
> On Wed, 20 Jan 2021 at 14:21, James Coleman  wrote:
>
> > An alternative approach that occurred to me while typing this reply: a
> > setting in Postgres that would disallow setting session level GUCs
> > (i.e., enforce `SET LOCAL` transaction level usage instead) would
> > remove a large chunk of our need to set server_reset_query_always=1
> > (and more interestingly it'd highlight when broken code gets pushed).
> > But even with that, I see some value in the proposed setting since
> > there is additional session state beyond GUCs.
>
> With transaction_cleanup=on we could force all SETs to be SET LOCAL.
>
> The point is that if we declare ahead of time that the transaction
> will be reset then we can act differently and more easily for various
> circumstances, for SETs, for Temp tables and others.

Right, I agree it's independently useful. My "alternative" is a subset
of that functionality and doesn't cover as many cases.

James

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alvaro Herrera

On 2021-Jan-20, Alvaro Herrera wrote:

> On 2021-Jan-20, Michael Paquier wrote:
> 
> > +/*
> > + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> > + * which should maybe be factored out to a library function.
> > + */
> > Wouldn't it be better to do first the refactoring of 0002 and then
> > 0001 so as REINDEX can use the new routine, instead of putting that
> > into a comment?
> 
> I think merging 0001 and 0002 into a single commit is a reasonable
> approach.

... except it doesn't make a lot of sense to have set_rel_tablespace in
either indexcmds.c or index.c.  I think tablecmds.c is a better place
for it.  (I would have thought catalog/storage.c, but that one's not the
right abstraction level it seems.)

But surely ATExecSetTableSpaceNoStorage should be using this new
routine.  (I first thought 0002 was doing that, since that commit is
calling itself a "refactoring", but now that I look closer, it's not.)

-- 
Álvaro Herrera39°49'30"S 73°17'W
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-20 Thread Dmitry Dolgov

> On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote:
>
> I found minor issues.
>
> Doc - missing tag
>
> and three whitespaces issues
>
> see attached patch

Thanks, I need to remember to not skipp doc building for testing process
even for such small changes. Hope now I didn't forget anything.

> On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote:

> Here's a full editing pass on the documentation, with v45 and Pavel's
> doc-whitespaces-fix.patch applied. I also corrected a typo in one of the
> added hints.

Great! I've applied almost all of it, except:

+   A jsonb value will accept assignments to nonexistent subscript
+   paths as long as the nonexistent elements being traversed are all arrays.

Maybe I've misunderstood the intention, but there is no requirement
about arrays for creating such an empty path. I've formulated it as:

+   A jsonb value will accept assignments to nonexistent subscript
+   paths as long as the last existing path key is an object or an array.
>From a4037c651a0cfd2448f38b6c8c932b5a09136b0a Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 18 Dec 2020 17:19:51 +0100
Subject: [PATCH v46 1/3] Subscripting for jsonb

Subscripting implementation for jsonb. It does not support slices, does
not have a limit for number of subscripts and for assignment expects a
replace value to be of jsonb type. There is also one functional
difference in assignment via subscripting from jsonb_set, when an
original jsonb container is NULL, subscripting replaces it with an empty
jsonb and proceed with assignment.

For the sake of code reuse, some parts of jsonb functionality were
rearranged to allow use the same functions for jsonb_set and assign
subscripting operation.

The original idea belongs to Oleg Bartunov.

Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
---
 doc/src/sgml/json.sgml  |  51 
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/jsonb_util.c  |  76 -
 src/backend/utils/adt/jsonbsubs.c   | 413 
 src/backend/utils/adt/jsonfuncs.c   | 180 ++--
 src/include/catalog/pg_proc.dat |   4 +
 src/include/catalog/pg_type.dat |   3 +-
 src/include/utils/jsonb.h   |   6 +-
 src/test/regress/expected/jsonb.out | 272 +-
 src/test/regress/sql/jsonb.sql  |  84 +-
 10 files changed, 985 insertions(+), 105 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonbsubs.c

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 5b9a5557a4..3ace5e444b 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -602,6 +602,57 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qu
   
  
 
+ 
+  jsonb Subscripting
+  
+   The jsonb data type supports array-style subscripting expressions
+   to extract and modify elements. Nested values can be indicated by chaining
+   subscripting expressions, following the same rules as the path
+   argument in the jsonb_set function. If a jsonb
+   value is an array, numeric subscripts start at zero, and negative integers count
+   backwards from the last element of the array. Slice expressions are not supported.
+   The result of a subscripting expression is always of the jsonb data type.
+  
+
+  
+   An example of subscripting syntax:
+
+
+-- Extract object value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
+
+-- Extract nested object value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract array element by index
+SELECT ('[1, "2", null]'::jsonb)[1];
+
+-- Update object value by key. Note the quotes around '1': the assigned
+-- value must be of the jsonb type as well
+UPDATE table_name SET jsonb_field['key'] = '1';
+
+-- Filter records using a WHERE clause with subscripting. Since the result of
+-- subscripting is jsonb, the value we compare it against must also be jsonb.
+-- The double quotes make "value" also a valid jsonb string.
+SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"';
+
+
+   jsonb assignment via subscripting handles a few edge cases
+   differently from jsonb_set. When a source jsonb
+   is NULL, assignment via subscripting will proceed as if
+   it was an empty JSON object:
+
+
+-- Where jsonb_field was NULL, it is now {"a": 1}
+UPDATE table_name SET jsonb_field['a'] = '1';
+
+-- Where jsonb_field was NULL, it is now [1]
+UPDATE table_name SET jsonb_field[0] = '1';
+
+
+  
+ 
+
  
   Transforms
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 82732146d3..279ff15ade 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -50,6 +50,7 @@ OBJS = \
 	jsonb_op.o \
 	jsonb_util.o \
 	jsonfuncs.o \
+	jsonbsubs.o \
 	jsonpath.o \
 	jsonpath_exec.o \
 	jsonpath_gram.o \
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 4eeffa1424..41a1c1f9bb 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/uti

Getting column names/types from select query?

2021-01-20 Thread Wesley Aptekar-Cassels

Hi all,

I am interested in figuring out how to get the names and types of the columns 
from an arbitrary query. Essentially, I want to be able to take a query like:

CREATE TABLE foo(
bar bigserial,
baz varchar(256)
);

SELECT * FROM foo WHERE bar = 42;

and figure out programmatically that the select will return a column "bar" of 
type bigserial, and a column "foo" of type varchar(256). I would like this to 
work for more complex queries as well (joins, CTEs, etc).

I've found https://wiki.postgresql.org/wiki/Query_Parsing, which talks about 
related ways to hook into postgres, but that seems to only talk about the parse 
tree — a lot more detail and processing seems to be required in order to figure 
out the output types. It seems like there should be somewhere I can hook into 
in postgres that will get me this information, but I have no familiarity with 
the codebase, so I don't know the best way to get this.

How would you recommend that I approach this? I'm comfortable patching postgres 
if needed, although if there's a solution that doesn't require that, I'd prefer 
that.

Thanks,

:w

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-20 Thread Dian M Fay

On Wed Jan 20, 2021 at 11:22 AM EST, Dmitry Dolgov wrote:
> > On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote:
> >
> > I found minor issues.
> >
> > Doc - missing tag
> >
> > and three whitespaces issues
> >
> > see attached patch
>
> Thanks, I need to remember to not skipp doc building for testing process
> even for such small changes. Hope now I didn't forget anything.
>
> > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote:
>
> > Here's a full editing pass on the documentation, with v45 and Pavel's
> > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the
> > added hints.
>
> Great! I've applied almost all of it, except:
>
> + A jsonb value will accept assignments to nonexistent
> subscript
> + paths as long as the nonexistent elements being traversed are all
> arrays.
>
> Maybe I've misunderstood the intention, but there is no requirement
> about arrays for creating such an empty path. I've formulated it as:
>
> + A jsonb value will accept assignments to nonexistent
> subscript
> + paths as long as the last existing path key is an object or an array.

My intention there was to highlight the difference between:

* SET obj['a']['b']['c'] = '"newvalue"'
* SET arr[0][0][3] = '"newvalue"'

obj has to conform to {"a": {"b": {...}}} in order to receive the
assignment of the nested c. If it doesn't, that's the error case we
discussed earlier. But arr can be null, [], and so on, and any missing
structure [[[null, null, null, "newvalue"]]] will be created. Take 2:

A jsonb value will accept assignments to nonexistent
subscript paths as long as object key subscripts can be traversed as
described above. The final subscript is not traversed and, if it
describes a missing object key, will be created. Nested arrays will
always be created and NULL-padded according to the
path until the value can be placed appropriately.

Re: Getting column names/types from select query?

2021-01-20 Thread Tom Lane

"Wesley Aptekar-Cassels"  writes:
> I am interested in figuring out how to get the names and types of the
> columns from an arbitrary query.

Where do you need this information?  Usually the easiest way is to
prepare (plan) the query and then extract metadata, for instance
PQprepare and PQdescribePrepared if using libpq.

regards, tom lane

Re: Odd, intermittent failure in contrib/pageinspect

2021-01-20 Thread Tom Lane

Michael Paquier  writes:
> On Tue, Jan 19, 2021 at 05:03:49PM -0500, Tom Lane wrote:
>> In short I propose the attached patch, which also gets rid of
>> that duplicate query.

> Agreed, +1.

Pushed, thanks for looking at it.

regards, tom lane

strange error reporting

2021-01-20 Thread Robert Haas

I just made the mistake of trying to run pgbench without first running
createdb and got this:

pgbench: error: connection to database "" failed: could not connect to
socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist

This looks pretty bogus because (1) I was not attempting to connect to
a database whose name is the empty string and (2) saying that it
couldn't connect to the socket is wrong, else it would not also be
showing a server message.

I haven't investigated why this is happening; apologies if this is a
known issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Jsonpath ** vs lax mode

2021-01-20 Thread Alexander Korotkov

Hi!

We have a bug report which says that jsonpath ** operator behaves strangely
in the lax mode [1].

Naturally, the result of this query looks counter-intuitive.

# select jsonb_path_query_array('[{"a": 1, "b": [{"a": 2}]}]', 'lax
$.**.a');
 jsonb_path_query_array

 [1, 1, 2, 2]
(1 row)

But actually, everything works as designed.  ** operator reports both
objects and wrapping arrays, while object key accessor automatically
unwraps arrays.

# select x, jsonb_path_query_array(x, '$.a') from jsonb_path_query('[{"a":
1, "b": [{"a": 2}]}]', 'lax $.**') x;
  x  | jsonb_path_query_array
-+
 [{"a": 1, "b": [{"a": 2}]}] | [1]
 {"a": 1, "b": [{"a": 2}]}   | [1]
 1   | []
 [{"a": 2}]  | [2]
 {"a": 2}| [2]
 2   | []
(6 rows)

At first sight, we may just say that lax mode just sucks and
counter-intuitive results are expected.  But at the second sight, the lax
mode is used by default and current behavior may look too surprising.

My proposal is to make everything after the ** operator use strict mode
(patch attached).  I think this shouldn't be backpatched, just applied to
the v14.  Other suggestions?

Links
1.
https://www.postgresql.org/message-id/16828-2b0229babfad2d8c%40postgresql.org

--
Regards,
Alexander Korotkov


jsonpath_double_star_strict_mode.patch
Description: Binary data

Re: strange error reporting

2021-01-20 Thread Tom Lane

Robert Haas  writes:
> I just made the mistake of trying to run pgbench without first running
> createdb and got this:

> pgbench: error: connection to database "" failed: could not connect to
> socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist

> This looks pretty bogus because (1) I was not attempting to connect to
> a database whose name is the empty string and (2) saying that it
> couldn't connect to the socket is wrong, else it would not also be
> showing a server message.

I'm not sure about the empty DB name in the first part (presumably
that's from pgbench, so what was your pgbench command exactly?).
But the 'could not connect to socket' part is a consequence of my
recent fiddling with libpq's connection failure reporting, see
52a10224e.  We could discuss exactly how that ought to be spelled,
but the idea is to consistently identify the host that we were trying
to connect to.  If you have a multi-host connection string, it's
conceivable that "rhaas" exists on some of those hosts and not others,
so I do not think the info is irrelevant.

Just looking at this, I wonder if we ought to drop pgbench's
contribution to the message entirely; it seems like libpq's
message is now fairly freestanding.

regards, tom lane

Re: Phrase search vs. multi-lexeme tokens

2021-01-20 Thread Alexander Korotkov

On Thu, Jan 7, 2021 at 6:36 AM Alexander Korotkov  wrote:
>
> > I read your patch over quickly and it seems like a reasonable
> > approach (but sadly underdocumented).  Can we extend the idea
> > to fix the to_tsquery case?
>
> Sure, I'll provide a revised patch.

The next version of the patch is attached.  Now, it just makes
to_tsquery() and websearch_to_tsquery() use phrase operator to connect
multiple lexemes of the same tsquery token.  I leave plainto_tsquery()
aside because it considers the whole argument as a single token.
Changing it would make it an equivalent of phraseto_tsquery().

--
Regards,
Alexander Korotkov

tsquery_phrase_fix.path
Description: Binary data

Re: strange error reporting

2021-01-20 Thread Robert Haas

On Wed, Jan 20, 2021 at 12:19 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I just made the mistake of trying to run pgbench without first running
> > createdb and got this:
>
> > pgbench: error: connection to database "" failed: could not connect to
> > socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist
>
> > This looks pretty bogus because (1) I was not attempting to connect to
> > a database whose name is the empty string and (2) saying that it
> > couldn't connect to the socket is wrong, else it would not also be
> > showing a server message.
>
> I'm not sure about the empty DB name in the first part (presumably
> that's from pgbench, so what was your pgbench command exactly?).

I think it was just 'pgbench -i 40'. For sure, I didn't specify a database name.

> But the 'could not connect to socket' part is a consequence of my
> recent fiddling with libpq's connection failure reporting, see
> 52a10224e.  We could discuss exactly how that ought to be spelled,
> but the idea is to consistently identify the host that we were trying
> to connect to.  If you have a multi-host connection string, it's
> conceivable that "rhaas" exists on some of those hosts and not others,
> so I do not think the info is irrelevant.

I'm not saying that which socket I used is totally irrelevant although
in most cases it's going to be a lot of detail. I'm just saying that,
at least for me, when you say you can't connect to a socket, I at
least think about the return value of connect(2), which was clearly 0
here.

> Just looking at this, I wonder if we ought to drop pgbench's
> contribution to the message entirely; it seems like libpq's
> message is now fairly freestanding.

Maybe it would be better if it said:

connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
database "rhaas" does not exist

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Getting column names/types from select query?

2021-01-20 Thread Wesley Aptekar-Cassels

> Where do you need this information?

I'm writing some code that takes a given query, and generates type-safe 
bindings for it, so people can write SQL queries and get structs (or vectors of 
structs) out the other end. So I'm pretty flexible about where I get it, given 
that it'll be part of my build/codegen process. I hadn't seen libpq yet, I'll 
look into that — thanks!

Re: strange error reporting

2021-01-20 Thread Tom Lane

Robert Haas  writes:
> On Wed, Jan 20, 2021 at 12:19 PM Tom Lane  wrote:
>> But the 'could not connect to socket' part is a consequence of my
>> recent fiddling with libpq's connection failure reporting, see
>> 52a10224e.  We could discuss exactly how that ought to be spelled,
>> but the idea is to consistently identify the host that we were trying
>> to connect to.  If you have a multi-host connection string, it's
>> conceivable that "rhaas" exists on some of those hosts and not others,
>> so I do not think the info is irrelevant.

> I'm not saying that which socket I used is totally irrelevant although
> in most cases it's going to be a lot of detail. I'm just saying that,
> at least for me, when you say you can't connect to a socket, I at
> least think about the return value of connect(2), which was clearly 0
> here.

Fair.  One possibility, which'd take a few more cycles in libpq but
likely not anything significant, is to replace "could not connect to ..."
with "while connecting to ..." once we're past the connect() per se.

> Maybe it would be better if it said:

> connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
> database "rhaas" does not exist

I'd be inclined to spell it "connection to server at ... failed",
but that sort of wording is surely also possible.

regards, tom lane

Re: strange error reporting

2021-01-20 Thread Robert Haas

On Wed, Jan 20, 2021 at 12:47 PM Tom Lane  wrote:
> Fair.  One possibility, which'd take a few more cycles in libpq but
> likely not anything significant, is to replace "could not connect to ..."
> with "while connecting to ..." once we're past the connect() per se.

Yeah. I think this is kind of a client-side version of errcontext(),
except we don't really have that context formally, so we're trying to
figure out how to fake it in specific cases.

> > Maybe it would be better if it said:
>
> > connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
> > database "rhaas" does not exist
>
> I'd be inclined to spell it "connection to server at ... failed",
> but that sort of wording is surely also possible.

"connection to server" rather than "connection to database" works for
me; in fact, I think I like it slightly better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alexey Kondratov

On 2021-01-20 18:54, Alvaro Herrera wrote:

On 2021-Jan-20, Alvaro Herrera wrote:

On 2021-Jan-20, Michael Paquier wrote:

> +/*
> + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> + * which should maybe be factored out to a library function.
> + */
> Wouldn't it be better to do first the refactoring of 0002 and then
> 0001 so as REINDEX can use the new routine, instead of putting that
> into a comment?

I think merging 0001 and 0002 into a single commit is a reasonable
approach.

... except it doesn't make a lot of sense to have set_rel_tablespace in
either indexcmds.c or index.c.  I think tablecmds.c is a better place
for it.  (I would have thought catalog/storage.c, but that one's not 
the

right abstraction level it seems.)

I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New 
function SetRelTablesapce() is placed into the tablecmds.c. Following 
0002 gets use of it. Is it close to what you and Michael suggested?

But surely ATExecSetTableSpaceNoStorage should be using this new
routine.  (I first thought 0002 was doing that, since that commit is
calling itself a "refactoring", but now that I look closer, it's not.)

Yeah, this 'refactoring' was initially referring to refactoring of what 
Justin added to one of the previous 0001. And it was meant to be merged 
with 0001, once agreed, but we got distracted by other stuff.

I have not yet addressed Michael's concerns regarding reindex of 
partitions. I am going to look closer on it tomorrow.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alexey Kondratov


On 2021-01-20 21:08, Alexey Kondratov wrote:

On 2021-01-20 18:54, Alvaro Herrera wrote:

On 2021-Jan-20, Alvaro Herrera wrote:


On 2021-Jan-20, Michael Paquier wrote:

> +/*
> + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> + * which should maybe be factored out to a library function.
> + */
> Wouldn't it be better to do first the refactoring of 0002 and then
> 0001 so as REINDEX can use the new routine, instead of putting that
> into a comment?

I think merging 0001 and 0002 into a single commit is a reasonable
approach.


... except it doesn't make a lot of sense to have set_rel_tablespace 
in

either indexcmds.c or index.c.  I think tablecmds.c is a better place
for it.  (I would have thought catalog/storage.c, but that one's not 
the

right abstraction level it seems.)



I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New
function SetRelTablesapce() is placed into the tablecmds.c. Following
0002 gets use of it. Is it close to what you and Michael suggested?



Ugh, forgot to attach the patches. Here they are.

--
AlexeyFrom 2c3876f99bc8ebdd07c532619992e7ec3093e50a Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 23 Mar 2020 21:10:29 +0300
Subject: [PATCH v2 2/2] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  22 +
 src/backend/catalog/index.c   |  72 ++-
 src/backend/commands/indexcmds.c  |  68 ++-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   2 +
 src/test/regress/input/tablespace.source  |  53 +++
 src/test/regress/output/tablespace.source | 102 ++
 7 files changed, 318 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..4f84060c4d 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,19 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  This specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, then
+  all unsuitable relations will be skipped and a single WARNING
+  will be generated.
+ 
+
+   
+

 VERBOSE
 
@@ -210,6 +224,14 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+new_tablespace
+
+ 
+  The tablespace where indexes will be rebuilt.
+ 
+
+   
   
  
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b8cd35e995..ed98b17483 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -57,6 +57,7 @@
 #include "commands/event_trigger.h"
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
+#include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -1394,9 +1395,13 @@ index_update_collation_versions(Oid relid, Oid coll)
  * Create concurrently an index based on the definition of the one provided by
  * caller.  The index is inserted into catalogs and needs to be built later
  * on.  This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the new tablespace to use for this index.  If
+ * InvalidOid, use the tablespace in-use instead.
  */
 Oid
-index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName)
+index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
+			   Oid tablespaceOid, const char *newName)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1526,7 +1531,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char
 			  newInfo,
 			  indexColNames,
 			  indexRelation->rd_rel->relam,
-			  indexRelation->rd_rel->reltablespace,
+			  OidIsValid(tablespaceOid) ?
+tablespaceOid : indexRelation->rd_rel->reltablespace,
 			  indexRelation->rd_indcollation,
 			  indclass->values,
 			  indcoloptions->values,
@@ -3591,6 +3597,8 @@ IndexGetRelation(Oid indexId, bool missing_ok)
 
 /*
  * reindex_index - This routine is used to recreate a single index
+ *
+ * See comments of reindex_relation() for details about "tablespaceOid".
  */
 void
 reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
@@ -3603,6 +3611,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	volatile bool skipped_constraint = false;
 	PGRUsage	ru0;
 	bool		progress = ((params->options & REINDEXOPT_REPORT_PROGRESS) != 0);
+	bool		set_tablespace = OidIsValid(params->tablespaceOid);
 
 	pg_rusage_init(&ru0);

Re: Jsonpath ** vs lax mode

2021-01-20 Thread Alvaro Herrera

On 2021-Jan-20, Alexander Korotkov wrote:

> My proposal is to make everything after the ** operator use strict mode
> (patch attached).  I think this shouldn't be backpatched, just applied to
> the v14.  Other suggestions?

I think changing the mode midway through the operation is strange.  What
do you think of requiring for ** that mode is strict?  That is, if ** is
used and the mode is lax, an error is thrown.

Thanks

-- 
Álvaro Herrera   Valdivia, Chile

Re: strange error reporting

2021-01-20 Thread Alvaro Herrera

On 2021-Jan-20, Robert Haas wrote:

> On Wed, Jan 20, 2021 at 12:19 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > I just made the mistake of trying to run pgbench without first running
> > > createdb and got this:
> >
> > > pgbench: error: connection to database "" failed: could not connect to
> > > socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist
> >
> > > This looks pretty bogus because (1) I was not attempting to connect to
> > > a database whose name is the empty string [...]
> >
> > I'm not sure about the empty DB name in the first part (presumably
> > that's from pgbench, so what was your pgbench command exactly?).
> 
> I think it was just 'pgbench -i 40'. For sure, I didn't specify a database 
> name.

That's because pgbench reports the input argument dbname, but since you
didn't specify anything, then PQconnectdbParams() uses the libpq
behavior.  I think we'd have to use PQdb() instead.

-- 
Álvaro Herrera   Valdivia, Chile

Re: strange error reporting

2021-01-20 Thread Robert Haas

On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera  wrote:
> That's because pgbench reports the input argument dbname, but since you
> didn't specify anything, then PQconnectdbParams() uses the libpq
> behavior.  I think we'd have to use PQdb() instead.

I figured it was something like that. I don't know whether the right
thing is to use something like PQdb() to get the correct database
name, or whether we should go with Tom's suggestion and omit that
detail altogether, but I think showing the empty string when the user
relied on the default is too confusing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alvaro Herrera

On 2021-Jan-20, Alexey Kondratov wrote:

> On 2021-01-20 21:08, Alexey Kondratov wrote:
> > 
> > I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New
> > function SetRelTablesapce() is placed into the tablecmds.c. Following
> > 0002 gets use of it. Is it close to what you and Michael suggested?
> 
> Ugh, forgot to attach the patches. Here they are.

Yeah, looks reasonable.

> + /* No work if no change in tablespace. */
> + oldTablespaceOid = rd_rel->reltablespace;
> + if (tablespaceOid != oldTablespaceOid ||
> + (tablespaceOid == MyDatabaseTableSpace && 
> OidIsValid(oldTablespaceOid)))
> + {
> + /* Update the pg_class row. */
> + rd_rel->reltablespace = (tablespaceOid == MyDatabaseTableSpace) 
> ?
> + InvalidOid : tablespaceOid;
> + CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
> +
> + changed = true;
> + }
> +
> + if (changed)
> + /* Record dependency on tablespace */
> + changeDependencyOnTablespace(RelationRelationId,
> +  
> reloid, rd_rel->reltablespace);

Why have a separate "if (changed)" block here instead of merging with
the above?


-- 
Álvaro Herrera39°49'30"S 73°17'W

Re: strange error reporting

2021-01-20 Thread Alvaro Herrera

On 2021-Jan-20, Robert Haas wrote:

> On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera  
> wrote:
> > That's because pgbench reports the input argument dbname, but since you
> > didn't specify anything, then PQconnectdbParams() uses the libpq
> > behavior.  I think we'd have to use PQdb() instead.
> 
> I figured it was something like that. I don't know whether the right
> thing is to use something like PQdb() to get the correct database
> name, or whether we should go with Tom's suggestion and omit that
> detail altogether, but I think showing the empty string when the user
> relied on the default is too confusing.

Well, the patch seems small enough, and I don't think it'll be in any
way helpful to omit that detail.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f7da3e1f62..30fde9e9ec 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1226,7 +1226,7 @@ doConnect(void)
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
 		pg_log_error("connection to database \"%s\" failed: %s",
-	 dbName, PQerrorMessage(conn));
+	 PQdb(conn), PQerrorMessage(conn));
 		PQfinish(conn);
 		return NULL;
 	}

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-20 Thread Peter Geoghegan

On Wed, Jan 20, 2021 at 5:33 AM Amit Kapila  wrote:
> > Victor independently came up with a benchmark that ran over several
> > hours, with cleanup consistently held back by ~5 minutes by a long
> > running transaction:
> >
>
> AFAICS, the long-running transaction used in the test is below:
> SELECT abalance, pg_sleep(300) FROM pgbench_accounts WHERE mtime >
> now() - INTERVAL '15min' ORDER BY aid LIMIT 1;
>
> This shouldn't generate a transaction id so would it be sufficient to
> hold back the clean-up?

It will hold back clean-up because it holds open a snapshot. Whether
or not the long running transaction has been allocated a true XID (not
just a VXID) is irrelevant. Victor's test case is perfectly valid.

In general there are significant benefits for cases with long-running
transactions, which will be quite apparent if you do something simple
like run pgbench (a script with non-HOT updates) while a REPEATABLE
READ transaction runs in psql (and has actually acquired a snapshot by
running a simple query -- the xact snapshot is acquired lazily). I
understand that this will be surprising if you believe that the
problem in these cases is strictly that there are too many "recently
dead" versions that still need to be stored (i.e. versions that
cleanup simply isn't able to remove, given the xmin horizon
invariant). But it's now clear that that's not what really happens in
most cases with a long running transaction. What we actually see is
that local page-level inefficiencies in cleanup were (and perhaps to
some degree still are) a much bigger problem than the inherent
inability of cleanup to remove even one or two tuples. This is at
least true until the bloat problem becomes a full-blown disaster
(because cleanup really is inherently restricted by the global xmin
horizon, and therefore hopelessly behind).

In reality there are seldom that many individual logical rows that get
updated more than a few times in (say) any given one hour period. This
is true even with skewed updates -- the skew is almost never visible
at the level of an individual leaf page. The behavior we see now that
we have bottom-up index deletion is much closer to the true optimal
behavior for the general approach Postgres takes to cleanup of garbage
tuples, since it tends to make the number of versions for any given
logical row as low as possible (within the confines of the global xmin
horizon limitations for cleanup).

Of course it would also be helpful to have something like zheap --
some mechanism that can store "recently dead" versions some place
where they at least don't bloat up the main relation structures. But
that's only one part of the big picture for Postgres MVCC garbage. We
should not store garbage tuples (i.e. those that are dead rather than
just recently dead) *anywhere*.

> First, it is not clear to me if that has properly simulated the
> long-running test but even if it is what I intend to say was to have
> an open long-running transaction possibly for the entire duration of
> the test? If we do that, we will come to know if there is any overhead
> and if so how much?

I am confident that you are completely wrong about regressing cases
with long-running transactions, except perhaps in some very narrow
sense that is of little practical relevance. Victor's test case did
result in a small loss of throughput, for example, but that's a small
price to pay to not have your indexes explode (note also that most of
the indexes weren't even used for reads, so in the real world it would
probably also improve throughput even in the short term). FWIW the
small drop in TPS probably had little to do with the cost of visiting
the heap for visibility information. Workloads must be made to "live
within their means". You can call that a regression if you like, but I
think that almost anybody else would take issue with that
characterization.

Slowing down non-HOT updaters in these extreme cases may actually be a
good thing, even when bottom-up deletion finally becomes ineffective.
It can be thought of as backpressure. I am not worried about slowing
down something that is already hopelessly inefficient and
unsustainable. I'd go even further than that, in fact -- I now wonder
if we should *deliberately* slow them down some more!

> Test with 2 un-modified indexes
> ===
> create table t1(c1 int, c2 int, c3 int, c4 char(10));
> create index idx_t1 on t1(c1);
> create index idx_t2 on t1(c2);
> create index idx_t3 on t1(c3);
>
> insert into t1 values(generate_series(1,500), 1, 10, 'aa');
> update t1 set c2 = 2;

> postgres=# update t1 set c2 = 2;
> UPDATE 500
> Time: 46533.530 ms (00:46.534)
>
> With HEAD
> ==
> postgres=# update t1 set c2 = 2;
> UPDATE 500
> Time: 52529.839 ms (00:52.530)
>
> I have dropped and recreated the table after each update in the test.

Good thing that you remembered to drop and recreate the table, since
otherwise bottom-up index deletion would look really good!

Besides,

Re: strange error reporting

2021-01-20 Thread Tom Lane

Alvaro Herrera  writes:
> On 2021-Jan-20, Robert Haas wrote:
>> I figured it was something like that. I don't know whether the right
>> thing is to use something like PQdb() to get the correct database
>> name, or whether we should go with Tom's suggestion and omit that
>> detail altogether, but I think showing the empty string when the user
>> relied on the default is too confusing.

> Well, the patch seems small enough, and I don't think it'll be in any
> way helpful to omit that detail.

I'm +1 for applying and back-patching that.  I still think we might
want to just drop the phrase altogether in HEAD, but we wouldn't do
that in the back branches, and the message is surely misleading as-is.

regards, tom lane

catalogs.sgml documentation ambiguity

2021-01-20 Thread Joel Jacobson

Some catalog tables have references to pg_attribute.attnum.

In the documentation, it only says "(references pg_attribute.attnum)"
but not which oid column to include in the two-column "foreign key".

This would not be a problem if there would only be one reference to 
pg_class.oid,
but some catalog tables have multiple columns that references pg_class.oid.

For instance, pg_constraint has two columns (conkey, confkey) referencing 
pg_attribute,
and three columns (conrelid, conindid, confrelid) referencing pg_class.

A user might wonder:
- Which one of these three columns should be used in combination with the 
conkey/confkey elements to join pg_attribute?

If we would have array foreign key support, I would guess the "foreign keys" 
should be:

FOREIGN KEY (confrelid, EACH ELEMENT OF confkey) REFERENCES 
pg_catalog.pg_attribute (attrelid, attnum)
FOREIGN KEY (conrelid, EACH ELEMENT OF conkey) REFERENCES 
pg_catalog.pg_attribute (attrelid, attnum)

It's of course harder to guess for a machine though, which would need a 
separate human-produced lookup-table.

Could it be meaningful to clarify these multi-key relations in the 
documentation?

As a bonus, machines could then parse the information out of catalogs.sgml.

Here is a list of catalogs referencing pg_attribute and with multiple pg_class 
references:

  table_name  |   array_agg
--+---
pg_constraint| {confrelid,conindid,conrelid}
pg_index | {indexrelid,indrelid}
pg_partitioned_table | {partdefid,partrelid}
pg_trigger   | {tgconstrindid,tgconstrrelid,tgrelid}
(4 rows)

Produced using query:

SELECT b.table_name, array_agg(DISTINCT b.column_name)
FROM pit.oid_joins AS a
JOIN pit.oid_joins AS b
ON b.table_name = a.table_name
WHERE a.ref_table_name = 'pg_attribute'
AND b.ref_table_name = 'pg_class'
GROUP BY b.table_name
HAVING cardinality(array_agg(DISTINCT b.column_name)) > 1
;

Re: Allow matching whole DN from a client certificate

2021-01-20 Thread Jacob Champion

On Mon, 2021-01-18 at 11:23 +0100, Daniel Gustafsson wrote:
> +   /* use commas instead of slashes */
> +   X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_SEP_COMMA_PLUS);
> I don't have strong opinions on whether we shold use slashes or commas, but I
> think it needs to be documented that commas are required since slashes is the
> more common way to print a DN.

There's also a XN_FLAG_RFC2253 flag, which claims to print in an RFC-
compatible escape system. (It includes XN_FLAG_SEP_COMMA_PLUS.) I think
NSS uses a similar RFC-based escape scheme, but I haven't checked to
see whether it's fully compatible.

I think you'll want to be careful to specify the format as much as
possible, both to make sure that other backend TLS implementations can
actually use the same escaping system and to ensure that user regexes
don't suddenly start matching different things at some point in the
future. As a cautionary tale, nginx is stuck with two versions of their
similar feature (ssl_client_s_dn_legacy vs ssl_client_s_dn) after their
switch to an RFC-compatible system [1].

Even when using RFC 2253 (now 4514) escaping, there are things left to the 
implementation, such as the order of AVAs inside multivalued RDNs. The RFC 
warns not to consider these representations to be canonical forms, so it's 
possible that switching (or upgrading) the TLS library in use could force users 
to change their regexes at some point in the future.
I'm going to test this patch with some UTF-8 DNs later today; I'll
share my findings. I'm also going to read up on [2] a bit more.

--Jacob

[1] 
https://serverfault.com/questions/829754/why-did-the-format-of-nginx-ssl-client-i-dn-suddenly-change

[2] 
https://frasertweedale.github.io/blog-redhat/posts/2019-05-28-a-dn-is-not-a-string.html

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-20 Thread Dmitry Dolgov

> On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote:
> > Thanks, I need to remember to not skipp doc building for testing process
> > even for such small changes. Hope now I didn't forget anything.
> >
> > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote:
> >
> > > Here's a full editing pass on the documentation, with v45 and Pavel's
> > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the
> > > added hints.
> >
> > Great! I've applied almost all of it, except:
> >
> > + A jsonb value will accept assignments to nonexistent
> > subscript
> > + paths as long as the nonexistent elements being traversed are all
> > arrays.
> >
> > Maybe I've misunderstood the intention, but there is no requirement
> > about arrays for creating such an empty path. I've formulated it as:
> >
> > + A jsonb value will accept assignments to nonexistent
> > subscript
> > + paths as long as the last existing path key is an object or an array.
>
> My intention there was to highlight the difference between:
>
> * SET obj['a']['b']['c'] = '"newvalue"'
> * SET arr[0][0][3] = '"newvalue"'
>
> obj has to conform to {"a": {"b": {...}}} in order to receive the
> assignment of the nested c. If it doesn't, that's the error case we
> discussed earlier. But arr can be null, [], and so on, and any missing
> structure [[[null, null, null, "newvalue"]]] will be created.

If arr is 'null', or any other scalar value, such subscripting will work
only one level deep because they represented internally as an array of
one element. If arr is '[]' the path will comply by definition. So it's
essentially the same as for objects with no particular difference. If
such a quirk about scalars being treated like arrays is bothering, we
could also bend it in this case as well (see the attached version).
>From a4037c651a0cfd2448f38b6c8c932b5a09136b0a Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 18 Dec 2020 17:19:51 +0100
Subject: [PATCH v48 1/3] Subscripting for jsonb

Subscripting implementation for jsonb. It does not support slices, does
not have a limit for number of subscripts and for assignment expects a
replace value to be of jsonb type. There is also one functional
difference in assignment via subscripting from jsonb_set, when an
original jsonb container is NULL, subscripting replaces it with an empty
jsonb and proceed with assignment.

For the sake of code reuse, some parts of jsonb functionality were
rearranged to allow use the same functions for jsonb_set and assign
subscripting operation.

The original idea belongs to Oleg Bartunov.

Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
---
 doc/src/sgml/json.sgml  |  51 
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/jsonb_util.c  |  76 -
 src/backend/utils/adt/jsonbsubs.c   | 413 
 src/backend/utils/adt/jsonfuncs.c   | 180 ++--
 src/include/catalog/pg_proc.dat |   4 +
 src/include/catalog/pg_type.dat |   3 +-
 src/include/utils/jsonb.h   |   6 +-
 src/test/regress/expected/jsonb.out | 272 +-
 src/test/regress/sql/jsonb.sql  |  84 +-
 10 files changed, 985 insertions(+), 105 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonbsubs.c

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 5b9a5557a4..3ace5e444b 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -602,6 +602,57 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qu
   
  
 
+ 
+  jsonb Subscripting
+  
+   The jsonb data type supports array-style subscripting expressions
+   to extract and modify elements. Nested values can be indicated by chaining
+   subscripting expressions, following the same rules as the path
+   argument in the jsonb_set function. If a jsonb
+   value is an array, numeric subscripts start at zero, and negative integers count
+   backwards from the last element of the array. Slice expressions are not supported.
+   The result of a subscripting expression is always of the jsonb data type.
+  
+
+  
+   An example of subscripting syntax:
+
+
+-- Extract object value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
+
+-- Extract nested object value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract array element by index
+SELECT ('[1, "2", null]'::jsonb)[1];
+
+-- Update object value by key. Note the quotes around '1': the assigned
+-- value must be of the jsonb type as well
+UPDATE table_name SET jsonb_field['key'] = '1';
+
+-- Filter records using a WHERE clause with subscripting. Since the result of
+-- subscripting is jsonb, the value we compare it against must also be jsonb.
+-- The double quotes make "value" also a valid jsonb string.
+SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"';
+
+
+   jsonb assignment via subscripting handles a few edge cases
+   differently from jsonb_set. When a source jsonb
+   is NULL, assig

Re: Calculation of relids (pull_varnos result) for PlaceHolderVars

2021-01-20 Thread Tom Lane

I wrote:
> ...
> 2. pull_varnos() is not passed the planner "root" data structure,
> so it can't get at the PlaceHolderInfo list.  We can change its
> API of course, but that propagates to dozens of places.
> ...
> The 0001 patch attached goes ahead and makes those API changes.
> I think this is perfectly reasonable to do in HEAD, but it most
> likely is an unacceptable API/ABI break for the back branches.
> ...
> A third way is to preserve the existing pull_varnos() API in
> the back branches, changing all the internal calls to use a
> new function that has the additional "root" parameter.  This
> seems feasible but I've not attempted to code it yet.

Here's a proposed fix that does it like that.  The 0001 patch
is the same as before, and then 0002 is a delta to be applied
only in the back branches.  What I did there was install a layer
of macros in the relevant .c files that cause calls that look like
the HEAD versions to be redirected to the "xxx_new" functions.
The idea is to keep the actual code in sync with HEAD, for
readability and to minimize back-patching pain.  It could be
argued that this is too cute and the internal references should
just go to the "new" functions in the back branches.

I did not bother to preserve ABI for these two functions:
indexcol_is_bool_constant_for_query()
build_implied_join_equality()
because I judged it highly unlikely that any extensions are
calling them.  If anybody thinks differently, we could hack
those in the same way.

regards, tom lane

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..48c2f23ae0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5709,7 +5709,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 			 * RestrictInfos, so we must make our own.
 			 */
 			Assert(!IsA(expr, RestrictInfo));
-			rinfo = make_restrictinfo(expr,
+			rinfo = make_restrictinfo(root,
+	  expr,
 	  true,
 	  false,
 	  false,
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 4f5b870d1b..d263ecf082 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -227,7 +227,7 @@ clauselist_selectivity_ext(PlannerInfo *root,
 			}
 			else
 			{
-ok = (NumRelids(clause) == 1) &&
+ok = (NumRelids(root, clause) == 1) &&
 	(is_pseudo_constant_clause(lsecond(expr->args)) ||
 	 (varonleft = false,
 	  is_pseudo_constant_clause(linitial(expr->args;
@@ -609,7 +609,7 @@ bms_is_subset_singleton(const Bitmapset *s, int x)
  *	  restriction or join estimator.  Subroutine for clause_selectivity().
  */
 static inline bool
-treat_as_join_clause(Node *clause, RestrictInfo *rinfo,
+treat_as_join_clause(PlannerInfo *root, Node *clause, RestrictInfo *rinfo,
 	 int varRelid, SpecialJoinInfo *sjinfo)
 {
 	if (varRelid != 0)
@@ -643,7 +643,7 @@ treat_as_join_clause(Node *clause, RestrictInfo *rinfo,
 		if (rinfo)
 			return (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
 		else
-			return (NumRelids(clause) > 1);
+			return (NumRelids(root, clause) > 1);
 	}
 }
 
@@ -860,7 +860,7 @@ clause_selectivity_ext(PlannerInfo *root,
 		OpExpr	   *opclause = (OpExpr *) clause;
 		Oid			opno = opclause->opno;
 
-		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+		if (treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo))
 		{
 			/* Estimate selectivity for a join clause. */
 			s1 = join_selectivity(root, opno,
@@ -896,7 +896,7 @@ clause_selectivity_ext(PlannerInfo *root,
   funcclause->funcid,
   funcclause->args,
   funcclause->inputcollid,
-  treat_as_join_clause(clause, rinfo,
+  treat_as_join_clause(root, clause, rinfo,
 	   varRelid, sjinfo),
   varRelid,
   jointype,
@@ -907,7 +907,7 @@ clause_selectivity_ext(PlannerInfo *root,
 		/* Use node specific selectivity calculation function */
 		s1 = scalararraysel(root,
 			(ScalarArrayOpExpr *) clause,
-			treat_as_join_clause(clause, rinfo,
+			treat_as_join_clause(root, clause, rinfo,
  varRelid, sjinfo),
 			varRelid,
 			jointype,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..aab06c7d21 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1858,7 +1858,7 @@ cost_incremental_sort(Path *path,
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos((Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
 		{
 			unknown_varno = true;
 			break;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index ab6eaaead1..0188c1e9a1 100644
--- a/s

Re: strange error reporting

2021-01-20 Thread Robert Haas

On Wed, Jan 20, 2021 at 1:54 PM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > On 2021-Jan-20, Robert Haas wrote:
> >> I figured it was something like that. I don't know whether the right
> >> thing is to use something like PQdb() to get the correct database
> >> name, or whether we should go with Tom's suggestion and omit that
> >> detail altogether, but I think showing the empty string when the user
> >> relied on the default is too confusing.
>
> > Well, the patch seems small enough, and I don't think it'll be in any
> > way helpful to omit that detail.
>
> I'm +1 for applying and back-patching that.  I still think we might
> want to just drop the phrase altogether in HEAD, but we wouldn't do
> that in the back branches, and the message is surely misleading as-is.

Sure, that makes sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Tom Lane

Pavel Stehule  writes:
> The second question is work with partition context value. This should be
> only one value, and of only one but of any type per function. In this case
> we cannot use GET statements. I had an idea of enhancing declaration. Some
> like

> DECLARE
>   pcx PARTITION CONTEXT (int); -- read partition context
> BEGIN
>   pcx := 10; -- set partition context

> What do you think about it?

Uh, what?  I don't understand what this "partition context" is.

regards, tom lane

Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Pavel Stehule

st 20. 1. 2021 v 21:07 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > The second question is work with partition context value. This should be
> > only one value, and of only one but of any type per function. In this
> case
> > we cannot use GET statements. I had an idea of enhancing declaration.
> Some
> > like
>
> > DECLARE
> >   pcx PARTITION CONTEXT (int); -- read partition context
> > BEGIN
> >   pcx := 10; -- set partition context
>
> > What do you think about it?
>
> Uh, what?  I don't understand what this "partition context" is.
>

It was my name for an access to window partition local memory -
WinGetPartitionLocalMemory

We need some interface for this cache

Regards

Pavel







>
> regards, tom lane
>

Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Tom Lane

Pavel Stehule  writes:
> st 20. 1. 2021 v 21:07 odesílatel Tom Lane  napsal:
>> Uh, what?  I don't understand what this "partition context" is.

> It was my name for an access to window partition local memory -
> WinGetPartitionLocalMemory

Ah.

> We need some interface for this cache

I'm not convinced we need to expose that, or that it'd be very
satisfactory to plpgsql users if we did.  The fact that it's fixed-size
and initializes to zeroes are both things that are okay for C programmers
but might be awkward to deal with in plpgsql code.  At the very least it
would greatly constrain what data types you could usefully store.

So I'd be inclined to leave that out, at least for the first version.

regards, tom lane

Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Pavel Stehule

st 20. 1. 2021 v 21:32 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > st 20. 1. 2021 v 21:07 odesílatel Tom Lane  napsal:
> >> Uh, what?  I don't understand what this "partition context" is.
>
> > It was my name for an access to window partition local memory -
> > WinGetPartitionLocalMemory
>
> Ah.
>
> > We need some interface for this cache
>
> I'm not convinced we need to expose that, or that it'd be very
> satisfactory to plpgsql users if we did.  The fact that it's fixed-size
> and initializes to zeroes are both things that are okay for C programmers
> but might be awkward to deal with in plpgsql code.  At the very least it
> would greatly constrain what data types you could usefully store.
>
> So I'd be inclined to leave that out, at least for the first version.
>

I think this functionality is relatively important. If somebody tries to
implement own window function, then he starts with some variation of the
row_num function.

We can support only types of fixed length to begin.

Regards

Pavel



>
> regards, tom lane
>

Re: Deleting older versions in unique indexes to avoid page splits

2021-01-20 Thread Peter Geoghegan

On Wed, Jan 20, 2021 at 10:53 AM Peter Geoghegan  wrote:
> This patch is unusual in that you really need to think about emergent
> behaviors to understand it. That is certainly a difficult thing to do,
> and it's understandable that even an expert might not grok it without
> considering it carefully.

I happened to stumble upon a recent blog post that seems like a light,
approachable introduction to some of the key concepts here:

https://jessitron.com/2021/01/18/when-costs-are-nonlinear-keep-it-small/

Bottom-up index deletion enhances a complex system whose maintenance
costs are *dramatically* nonlinear, at least in many important cases.
If you apply linear thinking to such a system then you'll probably end
up with a bad design.

The system as a whole is made efficient by making sure that we're lazy
when that makes sense, while also making sure that we're eager when
that makes sense. So it almost *has* to be structured as a bottom-up,
reactive mechanism -- no other approach is able to ramp up or down in
exactly the right way. Talking about small cost differences (things
that can easily be empirically measured, perhaps with a
microbenchmark) is almost irrelevant to the big picture. It's even
irrelevant to the "medium picture".

What's more, it's basically a mistake to think of heap page accesses
that don't yield any deletable index tuples as wasted effort (even
though that's how I describe them myself!). Here's why: we have to
access the heap page to learn that it has nothing for us in the first
place place! If we somehow knew ahead of time that some useless-to-us
heap block was useless, then the whole system wouldn't be bottom-up
(by definition). In other words, failing to get any index tuple
deletes from an entire heap page *is itself a form of feedback* at the
local level -- it guides the entire system's behavior over time. Why
should we expect to get that information at zero cost?

This is somehow both simple and complicated, which creates huge
potential for miscommunications. I tried to describe this in various
ways at various points. Perhaps I could have done better with that.

-- 
Peter Geoghegan

Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-20 Thread James Hilliard

On Tue, Jan 19, 2021 at 6:37 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > Actually, this looks path looks wrong in general, the value for
> > "xcrun --sdk macosx --show-sdk-path" should take precedence over
> > "xcrun --show-sdk-path" as the latter may be used for IOS potentially.
>
> What is "potentially"?

Well I'm not sure the SDK parameter always defaults to macos although
I guess it probably does as I couldn't figure out a way to change it:
$ xcodebuild -showsdks
iOS SDKs:
iOS 14.3  -sdk iphoneos14.3
iOS Simulator SDKs:
Simulator - iOS 14.3  -sdk iphonesimulator14.3
macOS SDKs:
DriverKit 20.2-sdk driverkit.macosx20.2
macOS 11.1-sdk macosx11.1
tvOS SDKs:
tvOS 14.3 -sdk appletvos14.3
tvOS Simulator SDKs:
Simulator - tvOS 14.3 -sdk appletvsimulator14.3
watchOS SDKs:
watchOS 7.2   -sdk watchos7.2
watchOS Simulator SDKs:
Simulator - watchOS 7.2   -sdk watchsimulator7.2

> I've found no direct means to control the
> SDK path at all, but so far it appears that "xcrun --show-sdk-path"
> agrees with the compiler's default -isysroot path as seen in the
> compiler's -v output.  I suspect that this isn't coincidental,
> but reflects xcrun actually being used in the compiler launch
> process.  If it were to flip over to using a IOS SDK, that would
> mean that bare "cc" would generate nonfunctional executables,
> which just about any onlooker would agree is broken.

So there's some more weirdness involved here, whether or not you
have the command line install seems to affect the output of the
"xcrun --show-sdk-path" command, but not the
"xcrun --sdk macosx --show-sdk-path" command.

This is what I get without the command line tools:
$ xcrun --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
$ xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
this last one is just a symlink to the other path.

With command line tools this is different however:
$ xcrun --show-sdk-path
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
$ xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk

Note that the /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
is different from the normal SDK and doesn't seem to be able to generate
binaries that target a 11.0 deployment target on my 10.15 system, however
I am unsure if this behavior can be relied upon.

So in terms of what works best, the newer normal SDK has the most flexibility
as it can produce both 10.15 target binaries and 11.0 target binaries
depending on the MACOSX_DEPLOYMENT_TARGET while the command
line tools SDK can only produce 10.15 target binaries it would appear.

Note that with my patch the binaries will always be compatible with the
host system by default, even if the SDK is capable of producing binaries
that are incompatible so building postgres works with and without the
command line tools SDK.

So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable
but either should work as long as we can properly detect deployment
target symbol availability, regardless this SDK sysroot selection issue is
effectively an entirely different issue from the feature detection not properly
respecting the configured deployment target.

>
> I'm really not excited about trying to make the build work with
> a non-native SDK as you are proposing.  I think that's just going
> to lead to a continuing stream of problems, because of Apple's
> opinions about how cross-version compatibility should work.

Well the minimum required target version is pretty much strictly based on
MACOSX_DEPLOYMENT_TARGET so our feature detection still needs
to use that, otherwise cross target compilation for newer or older targets will
not work correctly.

>From my understanding the reason AC_REPLACE_FUNCS does not
throw an error for deployment target incompatible functions is that it only
checks if the function exists and not if it is actually useable, this is
why I had to add an explicit AC_LANG_PROGRAM compile test to
properly trigger a compile failure if the function is not usable for a
particular deployment target version, merely checking if the function
exists in the header is not sufficient.

> It also seems like unnecessary complexity, because there is always
> (AFAICS) a native SDK version available.  We just need to find it.

Best I can tell this is not true, it is some(most?) of the time but
it's not something
we can rely upon as systems may only contain a newer SDK, but this newer SDK
is still capable of producing binaries that can run on the build host system so
this shouldn't be an issue as long as we can do target feature
detection properly.

>
> regards, tom lane

Re: list of extended statistics on psql

2021-01-20 Thread Tomas Vondra





On 1/20/21 7:41 AM, Tatsuro Yamada wrote:

Hi Tomas,

On 2021/01/20 11:35, Tatsuro Yamada wrote:
Apologies for all the extra work - I haven't realized this flaw when 
pushing for showing more stuff :-(


Don't worry about it. We didn't notice the problem even when viewed by 
multiple

people on -hackers. Let's keep moving forward. :-D

I'll send a patch including a regression test on the next patch.



I created patches and my test results on PG10, 11, 12, and 14 are fine.

   0001:
     - Fix query to use pg_statistic_ext only
     - Replace statuses "required" and "built" with "defined"
     - Remove the size columns
     - Fix document
     - Add schema name as a filter condition on the query

   0002:
     - Fix all results of \dX
     - Add new testcase by non-superuser

Please find attached files. :-D


Thanks, I've pushed this. I had to tweak the regression tests a bit, for 
two reasons:


1) to change user in regression tests, don't use \connect, but SET ROLE 
and RESET ROLE


2) roles in regression tests should use names with regress_ prefix



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_upgrade fails with non-standard ACL

2021-01-20 Thread Anastasia Lubennikova


On 03.01.2021 14:29, Noah Misch wrote:

On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote:

On 08.06.2020 19:31, Alvaro Herrera wrote:

I'm thinking what's a good way to have a test that's committable.  Maybe
it would work to add a TAP test to pg_upgrade that runs initdb, does a
few GRANTS as per your attachment, then runs pg_upgrade?  Currently the
pg_upgrade tests are not TAP ... we'd have to revive
https://postgr.es/m/20180126080026.gi17...@paquier.xyz
(Some progress was already made on the buildfarm front to permit this)

I would be glad to add some test, but it seems to me that the infrastructure
changes for cross-version pg_upgrade test is much more complicated task than
this modest bugfix.

Agreed.

Thank you for the review.
New version of the patch is attached, though I haven't tested it 
properly yet. I am planning to do in a couple of days.

--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
+static void
+check_for_changed_signatures(void)
+{
+   snprintf(output_path, sizeof(output_path), "revoke_objects.sql");

If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database in
which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened
requires a GRANT.  Can you use a file name that will still make sense if
someone enhances pg_upgrade to generate such GRANT statements?
I changed the name to 'fix_system_objects_ACL.sql'. Does it look good to 
you?





+   if (script == NULL && (script = fopen_priv(output_path, 
"w")) == NULL)
+   pg_fatal("could not open file \"%s\": 
%s\n",
+output_path, 
strerror(errno));

Use %m instead of passing sterror(errno) to %s.

Done.

+   }
+
+   /* Handle columns separately */
+   if (strstr(aclinfo->obj_type, "column") != NULL)
+   {
+   char   *pdot = 
last_dot_location(aclinfo->obj_ident);

I'd write strrchr(aclinfo->obj_ident, '.') instead of introducing
last_dot_location().

This assumes column names don't contain '.'; how difficult would it be to
remove that assumption?  If it's difficult, a cheap approach would be to
ignore any entry containing too many dots.  We're not likely to create such
column names at initdb time, but v13 does allow:

   ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b";
   GRANT SELECT ("a.b") ON pg_locks TO backup;

After which revoke_objects.sql has:

   psql:./revoke_objects.sql:9: ERROR:  syntax error at or near "") ON 
pg_catalog.pg_locks.""
   LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup";

While that ALTER VIEW probably should have required allow_system_table_mods,
we need to assume such databases exist.


I've fixed it by using pg_identify_object_as_address(). Now we don't 
have to parse obj_identity.




Overall, this patch predicts a subset of cases where pg_dump will emit a
failing GRANT or REVOKE that targets a pg_catalog object.  Can you write a
code comment stating what it does and does not detect?  I think it's okay to
not predict every failure, but let's record the intended coverage.  Here are a
few examples I know; there may be more.  The above query only finds GRANTs to
non-pinned roles.  pg_dump dumps the following, but the above query doesn't
find them:

   REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public;
   GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend;

The above query should test refclassid.




This function should not run queries against servers older than 9.6, because
pg_dump emits GRANT/REVOKE for pg_catalog objects only when targeting 9.6 or
later.  An upgrade from 8.4 to master is failing on the above query:


Fixed.


The patch adds many lines wider than 78 columns, this being one example.  In
general, break such lines.  (Don't break string literal arguments of
ereport(), though.)

Fixed.

nm



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 43fc297eb6..429e4468f2 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -16,6 +16,7 @@
 
 static void check_new_cluster_is_empty(void);
 static void check_databases_are_compatible(void);
+static void check_for_changed_signatures(void);
 static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb);
 static bool equivalent_locale(int category, const char *loca, const char *locb);
 static void check_is_install_user(ClusterInfo *cluster);
@@ -151,6 +152,8 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804)
 		new_9_0_populate_pg_largeobject_metadata(&old_cluster, true);
 
+	get_non_default_acl_infos(&old_cluster);
+
 	/*
 	 * While not a check option, we do this now because this

Re: ResourceOwner refactoring

2021-01-20 Thread Heikki Linnakangas


On 19/01/2021 11:09, Heikki Linnakangas wrote:

On 18/01/2021 18:11, Robert Haas wrote:

On Mon, Jan 18, 2021 at 11:11 AM Robert Haas  wrote:

On Mon, Jan 18, 2021 at 10:19 AM Heikki Linnakangas  wrote:

On 18/01/2021 16:34, Alvaro Herrera wrote:

So according to your performance benchmark, we're willing to accept a
30% performance loss on an allegedly common operation -- numkeep=0
numsnaps=10 becomes 49.8ns from 37.6ns.  That seems a bit shocking.
Maybe you can claim that these operations aren't exactly hot spots, and
so the fact that we remain in the same power-of-ten is sufficient.  Is
that the argument?


That's right. The fast path is fast, and that's important. The slow path
becomes 30% slower, but that's acceptable.


I don't know whether a 30% slowdown will hurt anybody, but it seems
like kind of a lot, and I'm not sure I understand what corresponding
benefit we're getting.


The benefit is to make it easy for extensions to use resource owners to
track whatever resources they need to track. And arguably, the new
mechanism is nicer for built-in code, too.

I'll see if I can optimize the slow paths, to make it more palatable.


Ok, here's a new set of patches, and new test results. I replaced the 
hash function with a cheaper one. I also added the missing wrappers that 
Alvaro and Hayato Kuroda commented on, and fixed the typos that Michael 
Paquier pointed out.


In the test script, I increased the number of iterations used in the 
tests, to make them run longer and produce more stable results. There is 
still a fair amount of jitter in the results, so take any particular 
number with a grain of salt, but the overall trend is repeatable.


The results now look like this:

Unpatched
-

postgres=# \i snaptest.sql
 numkeep | numsnaps | lifo_time_ns | fifo_time_ns
-+--+--+--
   0 |1 | 32.7 | 32.3
   0 |5 | 33.0 | 32.9
   0 |   10 | 34.1 | 35.5
   0 |   60 | 32.5 | 38.3
   0 |   70 | 47.6 | 48.9
   0 |  100 | 47.9 | 49.3
   0 | 1000 | 52.9 | 52.7
   0 |1 | 61.7 | 62.4
   9 |   10 | 38.4 | 37.6
   9 |  100 | 42.3 | 42.3
   9 | 1000 | 43.9 | 45.0
   9 |1 | 62.2 | 62.5
  65 |   70 | 42.4 | 42.9
  65 |  100 | 43.2 | 43.9
  65 | 1000 | 44.0 | 45.1
  65 |1 | 62.4 | 62.6
(16 rows)

Patched
---

postgres=# \i snaptest.sql
 numkeep | numsnaps | lifo_time_ns | fifo_time_ns
-+--+--+--
   0 |1 | 32.8 | 34.2
   0 |5 | 33.8 | 32.2
   0 |   10 | 37.2 | 40.2
   0 |   60 | 40.2 | 45.1
   0 |   70 | 40.9 | 48.4
   0 |  100 | 41.9 | 45.2
   0 | 1000 | 49.0 | 55.6
   0 |1 | 62.4 | 67.2
   9 |   10 | 33.6 | 33.0
   9 |  100 | 39.6 | 39.7
   9 | 1000 | 42.2 | 45.0
   9 |1 | 60.7 | 63.9
  65 |   70 | 32.5 | 33.6
  65 |  100 | 37.5 | 39.7
  65 | 1000 | 42.3 | 46.3
  65 |1 | 61.9 | 64.8
(16 rows)

For easier side-by-side comparison, here are the same numbers with the 
patched and unpatched results side by side, and their ratio (ratio < 1 
means the patched version is faster):


LIFO tests:
 numkeep | numsnaps | unpatched | patched | ratio
-+--+---+-+---
   0 |1 |  32.7 |32.8 |  1.00
   0 |5 |  33.0 |33.8 |  1.02
   0 |   10 |  34.1 |37.2 |  1.09
   0 |   60 |  32.5 |40.2 |  1.24
   0 |   70 |  47.6 |40.9 |  0.86
   0 |  100 |  47.9 |41.9 |  0.87
   0 | 1000 |  52.9 |49.0 |  0.93
   0 |1 |  61.7 |62.4 |  1.01
   9 |   10 |  38.4 |33.6 |  0.88
   9 |  100 |  42.3 |39.6 |  0.94
   9 | 1000 |  43.9 |42.2 |  0.96
   9 |1 |  62.2 |60.7 |  0.98
  65 |   70 |  42.4 |32.5 |  0.77
  65 |  100 |  43.2 |37.5 |  0.87
  65 | 1000 |  44.0 |42.3 |  0.96
  65 |1 |  62.4 |61.9 |  0.99
(16 rows)


FIFO tests:
 numkeep | numsnaps | unpatched | patched | ratio
-+--+---+-+---
   0 |1 |  32.3 |34.2 |  1.06
   0 |5 |  32.9 |32.2 |  0.98
   0 |   10 |  35.5 |40.2 |  1.13

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


OK, pushed after a little bit of additional polishing (mostly comments).

Thanks everyone!

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-20 Thread Tom Lane

James Hilliard  writes:
> On Tue, Jan 19, 2021 at 6:37 PM Tom Lane  wrote:
>> I've found no direct means to control the
>> SDK path at all, but so far it appears that "xcrun --show-sdk-path"
>> agrees with the compiler's default -isysroot path as seen in the
>> compiler's -v output.  I suspect that this isn't coincidental,
>> but reflects xcrun actually being used in the compiler launch
>> process.  If it were to flip over to using a IOS SDK, that would
>> mean that bare "cc" would generate nonfunctional executables,
>> which just about any onlooker would agree is broken.

> So there's some more weirdness involved here, whether or not you
> have the command line install seems to affect the output of the
> "xcrun --show-sdk-path" command, but not the
> "xcrun --sdk macosx --show-sdk-path" command.

Yeah, that's what we discovered in the other thread.  It seems that
with "--sdk macosx" you'll always get a pointer to the (solitary)
SDK under /Applications/Xcode.app, but with the short "xcrun
--show-sdk-path" command you might get either that or a pointer to
something under /Library/Developer/CommandLineTools.

I now believe what is actually happening with the short command is
that it's iterating through the available SDKs (according to some not
very clear search path) and picking the first one it finds that
matches the host system version.  That matches the ktrace evidence
that shows it reading the SDKSettings.plist file in each SDK
directory.  The fact that it can seize on either an actual directory
or an equivalent symlink might be due to chance ordering of directory
entries.  (It'd be interesting to see "ls -f" output for your
/Library/Developer/CommandLineTools/SDKs directory ... though if
you've been experimenting with deinstall/reinstall, there's no
reason to suppose the entry order is still the same.)

I'm not sure that the case of not having the "command line tools"
installed is interesting for our purposes.  AFAIK you have to have
that in order to have access to required tools like bison and gmake.
(That reminds me, I was intending to add something to our docs
about how-to-build-from-source to say that you need to install those.)

> Note that with my patch the binaries will always be compatible with the
> host system by default, even if the SDK is capable of producing binaries
> that are incompatible so building postgres works with and without the
> command line tools SDK.

Yeah.  I don't see that as a benefit actually.  Adding the
-no_weak_imports linker switch (or the other one you're suggesting)
means that you *cannot* cross-compile for a newer macOS version,
even if you set PG_SYSROOT and/or MACOSX_DEPLOYMENT_TARGET with the
intention of doing so.  You'll still get a build that reflects the set
of kernel calls available on the host system.  Admittedly, this is a
case that's not likely to be of interest to very many people, but
I don't see why a method with that restriction is superior to picking
a default SDK that matches the host system (and can be overridden).

> So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable
> but either should work as long as we can properly detect deployment
> target symbol availability, regardless this SDK sysroot selection issue is
> effectively an entirely different issue from the feature detection not 
> properly
> respecting the configured deployment target.

No, I think it's pretty much equivalent.  If we pick the right SDK
then we'll get the build we want.

regards, tom lane

Re: Allow matching whole DN from a client certificate

2021-01-20 Thread Jacob Champion

On Wed, 2021-01-20 at 19:07 +, Jacob Champion wrote:
> I think you'll want to be careful to specify the format as much as
> possible, both to make sure that other backend TLS implementations can
> actually use the same escaping system and to ensure that user regexes
> don't suddenly start matching different things at some point in the
> future.

Along those lines: the current implementation doesn't escape commas in
fields, which means you can inject them to force a bad regex match. For
instance, when using the usermap that's in the patch:

dn"/^.*OU=Testing,.*$"username

if I create a certificate with the Organizational Unit name "Testing,
or something", then that will also match. Switching to RFC 2253/4514
quoting fixes comma injection (and reverses the order of the RDNs,
which requires a change to the regex patterns). But I think that the
regex as supplied still isn't strong enough to prevent problems.

For example, the equals sign isn't a delimiter and therefore isn't
quoted. So if I get my CA to sign a certificate with some arbitrary
field value of "HEY YOU=Testing", then that will also match the above
usermap. You'd need to write the regex with extreme attention to detail
and a full understanding of the escaping scheme to get around that --
assuming that the scheme is generally parsable with regexes to begin
with.

> I'm going to test this patch with some UTF-8 DNs later today; I'll share my 
> findings.

UTF-8 has the opposite issue; it's escaped in a way that makes it
unusable in a regex match. For example, say I have a (simple for the
sake of example, but broken as noted above) usermap of

dn"/^CN=([^,]*).*$"\1

which is supposed to emulate the functionality of the "clientname=CN"
mode, and two users named "postgres" and "οδυσσέας". The "CN=postgres"
user will work just fine, but the UTF-8 CN of "οδυσσέας" will be
escaped into "\U03BF\U03B4\U03C5\U03C3\U03C3\U03AD\U03B1\U03C2" and
fail to match the internal user. (I'm not seeing an RFC describe the
"\U" escaping scheme; maybe it's OpenSSL-specific?)

--Jacob

Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-20 Thread James Hilliard

On Wed, Jan 20, 2021 at 4:07 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > On Tue, Jan 19, 2021 at 6:37 PM Tom Lane  wrote:
> >> I've found no direct means to control the
> >> SDK path at all, but so far it appears that "xcrun --show-sdk-path"
> >> agrees with the compiler's default -isysroot path as seen in the
> >> compiler's -v output.  I suspect that this isn't coincidental,
> >> but reflects xcrun actually being used in the compiler launch
> >> process.  If it were to flip over to using a IOS SDK, that would
> >> mean that bare "cc" would generate nonfunctional executables,
> >> which just about any onlooker would agree is broken.
>
> > So there's some more weirdness involved here, whether or not you
> > have the command line install seems to affect the output of the
> > "xcrun --show-sdk-path" command, but not the
> > "xcrun --sdk macosx --show-sdk-path" command.
>
> Yeah, that's what we discovered in the other thread.  It seems that
> with "--sdk macosx" you'll always get a pointer to the (solitary)
> SDK under /Applications/Xcode.app, but with the short "xcrun
> --show-sdk-path" command you might get either that or a pointer to
> something under /Library/Developer/CommandLineTools.
>
> I now believe what is actually happening with the short command is
> that it's iterating through the available SDKs (according to some not
> very clear search path) and picking the first one it finds that
> matches the host system version.  That matches the ktrace evidence
> that shows it reading the SDKSettings.plist file in each SDK
> directory.  The fact that it can seize on either an actual directory
> or an equivalent symlink might be due to chance ordering of directory
> entries.  (It'd be interesting to see "ls -f" output for your
> /Library/Developer/CommandLineTools/SDKs directory ... though if

Well at the moment I completely deleted that directory...and the build
works fine with my patch still.

> you've been experimenting with deinstall/reinstall, there's no
> reason to suppose the entry order is still the same.)
>
> I'm not sure that the case of not having the "command line tools"
> installed is interesting for our purposes.  AFAIK you have to have
> that in order to have access to required tools like bison and gmake.
> (That reminds me, I was intending to add something to our docs
> about how-to-build-from-source to say that you need to install those.)

Yeah, not 100% sure but I was able to build just fine after deleting my
command line tools. I think it just switched to using the normal SDK
toolchain, I guess that's the fallback logic doing that.

It would be pretty annoying to have to install an outdated SDK just to
build postgres for no other reason than the autoconf feature detection
being broken.

>
> > Note that with my patch the binaries will always be compatible with the
> > host system by default, even if the SDK is capable of producing binaries
> > that are incompatible so building postgres works with and without the
> > command line tools SDK.
>
> Yeah.  I don't see that as a benefit actually.  Adding the
> -no_weak_imports linker switch (or the other one you're suggesting)
> means that you *cannot* cross-compile for a newer macOS version,
> even if you set PG_SYSROOT and/or MACOSX_DEPLOYMENT_TARGET with the
> intention of doing so.

Best I can tell this isn't true, I was able to cross compile for a newer
MACOSX_DEPLOYMENT_TARGET than my build host just fine. The
binary fails with a "Symbol not found: _pwritev" error when I try
to run it on the system that built it.

In regards to the -no_weak_imports switch...that is something different
from my understanding as it just strips the weak imports forcing the
fallback code paths to be taken instead, essentially functioning as if
the weak symbols are never available. It's largely separate from the
deployment target from my understanding as weak symbols are feature
that lets you use newer syscalls while still providing backwards
compatible fallbacks for older systems.

> You'll still get a build that reflects the set
> of kernel calls available on the host system.  Admittedly, this is a
> case that's not likely to be of interest to very many people, but
> I don't see why a method with that restriction is superior to picking
> a default SDK that matches the host system (and can be overridden).

But to fix the build when using a newer SDK overriding the SDK location
does not help, you would have to override the broken feature detection.

>
> > So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable
> > but either should work as long as we can properly detect deployment
> > target symbol availability, regardless this SDK sysroot selection issue is
> > effectively an entirely different issue from the feature detection not 
> > properly
> > respecting the configured deployment target.
>
> No, I think it's pretty much equivalent.  If we pick the right SDK
> then we'll get the build we want.

Generally any recent SDK installed should

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15

It's a i386 machine running FreeBSD, so not sure what exactly it's picky 
about. But when I tried running this under valgrind, I get some strange 
failures in the new chunk in ExecInitModifyTable:


  /*
   * Determine if the FDW supports batch insert and determine the batch
   * size (a FDW may support batching, but it may be disabled for the
   * server/table).
   */
  if (!resultRelInfo->ri_usesFdwDirectModify &&
  operation == CMD_INSERT &&
  resultRelInfo->ri_FdwRoutine != NULL &&
  resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
  resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
  resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
  else
  resultRelInfo->ri_BatchSize = 1;

  Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like 
that. I wouldn't be surprised if the 32-bit machine was pickier and 
failing because of that.


A sample of the valgrind log is attached. It's pretty much just 
repetitions of these three reports.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
==186005== Invalid read of size 1
==186005==at 0x759C12: ExecInitModifyTable (nodeModifyTable.c:2801)
==186005==by 0x720B35: ExecInitNode (execProcnode.c:174)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x9495B1: ProcessQuery (pquery.c:155)
==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267)
==186005==by 0x94A4C8: PortalRun (pquery.c:779)
==186005==by 0x94438E: exec_simple_query (postgres.c:1240)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005==  Address 0xe594f78 is 1,864 bytes inside a block of size 8,192 alloc'd
==186005==at 0x483A809: malloc (vg_replace_malloc.c:307)
==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468)
==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252)
==186005==by 0x7299DD: CreateExprContext (execUtils.c:302)
==186005==by 0x729C5E: ExecAssignExprContext (execUtils.c:481)
==186005==by 0x75D24D: ExecInitSeqScan (nodeSeqscan.c:147)
==186005==by 0x720BEF: ExecInitNode (execProcnode.c:207)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x949E64: PortalStart (pquery.c:505)
==186005==by 0x9442A2: exec_simple_query (postgres.c:1201)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005== 
{
   
   Memcheck:Addr1
   fun:ExecInitModifyTable
   fun:ExecInitNode
   fun:InitPlan
   fun:standard_ExecutorStart
   fun:ExecutorStart
   fun:ProcessQuery
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==186005== Invalid write of size 4
==186005==at 0x759C74: ExecInitModifyTable (nodeModifyTable.c:2809)
==186005==by 0x720B35: ExecInitNode (execProcnode.c:174)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x9495B1: ProcessQuery (pquery.c:155)
==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267)
==186005==by 0x94A4C8: PortalRun (pquery.c:779)
==186005==by 0x94438E: exec_simple_query (postgres.c:1240)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005==  Address 0xe594f80 is 1,872 bytes inside a block of size 8,192 alloc'd
==186005==at 0x483A809: malloc (vg_replace_malloc.c:307)
==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468)
==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252)
==186005==by

Re: list of extended statistics on psql

2021-01-20 Thread Tatsuro Yamada


Hi Tomas and hackers,

On 2021/01/21 7:00, Tomas Vondra wrote:

I created patches and my test results on PG10, 11, 12, and 14 are fine.

   0001:
 - Fix query to use pg_statistic_ext only
 - Replace statuses "required" and "built" with "defined"
 - Remove the size columns
 - Fix document
 - Add schema name as a filter condition on the query

   0002:
 - Fix all results of \dX
 - Add new testcase by non-superuser

Please find attached files. :-D


Thanks, I've pushed this. I had to tweak the regression tests a bit, for two 
reasons:

1) to change user in regression tests, don't use \connect, but SET ROLE and 
RESET ROLE

2) roles in regression tests should use names with regress_ prefix



Thanks for reviewing many times and committing the feature!

I understood 1) and 2). I'll keep that in mind for the next developing patch.
Then, If possible, could you add Justin to the commit message as a reviewer?
Because I revised the document partly based on his comments.

Finally, As extended stats were more used, this feature becomes more useful.
I hope it helps DBA. :-D


Thanks,
Tatsuro Yamada

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane

Tomas Vondra  writes:
> OK, pushed after a little bit of additional polishing (mostly comments).
> Thanks everyone!

florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...

regards, tom lane

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

diff --git a/src/backend/executor/nodeModifyTable.c
b/src/backend/executor/nodeModifyTable.c
index 9c36860704..a6a814454d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,17 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  * size (a FDW may support batching, but it may be disabled for the
  * server/table).
  */
-if (!resultRelInfo->ri_usesFdwDirectModify &&
+if (!mtstate->resultRelInfo->ri_usesFdwDirectModify &&
 operation == CMD_INSERT &&
-resultRelInfo->ri_FdwRoutine != NULL &&
-resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-resultRelInfo->ri_BatchSize =
-
 resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+mtstate->resultRelInfo->ri_FdwRoutine != NULL &&
+mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+mtstate->resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+mtstate->resultRelInfo->ri_BatchSize =
+
 
mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(mtstate->resultRelInfo);
 else
-resultRelInfo->ri_BatchSize = 1;
+mtstate->resultRelInfo->ri_BatchSize = 1;

-Assert(resultRelInfo->ri_BatchSize >= 1);
+Assert(mtstate->resultRelInfo->ri_BatchSize >= 1);

 /*
  * Lastly, if this is not the primary (canSetTag) ModifyTable node,
add it

Cheers

On Wed, Jan 20, 2021 at 3:52 PM Tomas Vondra 
wrote:

> Hmm, seems that florican doesn't like this :-(
>
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15
>
> It's a i386 machine running FreeBSD, so not sure what exactly it's picky
> about. But when I tried running this under valgrind, I get some strange
> failures in the new chunk in ExecInitModifyTable:
>
>/*
> * Determine if the FDW supports batch insert and determine the batch
> * size (a FDW may support batching, but it may be disabled for the
> * server/table).
> */
>if (!resultRelInfo->ri_usesFdwDirectModify &&
>operation == CMD_INSERT &&
>resultRelInfo->ri_FdwRoutine != NULL &&
>resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
>resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
>resultRelInfo->ri_BatchSize =
>
> resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
>else
>resultRelInfo->ri_BatchSize = 1;
>
>Assert(resultRelInfo->ri_BatchSize >= 1);
>
> It seems as if the resultRelInfo is not initialized, or something like
> that. I wouldn't be surprised if the 32-bit machine was pickier and
> failing because of that.
>
> A sample of the valgrind log is attached. It's pretty much just
> repetitions of these three reports.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Heap's backwards scan scans the incorrect pages with heap_setscanlimits()

2021-01-20 Thread David Rowley

Hackers,

It looks like both heapgettup() and heapgettup_pagemode() are coded
incorrectly when setting the page to start the scan on for a backwards
scan when heap_setscanlimits() has been used.

It looks like the code was not updated during 7516f5259.

The current code is:

/* start from last page of the scan */
if (scan->rs_startblock > 0)
page = scan->rs_startblock - 1;
else
page = scan->rs_nblocks - 1;


Where rs_startblock is either the sync scan start location, or the
start page set by heap_setscanlimits().  rs_nblocks is the number of
blocks in the relation.

Let's say we have a 100 block relation and we want to scan blocks 10
to 30 in a backwards order. We'll do heap_setscanlimits(scan, 10, 21);
to indicate that we want to scan 21 blocks starting at page 10 and
finishing after scanning page 30.

What the code above does is wrong.  Since rs_startblock is > 0 we'll execute:

page = scan->rs_nblocks - 1;

i.e. 99. then proceed to scan blocks all blocks down to 78.  Oops. Not
quite the 10 to 30 that we asked for.

Now, it does not appear that there are any live bugs here, in core at
least.  The only usage I see of heap_setscanlimits() is in
heapam_index_build_range_scan() to which I see the scan is a forward
scan. I only noticed the bug as I'm in the middle of fixing up [1] to
implement backwards TID Range scans.

Proposed patch attached.

Since this is not a live bug, is it worth a backpatch?  I guess some
extensions could suffer from this, I'm just not sure how likely that
is as if anyone is doing backwards scanning with a setscanlimits set,
then they'd surely have noticed this already!?

David

[1] 
https://postgr.es/m/CAMyN-kB-nFTkF=va_jpwfno08s0d-yk0f741s2b7ldmyai8...@mail.gmail.com
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faffbb1865..ddd214b7af 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -603,11 +603,15 @@ heapgettup(HeapScanDesc scan,
 * forward scanners.
 */
scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-   /* start from last page of the scan */
-   if (scan->rs_startblock > 0)
-   page = scan->rs_startblock - 1;
+
+   /*
+* Start from last page of the scan.  Ensure we take 
into account
+* rs_numblocks if it's been adjusted by 
heap_setscanlimits().
+*/
+   if (scan->rs_numblocks != InvalidBlockNumber)
+   page = (scan->rs_startblock + 
scan->rs_numblocks - 1) % scan->rs_nblocks;
else
-   page = scan->rs_nblocks - 1;
+   page = (scan->rs_startblock + scan->rs_nblocks 
- 1) % scan->rs_nblocks;
heapgetpage((TableScanDesc) scan, page);
}
else
@@ -918,11 +922,15 @@ heapgettup_pagemode(HeapScanDesc scan,
 * forward scanners.
 */
scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-   /* start from last page of the scan */
-   if (scan->rs_startblock > 0)
-   page = scan->rs_startblock - 1;
+
+   /*
+* Start from last page of the scan.  Ensure we take 
into account
+* rs_numblocks if it's been adjusted by 
heap_setscanlimits().
+*/
+   if (scan->rs_numblocks != InvalidBlockNumber)
+   page = (scan->rs_startblock + 
scan->rs_numblocks - 1) % scan->rs_nblocks;
else
-   page = scan->rs_nblocks - 1;
+   page = (scan->rs_startblock + scan->rs_nblocks 
- 1) % scan->rs_nblocks;
heapgetpage((TableScanDesc) scan, page);
}
else

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


On 1/21/21 12:52 AM, Tomas Vondra wrote:

Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15 



It's a i386 machine running FreeBSD, so not sure what exactly it's picky 
about. But when I tried running this under valgrind, I get some strange 
failures in the new chunk in ExecInitModifyTable:


   /*
    * Determine if the FDW supports batch insert and determine the batch
    * size (a FDW may support batching, but it may be disabled for the
    * server/table).
    */
   if (!resultRelInfo->ri_usesFdwDirectModify &&
   operation == CMD_INSERT &&
   resultRelInfo->ri_FdwRoutine != NULL &&
   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
   resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
   else
   resultRelInfo->ri_BatchSize = 1;

   Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like 
that. I wouldn't be surprised if the 32-bit machine was pickier and 
failing because of that.


A sample of the valgrind log is attached. It's pretty much just 
repetitions of these three reports.




OK, it's definitely accessing uninitialized memory, because the 
resultRelInfo (on line 2801, i.e. the "if" condition) looks like this:


(gdb) p resultRelInfo
$1 = (ResultRelInfo *) 0xe595988
(gdb) p *resultRelInfo
$2 = {type = 2139062142, ri_RangeTableIndex = 2139062143, 
ri_RelationDesc = 0x7f7f7f7f7f7f7f7f, ri_NumIndices = 2139062143, 
ri_IndexRelationDescs = 0x7f7f7f7f7f7f7f7f, ri_IndexRelationInfo = 
0x7f7f7f7f7f7f7f7f,
  ri_TrigDesc = 0x7f7f7f7f7f7f7f7f, ri_TrigFunctions = 
0x7f7f7f7f7f7f7f7f, ri_TrigWhenExprs = 0x7f7f7f7f7f7f7f7f, 
ri_TrigInstrument = 0x7f7f7f7f7f7f7f7f, ri_ReturningSlot = 
0x7f7f7f7f7f7f7f7f, ri_TrigOldSlot = 0x7f7f7f7f7f7f7f7f,
  ri_TrigNewSlot = 0x7f7f7f7f7f7f7f7f, ri_FdwRoutine = 
0x7f7f7f7f7f7f7f7f, ri_FdwState = 0x7f7f7f7f7f7f7f7f, 
ri_usesFdwDirectModify = 127, ri_NumSlots = 2139062143, ri_BatchSize = 
2139062143, ri_Slots = 0x7f7f7f7f7f7f7f7f,
  ri_PlanSlots = 0x7f7f7f7f7f7f7f7f, ri_WithCheckOptions = 
0x7f7f7f7f7f7f7f7f, ri_WithCheckOptionExprs = 0x7f7f7f7f7f7f7f7f, 
ri_ConstraintExprs = 0x7f7f7f7f7f7f7f7f, ri_GeneratedExprs = 
0x7f7f7f7f7f7f7f7f,
  ri_NumGeneratedNeeded = 2139062143, ri_junkFilter = 
0x7f7f7f7f7f7f7f7f, ri_returningList = 0x7f7f7f7f7f7f7f7f, 
ri_projectReturning = 0x7f7f7f7f7f7f7f7f, ri_onConflictArbiterIndexes = 
0x7f7f7f7f7f7f7f7f,
  ri_onConflict = 0x7f7f7f7f7f7f7f7f, ri_PartitionCheckExpr = 
0x7f7f7f7f7f7f7f7f, ri_PartitionRoot = 0x7f7f7f7f7f7f7f7f, 
ri_RootToPartitionMap = 0x8, ri_PartitionTupleSlot = 0x8, 
ri_ChildToRootMap = 0xe5952b0,

  ri_CopyMultiInsertBuffer = 0xe596740}
(gdb)

I may be wrong, but the most likely explanation seems to be this is due 
to the junk filter initialization, which simply moves past the end of 
the mtstate->resultRelInfo array.


It kinda seems the GetForeignModifyBatchSize call should happen before 
that block. The attached patch fixes this for me (i.e. regression tests 
pass with no valgrind reports.


Or did I get that wrong?

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..2ac4999dc8 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2672,6 +2672,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (!resultRelInfo->ri_usesFdwDirectModify &&
+		operation == CMD_INSERT &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* select first subplan */
 	mtstate->mt_whichplan = 0;
 	subplan = (Plan *) linitial(node->plans);
@@ -2793,23 +2810,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
-	/*
-	 * Determine if the FDW supports batch insert and determine the batch
-	 * size (a FDW may support batching, but it may be disabled for the
-	 * server/table).
-	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo

Re: Printing LSN made easy

2021-01-20 Thread Kyotaro Horiguchi

At Wed, 20 Jan 2021 16:40:59 +0900, Michael Paquier  wrote 
in 
> On Wed, Jan 20, 2021 at 07:25:37AM +0100, Peter Eisentraut wrote:
> > It looks like we are not getting any consensus on this approach.  One
> > reduced version I would consider is just the second part, so you'd write
> > something like
> > 
> > snprintf(lsnchar, sizeof(lsnchar), "%X/%X",
> >  LSN_FORMAT_ARGS(lsn));
> > 
> > This would still reduce notational complexity quite a bit but avoid any
> > funny business with the format strings.
> 
> That seems reasonable to me.  So +1.

That seems in the good balance. +1, too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


On 1/21/21 12:59 AM, Tom Lane wrote:

Tomas Vondra  writes:

OK, pushed after a little bit of additional polishing (mostly comments).
Thanks everyone!


florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...



I know, although it seems more like an access to unitialized memory. 
I've already posted a patch that resolves that for me on 64-bits (per 
valgrind, I suppose it's the same issue).


I'm working on reproducing it on 32-bits, hopefully it won't take long.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra





On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

         if (junk_filter_needed)
         {
             resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo 
directly ?




IMO the issue is that code iterates over all plans and moves to the next 
for each one:


resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So 
yeah, either the code needs to move before the loop (per my patch), or 
we need to access mtstate->resultRelInfo directly.


I'm pretty amazed this did not crash during any of the many regression 
runs I did recently.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the start
of loop and the end of the loop:

 * Determine if the FDW supports batch insert and determine the batch
 * size (a FDW may support batching, but it may be disabled for the
 * server/table).

My patch would reflect that change. I guess this was the reason the if /
else block was placed there in the first place.

Cheers

On Wed, Jan 20, 2021 at 4:56 PM Tomas Vondra 
wrote:

>
>
> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> > Hi,
> > The assignment to resultRelInfo is done when junk_filter_needed is true:
> >
> >  if (junk_filter_needed)
> >  {
> >  resultRelInfo = mtstate->resultRelInfo;
> >
> > Should the code for determining batch size access mtstate->resultRelInfo
> > directly ?
> >
>
> IMO the issue is that code iterates over all plans and moves to the next
> for each one:
>
>  resultRelInfo++;
>
> so it ends up pointing past the last element, hence the failures. So
> yeah, either the code needs to move before the loop (per my patch), or
> we need to access mtstate->resultRelInfo directly.
>
> I'm pretty amazed this did not crash during any of the many regression
> runs I did recently.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra





On 1/21/21 2:02 AM, Zhihong Yu wrote:

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the 
start of loop and the end of the loop:


      * Determine if the FDW supports batch insert and determine the batch
      * size (a FDW may support batching, but it may be disabled for the
      * server/table).

My patch would reflect that change. I guess this was the reason the if / 
else block was placed there in the first place.




But can it change? All the loop does is extracting junk attributes from 
the plans, it does not modify anything related to the batching. Or maybe 
I just don't understand what you mean.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi,
Do we need to consider how this part of code inside ExecInitModifyTable()
would evolve ?

I think placing the compound condition toward the end
of ExecInitModifyTable() is reasonable because it checks the latest
information.

Regards

On Wed, Jan 20, 2021 at 5:11 PM Tomas Vondra 
wrote:

>
>
> On 1/21/21 2:02 AM, Zhihong Yu wrote:
> > Hi, Tomas:
> > In my opinion, my patch is a little better.
> > Suppose one of the conditions in the if block changes in between the
> > start of loop and the end of the loop:
> >
> >   * Determine if the FDW supports batch insert and determine the
> batch
> >   * size (a FDW may support batching, but it may be disabled for the
> >   * server/table).
> >
> > My patch would reflect that change. I guess this was the reason the if /
> > else block was placed there in the first place.
> >
>
> But can it change? All the loop does is extracting junk attributes from
> the plans, it does not modify anything related to the batching. Or maybe
> I just don't understand what you mean.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane

Tomas Vondra  writes:
> I may be wrong, but the most likely explanation seems to be this is due 
> to the junk filter initialization, which simply moves past the end of 
> the mtstate->resultRelInfo array.

resultRelInfo is certainly pointing at garbage at that point.

> It kinda seems the GetForeignModifyBatchSize call should happen before 
> that block. The attached patch fixes this for me (i.e. regression tests 
> pass with no valgrind reports.

> Or did I get that wrong?

Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one?  That is, this new code needs to be
somewhere inside a loop over the result rels.

regards, tom lane

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Amit Langote

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:
> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> > Hi,
> > The assignment to resultRelInfo is done when junk_filter_needed is true:
> >
> >  if (junk_filter_needed)
> >  {
> >  resultRelInfo = mtstate->resultRelInfo;
> >
> > Should the code for determining batch size access mtstate->resultRelInfo
> > directly ?
> >
>
> IMO the issue is that code iterates over all plans and moves to the next
> for each one:
>
>  resultRelInfo++;
>
> so it ends up pointing past the last element, hence the failures. So
> yeah, either the code needs to move before the loop (per my patch), or
> we need to access mtstate->resultRelInfo directly.

Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

/*
 * Initialize RETURNING projections if needed.
 */
if (node->returningLists)
{

/*
 * Build a projection for each result rel.
 */
resultRelInfo = mtstate->resultRelInfo;
foreach(l, node->returningLists)
{
List   *rlist = (List *) lfirst(l);

resultRelInfo->ri_returningList = rlist;
resultRelInfo->ri_projectReturning =
ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
resultRelInfo->ri_RelationDesc->rd_att);
resultRelInfo++;
}
}

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: strange error reporting

2021-01-20 Thread Tom Lane

Robert Haas  writes:
>>> Maybe it would be better if it said:
>>> connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
>>> database "rhaas" does not exist

>> I'd be inclined to spell it "connection to server at ... failed",
>> but that sort of wording is surely also possible.

> "connection to server" rather than "connection to database" works for
> me; in fact, I think I like it slightly better.

If I don't hear any other opinions, I'll change these messages to

"connection to server at socket \"%s\" failed: "
"connection to server at \"%s\" (%s), port %s failed: "

(or maybe "server on socket"?  "at" sounds right for the IP address
case, but it feels a little off in the socket pathname case.)

regards, tom lane

Re: Printing backtrace of postgres processes

2021-01-20 Thread Craig Ringer

On Wed, 20 Jan 2021 at 01:31, Robert Haas  wrote:

> On Sat, Jan 16, 2021 at 3:21 PM Tom Lane  wrote:
> > I'd argue that backtraces for those processes aren't really essential,
> > and indeed that trying to make the syslogger report its own backtrace
> > is damn dangerous.
>
> I agree. Ideally I'd like to be able to use the same mechanism
> everywhere and include those processes too, but surely regular
> backends and parallel workers are going to be the things that come up
> most often.
>
> > (Personally, I think this whole patch fails the safety-vs-usefulness
> > tradeoff, but I expect I'll get shouted down.)
>
> You and I are frequently on opposite sides of these kinds of
> questions, but I think this is a closer call than many cases. I'm
> convinced that it's useful, but I'm not sure whether it's safe. On the
> usefulness side, backtraces are often the only way to troubleshoot
> problems that occur on production systems. I wish we had better
> logging and tracing tools instead of having to ask for this sort of
> thing, but we don't.

Agreed.

In theory we should be able to do this sort of thing using external trace
and diagnostic tools like perf, systemtap, etc. In practice, these tools
tend to be quite version-sensitive and hard to get right without multiple
rounds of back-and-forth to deal with specifics of the site's setup,
installed debuginfo or lack thereof, specific tool versions, etc.

It's quite common to have to fall back on attaching gdb with a breakpoint
on a function in the export symbol table (so it works w/o debuginfo),
saving a core, and then analysing the core on a separate system on which
debuginfo is available for all the loaded modules. It's a major pain.

The ability to get a basic bt from within Pg is strongly desirable. IIRC
gdb's basic unwinder works without external debuginfo, if not especially
well. libunwind produces much better results, but that didn't pass the
extra-dependency bar when backtracing support was introduced to core
postgres.

On a side note, to help get better diagnostics I've also been meaning to
look into building --enable-debug with -ggdb3 so we can embed macro info,
and using dwz to deduplicate+compress the debuginfo so we can encourage
people to install it by default on production. I also want to start
exporting pointers to all the important data symbols for diagnostic use,
even if we do so in a separate ELF section just for debug use.

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

From: Zhihong Yu 
> Do we need to consider how this part of code inside ExecInitModifyTable() 
> would evolve ?

> I think placing the compound condition toward the end of 
> ExecInitModifyTable() is reasonable because it checks the latest information.

+1 for Zaihong-san's idea.  But instead of rewriting every relsultRelInfo to 
mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to 
suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately 
before the if block.

Thanks a lot, all for helping to solve the problem quickly!


Regards
Takayuki Tsunakawa

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Michael Paquier

On Wed, Jan 20, 2021 at 03:34:39PM -0300, Alvaro Herrera wrote:
> On 2021-Jan-20, Alexey Kondratov wrote:
>> Ugh, forgot to attach the patches. Here they are.
> 
> Yeah, looks reasonable.

Patch 0002 still has a whole set of issues as I pointed out a couple
of hours ago, but if we agree on 0001 as being useful if done
independently, I'd rather get that done first.  This way or just
merging both things in a single commit is not a big deal seeing the
amount of code, so I am fine with any approach.  It may be possible
that 0001 requires more changes depending on the work to-be-done for
0002 though?

>> +/* No work if no change in tablespace. */
>> +oldTablespaceOid = rd_rel->reltablespace;
>> +if (tablespaceOid != oldTablespaceOid ||
>> +(tablespaceOid == MyDatabaseTableSpace && 
>> OidIsValid(oldTablespaceOid)))
>> +{
>> +/* Update the pg_class row. */
>> +rd_rel->reltablespace = (tablespaceOid == MyDatabaseTableSpace) 
>> ?
>> +InvalidOid : tablespaceOid;
>> +CatalogTupleUpdate(pg_class, &tuple->t_self, tuple);
>> +
>> +changed = true;
>> +}
>> +
>> +if (changed)
>> +/* Record dependency on tablespace */
>> +changeDependencyOnTablespace(RelationRelationId,
>> + 
>> reloid, rd_rel->reltablespace);
> 
> Why have a separate "if (changed)" block here instead of merging with
> the above?

Yep.

+   if (SetRelTablespace(reloid, newTableSpace))
+   /* Make sure the reltablespace change is visible */
+   CommandCounterIncrement();
At quick glance, I am wondering why you just don't do a CCI within
SetRelTablespace().
--
Michael


signature.asc
Description: PGP signature

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra




On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

  if (junk_filter_needed)
  {
  resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?



IMO the issue is that code iterates over all plans and moves to the next
for each one:

  resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.


Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

 /*
  * Initialize RETURNING projections if needed.
  */
 if (node->returningLists)
 {
 
 /*
  * Build a projection for each result rel.
  */
 resultRelInfo = mtstate->resultRelInfo;
 foreach(l, node->returningLists)
 {
 List   *rlist = (List *) lfirst(l);

 resultRelInfo->ri_returningList = rlist;
 resultRelInfo->ri_projectReturning =
 ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
 
resultRelInfo->ri_RelationDesc->rd_att);
 resultRelInfo++;
 }
 }



Right. But I think Tom is right this should initialize ri_BatchSize for 
all the resultRelInfo elements, not just the first one. Per the attached 
patch, which resolves the issue both on x86_64 and armv7l for me.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..10febcae8a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table).
 	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
+	resultRelInfo = mtstate->resultRelInfo;
+	for (i = 0; i < nplans; i++)
+	{
+		if (!resultRelInfo->ri_usesFdwDirectModify &&
+			operation == CMD_INSERT &&
+			resultRelInfo->ri_FdwRoutine != NULL &&
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+			resultRelInfo->ri_BatchSize =
+resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+		else
+			resultRelInfo->ri_BatchSize = 1;
+
+		Assert(resultRelInfo->ri_BatchSize >= 1);
 
-	Assert(resultRelInfo->ri_BatchSize >= 1);
+		resultRelInfo++;
+	}
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it

Re: Printing backtrace of postgres processes

2021-01-20 Thread Craig Ringer

On Wed, 20 Jan 2021 at 05:23, Tom Lane  wrote:

>
> Recursion is scary, but it should (I think) not be possible if this
> is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
> of those in libbacktrace.
>

We can also hold interrupts for the call, and it might be wise to do so.

One point here is that it might be a good idea to suppress elog.c's
> calls to functions in the error context stack.  As we saw in another
> connection recently, allowing that to happen makes for a *very*
> large increase in the footprint of code that you are expecting to
> work at any random CHECK_FOR_INTERRUPTS call site.
>

I strongly agree. Treat it as errhidecontext().

BTW, it also looks like the patch is doing nothing to prevent the
> backtrace from being sent to the connected client.  I'm not sure
> what I think about whether it'd be okay from a security standpoint
> to do that on the connection that requested the trace, but I sure
> as heck don't want it to happen on connections that didn't.

I don't see a good reason to send a bt to a client. Even though these
backtraces won't be analysing debuginfo and populating args, locals, etc,
it should still just go to the server log.

> Maybe, given all of these things, we should forget using elog at
> all and just emit the trace with fprintf(stderr).

That causes quite a lot of pain with MemoryContextStats() already as it's
frequently difficult to actually locate the output given the variations
that exist in customer logging configurations. Sometimes stderr goes to a
separate file or to journald. It's also much harder to locate the desired
output since there's no log_line_prefix. I have a WIP patch floating around
somewhere that tries to teach MemoryContextStats to write to the ereport
channel when not called during an actual out-of-memory situation for that
reason; an early version is somewhere in the archives.

This is one of those "ok in development, painful in production" situations.

So I'm not a big fan of pushing it to stderr, though I'd rather have that
than not have the ability at all.

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra





On 1/21/21 2:22 AM, Tom Lane wrote:

Tomas Vondra  writes:

I may be wrong, but the most likely explanation seems to be this is due
to the junk filter initialization, which simply moves past the end of
the mtstate->resultRelInfo array.


resultRelInfo is certainly pointing at garbage at that point.



Yup. It's pretty amazing the x86 machines seem to be mostly OK with it.


It kinda seems the GetForeignModifyBatchSize call should happen before
that block. The attached patch fixes this for me (i.e. regression tests
pass with no valgrind reports.



Or did I get that wrong?


Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one?  That is, this new code needs to be
somewhere inside a loop over the result rels.



Yeah, I think you're right. That's an embarrassing oversight :-(


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Hou, Zhijie

> 
> Thanks for the feedback.
> Posting an updated set of patches. Changes are based on feedback, as detailed
> below:
Hi

It seems there are some previous comments[1][2] not addressed in current patch.
Just to make sure it's not missed.

[1]
https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local

[2]
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com

Best regards,
houzj

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi, Takayuki-san:
My first name is Zhihong.

You can call me Ted if you want to save some typing :-)

Cheers

On Wed, Jan 20, 2021 at 5:37 PM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

> From: Zhihong Yu 
>
> > Do we need to consider how this part of code inside
> ExecInitModifyTable() would evolve ?
>
>
>
> > I think placing the compound condition toward the end of
> ExecInitModifyTable() is reasonable because it checks the latest
> information.
>
>
>
> +1 for Zaihong-san's idea.  But instead of rewriting every relsultRelInfo
> to mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to
> suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately
> before the if block.
>
>
>
> Thanks a lot, all for helping to solve the problem quickly!
>
>
>
>
>
> Regards
>
> Takayuki Tsunakawa
>
>
>
>

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Right. But I think Tom is right this should initialize ri_BatchSize for all 
> the
> resultRelInfo elements, not just the first one. Per the attached patch, which
> resolves the issue both on x86_64 and armv7l for me.

I think Your patch is perfect in the sense that it's ready for the future 
multi-target DML support.  +1

Just for learning, could anyone tell me what this loop for?  I thought current 
Postgres's DML supports a single target table, so it's enough to handle the 
first element of mtstate->resultRelInfo.  In that sense, Amit-san and I agreed 
that we don't put the if block in the for loop yet.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Amit Langote

On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra
 wrote:
> On 1/21/21 2:24 AM, Amit Langote wrote:
> > On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
> >  wrote:
> >> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> >>> Hi,
> >>> The assignment to resultRelInfo is done when junk_filter_needed is true:
> >>>
> >>>   if (junk_filter_needed)
> >>>   {
> >>>   resultRelInfo = mtstate->resultRelInfo;
> >>>
> >>> Should the code for determining batch size access mtstate->resultRelInfo
> >>> directly ?
> >>>
> >>
> >> IMO the issue is that code iterates over all plans and moves to the next
> >> for each one:
> >>
> >>   resultRelInfo++;
> >>
> >> so it ends up pointing past the last element, hence the failures. So
> >> yeah, either the code needs to move before the loop (per my patch), or
> >> we need to access mtstate->resultRelInfo directly.
> >
> > Accessing mtstate->resultRelInfo directly would do.  The only
> > constraint on where this block should be placed is that
> > ri_projectReturning must be valid as of calling
> > GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
> > So, after this block in ExecInitModifyTable:
> >
> >  /*
> >   * Initialize RETURNING projections if needed.
> >   */
> >  if (node->returningLists)
> >  {
> >  
> >  /*
> >   * Build a projection for each result rel.
> >   */
> >  resultRelInfo = mtstate->resultRelInfo;
> >  foreach(l, node->returningLists)
> >  {
> >  List   *rlist = (List *) lfirst(l);
> >
> >  resultRelInfo->ri_returningList = rlist;
> >  resultRelInfo->ri_projectReturning =
> >  ExecBuildProjectionInfo(rlist, econtext, slot, 
> > &mtstate->ps,
> >  
> > resultRelInfo->ri_RelationDesc->rd_att);
> >  resultRelInfo++;
> >  }
> >  }
> >
>
> Right. But I think Tom is right this should initialize ri_BatchSize for
> all the resultRelInfo elements, not just the first one. Per the attached
> patch, which resolves the issue both on x86_64 and armv7l for me.

+1 in general.  To avoid looping uselessly in the case of
UPDATE/DELETE where batching can't be used today, I'd suggest putting
if (operation == CMD_INSERT) around the loop.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Printing backtrace of postgres processes

2021-01-20 Thread Tom Lane

Craig Ringer  writes:
> On Wed, 20 Jan 2021 at 05:23, Tom Lane  wrote:
>> BTW, it also looks like the patch is doing nothing to prevent the
>> backtrace from being sent to the connected client.

> I don't see a good reason to send a bt to a client. Even though these
> backtraces won't be analysing debuginfo and populating args, locals, etc,
> it should still just go to the server log.

Yeah.  That's easier than I was thinking, we just need to
s/LOG/LOG_SERVER_ONLY/.

>> Maybe, given all of these things, we should forget using elog at
>> all and just emit the trace with fprintf(stderr).

> That causes quite a lot of pain with MemoryContextStats() already

True.  Given the changes discussed in the last couple messages, I don't
see any really killer reasons why we can't ship the trace through elog.
We can always try that first, and back off to fprintf if we do find
reasons why it's too unstable.

regards, tom lane

1 2 >

1 - 100 of 135 matches

Mail list logo