Re: Re: parallel distinct union and aggregate support patch

2020-11-07 Thread Dilip Kumar
On Tue, Nov 3, 2020 at 6:06 PM Dilip Kumar  wrote:
>
> On Thu, Oct 29, 2020 at 12:53 PM bu...@sohu.com  wrote:
> >
> > > 1) It's better to always include the whole patch series - including the
> > > parts that have not changed. Otherwise people have to scavenge the
> > > thread and search for all the pieces, which may be a source of issues.
> > > Also, it confuses the patch tester [1] which tries to apply patches from
> > > a single message, so it will fail for this one.
> >  Pathes 3 and 4 do not rely on 1 and 2 in code.
> >  But, it will fail when you apply the apatches 3 and 4 directly, because
> >  they are written after 1 and 2.
> >  I can generate a new single patch if you need.
> >
> > > 2) I suggest you try to describe the goal of these patches, using some
> > > example queries, explain output etc. Right now the reviewers have to
> > > reverse engineer the patches and deduce what the intention was, which
> > > may be causing unnecessary confusion etc. If this was my patch, I'd try
> > > to create a couple examples (CREATE TABLE + SELECT + EXPLAIN) showing
> > > how the patch changes the query plan, showing speedup etc.
> >  I written some example queries in to regress, include "unique" "union"
> >  "group by" and "group by grouping sets".
> >  here is my tests, they are not in regress
> > ```sql
> > begin;
> > create table gtest(id integer, txt text);
> > insert into gtest select t1.id,'txt'||t1.id from (select 
> > generate_series(1,1000*1000) id) t1,(select generate_series(1,10) id) t2;
> > analyze gtest;
> > commit;
> > set jit = off;
> > \timing on
> > ```
> > normal aggregate times
> > ```
> > set enable_batch_hashagg = off;
> > explain (costs off,analyze,verbose)
> > select sum(id),txt from gtest group by txt;
> >  QUERY PLAN
> > -
> >  Finalize GroupAggregate (actual time=6469.279..8947.024 rows=100 
> > loops=1)
> >Output: sum(id), txt
> >Group Key: gtest.txt
> >->  Gather Merge (actual time=6469.245..8165.930 rows=158 loops=1)
> >  Output: txt, (PARTIAL sum(id))
> >  Workers Planned: 2
> >  Workers Launched: 2
> >  ->  Sort (actual time=6356.471..7133.832 rows=53 loops=3)
> >Output: txt, (PARTIAL sum(id))
> >Sort Key: gtest.txt
> >Sort Method: external merge  Disk: 11608kB
> >Worker 0:  actual time=6447.665..7349.431 rows=317512 loops=1
> >  Sort Method: external merge  Disk: 10576kB
> >Worker 1:  actual time=6302.882..7061.157 rows=01 loops=1
> >  Sort Method: external merge  Disk: 2kB
> >->  Partial HashAggregate (actual time=2591.487..4430.437 
> > rows=53 loops=3)
> >  Output: txt, PARTIAL sum(id)
> >  Group Key: gtest.txt
> >  Batches: 17  Memory Usage: 4241kB  Disk Usage: 113152kB
> >  Worker 0:  actual time=2584.345..4486.407 rows=317512 
> > loops=1
> >Batches: 17  Memory Usage: 4241kB  Disk Usage: 
> > 101392kB
> >  Worker 1:  actual time=2584.369..4393.244 rows=01 
> > loops=1
> >Batches: 17  Memory Usage: 4241kB  Disk Usage: 
> > 112832kB
> >  ->  Parallel Seq Scan on public.gtest (actual 
> > time=0.691..603.990 rows=333 loops=3)
> >Output: id, txt
> >Worker 0:  actual time=0.104..607.146 
> > rows=3174970 loops=1
> >Worker 1:  actual time=0.100..603.951 
> > rows=3332785 loops=1
> >  Planning Time: 0.226 ms
> >  Execution Time: 9021.058 ms
> > (29 rows)
> >
> > Time: 9022.251 ms (00:09.022)
> >
> > set enable_batch_hashagg = on;
> > explain (costs off,analyze,verbose)
> > select sum(id),txt from gtest group by txt;
> >QUERY PLAN
> > -
> >  Gather (actual time=3116.666..5740.826 rows=100 loops=1)
> >Output: (sum(id)), txt
> >Workers Planned: 2
> >Workers Launched: 2
> >->  Parallel BatchHashAggregate (actual time=3103.181..5464.948 
> > rows=33 loops=3)
> >  Output: sum(id), txt
> >  Group Key: gtest.txt
> >  Worker 0:  actual time=3094.550..5486.992 rows=326082 loops=1
> >  Worker 1:  actual time=3099.562..5480.111 rows=324729 loops=1
> >  ->  Parallel Seq Scan on public.gtest (actual time=0.791..656.601 
> > rows=333 loops=3)
> >Output: id, txt
> >Worker 0:  actual time=0.080..646.053 rows=3057680 loops=1
> >Worker 1:  actual time=0.070..662.754 rows=3034370 loops=1
> >  Planning Time: 0.243 ms
> >  Execution Time: 5788.981 

Re: Yet another (minor) fix in BRIN

2020-11-07 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Nov-08, Tomas Vondra wrote:
>> While rebasing some of the BRIN patches, I noticed some of the code in
>> brin_memtuple_initialize is duplicated. This happened in 8bf74967dab
>> which moved some of the code from brin_new_memtuple, not removing the
>> shared pieces. In practice this is benign, of course.
>> 
>> Barring objections I'll get the attached fix committed and backpatched.

> LGTM, thanks for noticing.

The weekend before stable-branch releases is probably not the best
time to be pushing "minor" fixes into those branches.  I got my
fingers burned today, and so did Peter.  Don't follow our example ;-)

regards, tom lane




Re: Yet another (minor) fix in BRIN

2020-11-07 Thread Alvaro Herrera
On 2020-Nov-08, Tomas Vondra wrote:

> While rebasing some of the BRIN patches, I noticed some of the code in
> brin_memtuple_initialize is duplicated. This happened in 8bf74967dab
> which moved some of the code from brin_new_memtuple, not removing the
> shared pieces. In practice this is benign, of course.
> 
> Barring objections I'll get the attached fix committed and backpatched.

LGTM, thanks for noticing.





Yet another (minor) fix in BRIN

2020-11-07 Thread Tomas Vondra
Hi,

While rebasing some of the BRIN patches, I noticed some of the code in
brin_memtuple_initialize is duplicated. This happened in 8bf74967dab
which moved some of the code from brin_new_memtuple, not removing the
shared pieces. In practice this is benign, of course.

Barring objections I'll get the attached fix committed and backpatched.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From d97e3160921caa6d2191b7e8539d4d4f4816219c Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 8 Nov 2020 02:11:45 +0100
Subject: [PATCH] Remove duplicate code in brin_memtuple_initialize

Commit 8bf74967dab moved some of the code from brin_new_memtuple to
brin_memtuple_initialize, but this resulted in some of the code being
duplicate. Fix by removing the duplicate lines and backpatch to 10.

Author: Tomas Vondra
Backpatch-through: 10
Discussion: TBD
---
 src/backend/access/brin/brin_tuple.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c
index 6774f597a4..17e50de530 100644
--- a/src/backend/access/brin/brin_tuple.c
+++ b/src/backend/access/brin/brin_tuple.c
@@ -491,9 +491,6 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc)
  sizeof(BrinValues) * brdesc->bd_tupdesc->natts);
 	for (i = 0; i < brdesc->bd_tupdesc->natts; i++)
 	{
-		dtuple->bt_columns[i].bv_allnulls = true;
-		dtuple->bt_columns[i].bv_hasnulls = false;
-
 		dtuple->bt_columns[i].bv_attno = i + 1;
 		dtuple->bt_columns[i].bv_allnulls = true;
 		dtuple->bt_columns[i].bv_hasnulls = false;
-- 
2.26.2



Re: First-draft release notes for back branches are up

2020-11-07 Thread Tomas Vondra
On 11/7/20 11:21 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 11/7/20 3:52 AM, Tom Lane wrote:
>>> Do you have some suggested text?
> 
>> I think this might work:
> 
> I dunno ... given that we have zero field reports, I doubt this is
> something we need to tell every BRIN user to do.  The text I put in
> earlier today just recommends reindexing if you see the error.
> 

Sorry, I haven't noticed you already wrote something :-( I agree it's
enough to recommend reindexing only when there's an error.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Rethinking LOCK TABLE's behavior on views

2020-11-07 Thread Noah Misch
On Sat, Nov 07, 2020 at 11:57:20AM -0500, Tom Lane wrote:
> The problems discussed in bug #16703 [1] show that pg_dump needs a
> version of LOCK TABLE that behaves differently for views than
> what we have now.  Since v11, LOCK TABLE on a view recurses to all
> tables and views named in the view, and it does so using the view
> owner's permissions, meaning that a view that would have permissions
> failures if executed will also have permissions failures when locked.
> That's probably fine for ordinary usage, but it's disastrous for
> pg_dump --- even a superuser can't lock such a view.
> 
> Moreover, pg_dump doesn't really need the recursive behavior.  It just
> needs the view's definition to hold still; and in any case, a typical
> pg_dump run would have independently acquired locks on all the other
> relations anyway.  The recursion is buying us nothing, except perhaps
> an increased risk of deadlocks against concurrent DDL operations.

The getTables() locking aims to take the locks that will be taken later.  That
avoids failing after expensive work.  For views, the later lock-taker is
pg_get_viewdef(), which locks more than just the view but less than[2] LOCK
TABLE.  Recursion buys us more than nothing for "pg_dump --table=viewname", so
abandoning recursion unconditionally is a step in the wrong direction.  I
don't expect --table to be as excellent as complete dumps, but a change that
makes it worse does lose points.  I want to keep the recursion.

> (I'm not quite sure if that's significant, given that pg_dump pays
> no attention to the order in which it locks things.  But it sure as
> heck isn't *decreasing* the risk; and it's a behavior that we could
> not hope to improve with more smarts about pg_dump's lock ordering.)

Reordering to avoid deadlocks would be best-effort, so it's fine not to have
full control over the order.

> Closely related to this is whether pg_dump ought to be using ONLY for
> locking regular tables too.  I tend to think that it should be, again
> on the grounds that any child tables we may be interested in will get
> locked separately, so that we're not doing anything by recursing except
> expending extra cycles and perhaps increasing the chance of a deadlock.

Agreed.  "pg_dump --table=inheritance_parent" never queries inheritance
children, so it's nice not to lock them.

> A completely different approach we could consider is to weaken the
> permissions requirements for LOCK on a view, say "allow it if either
> the calling user or the view owner has the needed permission".  This
> seems generally pretty messy and so I don't much like it, but we
> should consider as many solutions as we can think of.

This is the best of what you've listed by a strong margin, and I don't know of
better options you've not listed.  +1 for it.  Does it work for you?  I think
the mess arises from LOCK TABLE serving "get locks sufficient for $ACTIONS" as
a family of use cases.  For views only, different $ACTIONS want different
behavior.  $ACTIONS==SELECT wants today's behavior; pg_get_viewdef() wants
shallower recursion and caller permissions; DROP VIEW wants no recursion.


> [1] 
> https://www.postgresql.org/message-id/flat/16703-e348f58aab3cf6cc%40postgresql.org

[2] For example, pg_get_viewdef('pg_user') locks pg_shadow, but "LOCK TABLE
pg_user" additionally locks pg_authid and pg_db_role_setting.




Re: PG13: message style changes

2020-11-07 Thread Alvaro Herrera
Thanks for looking!  Pushed.





Re: First-draft release notes for back branches are up

2020-11-07 Thread Tom Lane
Tomas Vondra  writes:
> On 11/7/20 3:52 AM, Tom Lane wrote:
>> Do you have some suggested text?

> I think this might work:

I dunno ... given that we have zero field reports, I doubt this is
something we need to tell every BRIN user to do.  The text I put in
earlier today just recommends reindexing if you see the error.

regards, tom lane




Re: redundant error messages

2020-11-07 Thread Peter Eisentraut

On 2020-11-05 13:27, Peter Eisentraut wrote:

A few client tools duplicate error messages already provided by libpq,
such as

pg_rewind: fatal: could not connect to server: could not connect to
server: No such file or directory

pg_basebackup: error: could not connect to server: could not connect to
server: No such file or directory

psql: error: could not connect to server: could not connect to server:
No such file or directory

The psql case is actually a regression introduced in PG12, but the other
two appear to be ancient.


I have committed fixes for these.




Re: First-draft release notes for back branches are up

2020-11-07 Thread Tomas Vondra
Hi,

On 11/7/20 3:52 AM, Tom Lane wrote:
> Tomas Vondra  writes:
>> We should probably include instructions what to do about the BRIN
>> data corruption fixed by 7577dd8480 - a query to list might be
>> affected by the bug and should be rebuilt.
> 
> Do you have some suggested text?
> 

I think this might work:

Fix handling of toasted values in BRIN indexes

This mistake can result in BRIN indexes referencing toasted values,
which may unexpectedly disapper after a cleanup of the table, leading
to errors due to missing chunks:

  ERROR:  missing chunk number 0 for toast value 16433 in pg_toast_16426

Fortunately, this issue only affects BRIN indexes on columns with
toastable data types. While no index corruption due to this bug is
known to have occurred in the field, it is recommended that production
installations REINDEX all brin indexes at a convenient time after
upgrading to X.X.

The list of porentially corrupted BRIN indexes may be obtained using
this query:

  select pg_get_indexdef(oid)
from pg_class
where (relkind = 'i')
  and relam = 3580
  and exists (select 1 from pg_attribute
   where attrelid = pg_class.oid
 and attlen = -1
 and attstorage in ('x', 'e'))

It might be better to propose CREATE INDEX CONCURRENTLY, but I don't
think there is a function to generate that SQL.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: errors when there is a bit literal in ecpg

2020-11-07 Thread Tom Lane
"Wang, Shenhao"  writes:
> I met an error as below when I use ecpg

> a.pgc:5: ERROR: invalid bit string literal
> a.pgc:5: ERROR: internal error: unreachable state; please report this to 
> 

Indeed.  This has apparently been broken for a very long time (though
the "unreachable state" part is fairly new).

> I try to fix this bug by deleting 'addlitchar('b');' from source I also add a 
> test case to test all const str in ecpg.

I thought that a whole new test case was overkill when we could just add a
couple of lines to an existing test.  Other than that, looks good, pushed.

regards, tom lane




Re: Additional Chapter for Tutorial

2020-11-07 Thread Erik Rijkers

On 2020-11-07 13:24, Jürgen Purtz wrote:



Because there have been no more comments in the last days I created a
consolidated patch. It contains Erik's suggestion and some tweaks for
the text size within graphics.

[0011-architecture.patch]


Hi,

I went through architecture.sgml once more; some proposed changes 
attached.


And in some .svg files I noticed 'jungest' which should be 'youngest', I 
suppose.

I did not change them but below is filelist of  grep -l 'jung'.

./doc/src/sgml/images/freeze-ink.svg
./doc/src/sgml/images/freeze-ink-svgo.svg
./doc/src/sgml/images/freeze-raw.svg
./doc/src/sgml/images/wraparound-ink.svg
./doc/src/sgml/images/wraparound-ink-svgo.svg
./doc/src/sgml/images/wraparound-raw.svg


Thanks,

Erik


--- doc/src/sgml/architecture.sgml.orig	2020-11-07 14:05:50.188396026 +0100
+++ doc/src/sgml/architecture.sgml	2020-11-07 20:04:27.890983873 +0100
@@ -24,7 +24,7 @@
 This leads to other activities (file access, WAL, vacuum, ...) of the
 Instance. The
 Instance is a group of server-side processes acting on a common
-Shared Memory. PostgreSQL does not utilize threading.
+Shared Memory. PostgreSQL does not use threading.

 

@@ -78,7 +78,7 @@
 cache, located in Shared Memory (for details see:
 ) for the benefit of all processes
 in the instance. Writes also use this cache, in addition
-to a journal, called a write-ahead-log or WAL.
+to a journal, called the write-ahead-log or WAL.

 

@@ -90,20 +90,20 @@
 must be written to disk. This happens regularly by the
 Checkpointer and the Background Writer processes to ensure
 that the disk version of the pages are up-to-date.
-The synchronisation from RAM to disk consists of two steps.
+The synchronization from RAM to disk consists of two steps.

 

 First, whenever the content of a page changes, a
 WAL record
-is created out of the delta-information (difference between the
+is created from the delta-information (difference between the
 old and the new content) and stored in another area of
 Shared Memory. The parallel running WAL Writer process
 reads them and appends them to the end of the current
 WAL file.
 Such sequential writes are faster than writes to random
 positions of heap and index files. All WAL records created
-out of one dirty page must be transferred to disk before the
+from one dirty page must be transferred to disk before the
 dirty page itself can be transferred to disk in the second step.

 
@@ -123,22 +123,22 @@
 Checkpoints.
 A Checkpoint is a point in time when all older dirty buffers,
 all older WAL records, and finally a special Checkpoint record
-are written and flushed to disk. Heap and index files
-on the one hand and WAL files on the other hand are in sync.
-Previous WAL is no longer required. In other words,
+are written and flushed to disk. Heap and index files,
+and WAL files are now in sync.
+Older WAL is no longer required. In other words,
 a possibly occurring recovery, which integrates the delta
 information of WAL into heap and index files, will happen
-by replaying only WAL past the last recorded checkpoint.
-This limits the amount of WAL which needs to be replayed
+by replaying only WAL past the last-recorded checkpoint.
+This limits the amount of WAL to be replayed
 during recovery in the event of a crash.

 

-While the Checkpointer ensures that the database system can crash
-and restart itself in a valid state, the administrator needs
+While the Checkpointer ensures that the database system can,
+after a crash, restart itself in a valid state, the administrator needs
 to handle the case where the heap or other files become
 corrupted (and possibly the locally written WAL, though that is
-less common). The options and details are covered extensively
+less common). Options and details are covered
 in the backup and restore section ().
 For our purposes here, just note that the WAL Archiver process
 can be enabled and configured to run a script on filled WAL
@@ -153,7 +153,7 @@
 

 The Logger writes text lines about serious and less serious
-events which can happen during database access, e.g., wrong
+events that may happen during database access, e.g., wrong
 password, no permission, long-running queries, etc.

 
@@ -262,7 +262,7 @@
 all tablespace names, and all
 role names are automatically
 available throughout the cluster, independent from
-the database or schema in which they where defined originally.
+the database or schema in which they were defined originally.
 
 shows the relation between the object types.

@@ -410,13 +410,11 @@

 

-A first approach to implement protections against concurrent
-access to the same data may be the locking of critical rows.
-PostgreSQL

Re: pgbench stopped supporting large number of client connections on Windows

2020-11-07 Thread Ranier Vilela
Em sáb., 7 de nov. de 2020 às 14:55, Marina Polyakova <
m.polyak...@postgrespro.ru> escreveu:

> On 2020-11-06 23:54, Ranier Vilela wrote:
> > Hi Marina,
>
> Hello!
>
> 1) If you mean the function pgwin32_select in the file
> src/backend/port/win32/socket.c, IIUC it is only used in the backend,
> see src/include/port/win32_port.h:
>
> #ifndef FRONTEND
> <...>
> #define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
> <...>
> #endif  /* FRONTEND */
>
Yes. My mistake, you right here.


> 2) It looks like FD_SETSIZE does not set a limit on the socket value on
> Windows, see
>
> https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
> :
>
> The maximum number of sockets that a Windows Sockets application can use
> is not affected by the manifest constant FD_SETSIZE. This value defined
> in the Winsock2.h header file is used in constructing the FD_SET
> structures used with select function.
>
Correct.
It seems that the limit will be defined by compilation, before the
inclusion of Winsock2.h.
Have you tried to define -DFD_SETSIZE=2048

best regards,
Ranier Vilela


Re: pgbench stopped supporting large number of client connections on Windows

2020-11-07 Thread Marina Polyakova

On 2020-11-06 23:54, Ranier Vilela wrote:

Hi Marina,


Hello!


Nice catch.


Thank you!


rc/bin/pgbench/pgbench.c, the function add_socket_to_set:
if (fd < 0 || fd >= FD_SETSIZE)
{
/*
* Doing a hard exit here is a bit grotty, but it doesn't seem worth
* complicating the API to make it less grotty.
*/
pg_log_fatal("too many client connections for select()");
exit(1);
}


It seems to me that the limit is hardcode in,
src/backend/port/win32/socket.c

FD_SETSIZE * 2

that would be 2048?


1) If you mean the function pgwin32_select in the file 
src/backend/port/win32/socket.c, IIUC it is only used in the backend, 
see src/include/port/win32_port.h:


#ifndef FRONTEND
<...>
#define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
<...>
#endif  /* FRONTEND */

2) It looks like FD_SETSIZE does not set a limit on the socket value on 
Windows, see 
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 
:


The maximum number of sockets that a Windows Sockets application can use 
is not affected by the manifest constant FD_SETSIZE. This value defined 
in the Winsock2.h header file is used in constructing the FD_SET 
structures used with select function.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




RE: pgbench: option delaying queries till connections establishment?

2020-11-07 Thread Fabien COELHO



Hello,


Indeed. I took your next patch with an added explanation. I'm unclear
whether proceeding makes much sense though, that is some thread would run
the test and other would have aborted. Hmmm.


Your comment looks good, thanks. In the previous version pgbench starts 
benchmarking even if some connections fail. And users can notice the 
connection failure by stderr output. Hence the current specification may 
be enough.


Usually I run many pgbench through scripts, so I'm probably not there to 
check a lone stderr failure at the beginning if performance figures are

actually reported.


If you agree, please remove the following lines:

```
+* It is unclear whether it is worth doing 
anything rather than
+* coldly exiting with an error message.
```


I can remove the line, but I strongly believe that reporting performance 
figures if some client connection failed thus the bench could not run as 
prescribed is a bad behavior. The good news is that it is probably quite 
unlikely. So I'd prefer to keep it and possibly submit a patch to change 
the behavior.



ISTM that there is another patch in the queue which needs barriers to
delay some initialization so as to fix a corner case bug, in which case
the behavior would be mandatory. The current submission could add an
option to skip the barrier synchronization, but I'm not enthousiastic to
add an option and remove it shortly later.


Could you tell me which patch you mention? Basically I agree what you say,
but I want to check it.


Should be this one: https://commitfest.postgresql.org/30/2624/,

--
Fabien.




Rethinking LOCK TABLE's behavior on views

2020-11-07 Thread Tom Lane
The problems discussed in bug #16703 [1] show that pg_dump needs a
version of LOCK TABLE that behaves differently for views than
what we have now.  Since v11, LOCK TABLE on a view recurses to all
tables and views named in the view, and it does so using the view
owner's permissions, meaning that a view that would have permissions
failures if executed will also have permissions failures when locked.
That's probably fine for ordinary usage, but it's disastrous for
pg_dump --- even a superuser can't lock such a view.

Moreover, pg_dump doesn't really need the recursive behavior.  It just
needs the view's definition to hold still; and in any case, a typical
pg_dump run would have independently acquired locks on all the other
relations anyway.  The recursion is buying us nothing, except perhaps
an increased risk of deadlocks against concurrent DDL operations.
(I'm not quite sure if that's significant, given that pg_dump pays
no attention to the order in which it locks things.  But it sure as
heck isn't *decreasing* the risk; and it's a behavior that we could
not hope to improve with more smarts about pg_dump's lock ordering.)

So we need a way to turn that off.  What I proposed in that thread
was

> (1) Make pg_dump use LOCK TABLE ONLY, not LOCK TABLE.
> (2) Make LOCK TABLE ONLY on a view not recurse to the view's dependencies.

which would each be quite trivial to do.  An objection to this approach
is that ONLY typically controls recursion to a table's inheritance or
partitioning children, which is a different animal from recursion to
a view's dependencies.  That argument would lead to wanting some other
syntax to control this.  I do not find that argument compelling enough
to justify making pg_dump deal with two different commands depending
on the relation's relkind, but it is a plausible concern.

Closely related to this is whether pg_dump ought to be using ONLY for
locking regular tables too.  I tend to think that it should be, again
on the grounds that any child tables we may be interested in will get
locked separately, so that we're not doing anything by recursing except
expending extra cycles and perhaps increasing the chance of a deadlock.

A completely different approach we could consider is to weaken the
permissions requirements for LOCK on a view, say "allow it if either
the calling user or the view owner has the needed permission".  This
seems generally pretty messy and so I don't much like it, but we
should consider as many solutions as we can think of.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/16703-e348f58aab3cf6cc%40postgresql.org




Re: Yet another fast GiST build

2020-11-07 Thread Andrey Borodin


> 5 нояб. 2020 г., в 22:20, Justin Pryzby  написал(а):
> 
> On Thu, Nov 05, 2020 at 10:11:52PM +0500, Andrey Borodin wrote:
>> To test that functions are actually called for sorting build we should 
>> support directive sorting build like "CREATE INDEX ON A USING GIST(B) 
>> WITH(sorting=surely,and fail if not)".
> 
> Maybe you could add a DEBUG1 message for that, and include that in regression
> tests, which would then fail if sorting wasn't used.

That's a good idea. Thanks!
> 
> Maybe you'd need to make it consistent by setting GUCs like work_mem /
> max_parallel_maintenance_workers / ??
> 
> Similar to this
> 
> postgres=# SET client_min_messages =debug;
> postgres=# CREATE INDEX ON A USING GIST(i) WITH(buffering=off);
> DEBUG:  building index "a_i_idx2" on table "a" serially
> CREATE INDEX

Currently, only B-tree uses parallel build, so no need to tweak GUCs except 
client_min_messages.
Before these tests, actually, ~20% of opclasses were not working as expected. 
Despite I've checked each one by hand. I have 

PFA patch with fixed comments from Heikki.

Thanks!

Best regards, Andrey Borodin.


v4-0001-Sortsupport-for-sorting-GiST-build-for-gist_btree.patch
Description: Binary data


Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

2020-11-07 Thread Bharath Rupireddy
On Fri, Nov 6, 2020 at 11:00 PM Fujii Masao  wrote:
>
> >> I'm not quite sure to replace all the places in the walreceiver
> >> process, for instance in WalRcvForceReply() we are using spinlock to
> >> acquire the latch pointer and . Others may have better thoughts on
> >> this.
> >
> > The SIGTERM part looks good. The only difference between
> > WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
> > is set or not.  I don't think it's a problem that config-reload causes
> > an extra wakeup.  Couldn't we do the same thing for SIGHUP?
>
> I agree that we can use even standard SIGHUP handler in walreceiver.
> I'm not sure why SetLatch() was not called in walreceiver's SIGHUP
> handler so far.
>

The main reason for having SetLatch() in
SignalHandlerForConfigReload() is to wake up the calling process if
waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
config file and use the reloaded config variables. Maybe we should
give a thought on the scenarios in which the walreceiver process
waits, and what happens in case the latch is set when SIGHUP is
received.

And also, I think it's worth having a look at the commit 40f908bdcdc7
that introduced WalRcvSigHupHandler() and 597a87ccc that replaced
custom latch with procLatch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-11-07 Thread vignesh C
On Thu, Nov 5, 2020 at 6:33 PM Hou, Zhijie  wrote:
>
> Hi
>
> >
> > my $bytes = $ARGV[0];
> > for(my $i = 0; $i < $bytes; $i+=8){
> >  print "longdata";
> > }
> > print "\n";
> > 
> >
> > postgres=# copy longdata from program 'perl /tmp/longdata.pl 1'
> > with (parallel 2);
> >
> > This gets stuck forever (or at least I didn't have the patience to wait
> > it finish). Both worker processes are consuming 100% of CPU.
>
> I had a look over this problem.
>
> the ParallelCopyDataBlock has size limit:
> uint8   skip_bytes;
> chardata[DATA_BLOCK_SIZE];  /* data read from file */
>
> It seems the input line is so long that the leader process run out of the 
> Shared memory among parallel copy workers.
> And the leader process keep waiting free block.
>
> For the worker process, it wait util line_state becomes LINE_LEADER_POPULATED,
> But leader process won't set the line_state unless it read the whole line.
>
> So it stuck forever.
> May be we should reconsider about this situation.
>
> The stack is as follows:
>
> Leader stack:
> #3  0x0075f7a1 in WaitLatch (latch=, 
> wakeEvents=wakeEvents@entry=41, timeout=timeout@entry=1, 
> wait_event_info=wait_event_info@entry=150994945) at latch.c:411
> #4  0x005a9245 in WaitGetFreeCopyBlock 
> (pcshared_info=pcshared_info@entry=0x7f26d2ed3580) at copyparallel.c:1546
> #5  0x005a98ce in SetRawBufForLoad (cstate=cstate@entry=0x2978a88, 
> line_size=67108864, copy_buf_len=copy_buf_len@entry=65536, 
> raw_buf_ptr=raw_buf_ptr@entry=65536,
> copy_raw_buf=copy_raw_buf@entry=0x7fff4cdc0e18) at copyparallel.c:1572
> #6  0x005a1963 in CopyReadLineText (cstate=cstate@entry=0x2978a88) at 
> copy.c:4058
> #7  0x005a4e76 in CopyReadLine (cstate=cstate@entry=0x2978a88) at 
> copy.c:3863
>
> Worker stack:
> #0  GetLinePosition (cstate=cstate@entry=0x29e1f28) at copyparallel.c:1474
> #1  0x005a8aa4 in CacheLineInfo (cstate=cstate@entry=0x29e1f28, 
> buff_count=buff_count@entry=0) at copyparallel.c:711
> #2  0x005a8e46 in GetWorkerLine (cstate=cstate@entry=0x29e1f28) at 
> copyparallel.c:885
> #3  0x005a4f2e in NextCopyFromRawFields 
> (cstate=cstate@entry=0x29e1f28, fields=fields@entry=0x7fff4cdc0b48, 
> nfields=nfields@entry=0x7fff4cdc0b44) at copy.c:3615
> #4  0x005a50af in NextCopyFrom (cstate=cstate@entry=0x29e1f28, 
> econtext=econtext@entry=0x2a358d8, values=0x2a42068, nulls=0x2a42070) at 
> copy.c:3696
> #5  0x005a5b90 in CopyFrom (cstate=cstate@entry=0x29e1f28) at 
> copy.c:2985
>

Thanks for providing your thoughts. I have analyzed this issue and I'm
working on the fix for this, I will be posting a patch for this
shortly.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




hard readable explain due JIT

2020-11-07 Thread Pavel Stehule
Hi

I had to solve a slow due very slow JIT preparing. I found very unpleased
situation where some subplan was strangely slow

 ->  Hash Anti Join  (cost=11.35..19.41 rows=143 width=3011) (actual
time=5039.022..5039.105 rows=203 loops=1)
   Hash Cond: (term_p.id_i = term_d_1.id_i)
   ->  Seq Scan on public.term_p  (cost=0.00..6.03 rows=203 width=24)
(actual time=5038.980..5039.014 rows=203 loops=1)
 Filter: (term_p.to_h IS NULL)
   ->  Hash  (cost=10.60..10.60 rows=60 width=16) (actual
time=0.008..0.009 rows=0 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 8kB
->  Seq Scan on public.term_d term_d_1  (cost=0.00..10.60
rows=60 width=16) (actual time=0.008..0.008 rows=0 loops=1)

It looks very strange - why does a scan of 203 rows need 5 sec?

There is a overhead of JIT but on different place of EXPLAIN

JIT:
  Functions: 416
  Options: Inlining true, Optimization true, Expressions true, Deforming
true
  Timing: Generation 61.002 ms, Inlining 67.897 ms, Optimization 2853.125
ms, Emission 2116.233 ms, Total 5098.258 ms

With some deduction we can detect so slow seq scan is probably JIT ??

But some tools like https://explain.depesz.com displays very strange
statistics then - so JIT overhead should be displayed in related plans too.

Regards

Pavel
<>


Re: Move catalog toast table and index declarations

2020-11-07 Thread Peter Eisentraut

On 2020-11-05 12:59, John Naylor wrote:
And yes, this doesn't materially change the patch, it's just nitpicking 
:-) . Materially, I believe it's fine.


OK, committed.

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/