Re: Re: parallel distinct union and aggregate support patch
On Tue, Nov 3, 2020 at 6:06 PM Dilip Kumar wrote: > > On Thu, Oct 29, 2020 at 12:53 PM bu...@sohu.com wrote: > > > > > 1) It's better to always include the whole patch series - including the > > > parts that have not changed. Otherwise people have to scavenge the > > > thread and search for all the pieces, which may be a source of issues. > > > Also, it confuses the patch tester [1] which tries to apply patches from > > > a single message, so it will fail for this one. > > Pathes 3 and 4 do not rely on 1 and 2 in code. > > But, it will fail when you apply the apatches 3 and 4 directly, because > > they are written after 1 and 2. > > I can generate a new single patch if you need. > > > > > 2) I suggest you try to describe the goal of these patches, using some > > > example queries, explain output etc. Right now the reviewers have to > > > reverse engineer the patches and deduce what the intention was, which > > > may be causing unnecessary confusion etc. If this was my patch, I'd try > > > to create a couple examples (CREATE TABLE + SELECT + EXPLAIN) showing > > > how the patch changes the query plan, showing speedup etc. > > I written some example queries in to regress, include "unique" "union" > > "group by" and "group by grouping sets". > > here is my tests, they are not in regress > > ```sql > > begin; > > create table gtest(id integer, txt text); > > insert into gtest select t1.id,'txt'||t1.id from (select > > generate_series(1,1000*1000) id) t1,(select generate_series(1,10) id) t2; > > analyze gtest; > > commit; > > set jit = off; > > \timing on > > ``` > > normal aggregate times > > ``` > > set enable_batch_hashagg = off; > > explain (costs off,analyze,verbose) > > select sum(id),txt from gtest group by txt; > > QUERY PLAN > > - > > Finalize GroupAggregate (actual time=6469.279..8947.024 rows=100 > > loops=1) > >Output: sum(id), txt > >Group Key: gtest.txt > >-> Gather Merge (actual time=6469.245..8165.930 rows=158 loops=1) > > Output: txt, (PARTIAL sum(id)) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Sort (actual time=6356.471..7133.832 rows=53 loops=3) > >Output: txt, (PARTIAL sum(id)) > >Sort Key: gtest.txt > >Sort Method: external merge Disk: 11608kB > >Worker 0: actual time=6447.665..7349.431 rows=317512 loops=1 > > Sort Method: external merge Disk: 10576kB > >Worker 1: actual time=6302.882..7061.157 rows=01 loops=1 > > Sort Method: external merge Disk: 2kB > >-> Partial HashAggregate (actual time=2591.487..4430.437 > > rows=53 loops=3) > > Output: txt, PARTIAL sum(id) > > Group Key: gtest.txt > > Batches: 17 Memory Usage: 4241kB Disk Usage: 113152kB > > Worker 0: actual time=2584.345..4486.407 rows=317512 > > loops=1 > >Batches: 17 Memory Usage: 4241kB Disk Usage: > > 101392kB > > Worker 1: actual time=2584.369..4393.244 rows=01 > > loops=1 > >Batches: 17 Memory Usage: 4241kB Disk Usage: > > 112832kB > > -> Parallel Seq Scan on public.gtest (actual > > time=0.691..603.990 rows=333 loops=3) > >Output: id, txt > >Worker 0: actual time=0.104..607.146 > > rows=3174970 loops=1 > >Worker 1: actual time=0.100..603.951 > > rows=3332785 loops=1 > > Planning Time: 0.226 ms > > Execution Time: 9021.058 ms > > (29 rows) > > > > Time: 9022.251 ms (00:09.022) > > > > set enable_batch_hashagg = on; > > explain (costs off,analyze,verbose) > > select sum(id),txt from gtest group by txt; > >QUERY PLAN > > - > > Gather (actual time=3116.666..5740.826 rows=100 loops=1) > >Output: (sum(id)), txt > >Workers Planned: 2 > >Workers Launched: 2 > >-> Parallel BatchHashAggregate (actual time=3103.181..5464.948 > > rows=33 loops=3) > > Output: sum(id), txt > > Group Key: gtest.txt > > Worker 0: actual time=3094.550..5486.992 rows=326082 loops=1 > > Worker 1: actual time=3099.562..5480.111 rows=324729 loops=1 > > -> Parallel Seq Scan on public.gtest (actual time=0.791..656.601 > > rows=333 loops=3) > >Output: id, txt > >Worker 0: actual time=0.080..646.053 rows=3057680 loops=1 > >Worker 1: actual time=0.070..662.754 rows=3034370 loops=1 > > Planning Time: 0.243 ms > > Execution Time: 5788.981
Re: Yet another (minor) fix in BRIN
Alvaro Herrera writes: > On 2020-Nov-08, Tomas Vondra wrote: >> While rebasing some of the BRIN patches, I noticed some of the code in >> brin_memtuple_initialize is duplicated. This happened in 8bf74967dab >> which moved some of the code from brin_new_memtuple, not removing the >> shared pieces. In practice this is benign, of course. >> >> Barring objections I'll get the attached fix committed and backpatched. > LGTM, thanks for noticing. The weekend before stable-branch releases is probably not the best time to be pushing "minor" fixes into those branches. I got my fingers burned today, and so did Peter. Don't follow our example ;-) regards, tom lane
Re: Yet another (minor) fix in BRIN
On 2020-Nov-08, Tomas Vondra wrote: > While rebasing some of the BRIN patches, I noticed some of the code in > brin_memtuple_initialize is duplicated. This happened in 8bf74967dab > which moved some of the code from brin_new_memtuple, not removing the > shared pieces. In practice this is benign, of course. > > Barring objections I'll get the attached fix committed and backpatched. LGTM, thanks for noticing.
Yet another (minor) fix in BRIN
Hi, While rebasing some of the BRIN patches, I noticed some of the code in brin_memtuple_initialize is duplicated. This happened in 8bf74967dab which moved some of the code from brin_new_memtuple, not removing the shared pieces. In practice this is benign, of course. Barring objections I'll get the attached fix committed and backpatched. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From d97e3160921caa6d2191b7e8539d4d4f4816219c Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 8 Nov 2020 02:11:45 +0100 Subject: [PATCH] Remove duplicate code in brin_memtuple_initialize Commit 8bf74967dab moved some of the code from brin_new_memtuple to brin_memtuple_initialize, but this resulted in some of the code being duplicate. Fix by removing the duplicate lines and backpatch to 10. Author: Tomas Vondra Backpatch-through: 10 Discussion: TBD --- src/backend/access/brin/brin_tuple.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/backend/access/brin/brin_tuple.c b/src/backend/access/brin/brin_tuple.c index 6774f597a4..17e50de530 100644 --- a/src/backend/access/brin/brin_tuple.c +++ b/src/backend/access/brin/brin_tuple.c @@ -491,9 +491,6 @@ brin_memtuple_initialize(BrinMemTuple *dtuple, BrinDesc *brdesc) sizeof(BrinValues) * brdesc->bd_tupdesc->natts); for (i = 0; i < brdesc->bd_tupdesc->natts; i++) { - dtuple->bt_columns[i].bv_allnulls = true; - dtuple->bt_columns[i].bv_hasnulls = false; - dtuple->bt_columns[i].bv_attno = i + 1; dtuple->bt_columns[i].bv_allnulls = true; dtuple->bt_columns[i].bv_hasnulls = false; -- 2.26.2
Re: First-draft release notes for back branches are up
On 11/7/20 11:21 PM, Tom Lane wrote: > Tomas Vondra writes: >> On 11/7/20 3:52 AM, Tom Lane wrote: >>> Do you have some suggested text? > >> I think this might work: > > I dunno ... given that we have zero field reports, I doubt this is > something we need to tell every BRIN user to do. The text I put in > earlier today just recommends reindexing if you see the error. > Sorry, I haven't noticed you already wrote something :-( I agree it's enough to recommend reindexing only when there's an error. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Rethinking LOCK TABLE's behavior on views
On Sat, Nov 07, 2020 at 11:57:20AM -0500, Tom Lane wrote: > The problems discussed in bug #16703 [1] show that pg_dump needs a > version of LOCK TABLE that behaves differently for views than > what we have now. Since v11, LOCK TABLE on a view recurses to all > tables and views named in the view, and it does so using the view > owner's permissions, meaning that a view that would have permissions > failures if executed will also have permissions failures when locked. > That's probably fine for ordinary usage, but it's disastrous for > pg_dump --- even a superuser can't lock such a view. > > Moreover, pg_dump doesn't really need the recursive behavior. It just > needs the view's definition to hold still; and in any case, a typical > pg_dump run would have independently acquired locks on all the other > relations anyway. The recursion is buying us nothing, except perhaps > an increased risk of deadlocks against concurrent DDL operations. The getTables() locking aims to take the locks that will be taken later. That avoids failing after expensive work. For views, the later lock-taker is pg_get_viewdef(), which locks more than just the view but less than[2] LOCK TABLE. Recursion buys us more than nothing for "pg_dump --table=viewname", so abandoning recursion unconditionally is a step in the wrong direction. I don't expect --table to be as excellent as complete dumps, but a change that makes it worse does lose points. I want to keep the recursion. > (I'm not quite sure if that's significant, given that pg_dump pays > no attention to the order in which it locks things. But it sure as > heck isn't *decreasing* the risk; and it's a behavior that we could > not hope to improve with more smarts about pg_dump's lock ordering.) Reordering to avoid deadlocks would be best-effort, so it's fine not to have full control over the order. > Closely related to this is whether pg_dump ought to be using ONLY for > locking regular tables too. I tend to think that it should be, again > on the grounds that any child tables we may be interested in will get > locked separately, so that we're not doing anything by recursing except > expending extra cycles and perhaps increasing the chance of a deadlock. Agreed. "pg_dump --table=inheritance_parent" never queries inheritance children, so it's nice not to lock them. > A completely different approach we could consider is to weaken the > permissions requirements for LOCK on a view, say "allow it if either > the calling user or the view owner has the needed permission". This > seems generally pretty messy and so I don't much like it, but we > should consider as many solutions as we can think of. This is the best of what you've listed by a strong margin, and I don't know of better options you've not listed. +1 for it. Does it work for you? I think the mess arises from LOCK TABLE serving "get locks sufficient for $ACTIONS" as a family of use cases. For views only, different $ACTIONS want different behavior. $ACTIONS==SELECT wants today's behavior; pg_get_viewdef() wants shallower recursion and caller permissions; DROP VIEW wants no recursion. > [1] > https://www.postgresql.org/message-id/flat/16703-e348f58aab3cf6cc%40postgresql.org [2] For example, pg_get_viewdef('pg_user') locks pg_shadow, but "LOCK TABLE pg_user" additionally locks pg_authid and pg_db_role_setting.
Re: PG13: message style changes
Thanks for looking! Pushed.
Re: First-draft release notes for back branches are up
Tomas Vondra writes: > On 11/7/20 3:52 AM, Tom Lane wrote: >> Do you have some suggested text? > I think this might work: I dunno ... given that we have zero field reports, I doubt this is something we need to tell every BRIN user to do. The text I put in earlier today just recommends reindexing if you see the error. regards, tom lane
Re: redundant error messages
On 2020-11-05 13:27, Peter Eisentraut wrote: A few client tools duplicate error messages already provided by libpq, such as pg_rewind: fatal: could not connect to server: could not connect to server: No such file or directory pg_basebackup: error: could not connect to server: could not connect to server: No such file or directory psql: error: could not connect to server: could not connect to server: No such file or directory The psql case is actually a regression introduced in PG12, but the other two appear to be ancient. I have committed fixes for these.
Re: First-draft release notes for back branches are up
Hi, On 11/7/20 3:52 AM, Tom Lane wrote: > Tomas Vondra writes: >> We should probably include instructions what to do about the BRIN >> data corruption fixed by 7577dd8480 - a query to list might be >> affected by the bug and should be rebuilt. > > Do you have some suggested text? > I think this might work: Fix handling of toasted values in BRIN indexes This mistake can result in BRIN indexes referencing toasted values, which may unexpectedly disapper after a cleanup of the table, leading to errors due to missing chunks: ERROR: missing chunk number 0 for toast value 16433 in pg_toast_16426 Fortunately, this issue only affects BRIN indexes on columns with toastable data types. While no index corruption due to this bug is known to have occurred in the field, it is recommended that production installations REINDEX all brin indexes at a convenient time after upgrading to X.X. The list of porentially corrupted BRIN indexes may be obtained using this query: select pg_get_indexdef(oid) from pg_class where (relkind = 'i') and relam = 3580 and exists (select 1 from pg_attribute where attrelid = pg_class.oid and attlen = -1 and attstorage in ('x', 'e')) It might be better to propose CREATE INDEX CONCURRENTLY, but I don't think there is a function to generate that SQL. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: errors when there is a bit literal in ecpg
"Wang, Shenhao" writes: > I met an error as below when I use ecpg > a.pgc:5: ERROR: invalid bit string literal > a.pgc:5: ERROR: internal error: unreachable state; please report this to > Indeed. This has apparently been broken for a very long time (though the "unreachable state" part is fairly new). > I try to fix this bug by deleting 'addlitchar('b');' from source I also add a > test case to test all const str in ecpg. I thought that a whole new test case was overkill when we could just add a couple of lines to an existing test. Other than that, looks good, pushed. regards, tom lane
Re: Additional Chapter for Tutorial
On 2020-11-07 13:24, Jürgen Purtz wrote: Because there have been no more comments in the last days I created a consolidated patch. It contains Erik's suggestion and some tweaks for the text size within graphics. [0011-architecture.patch] Hi, I went through architecture.sgml once more; some proposed changes attached. And in some .svg files I noticed 'jungest' which should be 'youngest', I suppose. I did not change them but below is filelist of grep -l 'jung'. ./doc/src/sgml/images/freeze-ink.svg ./doc/src/sgml/images/freeze-ink-svgo.svg ./doc/src/sgml/images/freeze-raw.svg ./doc/src/sgml/images/wraparound-ink.svg ./doc/src/sgml/images/wraparound-ink-svgo.svg ./doc/src/sgml/images/wraparound-raw.svg Thanks, Erik --- doc/src/sgml/architecture.sgml.orig 2020-11-07 14:05:50.188396026 +0100 +++ doc/src/sgml/architecture.sgml 2020-11-07 20:04:27.890983873 +0100 @@ -24,7 +24,7 @@ This leads to other activities (file access, WAL, vacuum, ...) of the Instance. The Instance is a group of server-side processes acting on a common -Shared Memory. PostgreSQL does not utilize threading. +Shared Memory. PostgreSQL does not use threading. @@ -78,7 +78,7 @@ cache, located in Shared Memory (for details see: ) for the benefit of all processes in the instance. Writes also use this cache, in addition -to a journal, called a write-ahead-log or WAL. +to a journal, called the write-ahead-log or WAL. @@ -90,20 +90,20 @@ must be written to disk. This happens regularly by the Checkpointer and the Background Writer processes to ensure that the disk version of the pages are up-to-date. -The synchronisation from RAM to disk consists of two steps. +The synchronization from RAM to disk consists of two steps. First, whenever the content of a page changes, a WAL record -is created out of the delta-information (difference between the +is created from the delta-information (difference between the old and the new content) and stored in another area of Shared Memory. The parallel running WAL Writer process reads them and appends them to the end of the current WAL file. Such sequential writes are faster than writes to random positions of heap and index files. All WAL records created -out of one dirty page must be transferred to disk before the +from one dirty page must be transferred to disk before the dirty page itself can be transferred to disk in the second step. @@ -123,22 +123,22 @@ Checkpoints. A Checkpoint is a point in time when all older dirty buffers, all older WAL records, and finally a special Checkpoint record -are written and flushed to disk. Heap and index files -on the one hand and WAL files on the other hand are in sync. -Previous WAL is no longer required. In other words, +are written and flushed to disk. Heap and index files, +and WAL files are now in sync. +Older WAL is no longer required. In other words, a possibly occurring recovery, which integrates the delta information of WAL into heap and index files, will happen -by replaying only WAL past the last recorded checkpoint. -This limits the amount of WAL which needs to be replayed +by replaying only WAL past the last-recorded checkpoint. +This limits the amount of WAL to be replayed during recovery in the event of a crash. -While the Checkpointer ensures that the database system can crash -and restart itself in a valid state, the administrator needs +While the Checkpointer ensures that the database system can, +after a crash, restart itself in a valid state, the administrator needs to handle the case where the heap or other files become corrupted (and possibly the locally written WAL, though that is -less common). The options and details are covered extensively +less common). Options and details are covered in the backup and restore section (). For our purposes here, just note that the WAL Archiver process can be enabled and configured to run a script on filled WAL @@ -153,7 +153,7 @@ The Logger writes text lines about serious and less serious -events which can happen during database access, e.g., wrong +events that may happen during database access, e.g., wrong password, no permission, long-running queries, etc. @@ -262,7 +262,7 @@ all tablespace names, and all role names are automatically available throughout the cluster, independent from -the database or schema in which they where defined originally. +the database or schema in which they were defined originally. shows the relation between the object types. @@ -410,13 +410,11 @@ -A first approach to implement protections against concurrent -access to the same data may be the locking of critical rows. -PostgreSQL
Re: pgbench stopped supporting large number of client connections on Windows
Em sáb., 7 de nov. de 2020 às 14:55, Marina Polyakova < m.polyak...@postgrespro.ru> escreveu: > On 2020-11-06 23:54, Ranier Vilela wrote: > > Hi Marina, > > Hello! > > 1) If you mean the function pgwin32_select in the file > src/backend/port/win32/socket.c, IIUC it is only used in the backend, > see src/include/port/win32_port.h: > > #ifndef FRONTEND > <...> > #define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout) > <...> > #endif /* FRONTEND */ > Yes. My mistake, you right here. > 2) It looks like FD_SETSIZE does not set a limit on the socket value on > Windows, see > > https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 > : > > The maximum number of sockets that a Windows Sockets application can use > is not affected by the manifest constant FD_SETSIZE. This value defined > in the Winsock2.h header file is used in constructing the FD_SET > structures used with select function. > Correct. It seems that the limit will be defined by compilation, before the inclusion of Winsock2.h. Have you tried to define -DFD_SETSIZE=2048 best regards, Ranier Vilela
Re: pgbench stopped supporting large number of client connections on Windows
On 2020-11-06 23:54, Ranier Vilela wrote: Hi Marina, Hello! Nice catch. Thank you! rc/bin/pgbench/pgbench.c, the function add_socket_to_set: if (fd < 0 || fd >= FD_SETSIZE) { /* * Doing a hard exit here is a bit grotty, but it doesn't seem worth * complicating the API to make it less grotty. */ pg_log_fatal("too many client connections for select()"); exit(1); } It seems to me that the limit is hardcode in, src/backend/port/win32/socket.c FD_SETSIZE * 2 that would be 2048? 1) If you mean the function pgwin32_select in the file src/backend/port/win32/socket.c, IIUC it is only used in the backend, see src/include/port/win32_port.h: #ifndef FRONTEND <...> #define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout) <...> #endif /* FRONTEND */ 2) It looks like FD_SETSIZE does not set a limit on the socket value on Windows, see https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 : The maximum number of sockets that a Windows Sockets application can use is not affected by the manifest constant FD_SETSIZE. This value defined in the Winsock2.h header file is used in constructing the FD_SET structures used with select function. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
RE: pgbench: option delaying queries till connections establishment?
Hello, Indeed. I took your next patch with an added explanation. I'm unclear whether proceeding makes much sense though, that is some thread would run the test and other would have aborted. Hmmm. Your comment looks good, thanks. In the previous version pgbench starts benchmarking even if some connections fail. And users can notice the connection failure by stderr output. Hence the current specification may be enough. Usually I run many pgbench through scripts, so I'm probably not there to check a lone stderr failure at the beginning if performance figures are actually reported. If you agree, please remove the following lines: ``` +* It is unclear whether it is worth doing anything rather than +* coldly exiting with an error message. ``` I can remove the line, but I strongly believe that reporting performance figures if some client connection failed thus the bench could not run as prescribed is a bad behavior. The good news is that it is probably quite unlikely. So I'd prefer to keep it and possibly submit a patch to change the behavior. ISTM that there is another patch in the queue which needs barriers to delay some initialization so as to fix a corner case bug, in which case the behavior would be mandatory. The current submission could add an option to skip the barrier synchronization, but I'm not enthousiastic to add an option and remove it shortly later. Could you tell me which patch you mention? Basically I agree what you say, but I want to check it. Should be this one: https://commitfest.postgresql.org/30/2624/, -- Fabien.
Rethinking LOCK TABLE's behavior on views
The problems discussed in bug #16703 [1] show that pg_dump needs a version of LOCK TABLE that behaves differently for views than what we have now. Since v11, LOCK TABLE on a view recurses to all tables and views named in the view, and it does so using the view owner's permissions, meaning that a view that would have permissions failures if executed will also have permissions failures when locked. That's probably fine for ordinary usage, but it's disastrous for pg_dump --- even a superuser can't lock such a view. Moreover, pg_dump doesn't really need the recursive behavior. It just needs the view's definition to hold still; and in any case, a typical pg_dump run would have independently acquired locks on all the other relations anyway. The recursion is buying us nothing, except perhaps an increased risk of deadlocks against concurrent DDL operations. (I'm not quite sure if that's significant, given that pg_dump pays no attention to the order in which it locks things. But it sure as heck isn't *decreasing* the risk; and it's a behavior that we could not hope to improve with more smarts about pg_dump's lock ordering.) So we need a way to turn that off. What I proposed in that thread was > (1) Make pg_dump use LOCK TABLE ONLY, not LOCK TABLE. > (2) Make LOCK TABLE ONLY on a view not recurse to the view's dependencies. which would each be quite trivial to do. An objection to this approach is that ONLY typically controls recursion to a table's inheritance or partitioning children, which is a different animal from recursion to a view's dependencies. That argument would lead to wanting some other syntax to control this. I do not find that argument compelling enough to justify making pg_dump deal with two different commands depending on the relation's relkind, but it is a plausible concern. Closely related to this is whether pg_dump ought to be using ONLY for locking regular tables too. I tend to think that it should be, again on the grounds that any child tables we may be interested in will get locked separately, so that we're not doing anything by recursing except expending extra cycles and perhaps increasing the chance of a deadlock. A completely different approach we could consider is to weaken the permissions requirements for LOCK on a view, say "allow it if either the calling user or the view owner has the needed permission". This seems generally pretty messy and so I don't much like it, but we should consider as many solutions as we can think of. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/16703-e348f58aab3cf6cc%40postgresql.org
Re: Yet another fast GiST build
> 5 нояб. 2020 г., в 22:20, Justin Pryzby написал(а): > > On Thu, Nov 05, 2020 at 10:11:52PM +0500, Andrey Borodin wrote: >> To test that functions are actually called for sorting build we should >> support directive sorting build like "CREATE INDEX ON A USING GIST(B) >> WITH(sorting=surely,and fail if not)". > > Maybe you could add a DEBUG1 message for that, and include that in regression > tests, which would then fail if sorting wasn't used. That's a good idea. Thanks! > > Maybe you'd need to make it consistent by setting GUCs like work_mem / > max_parallel_maintenance_workers / ?? > > Similar to this > > postgres=# SET client_min_messages =debug; > postgres=# CREATE INDEX ON A USING GIST(i) WITH(buffering=off); > DEBUG: building index "a_i_idx2" on table "a" serially > CREATE INDEX Currently, only B-tree uses parallel build, so no need to tweak GUCs except client_min_messages. Before these tests, actually, ~20% of opclasses were not working as expected. Despite I've checked each one by hand. I have PFA patch with fixed comments from Heikki. Thanks! Best regards, Andrey Borodin. v4-0001-Sortsupport-for-sorting-GiST-build-for-gist_btree.patch Description: Binary data
Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
On Fri, Nov 6, 2020 at 11:00 PM Fujii Masao wrote: > > >> I'm not quite sure to replace all the places in the walreceiver > >> process, for instance in WalRcvForceReply() we are using spinlock to > >> acquire the latch pointer and . Others may have better thoughts on > >> this. > > > > The SIGTERM part looks good. The only difference between > > WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch > > is set or not. I don't think it's a problem that config-reload causes > > an extra wakeup. Couldn't we do the same thing for SIGHUP? > > I agree that we can use even standard SIGHUP handler in walreceiver. > I'm not sure why SetLatch() was not called in walreceiver's SIGHUP > handler so far. > The main reason for having SetLatch() in SignalHandlerForConfigReload() is to wake up the calling process if waiting in WaitLatchOrSocket() or WaitLatch() and reload the new config file and use the reloaded config variables. Maybe we should give a thought on the scenarios in which the walreceiver process waits, and what happens in case the latch is set when SIGHUP is received. And also, I think it's worth having a look at the commit 40f908bdcdc7 that introduced WalRcvSigHupHandler() and 597a87ccc that replaced custom latch with procLatch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Parallel copy
On Thu, Nov 5, 2020 at 6:33 PM Hou, Zhijie wrote: > > Hi > > > > > my $bytes = $ARGV[0]; > > for(my $i = 0; $i < $bytes; $i+=8){ > > print "longdata"; > > } > > print "\n"; > > > > > > postgres=# copy longdata from program 'perl /tmp/longdata.pl 1' > > with (parallel 2); > > > > This gets stuck forever (or at least I didn't have the patience to wait > > it finish). Both worker processes are consuming 100% of CPU. > > I had a look over this problem. > > the ParallelCopyDataBlock has size limit: > uint8 skip_bytes; > chardata[DATA_BLOCK_SIZE]; /* data read from file */ > > It seems the input line is so long that the leader process run out of the > Shared memory among parallel copy workers. > And the leader process keep waiting free block. > > For the worker process, it wait util line_state becomes LINE_LEADER_POPULATED, > But leader process won't set the line_state unless it read the whole line. > > So it stuck forever. > May be we should reconsider about this situation. > > The stack is as follows: > > Leader stack: > #3 0x0075f7a1 in WaitLatch (latch=, > wakeEvents=wakeEvents@entry=41, timeout=timeout@entry=1, > wait_event_info=wait_event_info@entry=150994945) at latch.c:411 > #4 0x005a9245 in WaitGetFreeCopyBlock > (pcshared_info=pcshared_info@entry=0x7f26d2ed3580) at copyparallel.c:1546 > #5 0x005a98ce in SetRawBufForLoad (cstate=cstate@entry=0x2978a88, > line_size=67108864, copy_buf_len=copy_buf_len@entry=65536, > raw_buf_ptr=raw_buf_ptr@entry=65536, > copy_raw_buf=copy_raw_buf@entry=0x7fff4cdc0e18) at copyparallel.c:1572 > #6 0x005a1963 in CopyReadLineText (cstate=cstate@entry=0x2978a88) at > copy.c:4058 > #7 0x005a4e76 in CopyReadLine (cstate=cstate@entry=0x2978a88) at > copy.c:3863 > > Worker stack: > #0 GetLinePosition (cstate=cstate@entry=0x29e1f28) at copyparallel.c:1474 > #1 0x005a8aa4 in CacheLineInfo (cstate=cstate@entry=0x29e1f28, > buff_count=buff_count@entry=0) at copyparallel.c:711 > #2 0x005a8e46 in GetWorkerLine (cstate=cstate@entry=0x29e1f28) at > copyparallel.c:885 > #3 0x005a4f2e in NextCopyFromRawFields > (cstate=cstate@entry=0x29e1f28, fields=fields@entry=0x7fff4cdc0b48, > nfields=nfields@entry=0x7fff4cdc0b44) at copy.c:3615 > #4 0x005a50af in NextCopyFrom (cstate=cstate@entry=0x29e1f28, > econtext=econtext@entry=0x2a358d8, values=0x2a42068, nulls=0x2a42070) at > copy.c:3696 > #5 0x005a5b90 in CopyFrom (cstate=cstate@entry=0x29e1f28) at > copy.c:2985 > Thanks for providing your thoughts. I have analyzed this issue and I'm working on the fix for this, I will be posting a patch for this shortly. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
hard readable explain due JIT
Hi I had to solve a slow due very slow JIT preparing. I found very unpleased situation where some subplan was strangely slow -> Hash Anti Join (cost=11.35..19.41 rows=143 width=3011) (actual time=5039.022..5039.105 rows=203 loops=1) Hash Cond: (term_p.id_i = term_d_1.id_i) -> Seq Scan on public.term_p (cost=0.00..6.03 rows=203 width=24) (actual time=5038.980..5039.014 rows=203 loops=1) Filter: (term_p.to_h IS NULL) -> Hash (cost=10.60..10.60 rows=60 width=16) (actual time=0.008..0.009 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on public.term_d term_d_1 (cost=0.00..10.60 rows=60 width=16) (actual time=0.008..0.008 rows=0 loops=1) It looks very strange - why does a scan of 203 rows need 5 sec? There is a overhead of JIT but on different place of EXPLAIN JIT: Functions: 416 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 61.002 ms, Inlining 67.897 ms, Optimization 2853.125 ms, Emission 2116.233 ms, Total 5098.258 ms With some deduction we can detect so slow seq scan is probably JIT ?? But some tools like https://explain.depesz.com displays very strange statistics then - so JIT overhead should be displayed in related plans too. Regards Pavel <>
Re: Move catalog toast table and index declarations
On 2020-11-05 12:59, John Naylor wrote: And yes, this doesn't materially change the patch, it's just nitpicking :-) . Materially, I believe it's fine. OK, committed. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/