Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables
Hi Amit, On 2017/07/24 14:09, Amit Khandekar wrote: >>> On 2017/07/10 14:15, Etsuro Fujita wrote: >>> Another thing I noticed is the error handling in ExecWithCheckOptions; it >>> doesn't take any care for partition tables, so the column description in >>> the error message for WCO_VIEW_CHECK is built using the partition table's >>> rowtype, not the root table's. But I think it'd be better to build that >>> using the root table's rowtype, like ExecConstraints, because that would >>> make the column description easier to understand since the parent view(s) >>> (from which WithCheckOptions evaluated there are created) would have been >>> defined on the root table. This seems independent from the above issue, >>> so I created a separate patch, which I'm attaching. What do you think >>> about that? > > + if (map != NULL) > + { > + tuple = do_convert_tuple(tuple, map); > + ExecStoreTuple(tuple, slot, InvalidBuffer, false); > + } > > Above, the tuple descriptor also needs to be set, since the parent and > child partitions can have different column ordering. > > Due to this, the following testcase crashes : > > CREATE TABLE range_parted (a text,b int, c int) partition by range (b); > CREATE VIEW upview AS SELECT * FROM range_parted WHERE (select c > > 120) WITH CHECK OPTION; > create table part_a_1_a_10(b int, c int, a text); > alter table range_parted attach partition part_a_1_a_10 for values > from (1) to (10); > insert into upview values ('a', 2, 100); Good catch. Thanks for creating the patch. There are other places with similar code viz. those in ExecConstraints() that would need fixing. Without the same, the following causes a crash (building on your example): alter table range_parted add constraint check_c check (c > 120); insert into range_parted values ('a', 2, 100); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Attached is the updated version of your patch. A test is also added in insert.sql on lines of the above example. Will add this to the PG 10 open items list. Thanks, Amit diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index b22de78516..040e9a916a 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1956,6 +1956,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo, if (map != NULL) { tuple = do_convert_tuple(tuple, map); + ExecSetSlotDescriptor(slot, tupdesc); ExecStoreTuple(tuple, slot, InvalidBuffer, false); } } @@ -2003,6 +2004,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo, if (map != NULL) { tuple = do_convert_tuple(tuple, map); + ExecSetSlotDescriptor(slot, tupdesc); ExecStoreTuple(tuple, slot, InvalidBuffer, false); } } @@ -2112,6 +2114,7 @@ ExecWithCheckOptions(WCOKind kind, ResultRelInfo *resultRelInfo, if (map != NULL) { tuple = do_convert_tuple(tuple, map); + ExecSetSlotDescriptor(slot, tupdesc); ExecStoreTuple(tuple, slot, InvalidBuffer, false); } } diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out index c608ce377f..0eaa47fb2b 100644 --- a/src/test/regress/expected/insert.out +++ b/src/test/regress/expected/insert.out @@ -410,6 +410,14 @@ with ins (a, b, c) as mlparted4 | 1 | 30 | 39 (5 rows) +alter table mlparted add c text; +create table mlparted5 (c text, a int not null, b int not null); +alter table mlparted attach partition mlparted5 for values from (1, 40) to (1, 50); +alter table mlparted add constraint check_b check (a = 1 and b < 45); +insert into mlparted values (1, 45, 'bah'); +ERROR: new row for relation "mlparted5" violates check constraint "check_b" +DETAIL: Failing row contains (1, 45, bah). +drop table mlparted5; -- check that message shown after failure to find a partition shows the -- appropriate key description (or none) in various situations create table key_desc (a int, b int) partition by list ((a+0)); diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/upd
Re: [HACKERS] pl/perl extension fails on Windows
On Wed, Jul 19, 2017 at 05:01:31PM -0400, Tom Lane wrote: > Ashutosh Sharma writes: > > Here are the list of macros and variables from 'intrpvar.h' file that > > are just defined in perl module but not in plperl on Windows, > > > #ifdef PERL_USES_PL_PIDSTATUS > > PERLVAR(I, pidstatus, HV *) /* pid-to-status mappings for waitpid */ > > #endif > > > #ifdef PERL_SAWAMPERSAND > > PERLVAR(I, sawampersand, U8)/* must save all match strings */ > > #endif > > I am really suspicious that this means your libperl was built in an unsafe > fashion, that is, by injecting configuration choices as random -D switches > in the build process rather than making sure the choices were recorded in > perl's config.h. As an example, looking at the perl 5.24.1 headers on > a Fedora box, it looks to me like PERL_SAWAMPERSAND could only get defined > if PERL_COPY_ON_WRITE were not defined, and the only way that that can > happen is if PERL_NO_COW is defined, and there are no references to the > latter anyplace except in this particular #if defined test in perl.h. > > Where did your perl installation come from, anyway? Are you sure the .h > files match up with the executables? I see corresponding symptoms with the following Perl distributions: strawberry-perl-5.26.0.1-64bit.msi: src/pl/plperl/Util.c: loadable library and perl binaries are mismatched (got handshake key 11800080, needed 11c00080) ActivePerl-5.24.1.2402-MSWin32-x64-401627.exe: src/pl/plperl/Util.c: loadable library and perl binaries are mismatched (got handshake key 11500080, needed 11900080) So, this affects each of the two prominent families of Perl Windows binaries. Notes for anyone trying to reproduce: - Both of those Perl distributions require the hacks described here: https://www.postgresql.org/message-id/flat/CABcP5fjEjgOsh097cWnQrsK9yCswo4DZxp-V47DKCH-MxY9Gig%40mail.gmail.com - Add PERL_USE_UNSAFE_INC=1 to the environment until we update things to cope with this Perl 5.26 change: http://search.cpan.org/~xsawyerx/perl-5.26.0/pod/perldelta.pod#Removal_of_the_current_directory_(%22.%22)_from_@INC -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] segfault in HEAD when too many nested functions call
On Tue, Jul 18, 2017 at 01:04:10PM -0700, Andres Freund wrote: > Ok, I'll flesh out the patch till Thursday. But I do think we're going > to have to do something about the back branches, too. This PostgreSQL 10 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On 13 July 2017 at 22:39, Amit Khandekar wrote: > Attached is a WIP patch (make_resultrels_ordered.patch) that generates > the result rels in canonical order. This patch is kept separate from > the update-partition-key patch, and can be applied on master branch. Attached update-partition-key_v13.patch now contains this make_resultrels_ordered.patch changes. So now that the per-subplan result rels and the leaf partition oids that are generated for tuple routing are both known to have the same order (cannonical), in ExecSetupPartitionTupleRouting(), we look for the per-subplan results without the need for a hash table. Instead of the hash table, we iterate over the leaf partition oids and at the same time keep shifting a position over the per-subplan resultrels whenever the resultrel at the position is found to be present in the leaf partitions list. Since the two lists are in the same order, we never have to again scan the portition of the lists that is already scanned. I considered whether the issue behind this recent commit might be relevant for update tuple-routing as well : commit f81a91db4d1c2032632aa5df9fc14be24f5fe5ec Author: Robert Haas Date: Mon Jul 17 21:29:45 2017 -0400 Use a real RT index when setting up partition tuple routing. Since we know that using a dummy 1 value for tuple routing result rels is not correct, I am checking about another possibility : Now in the latest patch, the tuple routing partitions would have a mix of a) existing update result-rels, and b) new partition resultrels. 'b' resultrels would have the RT index of nominalRelation, but the existing 'a' resultrels would have their own different RT indexes. I suspect, this might surface a similar issue that was fixed by the above commit, for e.g. with the WITH query having UPDATE subqueries doing tuple routing. Will check that. This patch also has Robert's changes in the planner to decide whether to do update tuple routing. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company update-partition-key_v13.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables
>> On 2017/07/10 14:15, Etsuro Fujita wrote: >> Another thing I noticed is the error handling in ExecWithCheckOptions; it >> doesn't take any care for partition tables, so the column description in >> the error message for WCO_VIEW_CHECK is built using the partition table's >> rowtype, not the root table's. But I think it'd be better to build that >> using the root table's rowtype, like ExecConstraints, because that would >> make the column description easier to understand since the parent view(s) >> (from which WithCheckOptions evaluated there are created) would have been >> defined on the root table. This seems independent from the above issue, >> so I created a separate patch, which I'm attaching. What do you think >> about that? + if (map != NULL) + { + tuple = do_convert_tuple(tuple, map); + ExecStoreTuple(tuple, slot, InvalidBuffer, false); + } Above, the tuple descriptor also needs to be set, since the parent and child partitions can have different column ordering. Due to this, the following testcase crashes : CREATE TABLE range_parted (a text,b int, c int) partition by range (b); CREATE VIEW upview AS SELECT * FROM range_parted WHERE (select c > 120) WITH CHECK OPTION; create table part_a_1_a_10(b int, c int, a text); alter table range_parted attach partition part_a_1_a_10 for values from (1) to (10); insert into upview values ('a', 2, 100); Attached is a patch that sets the tuple descriptor. Also in the patch, in test updatable_view.sql, I have added a varchar column in one of the partitioned tables used in updatable_views.sql, so as to cover this scenario. Without setting the tuple descriptor, the output of the patched updatable_views.sql shows junk value in one of the columns of the row in the error message emitted for the WithCheckOption violation. Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company set_slot_descriptor.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING
On Fri, Jul 21, 2017 at 10:39 PM, Tom Lane wrote: > Ashutosh Bapat writes: >> On Fri, Jul 21, 2017 at 10:55 AM, Kyotaro HORIGUCHI >> wrote: >>> The attached patch differs only in this point. > >> +1. The patch looks good to me. > > Pushed with a couple additional changes: we'd all missed that the header > comment for GetConnection was obsoleted by this change, and the arguments > for GetSysCacheHashValue really need to be coerced to Datum. (I think > OID to Datum is the same as what the compiler would do anyway, but best > not to assume that.) Thanks and sorry for not noticing the prologue. > > Back-patching was more exciting than I could wish. It seems that > before 9.6, we did not have struct UserMapping storing the OID of the > pg_user_mapping row it had been made from. I changed GetConnection to > re-look-up that row and get the OID. But that's ugly, and there's a > race condition: if user mappings are being added or deleted meanwhile, > we might locate a per-user mapping when we're really using a PUBLIC > mapping or vice versa, causing the ConnCacheEntry to be labeled with > the wrong hash value so that it might not get invalidated properly later. > Still, it's significantly better than it was, and that corner case seems > unlikely to get hit in practice --- for one thing, you'd have to then > revert the mapping addition/deletion before the ConnCacheEntry would be > found and used again. I don't want to take the risk of modifying struct > UserMapping in stable branches, so it's hard to see a way to make that > completely bulletproof before 9.6. +1. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server
On Sat, Jul 22, 2017 at 8:04 AM, Stephen Frost wrote: > Masahiko, all, > > * Masahiko Sawada (sawada.m...@gmail.com) wrote: >> On Tue, Jul 18, 2017 at 1:28 PM, Stephen Frost wrote: >> > Masahiko, Michael, >> > >> > * Masahiko Sawada (sawada.m...@gmail.com) wrote: >> >> > This is beginning to shape. >> >> >> >> Sorry, I missed lots of typo in the last patch. All comments from you >> >> are incorporated into the attached latest patch and I've checked it >> >> whether there is other typos. Please review it. >> > >> > I've taken an initial look through the patch and it looks pretty >> > reasonable. I need to go over it in more detail and work through >> > testing it myself next. >> > >> > I expect to commit this (with perhaps some minor tweaks primairly around >> > punctuation/wording), barring any issues found, on Wednesday or Thursday >> > of this week. >> >> I understood. Thank you for looking at this! > > I started discussing this with David off-list and he'd like a chance to > review it in a bit more detail (he's just returning from being gone for > a few weeks). That review will be posted to this thread on Monday, and > I'll wait until then to move forward with the patch. > > Next update will be before Tuesday, July 25th, COB. > Thank you! I'll start to revise the patch as soon as I got review comments. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
On 2017/07/21 19:16, Etsuro Fujita wrote: On 2017/07/20 11:21, Etsuro Fujita wrote: On 2017/07/19 23:36, Tom Lane wrote: Please put the responsibility of doing const-expression simplification in these cases somewhere closer to where the problem is being created. It would be reasonable that it's the FDW's responsibility to do that const-simplification if necessary? There seems to be no objections, so I removed the const-expression simplification from the patch and I added the note to the docs for AddForeignUpdateTargets. Attached is an updated version of the patch. I cleaned up the patch a bit. PFA a new version of the patch. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 6962,6967 update bar set f2 = f2 + 100 returning *; --- 6962,7026 7 | 277 (6 rows) + -- Test that UPDATE/DELETE with inherited target works with row-level triggers + CREATE TRIGGER trig_row_before + BEFORE UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + CREATE TRIGGER trig_row_after + AFTER UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + explain (verbose, costs off) + update bar set f2 = f2 + 100; + QUERY PLAN + -- + Update on public.bar +Update on public.bar +Foreign Update on public.bar2 + Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1 RETURNING f1, f2, f3 +-> Seq Scan on public.bar + Output: bar.f1, (bar.f2 + 100), bar.ctid +-> Foreign Scan on public.bar2 + Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, bar2.* + Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE + (9 rows) + + update bar set f2 = f2 + 100; + NOTICE: trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2 + NOTICE: OLD: (3,333,33),NEW: (3,433,33) + NOTICE: trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2 + NOTICE: OLD: (4,344,44),NEW: (4,444,44) + NOTICE: trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2 + NOTICE: OLD: (7,277,77),NEW: (7,377,77) + NOTICE: trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2 + NOTICE: OLD: (3,333,33),NEW: (3,433,33) + NOTICE: trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2 + NOTICE: OLD: (4,344,44),NEW: (4,444,44) + NOTICE: trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2 + NOTICE: OLD: (7,277,77),NEW: (7,377,77) + explain (verbose, costs off) + delete from bar where f2 < 400; + QUERY PLAN + - + Delete on public.bar +Delete on public.bar +Foreign Delete on public.bar2 + Remote SQL: DELETE FROM public.loct2 WHERE ctid = $1 RETURNING f1, f2, f3 +-> Seq Scan on public.bar + Output: bar.ctid + Filter: (bar.f2 < 400) +-> Foreign Scan on public.bar2 + Output: bar2.ctid, bar2.* + Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 400)) FOR UPDATE + (10 rows) + + delete from bar where f2 < 400; + NOTICE: trig_row_before(23, skidoo) BEFORE ROW DELETE ON bar2 + NOTICE: OLD: (7,377,77) + NOTICE: trig_row_after(23, skidoo) AFTER ROW DELETE ON bar2 + NOTICE: OLD: (7,377,77) + -- cleanup + DROP TRIGGER trig_row_before ON bar2; + DROP TRIGGER trig_row_after ON bar2; drop table foo cascade; NOTICE: drop cascades to foreign table foo2 drop table bar cascade; *** a/contrib/postgres_fdw/sql/postgres_fdw.sql --- b/contrib/postgres_fdw/sql/postgres_fdw.sql *** *** 1632,1637 explain (verbose, costs off) --- 1632,1657 update bar set f2 = f2 + 100 returning *; update bar set f2 = f2 + 100 returning *; + -- Test that UPDATE/DELETE with inherited target works with row-level triggers + CREATE TRIGGER trig_row_before + BEFORE UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + + CREATE TRIGGER trig_row_after + AFTER UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + + explain (verbose, costs off) + update bar set f2 = f2 + 100; + update bar set f2 = f2 + 100; + + explain (verbose, costs off) + delete from bar where f2 < 400; + delete from bar where f2 < 400; + + -- cleanup + DROP TRIGGER trig_row_before ON bar2; + DROP TRIGGER trig_row_after ON bar2; drop table foo cascade; drop table bar cascade; drop table loct1; *** a/doc/src/sgml/fdwhandler.sgml --- b/doc/src/sgml/fdwhandler.sgml *** *** 428,438 AddForeignUpdateTargets (Query *parsetree, Avoid using names matching ctidN, wholerow, or wholerowN, as the core system can ! generate junk columns of these names.
Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location
On Fri, Jul 21, 2017 at 10:31 PM, Peter Geoghegan wrote: > On Wed, Aug 17, 2016 at 7:54 PM, Claudio Freire > wrote: >> The attached patch tries to maintain the initial status of B-Tree >> indexes, which are created with equal-key runs in physical order, >> during the whole life of the B-Tree, and make key-tid pairs >> efficiently searchable in the process. > > I don't see an entry for this in the next CF. Do you have a plan for it? > > BTW, I did post some remarks on this patch on another thread recently > [1]. Not sure if any of what I said there is news to you at this > point. > > [1] > postgr.es/m/CAH2-Wzn=6Lc3OVA88x=E6SKG72ojNUE6ut6RZAqNnQx-YLcw=q...@mail.gmail.com > -- > Peter Geoghegan I plan to restart work on it soonishly, but ATM most of my time is dedicated to the vacuum patch, which is almost done. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c
On Sun, Jul 23, 2017 at 8:32 AM, Tom Lane wrote: > Meanwhile, it's still pretty unclear what happened yesterday on > culicidae. That failure is indeed baffling. The only code that inserts (HASH_ENTER[_NULL]) into PredicateLockTargetHash: 1. CreatePredicateLock(). I would be a bug if that ever tried to insert a { 0, 0, 0, 0 } tag, and in any case it holds SerializablePredicateLockListLock in LW_SHARED. 2. TransferPredicateLocksToNewTarget(), which removes and restores the scratch entry and also explicitly inserts a transferred entry. It asserts that it holds SerializablePredicateLockListLock and is called only by PredicateLockPageSplit() which acquires it in LW_EXCLUSIVE. 3. DropAllPredicateLocksFromTable(), which removes and restores the scratch entry and also explicitly inserts a transferred entry. Acquires SerializablePredicateLockListLock in LW_EXCLUSIVE. I wondered if DropAllPredicateLocksFromTable() had itself inserted a tag that accidentally looks like the scratch tag in between removing and restoring, perhaps because the relation passed in had a bogus 0 DB OID etc, but it constructs a tag with SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag, dbId, heapId) which sets locktag_field3 to InvalidBlockNumber == -1, not 0 so that can't explain it. I wondered if a concurrent PredicateLockPageSplit() called TransferPredicateLocksToNewTarget() using a newtargettag built from a Relation that somehow had a bogus relation with DB OID 0, rel OID 0 and newblkno 0, but that doesn't help because SerializablePredicateLockListLock is acquired at LW_EXCLUSIVE so it can't run concurrently. It looks a bit like something at a lower level needs to be broken (GCC 6.3 released 6 months ago, maybe interacts badly with some clever memory model-dependent code of ours?) or something needs to be trashing memory. Here's the set of tests that ran concurrently with select_into, whose backtrace we see ("DROP SCHEMA selinto_schema CASCADE;"): parallel group (20 tests): select_distinct_on delete select_having random btree_index select_distinct namespace update case hash_index select_implicit subselect select_into arrays prepared_xacts transactions portals aggregates join union Of those I see that prepared_xacts, portals and transactions explicitly use SERIALIZABLE (which may or may not be important). I wonder if the thing to do here is to run selinto (or maybe just its setup and tear-down, "DROP SCHEMA ...") concurrently with those others in tight loops and burn some CPU. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Hello, I changed the test to run for 6 hours at a time regardless of number of transactions. I also changed the du command to only look at the database (previously wal logs were included). This is the clearest indication of the problem I have been able to produce. Again, this is with 128 clients and 500 warehouses. The first test is a clean test, everything dropped, vacuumed etc... Each subsequent test is just starting the test again to have breakpoints. -+--- autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 12 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 10 autovacuum_vacuum_cost_delay| 0 autovacuum_vacuum_cost_limit| 5000 autovacuum_vacuum_scale_factor | 0.1 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | -1 max_wal_size| 640 checkpoint_timeout | 86400 checkpoint_completion_target| 0.5 Starting base metric 50G /srv/main/base Test 1: 90G /srv/main/base TPS: 838 Test 2: 121G/srv/main/base TPS: 725 Test 3: 146G/srv/main/base TPS: 642 Test 4: 171G/srv/main/base TPS: 549 Test 5: 189G/srv/main/base TPS: 489 Test 6: 208G/srv/main/base TPS: 454 As you can see even with aggressive vacuuming, over a period of 36 hours life gets increasingly miserable. The largest table is: postgres=# select pg_size_pretty(pg_total_relation_size('bmsql_order_line')); pg_size_pretty 148 GB (1 row) postgres=# \d bmsql_order_line Table "public.bmsql_order_line" Column |Type | Modifiers +-+--- ol_w_id| integer | not null ol_d_id| integer | not null ol_o_id| integer | not null ol_number | integer | not null ol_i_id| integer | not null ol_delivery_d | timestamp without time zone | ol_amount | numeric(6,2)| ol_supply_w_id | integer | ol_quantity| integer | ol_dist_info | character(24) | Indexes: "bmsql_order_line_pkey" PRIMARY KEY, btree (ol_w_id, ol_d_id, ol_o_id, ol_number) Foreign-key constraints: "ol_order_fkey" FOREIGN KEY (ol_w_id, ol_d_id, ol_o_id) REFERENCES bmsql_oorder(o_w_id, o_d_id, o_id) "ol_stock_fkey" FOREIGN KEY (ol_supply_w_id, ol_i_id) REFERENCES bmsql_stock(s_w_id, s_i_id) With the PK being postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey')); pg_size_pretty 48 GB (1 row) I tried to see how much data we are dealing with here: postgres=# select count(*) from bmsql_order_line; count --- 910324839 (1 row) Time: 503965.767 ms And just to show that we were pushing to get these numbers: avg-cpu: %user %nice %system %iowait %steal %idle 2.380.002.201.980.00 93.44 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sdb2027.40 239.99 0.05 1199 0 sda 0.80 0.00 0.01 0 0 So we have 910M rows, and it took 8.39941667 minutes to count them at 240MB/s. I know this is a lot of data and as I said previously, happy to let anyone look at it. However, we clearly have something deeper to look into. Thanks in advance, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item
I wrote: > Dima Pavlov writes: >> That's my first patch so I will be grateful for constructive criticism. > I think it would be better to do this in the planner, see specifically > eval_const_expressions. BTW, currently eval_const_expressions doesn't know anything about ScalarArrayOpExpr, but there's a patch pending to improve that: https://commitfest.postgresql.org/14/1136/ You should probably build your revised patch as a follow-on to that one, else we're going to have merge conflicts. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item
Dima Pavlov writes: > The problem was discussed on stackoverflow: > https://stackoverflow.com/questions/45061966/index-usage-with-single-item-anyarray Usually, we're not very happy with submissions that reference external pages for justification; we like to have all the relevant material in our mail archives. > That's my first patch so I will be grateful for constructive criticism. I think it would be better to do this in the planner, see specifically eval_const_expressions. That would allow the transformation to succeed in more cases, like where the array is coming from a parameter rather than an ARRAY[] construct. That is, you should be able to do this not only for ARRAY[x] but for any single-element constant array. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testlib.pm vs msys
Andrew Dunstan writes: > It turns out I was wrong about the problem jacana has been having with > the pg_ctl tests hanging. The problem was not the use of select as a > timeout mechanism, although I think the change to using > Time::Hires::usleep() is correct and shouldn't be reverted. > The problem is command_like's use of redirection to strings. Why this > should be a problem for this particular use is a matter of speculation. > I suspect it's to do with the fact that in this instance pg_ctl is > leaving behind some child processes (i.e. postmaster and children) after > it exits, and so on this particular path IPC::Run isn't detecting the > exit properly. The workaround I have found to work is to redirect > command_like's output instead to a couple of files and then slurp in > those files and delete them. A bit hacky, I know, so I'm open to other > suggestions. Yeah, I'd been eyeing that behavior of IPC::Run a month or so back, though from the opposite direction. If you are reading either stdout or stderr of the executed command into Perl, then it detects command completion by waiting till it gets EOF on those stream(s). If you are reading neither, then it goes into this wonky backoff behavior where it sleeps a bit and then checks waitpid(WNOHANG), with the value of "a bit" continually increasing until it reaches a fairly large value, half a second or a second (I forget). So you have potentially some sizable fraction of a second that's just wasted after command termination. I'd been able to make a small but noticeable improvement in the runtime of some of our TAP test suites by forcing the first behavior, ie reading stdout even if we were going to throw it away. So I'm not really that excited about going in the other direction ;-). It shouldn't matter much time-wise for short-lived commands, but it's disturbing if the EOF technique fails entirely for some cases. I looked at jacana's two recent pg_ctlCheck failures, and they both seem to have failed on this: command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data", '-l', "$TestLib::log_path/001_start_stop_server.log" ], qr/done.*server started/s, 'pg_ctl start'); That is redirecting the postmaster's stdout/stderr into a file, for sure, so the child processes shouldn't impact EOF detection AFAICS. It's also hard to explain this way why it only fails some of the time. I think we need to look at what the recent changes were in this area and try to form a better theory of why it's started to fail here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item
Hello, The problems I tried to solve here: 1. Improve perfomance for index search ANY(ARRAY[...]) condition with single item 2. I saw tons of users code like: if len(array) == 1: sql += '{}'.format(array[0]) else: sql += 'ANY(ARRAY[{}])'.format(array) So there will be less lines of code and it will be clearer. 3. Confusing moment that "IN" works well with single item, and "ANY(ARRAY[])" doesn't without any real reason. The problem was discussed on stackoverflow: https://stackoverflow.com/questions/45061966/index-usage-with-single-item-anyarray That's my first patch so I will be grateful for constructive criticism. --- CREATE TABLE public.t (id serial, a integer, b integer); INSERT INTO t(a, b) SELECT round(random()*1000), round(random()*1000) FROM generate_series(1, 100); CREATE INDEX "i_1" ON public.t USING btree (a, b); CREATE INDEX "i_2" ON public.t USING btree (b); --- If "a = 50" in the first query, everything is ok, appropriate index "i_1" is used: SELECT * FROM t WHERE a = 50 ORDER BY b LIMIT 1 "Limit (cost=0.42..4.03 rows=1 width=12) (actual time=0.085..0.085 rows=1 loops=1)" " Buffers: shared hit=1 read=3" " -> Index Scan using i_1 on t (cost=0.42..4683.12 rows=1300 width=12) (actual time=0.084..0.084 rows=1 loops=1)" "Index Cond: (a = 50)" "Buffers: shared hit=1 read=3" "Planning time: 0.637 ms" "Execution time: 0.114 ms" --- With "a IN (50)" result is the same: SELECT * FROM t WHERE a IN (50) ORDER BY b LIMIT 1 "Limit (cost=0.42..4.03 rows=1 width=12) (actual time=0.058..0.058 rows=1 loops=1)" " Buffers: shared hit=4" " -> Index Scan using i_1 on t (cost=0.42..4683.12 rows=1300 width=12) (actual time=0.056..0.056 rows=1 loops=1)" "Index Cond: (a = 50)" "Buffers: shared hit=4" "Planning time: 0.287 ms" "Execution time: 0.105 ms" --- The problem is when I try to use "a = ANY(ARRAY[50])". Wrong index "i_2" is used instead of "i_1" and execution time becomes x25 longer: SELECT * FROM t WHERE a = ANY(ARRAY[50]) ORDER BY b LIMIT 1 "Limit (cost=0.42..38.00 rows=1 width=12) (actual time=2.591..2.591 rows=1 loops=1)" " Buffers: shared hit=491 read=4" " -> Index Scan using i_2 on t (cost=0.42..48853.65 rows=1300 width=12) (actual time=2.588..2.588 rows=1 loops=1)" "Filter: (a = ANY ('{50}'::integer[]))" "Rows Removed by Filter: 520" "Buffers: shared hit=491 read=4" "Planning time: 0.251 ms" "Execution time: 2.627 ms" improve-single-item-array.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [GSOC][weekly report 7] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
In the last week, I focus on 1) Creating an independent skip list data structure and related interfaces. Now it has only two levels so that I don't have to modify too much existing code. But it is easy to be transformed into the data structure with any number of levels if necessary. Unfortunately, its performance is not good. In some cases, it's only 1/2 of original version. It reminded me that even conflict-tracking didn't consume too much CPU time, it was on the critical path and wrapped by a pair of lock acquiring and releasing. Slower conflicts tracking would result in more lock contentions, which would make the performance drop quickly. 2) Using some tricks to improve its performance. For example, I found if the length of the conflict list is smaller than 10, the original linked list is faster than the skip list. So I used it in a hybrid way: if the total conflicts are more than 10, using skip list; otherwise using linked list. Now, the performance is approximately equal to the original version in different benchmarks. But I don't found a case in which the new version is much faster. The patch is attached. So far, I have tried: 1) using hash table for conflict tracking. 2) reducing the global lock contention 3) using skip list for conflict tracking. But all of them did not improve the performance. So I'm a little confused now about what to do next. Could you please give me any suggestions? -- Sincerely Mengxing Liu skip-list-for-conflict-tracking.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Testlib.pm vs msys
It turns out I was wrong about the problem jacana has been having with the pg_ctl tests hanging. The problem was not the use of select as a timeout mechanism, although I think the change to using Time::Hires::usleep() is correct and shouldn't be reverted. The problem is command_like's use of redirection to strings. Why this should be a problem for this particular use is a matter of speculation. I suspect it's to do with the fact that in this instance pg_ctl is leaving behind some child processes (i.e. postmaster and children) after it exits, and so on this particular path IPC::Run isn't detecting the exit properly. The workaround I have found to work is to redirect command_like's output instead to a couple of files and then slurp in those files and delete them. A bit hacky, I know, so I'm open to other suggestions. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers