Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
> > On 19 July 2017 at 20:26, Neha Sharma> wrote: > >> Hi, >> >> I am getting FailedAssertion while executing the attached >> script.However,I am not able to produce the core dump for the same,the >> script runs in background and takes around a day time to produce the >> mentioned error. >> >> "TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(oldestXact, >> ShmemVariableCache->oldestXid))", File: "clog.c", Line: 683) >> 2017-07-19 01:16:51.973 GMT [27873] LOG: server process (PID 28084) was >> terminated by signal 6: Aborted >> 2017-07-19 01:16:51.973 GMT [27873] DETAIL: Failed process was running: >> autovacuum: VACUUM pg_toast.pg_toast_13029 (to prevent wraparound)" >> > > What are the starting conditions of your postgres instance? Does your script assume a newly initdb'd instance with no custom configuration? If not, what setup steps/configuration precede your script run? > well short of the 2-million mark. > Er, billion. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] <> join selectivity estimate question
On Fri, Jun 2, 2017 at 4:16 AM, Tom Lanewrote: > I don't think it does really. The thing about a <> semijoin is that it > will succeed unless *every* join key value from the inner query is equal > to the outer key value (or is null). That's something we should consider > to be of very low probability typically, so that the <> selectivity should > be estimated as nearly 1.0. If the regular equality selectivity > approaches 1.0, or when there are expected to be very few rows out of the > inner query, then maybe the <> estimate should start to drop off from 1.0, > but it surely doesn't move linearly with the equality selectivity. Ok, here I go like a bull in a china shop: please find attached a draft patch. Is this getting warmer? In the comment for JOIN_SEMI I mentioned a couple of refinements I thought of but my intuition was that we don't go for such sensitive and discontinuous treatment of stats; so I made the simplifying assumption that RHS always has more than 1 distinct value in it. Anti-join <> returns all the nulls from the LHS, and then it only returns other LHS rows if there is exactly one distinct non-null value in RHS and it happens to be that one. But if we make the same assumption I described above, namely that there are always at least 2 distinct values on the RHS, then the join selectivity is just nullfrac. -- Thomas Munro http://www.enterprisedb.com neqjoinsel-fix-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
On 19 July 2017 at 20:26, Neha Sharmawrote: > Hi, > > I am getting FailedAssertion while executing the attached script.However,I > am not able to produce the core dump for the same,the script runs in > background and takes around a day time to produce the mentioned error. > > "TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(oldestXact, > ShmemVariableCache->oldestXid))", File: "clog.c", Line: 683) > 2017-07-19 01:16:51.973 GMT [27873] LOG: server process (PID 28084) was > terminated by signal 6: Aborted > 2017-07-19 01:16:51.973 GMT [27873] DETAIL: Failed process was running: > autovacuum: VACUUM pg_toast.pg_toast_13029 (to prevent wraparound)" > It's not much of a stretch to assume this was likely introduced by ea42cc18c: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ea42cc18c35381f639d45628d792e790ff39e271 so I'll have a look to see if I can spot how it could happen. Please show your "SELECT version()" and if using a build from git, the "git rev-parse --short HEAD" for the build you're running. It's interesting and probably significant that your Pg was nearing wraparound. Just a tip for that - if you want to create a situation where you near wraparound quickly and easily, for testing, you can, on a test instance you don't care about: - select txid_current(); - stop pg - pg_resetxlog -n $(( $the_xid + (1<<31) - 1001000 )) - try to start pg, note the missing clog segment it complains about - dd if=/dev/zero bs=1 count=262144 of=datadir/pg_clog/$MISSINGSEGNAME - start Pg That should put you about 1000 txn's from the 1 million xid limit, assuming I got my maths right (don't assume that), and assuming your starting xid is well short of the 2-million mark. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Dealing with logical replication
On 19 July 2017 at 16:34, Tatsuo Ishiiwrote: > Now that we are going to have logical replication in PostgreSQL 10, I > have started thinking how Pgpool-II can deal with it. For example, the > logical replication does not replicate DDLs. Isn't it convenient for > users to do it automatically in Pgpool-II? Or even doing it for > TRUNCATE? > > Or are they against the design philosophy of the logical replication? > (Disclaimer - Petr Jelinek and Peter Eisentraut were the main ones working on in in-core logical rep, and I haven't had time to play with it much). TL;DR: a pooler can only solve a limited subset of the problem, fairly unreliably, and this is really something that needs further work in core. Not replicating TRUNCATE and schema changes is more an implementation limitation than anything else. It's desirable to get to the point where logical replication can transparently replicate DDL. There are some hurdles for doing so, but it'll be possible to get something in place that's good enough when time/release progress permits. Similarly, with TRUNCATE, AFAIK support just didn't make the cut for pg10. A pooler could well help in the mean time, but you have to consider ordering with care. For example, given "U" upstream, "D" downstream: U: CREATE TABLE x (a integer, b integer); D: CREATE TABLE x (a); -- user script syncs U: INSERT INTO x (a,b) VALUES (1,0); D: [applies INSERT 1,0] U: INSERT INTO x (a,b) VALUES (2,0); U: ALTER TABLE x DROP COLUMN b; D: ALTER TABLE x DROP COLUMN b; -- user script syncs U: INSERT INTO x (a) VALUES (3); D: [ERROR on INSERT of 2,0: no column 'b'] Because the DDL here is transported out of band vs the row data, you can easily create situations where the schema change is applied before the last committed-but-not-yet-replicated data from the upstream that was based on the old schema. To achieve correct ordering, the simplest approach is to record DDL in a table when you perform it on the upstream, and replay it when you see rows in that table appear on the downstream. You know it's safe to replay it now. This is the essence of what BDR and pglogical do with their DDL replication, but they handle DDL in the middle of transactions that also make row data changes by intercepting writes to the queue table and performing the DDL at the exact point in the transaction where it happened on the upstream. I don't think that's possible with the in-core logical replication yet, and certainly not something a pooler can do. To do it externally, you have to take note of when a schema change happened on the upstream and apply it on the downstream at or after the point where the downstream has replayed and confirmed up to the upstream lsn where the schema change happened. Then apply the schema change. A pooler trying to help here must also be very aware of the impact of multi-statements. If you send a single simple query message with a mixture of schema change commands and normal DML, you probably don't want to repeat the DML on downstream nodes or you'll get duplicate rows etc. But ... by unless it embeds transaction control commands, a simple query message executes in a single implicit transaction, so if you extract just the DDL you'll again have ordering issues of upstream vs downstream. There are even a variety of difficulties to overcome with doing it in core: event triggers don't capture ddl command text and have no facility to turn the internal command representation back into SQL command text, nor do we have any way to turn the internal representation back into a parsenode tree for execution on a downstream's standard_ProcessUtility. However, we can capture raw command text with ProcessUtility_hook now that we have byte-ranges for the query-parts of a multi-part query (yay!), and that works well enough if you also capture the active search_path and apply with the same search_path. It can match the wrong object if extra objects with the same name are present earlier on the search_path on the downstream than on the upstream, so it's not ideal, but that's a weird corner case. If we had a hook in the logical apply worker's insert or wal-message routines it'd be possible to write an extension to do this for pg10, but AFAICS we don't. So schema changes in logical replication currently require more care than in physical replication. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] psql's \r broken since e984ef5861d
On 20/07/2017 04:24, Tom Lane wrote: > I wrote: >> Ah. I don't feel like trawling the archives for the discussion right now, >> but I believe this was an intentional change to make the behavior more >> consistent. > > Oh ... a quick look in the commit log finds the relevant discussion: > https://www.postgresql.org/message-id/flat/9b4ea968-753f-4b5f-b46c-d7d3bf7c8f90%40manitou-mail.org > Oh I see. Thanks a lot, sorry for the noise. -- Julien Rouhaud -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pageinspect function to decode infomasks
On Wed, Jul 19, 2017 at 8:33 PM, Craig Ringerwrote: > That's silly, so here's a patch to teach pageinspect how to decode infomasks > to a human readable array of flag names. > > Example: > > SELECT t_infomask, t_infomask2, flags > FROM heap_page_items(get_raw_page('test1', 0)), > LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags); > t_infomask | t_infomask2 | flags > +-+ >2816 | 2 | > {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN} > (1 row) Seems like a good idea to me. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pageinspect function to decode infomasks
On 20 July 2017 at 11:33, Craig Ringerwrote: > Hi > > Whenever I'm debugging some kind of corruption incident, possible > visibility bug, etc, I always land up staring at integer infomasks or using > a SQL helper function to decode them. > > That's silly, so here's a patch to teach pageinspect how to decode > infomasks to a human readable array of flag names. > > Example: > > SELECT t_infomask, t_infomask2, flags > FROM heap_page_items(get_raw_page('test1', 0)), > LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags); > t_infomask | t_infomask2 | flags > > +-+- > --- >2816 | 2 | {HEAP_XMIN_COMMITTED,HEAP_ > XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN} > (1 row) > > > To decode individual mask integers you can just call it directly. It's > strict, so pass 0 for the other mask if you don't have both, e.g. > > SELECT heap_infomask_flags(2816, 0); > > The patch backports easily to older pageinspect versions for when you're > debugging something old. > > BTW, I used text[] not enums. That costs a fair bit of memory, but it > doesn't seem worth worrying too much about in this context. > > For convenience it also tests and reports HEAP_LOCKED_UPGRADED and > HEAP_XMAX_IS_LOCKED_ONLY as pseudo-flags. > > I decided not to filter out > HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID > when HEAP_XMIN_FROZEN is set > Er, decided not to filter out HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID. Obviously wouldn't filter out HEAP_XMAX_INVALID, that was a copy-paste'o. I wonder if it's worth dropping the HEAP_ prefix. Meh, anyway, usable as-is. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
[HACKERS] [PATCH] pageinspect function to decode infomasks
Hi Whenever I'm debugging some kind of corruption incident, possible visibility bug, etc, I always land up staring at integer infomasks or using a SQL helper function to decode them. That's silly, so here's a patch to teach pageinspect how to decode infomasks to a human readable array of flag names. Example: SELECT t_infomask, t_infomask2, flags FROM heap_page_items(get_raw_page('test1', 0)), LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags); t_infomask | t_infomask2 | flags +-+ 2816 | 2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN} (1 row) To decode individual mask integers you can just call it directly. It's strict, so pass 0 for the other mask if you don't have both, e.g. SELECT heap_infomask_flags(2816, 0); The patch backports easily to older pageinspect versions for when you're debugging something old. BTW, I used text[] not enums. That costs a fair bit of memory, but it doesn't seem worth worrying too much about in this context. For convenience it also tests and reports HEAP_LOCKED_UPGRADED and HEAP_XMAX_IS_LOCKED_ONLY as pseudo-flags. I decided not to filter out HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID when HEAP_XMIN_FROZEN is set; that doesn't make sense when we examine HEAP_XMAX_IS_LOCKED_ONLY or HEAP_LOCKED_UPGRADED, and filtering them out could be just as confusing as leaving them in. The infomask2 natts mask is ignored. You can bitwise-and it out in SQL pretty easily if needed. I could output it here as a constructed text datum, but it seems mostly pointless. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 488a1f69b8082258d508ba681a4f4a5f6fce2267 Mon Sep 17 00:00:00 2001 From: Craig RingerDate: Thu, 20 Jul 2017 11:20:21 +0800 Subject: [PATCH v1] Introduce heap_infomask_flags to decode infomask and infomask2 --- contrib/pageinspect/Makefile | 3 +- contrib/pageinspect/expected/page.out | 25 ++ contrib/pageinspect/heapfuncs.c | 120 ++ contrib/pageinspect/pageinspect--1.6--1.7.sql | 9 ++ contrib/pageinspect/pageinspect.control | 2 +- contrib/pageinspect/sql/page.sql | 14 +++ doc/src/sgml/pageinspect.sgml | 32 +++ 7 files changed, 203 insertions(+), 2 deletions(-) create mode 100644 contrib/pageinspect/pageinspect--1.6--1.7.sql diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index 0a3cbee..de114c7 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -5,7 +5,8 @@ OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \ brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) EXTENSION = pageinspect -DATA = pageinspect--1.5.sql pageinspect--1.5--1.6.sql \ +DATA = pageinspect--1.6--1.7.sql \ + pageinspect--1.5.sql pageinspect--1.5--1.6.sql \ pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 8e15947..054c69d 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -82,6 +82,31 @@ SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0)); (1 row) +-- If we freeze the only tuple on test1, the infomask should +-- always be the same in all test runs. +VACUUM FREEZE test1; +SELECT t_infomask, t_infomask2, flags +FROM heap_page_items(get_raw_page('test1', 0)), + LATERAL heap_infomask_flags(t_infomask, t_infomask2, true) m(flags); + t_infomask | t_infomask2 | flags ++-+ + 2816 | 2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN} +(1 row) + +SELECT t_infomask, t_infomask2, flags +FROM heap_page_items(get_raw_page('test1', 0)), + LATERAL heap_infomask_flags(t_infomask, t_infomask2, false) m(flags); + t_infomask | t_infomask2 | flags ++-+--- + 2816 | 2 | {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID} +(1 row) + +SELECT heap_infomask_flags(2816, 0); +heap_infomask_flags +--- + {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID} +(1 row) + DROP TABLE test1; -- check that using any of these functions with a partitioned table would fail create table
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Thu, Jul 20, 2017 at 2:02 PM, Robert Haaswrote: > On Wed, Jul 19, 2017 at 7:45 PM, Thomas Munro > wrote: >> Isn't this the same as the issue reported here? >> >> https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com > > Hmm, possibly. But why would that affect the partition-wise join case only? It doesn't. From Rafia's part_reg.zip we see a bunch of rows=1 that turn out to be wrong by several orders of magnitude: 21_nopart_head.out: Hash Semi Join (cost=5720107.25..9442574.55 rows=1 width=50) 21_part_head.out:Hash Semi Join (cost=5423094.06..8847638.36 rows=1 width=38) 21_part_patched.out: Hash Semi Join (cost=309300.53..491665.60 rows=1 width=12) My guess is that the consequences of that bad estimate are sensitive to arbitrary other parameters moving around, as you can see from the big jump in execution time I showed in the that message, measured on unpatched master of the day: 4 workers = 9.5s 3 workers = 39.7s That's why why both parallel hash join and partition-wise join are showing regressions on Q21: it's just flip-flopping between various badly costed plans. Note that even without parallelism, the fix that Tom Lane suggested gives a much better plan: https://www.postgresql.org/message-id/CAEepm%3D11BiYUkgXZNzMtYhXh4S3a9DwUP8O%2BF2_ZPeGzzJFPbw%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Wed, Jul 19, 2017 at 7:57 PM, Tom Lanewrote: > Peter Geoghegan writes: >> My argument for the importance of index bloat to the more general >> bloat problem is simple: any bloat that accumulates, that cannot be >> cleaned up, will probably accumulate until it impacts performance >> quite noticeably. > > But that just begs the question: *does* it accumulate indefinitely, or > does it eventually reach a more-or-less steady state? Yes, I believe it does reach a more-or-less steady state. It saturates when there is a lot of contention, because then you actually can reuse the bloat. If it didn't saturate, and instead became arbitrarily bad, then we'd surely have heard about that before now. The bloat is not entirely wasted, because it actually prevents you from getting even more bloat in that part of the keyspace. > The traditional > wisdom about btrees, for instance, is that no matter how full you pack > them to start with, the steady state is going to involve something like > 1/3rd free space. You can call that bloat if you want, but it's not > likely that you'll be able to reduce the number significantly without > paying exorbitant costs. For the purposes of this discussion, I'm mostly talking about duplicates within a page on a unique index. If the keyspace owned by an int4 unique index page only covers 20 distinct values, it will only ever cover 20 distinct values, now and forever, despite the fact that there is room for about 400 (a 90/10 split leaves you with 366 items + 1 high key). I don't know if I should really even call this bloat, since the term is so overloaded, although this is what other database systems call index bloat. I like to think of it as "damage to the keyspace", although that terminology seems unlikely to catch on. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Peter Geogheganwrites: > My argument for the importance of index bloat to the more general > bloat problem is simple: any bloat that accumulates, that cannot be > cleaned up, will probably accumulate until it impacts performance > quite noticeably. But that just begs the question: *does* it accumulate indefinitely, or does it eventually reach a more-or-less steady state? The traditional wisdom about btrees, for instance, is that no matter how full you pack them to start with, the steady state is going to involve something like 1/3rd free space. You can call that bloat if you want, but it's not likely that you'll be able to reduce the number significantly without paying exorbitant costs. I'm not claiming that we don't have any problems, but I do think it's important to draw a distinction between bloat and normal operating overhead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Thu, Jul 20, 2017 at 8:24 AM, Stephen Frostwrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> "Joshua D. Drake" writes: >> > At PGConf US Philly last week I was talking with Jim and Jan about >> > performance. One of the items that came up is that PostgreSQL can't run >> > full throttle for long periods of time. The long and short is that no >> > matter what, autovacuum can't keep up. This is what I have done: >> >> Try reducing autovacuum_vacuum_cost_delay more, and/or increasing >> autovacuum_vacuum_cost_limit. > > Or get rid of the cost delay entirely and let autovacuum actually go as > fast as it can when it's run. The assertion that it can't keep up is > still plausible, but configuring autovacuum to sleep regularly and then > complaining that it's not able to keep up doesn't make sense. > > Reducing the nap time might also be helpful if autovacuum is going as > fast as it can and it's able to clear a table in less than a minute. > > There have been discussions on this list about parallel vacuum of a > particular table as well; to address this issue I'd encourage reviewing > those discussions and looking at writing a patch to implement that > feature as that would address the case where the table is large enough > that autovacuum simply can't get through all of it before the other > backends have used all space available and then substantially increased > the size of the relation (leading to vacuum on the table running for > longer). Yeah, the parallel vacuum of a particular table might help this issue unless disk I/O is bottle-neck. I'm planning work on that. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
On 2017/07/19 23:36, Tom Lane wrote: Etsuro Fujitawrites: * Modified rewrite_targetlist(), which is a new function added to preptlist.c, so that we do const-simplification to junk TLEs that AddForeignUpdateTargets() added, as that API allows the FDW to add junk TLEs containing non-Var expressions to the query's targetlist. This does not seem like a good idea to me. eval_const_expressions is not a cheap thing, and for most use-cases those cycles will be wasted, and it has never been the responsibility of preprocess_targetlist to do this sort of thing. Hm, I added that const-simplification to that function so that the existing FDWs that append junk TLEs that need const-simplification, which I don't know really exist, would work well for this fix, without any changes, but I agree on that point. Please put the responsibility of doing const-expression simplification in these cases somewhere closer to where the problem is being created. It would be reasonable that it's the FDW's responsibility to do that const-simplification if necessary? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql's \r broken since e984ef5861d
I wrote: > Ah. I don't feel like trawling the archives for the discussion right now, > but I believe this was an intentional change to make the behavior more > consistent. Oh ... a quick look in the commit log finds the relevant discussion: https://www.postgresql.org/message-id/flat/9b4ea968-753f-4b5f-b46c-d7d3bf7c8f90%40manitou-mail.org regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Wed, Jul 19, 2017 at 3:54 PM, Alvaro Herrerawrote: >> Index bloat is a general problem that B-Trees have in all other major >> systems, but I think that PostgreSQL has a tendency to allow indexes >> to become progressively more bloated over time, in a way that it often >> can never recover from [1]. > > Interesting assertion. I don't pretend to understand the complicated feedback loops that may exist for workloads that are very reliant on pruning, kill_prior_tuples cleanup, very aggressive vacuuming, etc. I'll just say that it seems very reasonable to suppose that they exist. My argument for the importance of index bloat to the more general bloat problem is simple: any bloat that accumulates, that cannot be cleaned up, will probably accumulate until it impacts performance quite noticeably. Index bloat may not seem as important as I suggest to some. The ideas I'm talking about were received somewhat skeptically at pgCon, when I brought them up informally. Maybe this is a more cogent explanation: if your indexes were somehow magically never bloated, but the heap could become just as bloated, then it might not matter that much simply because the heap pages wouldn't be accessed by index scans. Heap page accesses may demonstrably be the bottleneck today, without that magic in place, but perhaps only because indexes point to the bloat in the first place. It could be a bit like the situation with bloating UNDO segments in Oracle; those are not the worst thing to have to bloat. And, the kill_prior_tuples stuff is only as strong as the weakest link (oldest snapshot). > Many years ago I set to implement btree page > merging[1] from ideas in a 1996 paper[2], though that work never saw the > light of day. Maybe it can be valuable now. Lots of other database systems don't implement B-Tree page merging, because it's hard to make work with techniques like Lehman & Yao B-Link trees, and the average case space utilization still ends up being good enough. It may be more worthwhile for us, though. Apparently Kevin has some ideas here. If I'm right about this index bloat issue, then I'd sooner tackle the problem by preventing bloat in unique indexes in the fist place, by using some additional indirection, a topic that I've said plenty about recently. I think that you can sometimes, though fairly rarely, see cases that get *really* out of hand, but with things like in-database queues, that have unusually aggressive update patterns where a great many duplicates are generated in indexes [1]. Our handling of duplicates in unique indexes [2] is surely a problem there. > Another thing to consider is indirect indexes, wherein you get less heap > bloat because more updates can become HOT. I think that the stuff I'm talking about, about having indirection for the primary key and making sure unique indexes actually disallow even physical duplicates actually enables indirect indexes. Remember how I pointed out issues with unique indexes and VACUUM when you presented on it at the pgCon unconference? I think that those problems may be solvable through centralizing everything by making duplicates within leaf pages physically impossible for unique indexes (not just logically impossible). One chokepoint for dealing with bloat cleanup for every index is the entire point of your indirect index design, but that requires that you actually have no ambiguity about what every logical pointer (in every secondary index) points to. > If we also allow heap to be pruned from line pointers by walking indexes > to remove specific pointers, instead of requiring indexes to be scanned > whole for the removal, as proposed by Andres, we could get further > along. Yeah. I talked to Robert about doing that just today. That seems very worthwhile, no matter how it ends up being used (whether it's by VACUUM, something closer to an synchronous deletion, or whatever). [1] https://brandur.org/postgres-queues [2] https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql's \r broken since e984ef5861d
Julien Rouhaudwrites: > On 20/07/2017 03:34, Tom Lane wrote: >> Works for me. Please describe exactly what misbehavior you're seeing. > Here's a simple test case, last \p still show the query buffer: Ah. I don't feel like trawling the archives for the discussion right now, but I believe this was an intentional change to make the behavior more consistent. Prior versions did things weirdly differently depending on whether you'd typed anything, eg modifying your example slightly: regression=# select version(); version -- PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit (1 row) regression=# \p select version(); regression=# mistake regression-# \r Query buffer reset (cleared). regression=# \p select version(); regression=# \g version -- PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit (1 row) I think we felt that throwing away the previous-query buffer when we didn't have to was generally to be avoided, so we wanted to standardize on this behavior not the other one. Do you think differently? I have some recollection that there were also cases where \p would print something different than what \g would execute, which of course is quite nasty. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Wed, Jul 19, 2017 at 7:45 PM, Thomas Munrowrote: > On Thu, Jul 20, 2017 at 7:00 AM, Robert Haas wrote: >> I think the problem is that the row count estimates for the child >> joins seem to be totally bogus: >> >> -> Hash Semi Join (cost=309300.53..491665.60 rows=1 width=12) >> (actual time=10484.422..15945.851 rows=1523493 loops=3) >> Hash Cond: (l1.l_orderkey = l2.l_orderkey) >> Join Filter: (l2.l_suppkey <> l1.l_suppkey) >> Rows Removed by Join Filter: 395116 >> >> That's clearly wrong. In the un-partitioned plan, the join to l2 >> produces about as many rows of output as the number of rows that were >> input (998433 vs. 962909); but here, a child join with a million rows >> as input is estimated to produce only 1 row of output. I bet the >> problem is that the child-join's row count estimate isn't getting >> initialized at all, but then something is clamping it to 1 row instead >> of 0. >> >> So this looks like a bug in Ashutosh's patch. > > Isn't this the same as the issue reported here? > > https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com Hmm, possibly. But why would that affect the partition-wise join case only? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql's \r broken since e984ef5861d
On 20/07/2017 03:34, Tom Lane wrote: > Julien Rouhaudwrites: >> Unless I miss something, \r isn't working anymore, > > Works for me. Please describe exactly what misbehavior you're seeing. > What libreadline or libedit version are you using? > I have libreadline 7.0_p3. Here's a simple test case, last \p still show the query buffer: psql -X postgres postgres=# select version(); version PostgreSQL 10beta2@decb08ebdf on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 4.9.3 p1.5, pie-0.6.4) 4.9.3, 64-bit (1 row) postgres=# \p select version(); postgres=# \r Query buffer reset (cleared). postgres=# \p select version(); On a 9.6: postgres=# select version(); version -- PostgreSQL 9.6.3@3c017a545f on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 4.9.3 p1.5, pie-0.6.4) 4.9.3, 64-bit (1 row) postgres=# \p select version(); postgres=# \r Query buffer reset (cleared). postgres=# \p Query buffer is empty. -- Julien Rouhaud -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql's \r broken since e984ef5861d
Julien Rouhaudwrites: > Unless I miss something, \r isn't working anymore, Works for me. Please describe exactly what misbehavior you're seeing. What libreadline or libedit version are you using? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql's \r broken since e984ef5861d
Hello, Unless I miss something, \r isn't working anymore, since exec_command_print() fallback to display previous_buf if query_buf has been freed. Trivial patch to fix issue (free both buffers in exec_command_reset()) attached. Regards. -- Julien Rouhaud diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 14c64208ca..4087532052 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -109,7 +109,7 @@ static backslashResult exec_command_prompt(PsqlScanState scan_state, bool active static backslashResult exec_command_pset(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_quit(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_reset(PsqlScanState scan_state, bool active_branch, - PQExpBuffer query_buf); + PQExpBuffer query_buf, PQExpBuffer previous_buf); static backslashResult exec_command_s(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_set(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_setenv(PsqlScanState scan_state, bool active_branch, @@ -369,7 +369,7 @@ exec_command(const char *cmd, else if (strcmp(cmd, "q") == 0 || strcmp(cmd, "quit") == 0) status = exec_command_quit(scan_state, active_branch); else if (strcmp(cmd, "r") == 0 || strcmp(cmd, "reset") == 0) - status = exec_command_reset(scan_state, active_branch, query_buf); + status = exec_command_reset(scan_state, active_branch, query_buf, previous_buf); else if (strcmp(cmd, "s") == 0) status = exec_command_s(scan_state, active_branch); else if (strcmp(cmd, "set") == 0) @@ -2060,11 +2060,12 @@ exec_command_quit(PsqlScanState scan_state, bool active_branch) */ static backslashResult exec_command_reset(PsqlScanState scan_state, bool active_branch, - PQExpBuffer query_buf) + PQExpBuffer query_buf, PQExpBuffer previous_buf) { if (active_branch) { resetPQExpBuffer(query_buf); + resetPQExpBuffer(previous_buf); psql_scan_reset(scan_state); if (!pset.quiet) puts(_("Query buffer reset (cleared).")); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Thu, Jul 20, 2017 at 7:00 AM, Robert Haaswrote: > I think the problem is that the row count estimates for the child > joins seem to be totally bogus: > > -> Hash Semi Join (cost=309300.53..491665.60 rows=1 width=12) > (actual time=10484.422..15945.851 rows=1523493 loops=3) > Hash Cond: (l1.l_orderkey = l2.l_orderkey) > Join Filter: (l2.l_suppkey <> l1.l_suppkey) > Rows Removed by Join Filter: 395116 > > That's clearly wrong. In the un-partitioned plan, the join to l2 > produces about as many rows of output as the number of rows that were > input (998433 vs. 962909); but here, a child join with a million rows > as input is estimated to produce only 1 row of output. I bet the > problem is that the child-join's row count estimate isn't getting > initialized at all, but then something is clamping it to 1 row instead > of 0. > > So this looks like a bug in Ashutosh's patch. Isn't this the same as the issue reported here? https://www.postgresql.org/message-id/flat/CAEepm%3D270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
* Tom Lane (t...@sss.pgh.pa.us) wrote: > "Joshua D. Drake"writes: > > At PGConf US Philly last week I was talking with Jim and Jan about > > performance. One of the items that came up is that PostgreSQL can't run > > full throttle for long periods of time. The long and short is that no > > matter what, autovacuum can't keep up. This is what I have done: > > Try reducing autovacuum_vacuum_cost_delay more, and/or increasing > autovacuum_vacuum_cost_limit. Or get rid of the cost delay entirely and let autovacuum actually go as fast as it can when it's run. The assertion that it can't keep up is still plausible, but configuring autovacuum to sleep regularly and then complaining that it's not able to keep up doesn't make sense. Reducing the nap time might also be helpful if autovacuum is going as fast as it can and it's able to clear a table in less than a minute. There have been discussions on this list about parallel vacuum of a particular table as well; to address this issue I'd encourage reviewing those discussions and looking at writing a patch to implement that feature as that would address the case where the table is large enough that autovacuum simply can't get through all of it before the other backends have used all space available and then substantially increased the size of the relation (leading to vacuum on the table running for longer). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] JSONB - JSONB operator feature request
On Tue, Jul 18, 2017 at 12:49 PM, David Fetterwrote: > On Tue, Jul 18, 2017 at 01:36:32PM +0200, david.tu...@linuxbox.cz wrote: >> Hi, >> >> some users and me used hstore - hstore for example storing only changed >> rows in trigger like: >> >> hsore(NEW) - hstore(OLD) >> >> There isn't same operator/function in JSON/JSONB. We can only remove keys >> from JSONB, but not equal key-value pairs. Is there any chance to have >> same feature with JSON/JSONB in postgres core? > > Here's one slightly modified from > http://coussej.github.io/2016/05/24/A-Minus-Operator-For-PostgreSQLs-JSONB/ > > CREATE OR REPLACE FUNCTION jsonb_minus ( arg1 jsonb, arg2 jsonb ) > RETURNS jsonb > LANGUAGE sql > AS $$ > SELECT > COALESCE(json_object_agg( > key, > CASE > -- if the value is an object and the value of the second argument > is > -- not null, we do a recursion > WHEN jsonb_typeof(value) = 'object' AND arg2 -> key IS NOT NULL > THEN jsonb_minus(value, arg2 -> key) > -- for all the other types, we just return the value > ELSE value > END > ), '{}')::jsonb > FROM > jsonb_each(arg1) > WHERE > arg1 -> key IS DISTINCT FROM arg2 -> key > $$; > > CREATE OPERATOR - ( > PROCEDURE = jsonb_minus, > LEFTARG = jsonb, > RIGHTARG = jsonb > ); > > I suspect that there's a faster way to do the jsonb_minus function > internally. yes, please! I also sorely miss the hstore 'slice' function which is very similar. The main remaining disadvantage with jsonb WRT to hstore is that you can't do simple retransformations that these operations allow for. Too often you end up doing multiple '->' operations against the same object followed by a rebundling which is a real performance killer. I understand that there are more edge cases due the flexible json structure but I'd be quite happy returning NULL or erroring when you can't arrive at a sensible extraction. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On 07/19/2017 03:29 PM, Tom Lane wrote: > "Joshua D. Drake"writes: >> At PGConf US Philly last week I was talking with Jim and Jan about >> performance. One of the items that came up is that PostgreSQL can't run >> full throttle for long periods of time. The long and short is that no >> matter what, autovacuum can't keep up. This is what I have done: > > Try reducing autovacuum_vacuum_cost_delay more, and/or increasing > autovacuum_vacuum_cost_limit. I would try autovacuum_vacuum_cost_delay = 0 and for any tables > 1 million rows: autovacuum_vacuum_scale_factor: 0 autovacuum_vacuum_threshold: 10 (perhaps even smaller) Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Peter Geoghegan wrote: > Index bloat is a general problem that B-Trees have in all other major > systems, but I think that PostgreSQL has a tendency to allow indexes > to become progressively more bloated over time, in a way that it often > can never recover from [1]. Interesting assertion. Many years ago I set to implement btree page merging[1] from ideas in a 1996 paper[2], though that work never saw the light of day. Maybe it can be valuable now. Another thing to consider is indirect indexes, wherein you get less heap bloat because more updates can become HOT. If we also allow heap to be pruned from line pointers by walking indexes to remove specific pointers, instead of requiring indexes to be scanned whole for the removal, as proposed by Andres, we could get further along. [1] https://www.postgresql.org/message-id/20020912235429.4714071a.alvhe...@atentus.com [2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9961 -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
"Joshua D. Drake"writes: > At PGConf US Philly last week I was talking with Jim and Jan about > performance. One of the items that came up is that PostgreSQL can't run > full throttle for long periods of time. The long and short is that no > matter what, autovacuum can't keep up. This is what I have done: Try reducing autovacuum_vacuum_cost_delay more, and/or increasing autovacuum_vacuum_cost_limit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
On Wed, Jul 19, 2017 at 3:11 PM, Joshua D. Drakewrote: > The good news is, PostgreSQL is not doing half bad against 128 connections > with only 16vCPU. The bad news is we more than doubled our disk size without > getting reuse or bloat under control. The concern here is that under heavy > write loads that are persistent, we will eventually bloat out and have to > vacuum full, no matter what. I know that Jan has done some testing and the > best he could get is something like 8 days before PostgreSQL became unusable > (but don't quote me on that). > > I am open to comments, suggestions, running multiple tests with different > parameters or just leaving this in the archive for people to reference. Did you see my blogpost on planet PostgreSQL from last night? https://pgeoghegan.blogspot.com/2017/07/postgresql-index-bloat-microscope.html Perhaps you could use my query to instrument an interesting index, to see what that turns up. I would really like to get a better sense of how often and to what extent index bloat is a problem that VACUUM is just inherently incapable of keeping under control over time. The timeline for performance to degrade with real production workloads is very interesting to me. It's really hard to simulate certain types of problems that you will see frequently in production. Index bloat is a general problem that B-Trees have in all other major systems, but I think that PostgreSQL has a tendency to allow indexes to become progressively more bloated over time, in a way that it often can never recover from [1]. This may be a particular problem with unique indexes, where many physical duplicates accumulate in pages. Duplicates that are theoretically reclaimable, but due to how the keyspace is split up, will never actually be reclaimed [2]. [1] postgr.es/m/CAH2-Wzmf6intNY1ggiNzOziiO5Eq=DsXfeptODGxO=2j-i1...@mail.gmail.com [2] https://wiki.postgresql.org/wiki/Key_normalization#VACUUM_and_nbtree_page_deletion -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] autovacuum can't keep up, bloat just continues to rise
Hello, At PGConf US Philly last week I was talking with Jim and Jan about performance. One of the items that came up is that PostgreSQL can't run full throttle for long periods of time. The long and short is that no matter what, autovacuum can't keep up. This is what I have done: Machine: 16vCPU 59G Memory 10G SSD (/) 500G SSD /srv/main/9.6 (PGDATA) : 240MB Sustained with 15k IOPS * Yes, we really got 240MB sustained performance I used benchmarksql which is a tpc-c benchmark similar to pgbench but supposedly more thorough. https://sourceforge.net/projects/benchmarksql/ PostgreSQL 9.6 on Ubuntu 16.04 x64. postgresql.conf: max_connections: 1000 (just to keep it out of the way) shared_buffers: 32G (Awesome work Haas) work_mem: 32M maintenance_work_mem: 2G effective_io_concurrency: 1 * Before anybody suggests increasing this, on GCE over a dozen tests, anything but disabling this appears to be a performance hit of ~ 10% (I can reproduce those tests if you like on another thread). synchronous_commit: off checkpoint_timeout: 60min max_wal_size: 5G random_page_cost: 1 effective_cache_size: 32GB *this probably should be more like 50 but still autovacuum_max_workers: 12 * One for each table + a couple for system tables autovacuum_vacuum_scale_factor: 0.1 autovacuum_cost_delay: 5ms Here are the benchmarksql settings for all 4 runs: 17:07:54,268 [main] INFO jTPCC : Term-00, warehouses=500 17:07:54,269 [main] INFO jTPCC : Term-00, terminals=128 17:07:54,272 [main] INFO jTPCC : Term-00, runTxnsPerTerminal=10 17:07:54,273 [main] INFO jTPCC : Term-00, limitTxnsPerMin=30 17:07:54,273 [main] INFO jTPCC : Term-00, terminalWarehouseFixed=false 17:07:54,274 [main] INFO jTPCC : Term-00, 17:07:54,274 [main] INFO jTPCC : Term-00, newOrderWeight=45 17:07:54,274 [main] INFO jTPCC : Term-00, paymentWeight=43 17:07:54,274 [main] INFO jTPCC : Term-00, orderStatusWeight=4 17:07:54,275 [main] INFO jTPCC : Term-00, deliveryWeight=4 17:07:54,275 [main] INFO jTPCC : Term-00, stockLevelWeight=4 For run 0, I started with: vacuumdb -U postgres -fz;./runBenchmark.sh my_postgres.properties And then for each subsequent run, I just ran the benchmark without the vacuum full so that PostgreSQL could prove us wrong. It didn't. Here is the break down of the results: RUN START DISK SIZE END DISK SIZE TPS/Terminal 0 54 78 868.6796875 1 78 91 852.4765625 2 91 103 741.4609375 3 103 116 686.125 The good news is, PostgreSQL is not doing half bad against 128 connections with only 16vCPU. The bad news is we more than doubled our disk size without getting reuse or bloat under control. The concern here is that under heavy write loads that are persistent, we will eventually bloat out and have to vacuum full, no matter what. I know that Jan has done some testing and the best he could get is something like 8 days before PostgreSQL became unusable (but don't quote me on that). I am open to comments, suggestions, running multiple tests with different parameters or just leaving this in the archive for people to reference. Thanks in advance, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us * Unless otherwise stated, opinions are my own. * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Authentication mechanisms categorization
On 14/07/17 13:29, Michael Paquier wrote: On Fri, Jul 14, 2017 at 12:16 PM, Álvaro Hernández Tortosawrote: If the parameter authmethod would rather be "authmethods", i.e., a list, I think it would be significantly more flexible. Yes, but the handling of a list becomes messier if there are some other connection parameters that are dependent on the authentication method. Say if a list made of scram-sha-256 and scram-sha-3 as methods is sent, and a parameter named saslchannel lists scram-sha-256-plus is used, this becomes unusable with scram-sha-3. Using individual names for a parameter makes interactions with other parameters easier to handle and less bug-prone. That's also by experience more flexible for the application. I agree with a list of methods and all the values already existing for sslmode, this might be more than enough, specially if the channel binding SCRAM mechanisms would get a different authmethod than their non-channel binding partners (like scram-sha-256-plus). This makes the list argument for the authmethods, in my opinion, stronger. For the channel binding patch, I have actually implemented saslname to enforce the name of the SASL mechanism name to use (SCRAM-SHA-256 or its -PLUS) as well as saslchannelbinding to enforce the channel binding type. That's very handy, and at the end I saw that having a list does not add much value in a feature that should be as simple as possible as the client will use one match at the end for authentication, and let the client know if it failed or succeeded (yes I hate sslmode=prefer which does up to two attempts at once). But that's as far as my opinion stands. It is not possible to know the future, but we cannot discard as well the fact that a future authentication method, say hoge could as well support scram-sha-256, in which case cases like that using a list "authmethods=hoge,sasl authmechanisms=scram-sha-256" would mean that scram-sha-256 needs to be enforced for both things, but the dependency handling makes things unnecessary complicated in libpq. My argument here is crazy though. Hi Michael. I'm mostly convinced by the power of all the parameters that already exist, given that you added both saslname and saslchannelbinding to the already existing sslmode. That's great, and allows for very fine choosing of the auth method. So it would be great if (non-libpq) driver implementations would expose the same parameter names to the users. I will study this for JDBC. My only fear is that this could become very complicated for the user, and could end up looking like the authentication algorithm lists for SSL, which are very hard to digest for the non expert. To handle list of auth methods spread through three parameters, seem to me going a bit in this direction. A categorization like the one proposed, while maybe difficult to do initially, and maybe to maintain too, is just precisely to take this burden out from the user, and expose a simpler setting for them. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl extension fails on Windows
Ashutosh Sharmawrites: > Here are the list of macros and variables from 'intrpvar.h' file that > are just defined in perl module but not in plperl on Windows, > #ifdef PERL_USES_PL_PIDSTATUS > PERLVAR(I, pidstatus, HV *) /* pid-to-status mappings for waitpid */ > #endif > #ifdef PERL_SAWAMPERSAND > PERLVAR(I, sawampersand, U8)/* must save all match strings */ > #endif I am really suspicious that this means your libperl was built in an unsafe fashion, that is, by injecting configuration choices as random -D switches in the build process rather than making sure the choices were recorded in perl's config.h. As an example, looking at the perl 5.24.1 headers on a Fedora box, it looks to me like PERL_SAWAMPERSAND could only get defined if PERL_COPY_ON_WRITE were not defined, and the only way that that can happen is if PERL_NO_COW is defined, and there are no references to the latter anyplace except in this particular #if defined test in perl.h. Where did your perl installation come from, anyway? Are you sure the .h files match up with the executables? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
On Wed, Jul 19, 2017 at 10:08 PM, Alvaro Herrerawrote: > So let's step back a bit, > get a patch that works for the case where the types match on both sides > of the FK, then we review that patch; if all is well, we can discuss the > other problem as a stretch goal. Agreed. This should be a future improvment. I think the next step should be testing the performnce before/after the modifiactions.
[HACKERS] pgbench minor doc typo
Alik Khilazhev is submitting a patch about a zipfian random function for pgbench, and noticed a typo in the documentation about random_exponential. Attached is a fix extracted from his patch submission, which could be applied to head/10/9.6. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 64b043b..03e1212 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1045,7 +1045,7 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1 - exp(-parameter)) Then value i between min and max inclusive is drawn with probability: - f(x) - f(x + 1). + f(i) - f(i + 1). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
Michael Paquierwrites: > On Wed, Jul 19, 2017 at 8:56 PM, Tom Lane wrote: >> I wonder if it'd be worth the trouble to stick something like this into >> xlog.c: >> StaticAssertStmt(sizeof(ControlFileData) <= 512, >> "pg_control is too large for atomic disk writes"); > +1. Even if it just gets triggered in 20 years by some hacker, that's > a good reminder about assumptions behind the update logic. Done. I found the size checks had metastasized into pg_resetwal as well, and probably should have been in pg_rewind since it also rewrites pg_control, so it ended up a slightly larger patch than I first thought. Seems cleaner than before though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Mark Rofail wrote: > On Tue, Jul 18, 2017 at 11:14 PM, Alvaro Herrera> wrote: > > > > Why did we add an operator and not a support > > procedure? > > I thought the support procedures were constant within an opclass. Uhh ... I apologize but I think I was barking at the wrong tree. I was thinking that it mattered that the opclass mechanism was able to determine whether some array @>> some element, but that's not true: it's the queries in ri_triggers.c, which have no idea about opclasses. (I tihnk we would have wanted to use to opclasses in order to find out what operator to use in the first place, if ri_triggers.c was already using that general idea; but in reality it's already using hardcoded operator names, so it doesn't matter.) I'm not entirely sure what's the best way to deal with the polymorphic problem, but on the other hand as Robert says downthread maybe we shouldn't be solving it at this stage anyway. So let's step back a bit, get a patch that works for the case where the types match on both sides of the FK, then we review that patch; if all is well, we can discuss the other problem as a stretch goal. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
Robert Haas wrote: > On Wed, Jul 19, 2017 at 8:26 AM, Neha Sharma >wrote: > > I am getting FailedAssertion while executing the attached script.However,I > > am not able to produce the core dump for the same,the script runs in > > background and takes around a day time to produce the mentioned error. > > Maybe you should run it for another day with core dumps enabled (try > the "-c" option to pg_ctl) and then pull a backtrace if you get one. > This is a pretty generic script so it's going to be hard to guess > where the problem is otherwise. Also, please state server version used. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning - some feedback
On Tue, Jul 18, 2017 at 2:26 AM, Vik Fearingwrote: > On 07/07/2017 02:02 AM, Mark Kirkwood wrote: >> I'd prefer *not* to see a table and its partitions all intermixed in the >> same display (especially with nothing indicating which are partitions) - >> as this will make for unwieldy long lists when tables have many >> partitions. Also it would be good if the 'main' partitioned table and >> its 'partitions' showed up as a different type in some way. > > I've just read through this thread, and I'm wondering why we can't just > have something like \set SHOW_PARTITIONS true or something, and that > would default to false. We could, and that would have the advantage of letting people set a default. On the other hand, if you want to override the default behavior just once, adding a modifier character is a lot less typing than issuing \set, retyping your command, and issuing \set again to change it back. So I don't know which is better. My main point is that it's too late to be making changes upon which we do not have a clear consensus. I reject the argument that v11 will be too late to make this change. Now that we have partitioning, I believe there will be zillions of things that need to be done to improve it further; several of those things already have proposed patches; this can be another one of those things. If we rush something in now and it turns out that it isn't well-liked, we may well end up with one behavior for v<10, another behavior for v=10, and a third behavior for v>10. Better to wait and make the change later when we have a few more data points. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
On Wed, Jul 19, 2017 at 3:26 PM, Mark Dilgerwrote: >> Typically, when you try to store an out-of-range value in PostgreSQL, >> you get an ERROR, and that's one of the selling points of PostgreSQL. >> PostgreSQL users regularly beat up other projects for, say, allowing >> -00-00 to be considered a valid date, or any similar perceived >> laxity in enforcing data consistency. I don't like the idea that we >> can just deviate from that principle whenever adhering to it is too >> much work. > > I don't see the relevance of this statement. I am not planning to allow > abstime data that is outside the range of the new epoch. Attempts to > cast strings like '1962-07-07 01:02:03' to abstime would draw an > exception with a suitably informative message. > > Now, the objection to having on-disk data automagically change meaning > is concerning, ... I see those things as very similar issues, but YMMV. > ... and I was particularly unsatisfied with the idea that > NOSTART_ABSTIME and NOEND_ABSTIME would suddenly be > reinterpreted as seconds in the year 2068, which is why I made mention > of it upthread. I was less concerned with dates prior to 1970 being > reinterpreted, though it is hard to justify why that bothers me less. I think any change in the meaning of bytes already on disk is a concern. It wouldn't bother me to say "these are second-class types, so if you have columns of this type, pg_upgrade will fail and tell you that you can't upgrade a database with those columns from a catversion < X to a catversion > X". But having it silently work and silently corrupt your data doesn't seem OK to me. > Those who feel otherwise might like to also argue for dropping > float4 because float8 does all the same stuff better. I don't think that's the same thing, but given my notorious hatred of floating point arithmetic, it would also bother me less than you might think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
> On Jul 19, 2017, at 9:53 AM, Robert Haaswrote: > > On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane wrote: >> So, thinking about how that would actually work ... the thing to do in >> order to preserve existing on-disk values is to alternate between signed >> and unsigned interpretations of abstimes. That is, right now, abstime >> is signed with origin 1970. The conversion I'm arguing we should make >> real soon now is to unsigned with origin 1970. If the project lives >> so long, in circa 70 years we'd switch it to signed with origin 2106. >> Yadda yadda. > > Doesn't this plan amount to breaking pg_upgrade compatibility and > hoping that nobody notice? Existing values will be silently > reinterpreted according to the new rules. If we had two actually > separate types and let people convert columns from the old type to the > new type with just a validation scan, that would be engineering at the > level of quality that I've come to associate with this project. This is what I have done in my fork. I repurposed the type as "secondstamp" since I think of it as a timestamp down to second precision (truncating fractional seconds.) I changed the Oids assigned to the catalog entries associated with the type, and considered myself somewhat immune to the project dropping the abstime type, which the documentation warned would happen eventually. > If > this type is so marginal that we don't want to do that kind of work, > then I think we should just rip it out; that doesn't preclude someone > maintaining it in their own fork, or even adding it back as a new type > in a contrib module with a #define for the base year. Silently > changing the interpretation of the same data in the core code, though, > seems both far too clever and not clever enough. I would be happy to contribute the "secondstamp" type as part of a patch that removes the abstime type. I can add a catalog table that holds the epoch, and add documentation for the type stating that every time the epoch changes, their data will be automatically reinterpreted, and as such they should only use the datatype if that is ok. Under this plan, users with abstime columns who upgrade to postgres 11 will not have an easy time, because the type will be removed. But that is the same and no worse than what they would get if we just remove the abstime type in postgres 11 without any replacement. I'm not implementing any of this yet, as I expect further objections. Thoughts? mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
> On Jul 19, 2017, at 10:23 AM, Robert Haaswrote: > > On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane wrote: >>> Doesn't this plan amount to breaking pg_upgrade compatibility and >>> hoping that nobody notice? >> >> Well, what we'd need to do is document that the type is only meant to be >> used to store dates within say +/- 30 years from current time. As long >> as people adhere to that use-case, the proposal would work conveniently >> long into the future ... > > Typically, when you try to store an out-of-range value in PostgreSQL, > you get an ERROR, and that's one of the selling points of PostgreSQL. > PostgreSQL users regularly beat up other projects for, say, allowing > -00-00 to be considered a valid date, or any similar perceived > laxity in enforcing data consistency. I don't like the idea that we > can just deviate from that principle whenever adhering to it is too > much work. I don't see the relevance of this statement. I am not planning to allow abstime data that is outside the range of the new epoch. Attempts to cast strings like '1962-07-07 01:02:03' to abstime would draw an exception with a suitably informative message. Now, the objection to having on-disk data automagically change meaning is concerning, and I was particularly unsatisfied with the idea that NOSTART_ABSTIME and NOEND_ABSTIME would suddenly be reinterpreted as seconds in the year 2068, which is why I made mention of it upthread. I was less concerned with dates prior to 1970 being reinterpreted, though it is hard to justify why that bothers me less. >> I'd definitely be on board with just dropping the type altogether despite >> Mark's concern. > > Then I vote for that option. I was somewhat surprised when Tom was onboard with the idea of keeping abstime around (for my benefit). My original post was in response to his statement that, right offhand, he couldn't think of any use for abstime that wasn't handled better by timestamptz (paraphrase), and so I said that improving storage efficiency was such a use. I maintain that position. The abstime type is a good and useful type, and we will lose that use when we discard it. Those who feel otherwise might like to also argue for dropping float4 because float8 does all the same stuff better. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
On 20 July 2017 at 05:08, Michael Paquierwrote: > On Wed, Jul 19, 2017 at 8:59 PM, > > Fabrízio de Royes Mello > > You should add the properly sgml docs for this pg_dumpall change also. > > Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in > extensions are in src/test/modules/test_pg_dump, but you just care > about the former with this patch. And if you implement some new tests, > look at the other tests and base your work on that. > Thanks Michael / Fabrízio. Updated patch (attached) additionally adds SGML changes for pg_dumpall. (I'll try to work on the tests, but sending this nonetheless ). - robins pgdump_nocomments_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More flexible LDAP auth search filters?
On Sun, Jul 16, 2017 at 7:23 PM, Stephen Frostwrote: >> Refusing to improve LDAP for the users who have no choice seems like a very >> unfriendly thing to do. > > I'm fine with improving LDAP in general, but, as I tried to point out, > having a way to make it easier to integrate PG into an AD environment > would be better. It's not my intent to stop this patch but rather to > point out the issues with LDAP auth that far too frequently are not > properly understood. Then it's off-topic for this thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
On Wed, Jul 19, 2017 at 8:59 PM, Fabrízio de Royes Mellowrote: > On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakan wrote: >> You may want to consider this patch (attached) which additionally has the >> pg_dumpall changes. >> It would be great if you could help with the tests though, am unsure how >> to go about them. > > You should add the properly sgml docs for this pg_dumpall change also. Tests of pg_dump go to src/bin/pg_dump/t/ and tests for objects in extensions are in src/test/modules/test_pg_dump, but you just care about the former with this patch. And if you implement some new tests, look at the other tests and base your work on that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
On Wed, Jul 19, 2017 at 8:56 PM, Tom Lanewrote: > I wonder if it'd be worth the trouble to stick something like this into > xlog.c: > > /* > * For reliability's sake, it's critical that pg_control updates > * be atomic writes. That generally means the active data can't > * be more than one disk sector, which is 512 bytes on common > * hardware. Be very careful about raising this limit. > */ > StaticAssertStmt(sizeof(ControlFileData) <= 512, > "pg_control is too large for atomic disk writes"); +1. Even if it just gets triggered in 20 years by some hacker, that's a good reminder about assumptions behind the update logic. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
On Wed, Jul 19, 2017 at 2:29 PM, Mark Rofailwrote: > On Wed, Jul 19, 2017 at 7:28 PM, Robert Haas wrote: >> >> Why do we have to solve that limitation? > > Since the regress test labled element_foreing_key fails now that I made the > RI queries utilise @(anyarray, anyelement), that means it's not functioning > as it is meant to be. Well, if this is a new test introduced by the patch, you could also just change the test. Off-hand, I'm not sure that it's very important to make the case work where the types don't match between the referenced table and the referencing table, which is what you seem to be talking about here. But maybe I'm misunderstanding the situation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Wed, Jul 19, 2017 at 12:24 AM, Rafia Sabihwrote: > On testing this patch for TPC-H (for scale factor 20) benchmark I found a > regression for Q21, on head it was taking some 600 seconds and with this > patch it is taking 3200 seconds. This comparison is on the same partitioned > database, one using the partition wise join patch and other is without it. > The execution time of Q21 on unpartitioned head is some 300 seconds. The > explain analyse output for each of these cases is attached. Interesting. > This suggests that partitioning is not a suitable strategy for this query, > but then may be partition wise should not be picked for such a case to > aggravate the performance issue. In the unpartitioned case, and in the partitioned case on head, the join order is l1-(nation-supplier)-l2-orders-l3. In the patched case, the join order changes to l1-l2-supplier-orders-nation-l3. If the planner used the former join order, it wouldn't be able to do a partition-wise join at all, so it must think that the l1-l2 join gets much cheaper when done partitionwise, thus justifying a change in the overall join order to be able to use partion-wise join. But it doesn't work out. I think the problem is that the row count estimates for the child joins seem to be totally bogus: -> Hash Semi Join (cost=309300.53..491665.60 rows=1 width=12) (actual time=10484.422..15945.851 rows=1523493 loops=3) Hash Cond: (l1.l_orderkey = l2.l_orderkey) Join Filter: (l2.l_suppkey <> l1.l_suppkey) Rows Removed by Join Filter: 395116 That's clearly wrong. In the un-partitioned plan, the join to l2 produces about as many rows of output as the number of rows that were input (998433 vs. 962909); but here, a child join with a million rows as input is estimated to produce only 1 row of output. I bet the problem is that the child-join's row count estimate isn't getting initialized at all, but then something is clamping it to 1 row instead of 0. So this looks like a bug in Ashutosh's patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
On Wed, Jul 19, 2017 at 3:54 PM, Robins Tharakanwrote: > > > On 18 July 2017 at 23:55, David Fetter wrote: >> >> Excellent point about pg_dumpall. I'll see what I can draft up in the >> next day or two and report back. > > > > Hi David, > > You may want to consider this patch (attached) which additionally has the pg_dumpall changes. > It would be great if you could help with the tests though, am unsure how to go about them. > You should add the properly sgml docs for this pg_dumpall change also. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
I wrote: > Hm. Currently sizeof(struct ControlFileData) = 296, at least on my > machine. Letting it grow past 512 would be problematic. It's hard > to see getting to that any time soon, though; we don't add fields > there often. I wonder if it'd be worth the trouble to stick something like this into xlog.c: /* * For reliability's sake, it's critical that pg_control updates * be atomic writes. That generally means the active data can't * be more than one disk sector, which is 512 bytes on common * hardware. Be very careful about raising this limit. */ StaticAssertStmt(sizeof(ControlFileData) <= 512, "pg_control is too large for atomic disk writes"); regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
On 18 July 2017 at 23:55, David Fetterwrote: > > Excellent point about pg_dumpall. I'll see what I can draft up in the > next day or two and report back. Hi David, You may want to consider this patch (attached) which additionally has the pg_dumpall changes. It would be great if you could help with the tests though, am unsure how to go about them. - robins pgdump_nocomments_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
On Wed, Jul 19, 2017 at 10:56 AM, Robert Haaswrote: >> I strongly agree. I simply don't understand how you can adopt UNDO for >> MVCC, and yet expect to get a benefit commensurate with the effort >> without also implementing "retail index tuple deletion" first. > > I agree that we need retail index tuple deletion. I liked Claudio's > idea at > http://postgr.es/m/cagtbqpz-ktrqiaa13xg1gne461yowra-s-yccqptyfrpkta...@mail.gmail.com > -- that seems indispensible to making retail index tuple deletion > reasonably efficient. Is anybody going to work on getting that > committed? I will do review work on it. IMV the main problems are: * The way a "header" is added at the PageAddItemExtended() level, rather than making heap TID something much closer to a conventional attribute that perhaps only nbtree and indextuple.c have special knowledge of, strikes me as the wrong way to go. * It's simply not acceptable to add overhead to *all* internal items. That kills fan-in. We're going to need suffix truncation for the common case where the user-visible attributes for a split point/new high key at the leaf level sufficiently distinguish what belongs on either side. IOW, you should only see internal items with a heap TID in the uncommon case where you have so many duplicates at the leaf level that you have no choice put to use a split point that's right in the middle of many duplicates. Fortunately, if we confine ourselves to making heap TID part of the keyspace, the code can be far simpler than what would be needed to get my preferred, all-encompassing design for suffix truncation [1] to work. I think we could just stash the number of attributes participating in a comparison within internal pages' unused item pointer offset. I've talked about this before, in the context of Anastasia's INCLUDED columns patch. If we can have a variable number of attributes for heap tuples, we can do so for index tuples, too. * We might also have problems with changing the performance characteristics for the worse in some cases by some measures. This will probably technically increase the amount of bloat for some indexes with sparse deletion patterns. I think that that will be well worth it, but I don't expect a slam dunk. A nice benefit of this work is that it lets us kill the hack that adds randomness to the search for free space among duplicates, and may let us follow the Lehman & Yao algorithm more closely. [1] https://wiki.postgresql.org/wiki/Key_normalization#Suffix_truncation_of_normalized_keys -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
Robert Haaswrites: > On Wed, Jul 19, 2017 at 1:35 PM, Tom Lane wrote: >> Alternatively, we could turn the origin point for abstime into >> pg_control field, and regard changing it as a reason for a database >> not being pg_upgrade'able unless it lacks any abstime columns. > I would be OK with that, too, but is there any danger that we're going > to grow pg_control to a size where reads and writes can no longer be > assumed atomic, if we keep adding things? Hm. Currently sizeof(struct ControlFileData) = 296, at least on my machine. Letting it grow past 512 would be problematic. It's hard to see getting to that any time soon, though; we don't add fields there often. Note that I'm not seriously pushing for this solution. I'm just trying to make sure that we've considered all the reasonable options. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
On Wed, Jul 19, 2017 at 7:28 PM, Robert Haaswrote: > Why do we have to solve that limitation? Since the regress test labled element_foreing_key fails now that I made the RI queries utilise @(anyarray, anyelement), that means it's not functioning as it is meant to be.
Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions
On Wed, Jul 19, 2017 at 8:05 PM, Tom Lanewrote: > Michael Paquier writes: >> On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas wrote: >>> It's interesting that you bring this up. I've also wondered why we >>> don't use random TLIs. I suppose I'm internally assuming that it's >>> because the people who wrote the code are far more brilliant and >>> knowledgeable of this area than I could ever be and that doing >>> anything else would create some kind of awful problem, but maybe >>> that's not so. > >> I am not the only who worked on that, but the result code is a tad >> more simple, as it is possible to guess more easily some hierarchy for >> the timelines, of course with the history files at hand. > > Yeah, right now you have the ability to guess that, say, timeline 42 > is a descendant of 41, which you couldn't assume with random TLIs. > Also, the values are only 32 bits, which is not wide enough to allow > imagining that random() could be relied on to produce non-duplicate > values. pg_backend_random() perhaps? If any new code uses random(), those would be slashed quickly at review. > If we had separate database identifiers for slave installations, which > AFAIR we don't, it'd be possible to consider incorporating part of > the server ID into timeline IDs it creates, which would alleviate > Brian's issue I think. That is, instead of 1, 2, 3, ..., a server > might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits > associated with the particular installation. This is obviously > not bulletproof since you could have collisions of the xyz's, but > it would help. Also you could imagine allowing DBAs to assign > distinct xyz codes to every slave in a given community. I am not much into any concept of complicating the timeline name to be honest :) Having a unique identifier per node has value for other purposes, like clustering, and we would have the same information by adding in the history file the ID of the node that generated the new timeline. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
On Wed, Jul 19, 2017 at 1:35 PM, Tom Lanewrote: > Robert Haas writes: >> On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane wrote: >>> I'd definitely be on board with just dropping the type altogether despite >>> Mark's concern. > >> Then I vote for that option. > > BTW, another possible compromise is to move abstime into a contrib > module; we've always accepted that contrib modules can be held to a > lower standard than core features. I'm not volunteering to do the > work for that, but it's worth contemplating. I would be OK with that, provided the documentation calls out the hazard. > Alternatively, we could turn the origin point for abstime into > pg_control field, and regard changing it as a reason for a database > not being pg_upgrade'able unless it lacks any abstime columns. I would be OK with that, too, but is there any danger that we're going to grow pg_control to a size where reads and writes can no longer be assumed atomic, if we keep adding things? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10b2: column-list UPDATE syntax fails with single column
Justin Pryzbywrites: > ts=# begin; UPDATE eric_enodeb_201707 SET > (pmhdelayvarbest50pct,pmlicconnecteduserstimecong)=(0,0) ; > BEGIN > UPDATE 3 > ts=# begin; UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct)=(0) ; > BEGIN > ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or > ROW() expression Hm. It's kind of unfortunate that this isn't backwards-compatible, but it was only accidental that that case was accepted before. It's really wrong, because the source for a parenthesized UPDATE target list ought to be a row value even when there's only one item in the list. That is, the correct standards-compliant spelling would be UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct) = ROW(0); Now, it's true that "(1,2)" is fully equivalent to "ROW(1,2)", but "(0)" is *not* equivalent to "ROW(0)"; it's just a scalar 0. So your existing code is non-spec-compliant and was really being accepted in error. We could maybe hack up some weird action-at-a-distance kluge that would make this case work like before, but I'm afraid it would just introduce other inconsistencies. > It may be that our use was wrong (?) or unintuitive (I'm in the middle of > changing it), but wondered if it was intentional or otherwise if the release > notes should mention that old syntax is no longer accepted. Not sure if it's worth getting into in the release notes. Using the parenthesis notation for single target columns seems rather weird. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl extension fails on Windows
Ashutosh Sharmawrites: > On Wed, Jul 19, 2017 at 9:42 PM, Tom Lane wrote: >> I imagine the route to a solution is to fix things so that the relevant >> macros are all defined correctly in both cases. But why they aren't >> already is certainly an interesting question. Have you identified just >> which fields are added or missing relative to what libperl thinks? > Here are the list of macros and variables from 'intrpvar.h' file that > are just defined in perl module but not in plperl on Windows, > #ifdef PERL_USES_PL_PIDSTATUS > PERLVAR(I, pidstatus, HV *) /* pid-to-status mappings for waitpid */ > #endif > #ifdef PERL_SAWAMPERSAND > PERLVAR(I, sawampersand, U8)/* must save all match strings */ > #endif > #ifdef FCRYPT > PERLVARI(I, cryptseen, bool, FALSE) /* has fast crypt() been initialized? > */ > #else > /* One byte hole in the interpreter structure. */ > #endif > #ifdef USE_REENTRANT_API > PERLVAR(I, reentrant_buffer, REENTR *) /* here we store the _r buffers */ > #endif > #ifdef PERL_GLOBAL_STRUCT_PRIVATE > PERLVARI(I, my_cxt_keys, const char **, NULL) /* per-module array of > pointers to MY_CXT_KEY constants */ > # endif > #ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP > /* File descriptor to talk to the child which dumps scalars. */ > PERLVARI(I, dumper_fd, int,-1) > #endif > #ifdef DEBUG_LEAKING_SCALARS > PERLVARI(I, sv_serial, U32,0) /* SV serial number, used in sv.c */ > #endif > #ifdef PERL_TRACE_OPS > PERLVARA(I, op_exec_cnt, OP_max+2, UV) > #endif Huh. So those seem like symbols that ought to be exposed somewhere in Perl's headers. Perhaps we're failing to #include some "perl_config.h" or equivalent file that records these ABI options? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions
Michael Paquierwrites: > On Wed, Jul 19, 2017 at 7:00 PM, Robert Haas wrote: >> It's interesting that you bring this up. I've also wondered why we >> don't use random TLIs. I suppose I'm internally assuming that it's >> because the people who wrote the code are far more brilliant and >> knowledgeable of this area than I could ever be and that doing >> anything else would create some kind of awful problem, but maybe >> that's not so. > I am not the only who worked on that, but the result code is a tad > more simple, as it is possible to guess more easily some hierarchy for > the timelines, of course with the history files at hand. Yeah, right now you have the ability to guess that, say, timeline 42 is a descendant of 41, which you couldn't assume with random TLIs. Also, the values are only 32 bits, which is not wide enough to allow imagining that random() could be relied on to produce non-duplicate values. If we had separate database identifiers for slave installations, which AFAIR we don't, it'd be possible to consider incorporating part of the server ID into timeline IDs it creates, which would alleviate Brian's issue I think. That is, instead of 1, 2, 3, ..., a server might create 1xyz, 2xyz, 3xyz, ... where "xyz" are random digits associated with the particular installation. This is obviously not bulletproof since you could have collisions of the xyz's, but it would help. Also you could imagine allowing DBAs to assign distinct xyz codes to every slave in a given community. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
On Sat, Jul 15, 2017 at 8:58 PM, Peter Geogheganwrote: > To repeat myself, for emphasis: *Not all bloat is equal*. +1. > I strongly agree. I simply don't understand how you can adopt UNDO for > MVCC, and yet expect to get a benefit commensurate with the effort > without also implementing "retail index tuple deletion" first. I agree that we need retail index tuple deletion. I liked Claudio's idea at http://postgr.es/m/cagtbqpz-ktrqiaa13xg1gne461yowra-s-yccqptyfrpkta...@mail.gmail.com -- that seems indispensible to making retail index tuple deletion reasonably efficient. Is anybody going to work on getting that committed? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
On Sat, Jul 15, 2017 at 6:36 PM, Alexander Korotkovwrote: > I think in general there are two ways dealing with out index AM API > limitation. One of them is to extend index AM API. That's pretty much what I have in mind. I think it's likely that if we end up with, say, 3 kinds of heap and 12 kinds of index, there will be some compatibility matrix. Some index AMs will be compatible with some heap AMs, and others won't be. For example, if somebody makes an IOT-type heap using the API proposed here or some other one, BRIN probably won't work at all. btree, on the other hand, could probably be made to work, perhaps with some greater or lesser degree of modification. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PG10b2: column-list UPDATE syntax fails with single column
In testing our application with PG10beta2, I discovered that our use of UPDATE broke, apparently by this commit |commit 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd |Author: Tom Lane|Date: Tue Nov 22 15:19:57 2016 -0500 | |Improve handling of "UPDATE ... SET (column_list) = row_constructor". ERROR ON QUERY: UPDATE eric_enodeb_cell_metrics SET (pmActiveDrbDlSum)=(%s) WHERE sect_id=%s AND subnetwork_id=%s AND start_time=%s AND site_id=%s AND interval_seconds=%s AND nedn=%s AND mecontext=%s AND EUtranCellxDD=%s AND EUtranCell=%s AND subnetwork=%s AND device_id=%s -- Query Error: ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression This still works for multiple columns but in pg10b2 no longer works for single column update with "column_list" syntax. ts=# begin; UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct,pmlicconnecteduserstimecong)=(0,0) ; BEGIN UPDATE 3 ts=# begin; UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct)=(0) ; BEGIN ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression It may be that our use was wrong (?) or unintuitive (I'm in the middle of changing it), but wondered if it was intentional or otherwise if the release notes should mention that old syntax is no longer accepted. Either way - thanks in advance. Justin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl extension fails on Windows
On Wed, Jul 19, 2017 at 9:42 PM, Tom Lanewrote: > > Ashutosh Sharma writes: > > Actually the function used for generation of handshake Key i.e HS_KEYp() > > considers 'sizeof(PerlInterpreter)' to generate the key and somehow the > > sizeof PerlInterpreter is not uniform in plperl and perl modules incase of > > Windows but on Linux it remains same in both the modules. > > Yipes. So actually, this is catching a live ABI problem, which presumably > we've escaped seeing bad effects from only through sheer good luck. > I suppose that the discrepancies in the struct contents only occur after > the last field that plperl happens to touch directly --- but that is > unlikely to be true forever. > > > *typedef struct interpreter PerlInterpreter;struct interpreter {# include > > "intrpvar.h"};* > > where intrpvar.h has different variables defined inside it and most of the > > variables definition are protected with various macros. And there are some > > macros that are just defined in perl but not in plperl module which means > > the sizeof(PerlInterpreter) on the two modules are going to be different > > and thereby resulting in a different key. > > I imagine the route to a solution is to fix things so that the relevant > macros are all defined correctly in both cases. But why they aren't > already is certainly an interesting question. Have you identified just > which fields are added or missing relative to what libperl thinks? Here are the list of macros and variables from 'intrpvar.h' file that are just defined in perl module but not in plperl on Windows, #ifdef PERL_USES_PL_PIDSTATUS PERLVAR(I, pidstatus, HV *) /* pid-to-status mappings for waitpid */ #endif #ifdef PERL_SAWAMPERSAND PERLVAR(I, sawampersand, U8)/* must save all match strings */ #endif #ifdef FCRYPT PERLVARI(I, cryptseen, bool, FALSE) /* has fast crypt() been initialized? */ #else /* One byte hole in the interpreter structure. */ #endif #ifdef USE_REENTRANT_API PERLVAR(I, reentrant_buffer, REENTR *) /* here we store the _r buffers */ #endif #ifdef PERL_GLOBAL_STRUCT_PRIVATE PERLVARI(I, my_cxt_keys, const char **, NULL) /* per-module array of pointers to MY_CXT_KEY constants */ # endif #ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP /* File descriptor to talk to the child which dumps scalars. */ PERLVARI(I, dumper_fd, int,-1) #endif #ifdef DEBUG_LEAKING_SCALARS PERLVARI(I, sv_serial, U32,0) /* SV serial number, used in sv.c */ #endif #ifdef PERL_TRACE_OPS PERLVARA(I, op_exec_cnt, OP_max+2, UV) #endif -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions
On Wed, Jul 19, 2017 at 7:00 PM, Robert Haaswrote: > On Wed, Jul 19, 2017 at 11:23 AM, Brian Faherty > wrote: >> I was working with replication and recovery the other day and noticed that >> there were scenarios where I could cause multiple servers to enter the same >> timeline while possibly having divergent data. One such scenario is Master A >> and Replica B are both on timeline 1. There is an event that causes Replica >> B to become promoted which changes it to timeline 2. Following this, you >> perform a restore on Master A to a point before the event happened. Once >> Postgres completes this recovery on Master A, it will switch over to >> timeline 2. There are now WAL files that have been written to timeline 2 >> from both servers. >> >> From this scenario, I would like to suggest considering using non-sequential >> timelines. From what I have investigated so far, I believe the *.history >> files in the WAL directory already have all the timelines id's in them and >> are in order. If we could make those timeline ids to be a bit more >> unique/random, and still rely on the ordering in the *.history file, I think >> this would help prevent multiple servers on the same timeline with divergent >> data. It seems to me that you are missing one piece here: the history files generated at the moment of the timeline bump. When recovery finishes, an instance scans the archives or from the instances it is streaming from for history files, and chooses a timeline number that does not match existing ones. So you are trying to avoid a problem that can easily be solved with a proper archive for example. >> I was hoping to begin a conversation on whether or not non-sequential >> timelines are a good idea before I looked at the code around timelines. > > It's interesting that you bring this up. I've also wondered why we > don't use random TLIs. I suppose I'm internally assuming that it's > because the people who wrote the code are far more brilliant and > knowledgeable of this area than I could ever be and that doing > anything else would create some kind of awful problem, but maybe > that's not so. I am not the only who worked on that, but the result code is a tad more simple, as it is possible to guess more easily some hierarchy for the timelines, of course with the history files at hand. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
Robert Haaswrites: > On Wed, Jul 19, 2017 at 1:12 PM, Tom Lane wrote: >> I'd definitely be on board with just dropping the type altogether despite >> Mark's concern. > Then I vote for that option. BTW, another possible compromise is to move abstime into a contrib module; we've always accepted that contrib modules can be held to a lower standard than core features. I'm not volunteering to do the work for that, but it's worth contemplating. Alternatively, we could turn the origin point for abstime into a pg_control field, and regard changing it as a reason for a database not being pg_upgrade'able unless it lacks any abstime columns. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cache lookup errors with functions manipulation object addresses
On Wed, Jul 19, 2017 at 2:25 AM, Michael Paquierwrote: > Would we want to improve the error handling of such objects? +1 for such an improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
On Wed, Jul 19, 2017 at 8:08 AM, Mark Rofailwrote: > To summarise, the options we have to solve the limitation of the @>(anyarray > , anyelement) where it produces the following error: operator does not > exist: integer[] @> smallint Why do we have to solve that limitation? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
On Wed, Jul 19, 2017 at 1:12 PM, Tom Lanewrote: >> Doesn't this plan amount to breaking pg_upgrade compatibility and >> hoping that nobody notice? > > Well, what we'd need to do is document that the type is only meant to be > used to store dates within say +/- 30 years from current time. As long > as people adhere to that use-case, the proposal would work conveniently > long into the future ... Typically, when you try to store an out-of-range value in PostgreSQL, you get an ERROR, and that's one of the selling points of PostgreSQL. PostgreSQL users regularly beat up other projects for, say, allowing -00-00 to be considered a valid date, or any similar perceived laxity in enforcing data consistency. I don't like the idea that we can just deviate from that principle whenever adhering to it is too much work. > I'd definitely be on board with just dropping the type altogether despite > Mark's concern. Then I vote for that option. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
On Wed, Jul 19, 2017 at 01:12:02PM -0400, Tom Lane wrote: > Robert Haaswrites: > > On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane wrote: > >> So, thinking about how that would actually work ... the thing to do in > >> order to preserve existing on-disk values is to alternate between signed > >> and unsigned interpretations of abstimes. That is, right now, abstime > >> is signed with origin 1970. The conversion I'm arguing we should make > >> real soon now is to unsigned with origin 1970. If the project lives > >> so long, in circa 70 years we'd switch it to signed with origin 2106. > >> Yadda yadda. > > > Doesn't this plan amount to breaking pg_upgrade compatibility and > > hoping that nobody notice? > > Well, what we'd need to do is document that the type is only meant to be > used to store dates within say +/- 30 years from current time. As long > as people adhere to that use-case, the proposal would work conveniently > long into the future ... > > > If we had two actually > > separate types and let people convert columns from the old type to the > > new type with just a validation scan, that would be engineering at the > > level of quality that I've come to associate with this project. > > ... whereas that would be seriously INconvenient. It's not just the > ALTER TABLE pushups, which presumably would be something you could do > and forget. It's that the new type name would be something you'd have > to change your applications to know about, and then you (or more likely > your successor) would have to do it over again decades later. > > I'd definitely be on board with just dropping the type altogether despite > Mark's concern. But I am not sure that the way you are proposing would > please anybody except pedants. +1 for just dropping the types, preferably modifying the contrib extensions that depend on it, less preferably, dropping those, too. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
Robert Haaswrites: > On Mon, Jul 17, 2017 at 6:12 PM, Tom Lane wrote: >> So, thinking about how that would actually work ... the thing to do in >> order to preserve existing on-disk values is to alternate between signed >> and unsigned interpretations of abstimes. That is, right now, abstime >> is signed with origin 1970. The conversion I'm arguing we should make >> real soon now is to unsigned with origin 1970. If the project lives >> so long, in circa 70 years we'd switch it to signed with origin 2106. >> Yadda yadda. > Doesn't this plan amount to breaking pg_upgrade compatibility and > hoping that nobody notice? Well, what we'd need to do is document that the type is only meant to be used to store dates within say +/- 30 years from current time. As long as people adhere to that use-case, the proposal would work conveniently long into the future ... > If we had two actually > separate types and let people convert columns from the old type to the > new type with just a validation scan, that would be engineering at the > level of quality that I've come to associate with this project. ... whereas that would be seriously INconvenient. It's not just the ALTER TABLE pushups, which presumably would be something you could do and forget. It's that the new type name would be something you'd have to change your applications to know about, and then you (or more likely your successor) would have to do it over again decades later. I'd definitely be on board with just dropping the type altogether despite Mark's concern. But I am not sure that the way you are proposing would please anybody except pedants. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
On Wed, Jul 19, 2017 at 8:26 AM, Neha Sharmawrote: > I am getting FailedAssertion while executing the attached script.However,I > am not able to produce the core dump for the same,the script runs in > background and takes around a day time to produce the mentioned error. Maybe you should run it for another day with core dumps enabled (try the "-c" option to pg_ctl) and then pull a backtrace if you get one. This is a pretty generic script so it's going to be hard to guess where the problem is otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using non-sequential timelines in order to help with possible collisions
On Wed, Jul 19, 2017 at 11:23 AM, Brian Fahertywrote: > Hey hackers, > I was working with replication and recovery the other day and noticed that > there were scenarios where I could cause multiple servers to enter the same > timeline while possibly having divergent data. One such scenario is Master A > and Replica B are both on timeline 1. There is an event that causes Replica > B to become promoted which changes it to timeline 2. Following this, you > perform a restore on Master A to a point before the event happened. Once > Postgres completes this recovery on Master A, it will switch over to > timeline 2. There are now WAL files that have been written to timeline 2 > from both servers. > > From this scenario, I would like to suggest considering using non-sequential > timelines. From what I have investigated so far, I believe the *.history > files in the WAL directory already have all the timelines id's in them and > are in order. If we could make those timeline ids to be a bit more > unique/random, and still rely on the ordering in the *.history file, I think > this would help prevent multiple servers on the same timeline with divergent > data. > > I was hoping to begin a conversation on whether or not non-sequential > timelines are a good idea before I looked at the code around timelines. It's interesting that you bring this up. I've also wondered why we don't use random TLIs. I suppose I'm internally assuming that it's because the people who wrote the code are far more brilliant and knowledgeable of this area than I could ever be and that doing anything else would create some kind of awful problem, but maybe that's not so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new function for tsquery creartion
On Wed, Jul 19, 2017 at 12:43 PM, Victor Drobnywrote: > Let me introduce new function for full text search query creation(which is > called 'queryto_tsquery'). It takes 'google like' query string and > translates it to tsquery. I haven't looked at the code, but that sounds like a neat idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl extension fails on Windows
On Wed, Jul 19, 2017 at 12:12 PM, Tom Lanewrote: > Ashutosh Sharma writes: >> Actually the function used for generation of handshake Key i.e HS_KEYp() >> considers 'sizeof(PerlInterpreter)' to generate the key and somehow the >> sizeof PerlInterpreter is not uniform in plperl and perl modules incase of >> Windows but on Linux it remains same in both the modules. > > Yipes. +1 for "yipes". It sounds like we should really try to fix the underlying problem, rather than just working around the check. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
On Mon, Jul 17, 2017 at 6:12 PM, Tom Lanewrote: > So, thinking about how that would actually work ... the thing to do in > order to preserve existing on-disk values is to alternate between signed > and unsigned interpretations of abstimes. That is, right now, abstime > is signed with origin 1970. The conversion I'm arguing we should make > real soon now is to unsigned with origin 1970. If the project lives > so long, in circa 70 years we'd switch it to signed with origin 2106. > Yadda yadda. Doesn't this plan amount to breaking pg_upgrade compatibility and hoping that nobody notice? Existing values will be silently reinterpreted according to the new rules. If we had two actually separate types and let people convert columns from the old type to the new type with just a validation scan, that would be engineering at the level of quality that I've come to associate with this project. If this type is so marginal that we don't want to do that kind of work, then I think we should just rip it out; that doesn't preclude someone maintaining it in their own fork, or even adding it back as a new type in a contrib module with a #define for the base year. Silently changing the interpretation of the same data in the core code, though, seems both far too clever and not clever enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] new function for tsquery creartion
Dear all, Now Postgres has a few functions to create tsqueries for full text search. The main one is the to_tsquery function that allows to make query with any operation. But to make correct query all of the operators should be specified explicitly. In order to make it easier postgres has functions like plainto_tsquery and phraseto_tsquery which allow to make tsqueries from strings. But they are not flexible enough. Let me introduce new function for full text search query creation(which is called 'queryto_tsquery'). It takes 'google like' query string and translates it to tsquery. The main features are the following: All the text inside double quotes would be treated as a phrase("a b c" -> 'a <-> b <-> c') New operator AROUND(N). It matches if the distance between words(or maybe phrases) is less than or equal to N. Alias for !('-rat' is the same as '!rat') Alias for |('dog OR cat' is the same as 'dog | cat') As a plainto_tsquery and phraseto_tsquery it will fill operators by itself, but already placed operations won't be ignored. It allows to combine two approaches. In the attachment you can find patch with the new features, tests and documentation for it. What do you think about it? Thank you very much for the attention! -- -- Victor Drobny Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e073f7b..d6fb4ce 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9494,6 +9494,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + queryto_tsquery + + queryto_tsquery( config regconfig , query text) + +tsquery +produce tsquery from google like query +queryto_tsquery('english', 'The Fat Rats') +'fat' 'rat' + + + + querytree querytree(query tsquery) diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index fe630a6..999e4ad 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -797,13 +797,15 @@ UPDATE tt SET ti = PostgreSQL provides the functions to_tsquery, -plainto_tsquery, and -phraseto_tsquery +plainto_tsquery, +phraseto_tsquery and +queryto_tsquery for converting a query to the tsquery data type. to_tsquery offers access to more features than either plainto_tsquery or phraseto_tsquery, but it is less forgiving -about its input. +about its input. queryto_tsquery provides a +different, Google like syntax to create tsquery. @@ -960,8 +962,68 @@ SELECT phraseto_tsquery('english', 'The Fat Rats:C'); - 'fat' - 'rat' - 'c' + + + +queryto_tsquery( config regconfig, querytext text) returns tsquery + + + +queryto_tsquery creates a tsquery from a unformated text. +But instead of plainto_tsquery and phraseto_tsquery it won't +ignore already placed operations. This function supports following operators: + + + + '"some text" - any text inside quote signs will be treated as a phrase and will be +performed like in phraseto_tsquery. + + + + + 'OR' - standard logical operator. It is just an alias for '|'' sign. + + + + + 'terma AROUND(N) termb' - this operation will match if the distance between + terma and termb is less than N. + + + + + '-' - standard logical negation sign. It is an alias for '!' sign. + + + +Other missing operators will be replaced by AND like in plainto_tsquery. + +Examples: + + select queryto_tsquery('The fat rats'); + queryto_tsquery + - + 'fat' & 'rat' + (1 row) + + + select queryto_tsquery('"supernovae stars" AND -crab'); + queryto_tsquery + -- + 'supernova' <-> 'star' & !'crab' +(1 row) + + + select queryto_tsquery('-run AROUND(5) "gnu debugger" OR "I like bananas"'); +queryto_tsquery + --- + !'run' AROUND(5) 'gnu' <-> 'debugg' | 'like' <-> 'banana' + (1 row) + + + diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c index 18368d1..10fd8c3 100644 --- a/src/backend/tsearch/to_tsany.c +++ b/src/backend/tsearch/to_tsany.c @@ -414,7 +414,8 @@ add_to_tsvector(void *_state, char *elem_value, int elem_len) * and different variants are ORed together. */ static void -pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, int16 weight, bool prefix) +pushval_morph(Datum opaque, TSQueryParserState state, char *strval, int lenval, +
Re: [HACKERS] tupconvert.c API change in v10 release notes
Justin Pryzbywrites: > FYI, I happened across this commit comment: > 3f902354b08ac788600f0ae54fcbfc1d4e3ea765 > | So, let's accept the removal of the guarantee about > | the output tuple's rowtype marking, recognizing that this is a API change > | that could conceivably break third-party callers of tupconvert.c. (So, > | let's remember to mention it in the v10 release notes.) > ..but couldn't see that the commit or change is so referenced. Yeah, I see nothing about 3f902354b in release-10.sgml either. We've had varying policies over the years about whether to mention internal API changes in the release notes or not, but this one I think does belong there, since it's a silent API break rather than one that would easily be caught due to compiler errors. Bruce, did you have any specific reasoning for leaving it out? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tupconvert.c API change in v10 release notes
FYI, I happened across this commit comment: 3f902354b08ac788600f0ae54fcbfc1d4e3ea765 | So, let's accept the removal of the guarantee about | the output tuple's rowtype marking, recognizing that this is a API change | that could conceivably break third-party callers of tupconvert.c. (So, | let's remember to mention it in the v10 release notes.) ..but couldn't see that the commit or change is so referenced. Justin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl extension fails on Windows
Ashutosh Sharmawrites: > Actually the function used for generation of handshake Key i.e HS_KEYp() > considers 'sizeof(PerlInterpreter)' to generate the key and somehow the > sizeof PerlInterpreter is not uniform in plperl and perl modules incase of > Windows but on Linux it remains same in both the modules. Yipes. So actually, this is catching a live ABI problem, which presumably we've escaped seeing bad effects from only through sheer good luck. I suppose that the discrepancies in the struct contents only occur after the last field that plperl happens to touch directly --- but that is unlikely to be true forever. > *typedef struct interpreter PerlInterpreter;struct interpreter {# include > "intrpvar.h"};* > where intrpvar.h has different variables defined inside it and most of the > variables definition are protected with various macros. And there are some > macros that are just defined in perl but not in plperl module which means > the sizeof(PerlInterpreter) on the two modules are going to be different > and thereby resulting in a different key. I imagine the route to a solution is to fix things so that the relevant macros are all defined correctly in both cases. But why they aren't already is certainly an interesting question. Have you identified just which fields are added or missing relative to what libperl thinks? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Using non-sequential timelines in order to help with possible collisions
Hey hackers, I was working with replication and recovery the other day and noticed that there were scenarios where I could cause multiple servers to enter the same timeline while possibly having divergent data. One such scenario is Master A and Replica B are both on timeline 1. There is an event that causes Replica B to become promoted which changes it to timeline 2. Following this, you perform a restore on Master A to a point before the event happened. Once Postgres completes this recovery on Master A, it will switch over to timeline 2. There are now WAL files that have been written to timeline 2 from both servers. >From this scenario, I would like to suggest considering using non-sequential timelines. From what I have investigated so far, I believe the *.history files in the WAL directory already have all the timelines id's in them and are in order. If we could make those timeline ids to be a bit more unique/random, and still rely on the ordering in the *.history file, I think this would help prevent multiple servers on the same timeline with divergent data. I was hoping to begin a conversation on whether or not non-sequential timelines are a good idea before I looked at the code around timelines. -- Brian Faherty
Re: [HACKERS] pl/perl extension fails on Windows
On Thu, Jul 13, 2017 at 10:30 PM, Tom Lanewrote: > Andrew Dunstan writes: >> It would be nice to get to the bottom of why we're getting a version >> mismatch on Windows, since we're clearly not getting one on Linux. > > Yeah, that's what's bothering me: as long as that remains unexplained, > I don't have any confidence that we're fixing the right thing. Okay, I tried to explore on this a bit and my findings are as follows. On Linux, the handshake key being generated in plperl code i.e. inside XS_EXTERNAL() in Util.c (generated from Util.xs during build time) and perl code (inside Perl_xs_handshake function) are same which means the following condition inside Perl_xs_handshake() becomes true and therefore, there is no mismatch error. got = INT2PTR(void*, (UV)(key & HSm_KEY_MATCH)); need = (void *)(HS_KEY(FALSE, FALSE, "", "") & HSm_KEY_MATCH); if (UNLIKELY(got != need)) goto bad_handshake; However, on Windows, the handshake key generated in plperl code inside XS_EXTERNAL() and in perl code i.e. inside Perl_xs_handshake() are different thereby resulting in a mismatch error. Actually the function used for generation of handshake Key i.e HS_KEYp() considers 'sizeof(PerlInterpreter)' to generate the key and somehow the sizeof PerlInterpreter is not uniform in plperl and perl modules incase of Windows but on Linux it remains same in both the modules. This is how PerlInterpreter is defined in Perl source, *typedef struct interpreter PerlInterpreter;struct interpreter {# include "intrpvar.h"};* where intrpvar.h has different variables defined inside it and most of the variables definition are protected with various macros. And there are some macros that are just defined in perl but not in plperl module which means the sizeof(PerlInterpreter) on the two modules are going to be different and thereby resulting in a different key. But, then the point is, why no such difference is observed on Linux. Well, as of now, i haven't found the reason behind it and i am still investigating on it. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com > > regards, tom lane
Re: [HACKERS] JSONB - JSONB operator feature request
Hi, hstore have only key-value pairs, but in json can have same behavior - only equal objects are removed: SELECT '{"a": 1, "b": {"c": 2}}'::JSONB - '{"a": 1, "b": {"c": 3}}'::JSONB '{"b": {"c": 2}}' SELECT '{"a": 1, "b": {"c": 2}}'::JSONB - '{"a": 2, "b": {"c": 2}}'::JSONB '{"a": 1}' David -- - Ing. David TUROŇ LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.:+420 591 166 224 fax:+420 596 621 273 mobil: +420 732 589 152 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - Od: David FetterKomu: david.tu...@linuxbox.cz Kopie: pgsql-hackers@postgresql.org Datum: 18. 07. 2017 18:24 Předmět:Re: [HACKERS] JSONB - JSONB operator feature request On Tue, Jul 18, 2017 at 01:36:32PM +0200, david.tu...@linuxbox.cz wrote: > Hi, > > some users and me used hstore - hstore for example storing only changed > rows in trigger like: > > hstore(NEW) - hstore(OLD) > > There isn't same operator/function in JSON/JSONB. We can only remove keys > from JSONB, but not equal key-value pairs. Is there any chance to have > same feature with JSON/JSONB in postgres core? What would - mean precisely for JSON[B]? For example, what would you expect SELECT '{"a": 1, "b": {"c": 2}}'::JSONB - '{"b": 1, "b": {"c": 3}}'::JSONB to yield? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [HACKERS] merge psql ef/ev sf/sv handling functions
While reviewing Corey's \if patch, I complained that there was some amount of copy-paste in "psql/command.c". Here is an attempt at merging some functions which removes 160 lines of code. Thank you for the patch. Is this an item for PG11? Yep. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
Etsuro Fujitawrites: > * Modified rewrite_targetlist(), which is a new function added to > preptlist.c, so that we do const-simplification to junk TLEs that > AddForeignUpdateTargets() added, as that API allows the FDW to add junk > TLEs containing non-Var expressions to the query's targetlist. This does not seem like a good idea to me. eval_const_expressions is not a cheap thing, and for most use-cases those cycles will be wasted, and it has never been the responsibility of preprocess_targetlist to do this sort of thing. Please put the responsibility of doing const-expression simplification in these cases somewhere closer to where the problem is being created. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Hello Alik, I am attaching patch v3. Among other things I fixed small typo in description of random_exponential function in pgbench.sgml file. Ok. Probably this typo should be committed separatly and independently. A few comments about v3: Patch applies cleanly, compiles, works. About the maths: As already said, I'm not at ease with a random_zipfian function which does not display a (good) zipfian distribution. At the minimum the documentation should be clear about the approximations implied depending on the parameter value. In the litterature the theta parameter seems to be often called alpha or s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest to stick to "s" instead of "theta"? About the code: looks simpler than the previous version, which is good. Double constants should be used when the underlying type is a double, instead of relying on implicit int to double promotion (0 -> 0.0, 1 -> 1.0). Functions zipfZeta(n, theta) does not really computes the zeta(n) function, so I think that a better name should be chosen. It seems to compute H_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct. The handling of cache overflow by randomly removing one entry looks like a strange idea. Rather remove the oldest entry? ISTM that it should print a warning once if the cache array overflows as performance would drop heavily. If the zipf cache is constant size, there is no point in using dynamic allocation, just declare an array... Comments need updating: eg "theta parameter of previous execution" which dates back when there was only one value. There should be a comment to explain that the structure is in the thread for thread safety. There should be non regression tests somehow. If the "improve pgbench tap test infrastructure" get through, things should be added there. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
That's not actually a problem as I am reusing an older v4 from Heikki now for the future, but I wanted to let you know about that first. Thank you, I'll look into that. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
Mark Dilgerwrites: >> On Jul 18, 2017, at 9:13 PM, Mark Dilger wrote: >> There was not much conversation about this, so I went ahead with what >> I think makes a logical first step. The attached patch removes the tinterval >> datatype from the sources. > As predicted, this second patch (which should be applied *after* the prior > tinterval_abatement patch) removes the reltime datatype from the sources. Seems like a good plan. Please be sure to add these patches to the next commitfest, so we remember them when the time comes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
*To summarise,* the options we have to solve the limitation of the @>(anyarray , anyelement) where it produces the following error: operator does not exist: integer[] @> smallint *Option 1: *Multiple Operators Have separate operators for every combination of datatypes instead of a single polymorphic definition (i.e int4[] @>> int8, int4[] @>> int4, int4[] @>> int2, int4[] @>> numeric.) Drawback: High maintenance. *Option 2: *Explicit casting Where we compare the datatype of the 2 operands and cast with the appropriate datatype Drawback: figuring out the appropriate cast may require considerable computation *Option 3:* Unsafe Polymorphic datatypes This a little out there. But since @>(anyarray, anyelement) have to resolve to the same datatype. How about defining new datatypes without this constraint? Where we handle the datatypes ourselves? It would ve something like @>(unsafeAnyarray, unsafeAnyelement). Drawback: a lot of defensive programming has to be implemented to guard against any exception. *Another thing* Until this is settled, another thing I have to go through is performance testing. To provide evidence that all we did actually enhances the performance of the RI checks. How can I go about this? Best Regards, Mark Rofail
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
On Tue, Jul 18, 2017 at 11:14 PM, Alvaro Herrerawrote: > > Why did we add an operator and not a support > procedure? I thought the support procedures were constant within an opclass. They implement the mandotary function required of an opclass. I don't see why we would need to implement new ones since they already deal with the lefthand operand which is the refrencing coloumn and is always an array so anyarray would suffice. Also the support procedure don't interact with the left and right operands simultanously. And we want to target the combinations of int4[] @>> int8, int4[] @>> int4, int4[] @>> int2, int4[] @>> numeric. So I think implementing operators is the way to go. Best Regards, Mark Rofail.
Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
On 2017/07/13 21:10, Etsuro Fujita wrote: Attached is an updated version of the patch. Here is an updated version of the patch. Changes are: * Modified rewrite_targetlist(), which is a new function added to preptlist.c, so that we do const-simplification to junk TLEs that AddForeignUpdateTargets() added, as that API allows the FDW to add junk TLEs containing non-Var expressions to the query's targetlist. * Updated docs in fdwhandler.sgml. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 6924,6929 update bar set f2 = f2 + 100 returning *; --- 6924,6988 7 | 277 (6 rows) + -- Test that UPDATE/DELETE with inherited target works with row-level triggers + CREATE TRIGGER trig_row_before + BEFORE UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + CREATE TRIGGER trig_row_after + AFTER UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + explain (verbose, costs off) + update bar set f2 = f2 + 100; + QUERY PLAN + -- + Update on public.bar +Update on public.bar +Foreign Update on public.bar2 + Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1 RETURNING f1, f2, f3 +-> Seq Scan on public.bar + Output: bar.f1, (bar.f2 + 100), bar.ctid +-> Foreign Scan on public.bar2 + Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, bar2.* + Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE + (9 rows) + + update bar set f2 = f2 + 100; + NOTICE: trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2 + NOTICE: OLD: (3,333,33),NEW: (3,433,33) + NOTICE: trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2 + NOTICE: OLD: (4,344,44),NEW: (4,444,44) + NOTICE: trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2 + NOTICE: OLD: (7,277,77),NEW: (7,377,77) + NOTICE: trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2 + NOTICE: OLD: (3,333,33),NEW: (3,433,33) + NOTICE: trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2 + NOTICE: OLD: (4,344,44),NEW: (4,444,44) + NOTICE: trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2 + NOTICE: OLD: (7,277,77),NEW: (7,377,77) + explain (verbose, costs off) + delete from bar where f2 < 400; + QUERY PLAN + - + Delete on public.bar +Delete on public.bar +Foreign Delete on public.bar2 + Remote SQL: DELETE FROM public.loct2 WHERE ctid = $1 RETURNING f1, f2, f3 +-> Seq Scan on public.bar + Output: bar.ctid + Filter: (bar.f2 < 400) +-> Foreign Scan on public.bar2 + Output: bar2.ctid, bar2.* + Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 400)) FOR UPDATE + (10 rows) + + delete from bar where f2 < 400; + NOTICE: trig_row_before(23, skidoo) BEFORE ROW DELETE ON bar2 + NOTICE: OLD: (7,377,77) + NOTICE: trig_row_after(23, skidoo) AFTER ROW DELETE ON bar2 + NOTICE: OLD: (7,377,77) + -- cleanup + DROP TRIGGER trig_row_before ON bar2; + DROP TRIGGER trig_row_after ON bar2; drop table foo cascade; NOTICE: drop cascades to foreign table foo2 drop table bar cascade; *** a/contrib/postgres_fdw/sql/postgres_fdw.sql --- b/contrib/postgres_fdw/sql/postgres_fdw.sql *** *** 1609,1614 explain (verbose, costs off) --- 1609,1634 update bar set f2 = f2 + 100 returning *; update bar set f2 = f2 + 100 returning *; + -- Test that UPDATE/DELETE with inherited target works with row-level triggers + CREATE TRIGGER trig_row_before + BEFORE UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + + CREATE TRIGGER trig_row_after + AFTER UPDATE OR DELETE ON bar2 + FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo'); + + explain (verbose, costs off) + update bar set f2 = f2 + 100; + update bar set f2 = f2 + 100; + + explain (verbose, costs off) + delete from bar where f2 < 400; + delete from bar where f2 < 400; + + -- cleanup + DROP TRIGGER trig_row_before ON bar2; + DROP TRIGGER trig_row_after ON bar2; drop table foo cascade; drop table bar cascade; drop table loct1; *** a/doc/src/sgml/fdwhandler.sgml --- b/doc/src/sgml/fdwhandler.sgml *** *** 432,438 AddForeignUpdateTargets (Query *parsetree, ! This function is called in the rewriter, not the planner, so the information available is a bit different from that available to the planning routines. parsetree is the parse tree for the UPDATE or --- 432,438 ! Although this function is called in the planner, the
[HACKERS] Dealing with logical replication
Now that we are going to have logical replication in PostgreSQL 10, I have started thinking how Pgpool-II can deal with it. For example, the logical replication does not replicate DDLs. Isn't it convenient for users to do it automatically in Pgpool-II? Or even doing it for TRUNCATE? Or are they against the design philosophy of the logical replication? Comments are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cache lookup errors with functions manipulation object addresses
Hi all, Per an offline report from Moshe Jacobson, it is possible to trigger easily cache lookup errors using pg_describe_object with invalid object IDs and pg_describe_object(). I had a closer look at things in this area, to notice that there are other user-facing failures as many things use the same interface: =# select pg_identify_object('pg_class'::regclass, 0::oid, 0); ERROR: XX000: cache lookup failed for relation 0 =# select pg_describe_object('pg_class'::regclass, 0::oid, 0); ERROR: XX000: cache lookup failed for relation 0 =# select pg_identify_object_as_address('pg_class'::regclass, 0::oid, 0); ERROR: XX000: cache lookup failed for relation 0 As my previous opinion on the matter in https://www.postgresql.org/message-id/87wpxfygg9@credativ.de, I still think that "cache lookup" failures should not be things a user is able to trigger at will, and that those errors should be replaced by proper NULL results. That's clearly not an item for PG10, but we could consider improving things for PG11. Still, we are talking about adding NULLness handling in getObjectDescription(), which goes into low-level functions to grab the name of some objects, and some of those functions have their own way to deal with incorrect objects (format_type_be returns "???" for example for functions). Would we want to improve the error handling of such objects? Or that's not worth the effort? Álvaro, what's your take on the matter as you worked a lot on that? Thoughts, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merge psql ef/ev sf/sv handling functions
On Wed, Jul 19, 2017 at 2:41 PM, Fabien COELHOwrote: > >>> While reviewing Corey's \if patch, I complained that there was some >>> amount >>> of copy-paste in "psql/command.c". >>> >>> Here is an attempt at merging some functions which removes 160 lines of >>> code. >> >> >> Thank you for the patch. Is this an item for PG11? > > > Yes, as it is submitted to CF 2017-09. > Thank! It is already registered to next CF. I missed it, sorry. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
On Sat, Jul 15, 2017 at 12:30 PM, Robert Haaswrote: > On Fri, Jul 14, 2017 at 8:35 AM, Haribabu Kommi > wrote: > > To replace tuple with slot, I took trigger and SPI calls as the first > step > > in modifying > > from tuple to slot, Here I attached a WIP patch. The notable changes are, > > > > 1. Replace most of the HeapTuple with Slot in SPI interface functions. > > 2. In SPITupleTable, Instead of HeapTuple, it is changed to > TupleTableSlot. > > But this change may not be a proper approach, because a duplicate copy of > > TupleTableSlot is generated and stored. > > 3. Changed all trigger interfaces to accept TupleTableSlot Instead of > > HeapTuple. > > 4. ItemPointerData is added as a member to the TupleTableSlot structure. > > 5. Modified the ExecInsert and others to work directly on TupleTableSlot > > instead > > of tuple(not completely). > > What problem are you trying to solve with these changes? I'm not > saying that it's a bad idea, but I think you should spell out the > motivation a little more explicitly. > Sorry for not providing complete details. I am trying these experiments to find out the best way to return the tuple from Storage methods by designing a proper API. The changes that I am doing are to reduce the dependency on the HeapTuple format with value/nulls array. So if there is no dependency on the HeapTuple format in the upper layers of the PostgreSQL storage, then directly we can define the StorageAPI to return the value/nulls array instead of one big chunck of tuple data like HeapTuple or StorageTuple(void *). I am finding out that eliminating the HeapTuple usage in the upper layers needs some major changes, How about not changing anything in the upper layers of storage currently and just support the pluggable tuple with one of the following approach for first version? 1. Design an API that returns values/nulls array and convert that into a HeapTuple whenever it is required in the upper layers. All the existing heap form/deform tuples are used for every tuple with some adjustments. 2. Design an API that returns StorageTuple(void *) with first member represents the TYPE of the storage, so that corresponding registered function calls can be called to deform/form the tuple whenever there is a need of tuple. 3. Design an API that returns StorageTuple(void *) but the necessary format information of that tuple can be get from the tupledesc. wherever the tuple is present, there exists a tupledesc in most of the cases. How about adding some kind of information in tupledesc to find out the tuple format and call the necessary functions 4. Design an API to return always the StorageTuple and it converts to HeapTuple with a function hook if it gets registered (for heap storages this is not required to register the hook, because it is already a HeapTuple format). This function hook should be placed in the heap form/deform functions. Any other better ideas for a first version. Regards, Hari Babu Fujitsu Australia