Re: Deduplicate logicalrep_read_tuple()
On Fri, Mar 3, 2023 at 10:04 PM Amit Kapila wrote: > > On Fri, Mar 3, 2023 at 4:13 PM Bharath Rupireddy > wrote: > > > > On Thu, Jan 19, 2023 at 8:36 AM Peter Smith wrote: > > > > > > On Wed, Jan 18, 2023 at 6:26 PM Bharath Rupireddy > > > wrote: > > > > > > > > Hi, > > > > > > > > logicalrep_read_tuple() duplicates code for LOGICALREP_COLUMN_TEXT and > > > > LOGICALREP_COLUMN_BINARY introduced by commit 9de77b5. While it > > > > doesn't hurt anyone, deduplication makes code a bit leaner by 57 bytes > > > > [1]. I've attached a patch for $SUBJECT. > > > > > > > > Thoughts? > > > > > > > > > > The code looks the same but there is a subtle comment difference where > > > previously only LOGICALREP_COLUMN_BINARY case said: > > > /* not strictly necessary but per StringInfo practice */ > > > > > > So if you de-duplicate the code then should that comment be modified to > > > say > > > /* not strictly necessary for LOGICALREP_COLUMN_BINARY but per > > > StringInfo practice */ > > > > Thanks. Done so in the attached v2. > > > > LGTM. Unless Peter or someone has any comments on this, I'll push this > early next week. > No more comments. Patch v2 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Request for comment on setting binary format output per session
On Sat, 4 Mar 2023 at 19:06, Tom Lane wrote: > Jeff Davis writes: > > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote: > >> Most of the clients know how to decode the builtin types. I'm not > >> sure there is a use case for binary encode types that the clients > >> don't have a priori knowledge of. > > > The client could, in theory, have a priori knowledge of a non-builtin > > type. > > I don't see what's "in theory" about that. There seems plenty of > use for binary I/O of, say, PostGIS types. Even for built-in types, > do we really want to encourage people to hard-wire their OIDs into > applications? > How does a client read these? I'm pretty narrowly focussed. The JDBC API doesn't really have a way to read a non built-in type. There is a facility to read a UDT, but the user would have to provide that transcoder. I guess I'm curious how other clients read binary UDT's ? > > I don't see a big problem with driving this off a GUC, but I think > it should be a list of type names not OIDs. We already have plenty > of precedent for dealing with that sort of thing; see search_path > for the canonical example. IIRC, there's similar caching logic > for temp_tablespaces. > I have no issue with allowing names, OID's were compact, but we could easily support both Dave
Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Hi, I was debugging a planner problem on Postgres 14.4 the other day - and the involved "bad" plan was including Memoize - though I don't necessarily think that Memoize is to blame (and this isn't any of the problems recently fixed in Memoize costing). However, what I noticed whilst trying different ways to fix the plan, is that the Memoize output was a bit hard to reason about - especially since the plan involving Memoize was expensive to run, and so I was mostly running EXPLAIN without ANALYZE to look at the costing. Here is an example of the output I was looking at: -> Nested Loop (cost=1.00..971672.56 rows=119623 width=0) -> Index Only Scan using table1_idx on table1 (cost=0.43..372676.50 rows=23553966 width=8) -> Memoize (cost=0.57..0.61 rows=1 width=8) Cache Key: table1.table2_id Cache Mode: logical -> Index Scan using table2_idx on table2 (cost=0.56..0.60 rows=1 width=8) Index Cond: (id = table1.table2_id) The other plan I was comparing with (that I wanted the planner to choose instead), had a total cost of 1,451,807.35 -- and so I was trying to figure out why the Nested Loop was costed as 971,672.56. Simple math makes me expect the Nested Loop should roughly have a total cost of14,740,595.76 here (372,676.50 + 23,553,966 * 0.61), ignoring a lot of the smaller costs. Thus, in this example, it appears Memoize made the plan cost significantly cheaper (roughly 6% of the regular cost). Essentially this comes down to the "cost reduction" performed by Memoize only being implicitly visible in the Nested Loop's total cost - and with nothing useful on the Memoize node itself - since the rescan costs are not shown. I think explicitly adding the estimated cache hit ratio for Memoize nodes might make this easier to reason about, like this: -> Memoize (cost=0.57..0.61 rows=1 width=8) Cache Key: table1.table2_id Cache Mode: logical Cache Hit Ratio Estimated: 0.94 Alternatively (or in addition) we could consider showing the "ndistinct" value that is calculated in cost_memoize_rescan - since that's the most significant contributor to the cache hit ratio (and you can influence that directly by improving the ndistinct statistics). See attached a patch that implements showing the cache hit ratio as a discussion starter. I'll park this in the July commitfest for now. Thanks, Lukas -- Lukas Fittl v1-0001-Add-Estimated-Cache-Hit-Ratio-for-Memoize-plan-no.patch Description: Binary data
Re: Request for comment on setting binary format output per session
On Sat, Mar 4, 2023 at 5:07 PM Tom Lane wrote: > Jeff Davis writes: > > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote: > >> Most of the clients know how to decode the builtin types. I'm not > >> sure there is a use case for binary encode types that the clients > >> don't have a priori knowledge of. > > > The client could, in theory, have a priori knowledge of a non-builtin > > type. > > I don't see what's "in theory" about that. There seems plenty of > use for binary I/O of, say, PostGIS types. Even for built-in types, > do we really want to encourage people to hard-wire their OIDs into > applications? > > I don't see a big problem with driving this off a GUC, but I think > it should be a list of type names not OIDs. We already have plenty > of precedent for dealing with that sort of thing; see search_path > for the canonical example. IIRC, there's similar caching logic > for temp_tablespaces. > > This seems slightly different since types depend upon schemas whereas search_path is top-level and tablespaces are global. But I agree that names should be accepted, maybe in addition to OIDs, the latter, for core types in particular, being a way to not have to worry about masking in user-space. David J.
Re: Add standard collation UNICODE
Jeff Davis writes: > On Sun, 2023-03-05 at 08:27 +1300, Thomas Munro wrote: >> It's created for UTF-8 only, and UTF-8 sorts the same way as the >> encoded code points, when interpreted as a sequence of unsigned char >> by memcmp(), strcmp() etc. Seems right? > Right, makes sense. > Though in principle, shouldn't someone using another encoding also be > able to use ucs_basic? I'm not sure if that's a practical problem or > not; I'm just curious. Does ICU provide a locale for sorting by code > point? ISTM we could trivially allow it in LATIN1 encoding as well; strcmp would still have the effect of sorting by unicode code points. Given the complete lack of field demand for making it work in other encodings, I'm unexcited about spending more effort than that. regards, tom lane
Re: Request for comment on setting binary format output per session
Jeff Davis writes: > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote: >> Most of the clients know how to decode the builtin types. I'm not >> sure there is a use case for binary encode types that the clients >> don't have a priori knowledge of. > The client could, in theory, have a priori knowledge of a non-builtin > type. I don't see what's "in theory" about that. There seems plenty of use for binary I/O of, say, PostGIS types. Even for built-in types, do we really want to encourage people to hard-wire their OIDs into applications? I don't see a big problem with driving this off a GUC, but I think it should be a list of type names not OIDs. We already have plenty of precedent for dealing with that sort of thing; see search_path for the canonical example. IIRC, there's similar caching logic for temp_tablespaces. regards, tom lane
Re: Request for comment on setting binary format output per session
On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote: > Most of the clients know how to decode the builtin types. I'm not > sure there is a use case for binary encode types that the clients > don't have a priori knowledge of. The client could, in theory, have a priori knowledge of a non-builtin type. I don't have a great solution for that, though. Maybe it's only practical for builtin types. Regards, Jeff Davis
Re: Add standard collation UNICODE
On Sun, 2023-03-05 at 08:27 +1300, Thomas Munro wrote: > It's created for UTF-8 only, and UTF-8 sorts the same way as the > encoded code points, when interpreted as a sequence of unsigned char > by memcmp(), strcmp() etc. Seems right? Right, makes sense. Though in principle, shouldn't someone using another encoding also be able to use ucs_basic? I'm not sure if that's a practical problem or not; I'm just curious. Does ICU provide a locale for sorting by code point? Regards, Jeff Davis
Re: Date-Time dangling unit fix
On Sat, Mar 4, 2023 at 4:05 PM Tom Lane wrote: > >I started to look at this, and soon noticed that while we have test cases >matching this sort of date input, there is no documentation for it. The >code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is >because it looks a lot like the ISO 8601 format for intervals (durations). >But I don't have a copy of ISO 8601, and some googling fails to find any >indication that anybody else believes this is a valid datetime format. >Wikipedia for example documents a lot of variants of ISO 8601 [1], >but nothing that looks like this. > >I wonder if we should just rip this code out instead of fixing it. >I suspect its real-world usage is not different from zero. We'd >have to keep the "Jnnn" Julian-date case, though, so maybe there's >little to be saved. > >If we do keep it, there's documentation work to be done. But the >first bit of doco I'd want to see is a pointer to a standard. I also don't have a copy of ISO 8601 and wasn't able to find anything about this variant on Google. I did find this comment in datetime.c /* * Was this an "ISO date" with embedded field labels? An * example is "y2001m02d04" - thomas 2001-02-04 */ which comes from this commit [1], which was authored by Thomas Lockhart (presumably the same thomas from the comment). I've CC'ed Thomas in case the email still exists and they happen to remember. The commit message mentions ISO, but not the variant mentioned in the comment. The mailing list thread can be found here [2], but it doesn't provide much more information. I also found the following thread [3], which happens to have you in it in case you remember it, which seemed to be the motivation for commit [1]. It only contains the following line about ISO: > o support for "ISO variants" on input, including embedded "T" preceeding the time fields All that seems to imply the "y2001m02d04" ISO variant was never really discussed in much detail and it's probably fine to remove it. Though, it has been around for 22 years which makes it a bit scary to remove. - Joe Koshakow [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f58115dddfa8ca63004c4784f57ef660422861d [2] https://www.postgresql.org/message-id/flat/3BB433D5.3CB4164E%40fourpalms.org [3] https://www.postgresql.org/message-id/flat/3B970FF8.B9990807%40fourpalms.org#c57d83c80d295bfa19887c92122369c3
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Andres Freund writes: > Just pushed the actual pg_stat_io view, the splitting of the tablespace test, > and the pg_stat_io tests. One of the test cases is flapping a bit: diff -U3 /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/results/stats.out --- /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/expected/stats.out 2023-03-04 21:30:05.891579466 +0100 +++ /home/pg/build-farm-15/buildroot/HEAD/pgsql.build/src/test/regress/results/stats.out 2023-03-04 21:34:26.745552661 +0100 @@ -1201,7 +1201,7 @@ SELECT :io_sum_shared_after_reads > :io_sum_shared_before_reads; ?column? -- - t + f (1 row) DROP TABLE test_io_shared; There are two instances of this today [1][2], and I've seen it before but failed to note down where. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison=2023-03-04%2021%3A19%3A39 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule=2023-03-04%2020%3A30%3A05
Re: Allow tests to pass in OpenSSL FIPS mode
Peter Eisentraut writes: > [ v2-0001-Remove-incidental-md5-function-uses-from-main-reg.patch ] I've gone through this and have a modest suggestion: let's invent some wrapper functions around encode(sha256()) to reduce the cosmetic diffs and consequent need for closer study of patch changes. In the attached I called them "notmd5()", but I'm surely not wedded to that name. This also accounts for some relatively recent additions to stats_ext.sql that introduced yet more uses of md5(). This passes for me on a FIPS-enabled Fedora system, with the exception of md5.sql and password.sql. I agree that the right thing for md5.sql is just to add a variant expected-file. password.sql could perhaps use some refactoring so that we don't have two large expected-files to manage. The only other place that perhaps needs discussion is rowsecurity.sql, which has some surprisingly large changes: not only do the random strings change, but there are rowcount differences in some results. I believe this is because there are RLS policy checks and view conditions that actually examine the contents of the "md5" strings, eg CREATE POLICY p1 ON s1 USING (a in (select x from s2 where y like '%2f%')); My recommendation is to just accept those changes as OK and move on. I doubt that anybody checked the existing results line-by-line either. So, once we've done something about md5.sql and password.sql, I think this is committable. regards, tom lane diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out index 0ff54a18de..1c46da1e0b 100644 --- a/src/test/regress/expected/arrays.out +++ b/src/test/regress/expected/arrays.out @@ -2303,14 +2303,14 @@ insert into src create type textandtext as (c1 text, c2 text); create temp table dest (f1 textandtext[]); insert into dest select array[row(f1,f1)::textandtext] from src; -select length(md5((f1[1]).c2)) from dest; +select length(notmd5((f1[1]).c2)) from dest; length 32 (1 row) delete from src; -select length(md5((f1[1]).c2)) from dest; +select length(notmd5((f1[1]).c2)) from dest; length 32 @@ -2318,7 +2318,7 @@ select length(md5((f1[1]).c2)) from dest; truncate table src; drop table src; -select length(md5((f1[1]).c2)) from dest; +select length(notmd5((f1[1]).c2)) from dest; length 32 diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out index 73fa38396e..b3aac15ecc 100644 --- a/src/test/regress/expected/brin.out +++ b/src/test/regress/expected/brin.out @@ -530,7 +530,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM brin_test WHERE b = 1; CREATE TABLE brintest_3 (a text, b text, c text, d text); -- long random strings (~2000 chars each, so ~6kB for min/max on two -- columns) to trigger toasting -WITH rand_value AS (SELECT string_agg(md5(i::text),'') AS val FROM generate_series(1,60) s(i)) +WITH rand_value AS (SELECT string_agg(notmd5(i::text),'') AS val FROM generate_series(1,60) s(i)) INSERT INTO brintest_3 SELECT val, val, val, val FROM rand_value; CREATE INDEX brin_test_toast_idx ON brintest_3 USING brin (b, c); @@ -545,7 +545,7 @@ VACUUM brintest_3; -- retry insert with a different random-looking (but deterministic) value -- the value is different, and so should replace either min or max in the -- brin summary -WITH rand_value AS (SELECT string_agg(md5((-i)::text),'') AS val FROM generate_series(1,60) s(i)) +WITH rand_value AS (SELECT string_agg(notmd5((-i)::text),'') AS val FROM generate_series(1,60) s(i)) INSERT INTO brintest_3 SELECT val, val, val, val FROM rand_value; -- now try some queries, accessing the brin index diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out index f3309f433f..28d136f59c 100644 --- a/src/test/regress/expected/brin_multi.out +++ b/src/test/regress/expected/brin_multi.out @@ -29,7 +29,7 @@ INSERT INTO brintest_multi SELECT (four + 1.0)/(hundred+1), odd::float8 / (tenthous + 1), format('%s:00:%s:00:%s:00', to_hex(odd), to_hex(even), to_hex(hundred))::macaddr, - substr(md5(unique1::text), 1, 16)::macaddr8, + substr(notmd5(unique1::text), 1, 16)::macaddr8, inet '10.2.3.4/24' + tenthous, cidr '10.2.3/24' + tenthous, date '1995-08-15' + tenthous, @@ -179,7 +179,7 @@ INSERT INTO brinopers_multi VALUES ('macaddr8col', 'macaddr8', '{>, >=, =, <=, <}', '{b1:d1:0e:7b:af:a4:42:12, d9:35:91:bd:f7:86:0e:1e, 72:8f:20:6c:2a:01:bf:57, 23:e8:46:63:86:07:ad:cb, 13:16:8e:6a:2e:6c:84:b4}', - '{33, 15, 1, 13, 6}'), + '{31, 17, 1, 11, 4}'), ('inetcol', 'inet', '{=, <, <=, >, >=}', '{10.2.14.231/24, 255.255.255.255, 255.255.255.255, 0.0.0.0, 0.0.0.0}', @@ -327,7 +327,7 @@ INSERT INTO brintest_multi SELECT (four + 1.0)/(hundred+1), odd::float8 / (tenthous + 1), format('%s:00:%s:00:%s:00', to_hex(odd), to_hex(even), to_hex(hundred))::macaddr, - substr(md5(unique1::text), 1, 16)::macaddr8, + substr(notmd5(unique1::text), 1, 16)::macaddr8,
Re: Request for comment on setting binary format output per session
Dave Cramer On Sat, 4 Mar 2023 at 11:35, Jeff Davis wrote: > On Thu, 2023-03-02 at 09:13 -0500, Dave Cramer wrote: > > I'd like to open up this discussion again so that we can > > move forward. I prefer the GUC as it is relatively simple and as > > Peter mentioned it works, but I'm not married to the idea. > > It's not very friendly to extensions, where the types are not > guaranteed to have stable OIDs. Did you consider any proposals that > work with type names? > I had not. Most of the clients know how to decode the builtin types. I'm not sure there is a use case for binary encode types that the clients don't have a priori knowledge of. Dave > > Regards, > Jeff Davis > >
Re: Date-Time dangling unit fix
Joseph Koshakow writes: > On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow wrote: >> I just found another class of this bug that the submitted patch does >> not fix. If the units are at the beginning of the string, then they are >> also ignored. For example, `date 'm d y2020m11d3'` is also valid. I >> think the fix here is to check and make sure that ptype is 0 before >> reassigning the value to a non-zero number. I'll send an updated patch >> with this tonight. > Attached is the described patch. I started to look at this, and soon noticed that while we have test cases matching this sort of date input, there is no documentation for it. The code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is because it looks a lot like the ISO 8601 format for intervals (durations). But I don't have a copy of ISO 8601, and some googling fails to find any indication that anybody else believes this is a valid datetime format. Wikipedia for example documents a lot of variants of ISO 8601 [1], but nothing that looks like this. I wonder if we should just rip this code out instead of fixing it. I suspect its real-world usage is not different from zero. We'd have to keep the "Jnnn" Julian-date case, though, so maybe there's little to be saved. If we do keep it, there's documentation work to be done. But the first bit of doco I'd want to see is a pointer to a standard. regards, tom lane [1] https://en.wikipedia.org/wiki/ISO_8601
Re: Date-time extraneous fields with reserved keywords
On Sat, Mar 4, 2023 at 2:48 PM Tom Lane wrote: > >Right. So really we ought to move the ValidateDate call as >well as the next half-dozen lines about "mer" down into >the subsequent "do additional checking" stanza. It's all >only relevant to normal date specs. > >BTW, looking at the set of RESERV tokens in datetktbl[], >it looks to me like this change renders the final "default:" >case unreachable, so probably we could just make that an error. Please see the attached patch with these changes. - Joe Koshakow From 64a71ed287aa9611c22eaa6e2cbb7e080d93be79 Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Sun, 11 Dec 2022 16:08:43 -0500 Subject: [PATCH] Handle extraneous fields in date-time input DecodeDateTime sometimest allowed extraneous fields to be included with reserved keywords. For example `date '1995-08-06 epoch'` would be parsed successfully, but the date was ignored. This commit fixes the issue so an error is returned instead. --- src/backend/utils/adt/datetime.c | 35 ++- src/test/regress/expected/date.out| 33 + src/test/regress/expected/timestamp.out | 33 + src/test/regress/expected/timestamptz.out | 33 + src/test/regress/sql/date.sql | 10 +++ src/test/regress/sql/timestamp.sql| 10 +++ src/test/regress/sql/timestamptz.sql | 10 +++ 7 files changed, 150 insertions(+), 14 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 01660637a2..0c1207223c 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -1431,8 +1431,15 @@ DecodeDateTime(char **field, int *ftype, int nf, *tzp = 0; break; - default: + case DTK_EPOCH: + case DTK_LATE: + case DTK_EARLY: +tmask = (DTK_DATE_M | DTK_TIME_M | DTK_M(TZ)); *dtype = val; +break; + + default: +return DTERR_BAD_FORMAT; } break; @@ -1567,22 +1574,22 @@ DecodeDateTime(char **field, int *ftype, int nf, fmask |= tmask; } /* end loop over fields */ - /* do final checking/adjustment of Y/M/D fields */ - dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm); - if (dterr) - return dterr; - - /* handle AM/PM */ - if (mer != HR24 && tm->tm_hour > HOURS_PER_DAY / 2) - return DTERR_FIELD_OVERFLOW; - if (mer == AM && tm->tm_hour == HOURS_PER_DAY / 2) - tm->tm_hour = 0; - else if (mer == PM && tm->tm_hour != HOURS_PER_DAY / 2) - tm->tm_hour += HOURS_PER_DAY / 2; - /* do additional checking for full date specs... */ if (*dtype == DTK_DATE) { + /* do final checking/adjustment of Y/M/D fields */ + dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm); + if (dterr) + return dterr; + + /* handle AM/PM */ + if (mer != HR24 && tm->tm_hour > HOURS_PER_DAY / 2) + return DTERR_FIELD_OVERFLOW; + if (mer == AM && tm->tm_hour == HOURS_PER_DAY / 2) + tm->tm_hour = 0; + else if (mer == PM && tm->tm_hour != HOURS_PER_DAY / 2) + tm->tm_hour += HOURS_PER_DAY / 2; + if ((fmask & DTK_DATE_M) != DTK_DATE_M) { if ((fmask & DTK_TIME_M) == DTK_TIME_M) diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out index f5949f3d17..c874f06546 100644 --- a/src/test/regress/expected/date.out +++ b/src/test/regress/expected/date.out @@ -1532,3 +1532,36 @@ select make_time(10, 55, 100.1); ERROR: time field value out of range: 10:55:100.1 select make_time(24, 0, 2.1); ERROR: time field value out of range: 24:00:2.1 +-- test errors with reserved keywords +SELECT date '1995-08-06 epoch'; +ERROR: invalid input syntax for type date: "1995-08-06 epoch" +LINE 1: SELECT date '1995-08-06 epoch'; +^ +SELECT date '1995-08-06 infinity'; +ERROR: invalid input syntax for type date: "1995-08-06 infinity" +LINE 1: SELECT date '1995-08-06 infinity'; +^ +SELECT date '1995-08-06 -infinity'; +ERROR: invalid input syntax for type date: "1995-08-06 -infinity" +LINE 1: SELECT date '1995-08-06 -infinity'; +^ +SELECT date 'epoch 1995-08-06'; +ERROR: invalid input syntax for type date: "epoch 1995-08-06" +LINE 1: SELECT date 'epoch 1995-08-06'; +^ +SELECT date 'infinity 1995-08-06'; +ERROR: invalid input syntax for type date: "infinity 1995-08-06" +LINE 1: SELECT date 'infinity 1995-08-06'; +^ +SELECT date '-infinity 1995-08-06'; +ERROR: invalid input syntax for type date: "-infinity 1995-08-06" +LINE 1: SELECT date '-infinity 1995-08-06'; +^ +SELECT date 'now infinity'; +ERROR: invalid input syntax for type date: "now infinity" +LINE 1: SELECT date 'now infinity'; +^ +SELECT date '-infinity infinity'; +ERROR: invalid input syntax for type date: "-infinity infinity" +LINE 1: SELECT date '-infinity infinity'; +^ diff --git
Re: Latches vs lwlock contention
On Sat, Jan 28, 2023 at 3:39 AM vignesh C wrote: > On Tue, 1 Nov 2022 at 16:40, Thomas Munro wrote: > > Here's an attempt at that. There aren't actually any cases of uses of > > this stuff in critical sections here, so perhaps I shouldn't bother > > with that part. The part I'd most like some feedback on is the > > heavyweight lock bits. I'll add this to the commitfest. > > The patch does not apply on top of HEAD as in [1], please post a rebased > patch: Rebased. I dropped the CV patch for now. From d678e740c61ed5408ad254d2ca5f8e2fff7d2cd1 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 1 Nov 2022 23:29:38 +1300 Subject: [PATCH v3 1/6] Allow palloc_extended(NO_OOM) in critical sections. Commit 4a170ee9e0e banned palloc() and similar in critical sections, because an allocation failure would produce a panic. Make an exception for allocation with NULL on failure, for code that has a backup plan. Discussion: https://postgr.es/m/CA%2BhUKGJHudZMG_vh6GiPB61pE%2BGgiBk5jxzd7inijqx5nEZLCw%40mail.gmail.com --- src/backend/utils/mmgr/mcxt.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 0b00802df7..eb18b89368 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -1123,7 +1123,8 @@ MemoryContextAllocExtended(MemoryContext context, Size size, int flags) void *ret; Assert(MemoryContextIsValid(context)); - AssertNotInCriticalSection(context); + if ((flags & MCXT_ALLOC_NO_OOM) == 0) + AssertNotInCriticalSection(context); if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : AllocSizeIsValid(size))) @@ -1278,7 +1279,8 @@ palloc_extended(Size size, int flags) MemoryContext context = CurrentMemoryContext; Assert(MemoryContextIsValid(context)); - AssertNotInCriticalSection(context); + if ((flags & MCXT_ALLOC_NO_OOM) == 0) + AssertNotInCriticalSection(context); if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) : AllocSizeIsValid(size))) @@ -1509,7 +1511,8 @@ repalloc_extended(void *pointer, Size size, int flags) AllocSizeIsValid(size))) elog(ERROR, "invalid memory alloc request size %zu", size); - AssertNotInCriticalSection(context); + if ((flags & MCXT_ALLOC_NO_OOM) == 0) + AssertNotInCriticalSection(context); /* isReset must be false already */ Assert(!context->isReset); -- 2.39.2 From a3917505d7bd7cec7e98010d05bc016ddcd0dbac Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Wed, 26 Oct 2022 15:51:45 +1300 Subject: [PATCH v3 2/6] Provide SetLatches() for batched deferred latches. If we have a way to buffer a set of wakeup targets and process them at a later time, we can: * move SetLatch() system calls out from under LWLocks, so that locks can be released faster; this is especially interesting in cases where the target backends will immediately try to acquire the same lock, or generally when the lock is heavily contended * possibly gain some micro-opimization from issuing only two memory barriers for the whole batch of latches, not two for each latch to be set * prepare for future IPC mechanisms which might allow us to wake multiple backends in a single system call Individual users of this facility will follow in separate patches. Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 214 --- src/include/storage/latch.h | 13 ++ src/tools/pgindent/typedefs.list | 1 + 3 files changed, 154 insertions(+), 74 deletions(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index f4123e7de7..96dd47575a 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -591,6 +591,85 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, return ret; } +/* + * Set multiple latches at the same time. + * Note: modifies input array. + */ +static void +SetLatchV(Latch **latches, int nlatches) +{ + /* Flush any other changes out to main memory just once. */ + pg_memory_barrier(); + + /* Keep only latches that are not already set, and set them. */ + for (int i = 0; i < nlatches; ++i) + { + Latch *latch = latches[i]; + + if (!latch->is_set) + latch->is_set = true; + else + latches[i] = NULL; + } + + pg_memory_barrier(); + + /* Wake the remaining latches that might be sleeping. */ + for (int i = 0; i < nlatches; ++i) + { + Latch *latch = latches[i]; + + /* + * See if anyone's waiting for the latch. It can be the current + * process if we're in a signal handler. We use the self-pipe or + * SIGURG to ourselves to wake up WaitEventSetWaitBlock() without + * races in that case. If it's another process, send a signal. + * + * Fetch owner_pid only once, in case the latch is concurrently + * getting owned or disowned. XXX: This assumes that pid_t is atomic, + * which isn't guaranteed
Re: Date-time extraneous fields with reserved keywords
Joseph Koshakow writes: > On Sat, Mar 4, 2023 at 1:56 PM Tom Lane wrote: >> Why do you want to skip ValidateDate in some cases? If we've not >> had to do that before, I don't see why it's a good idea now. > This goes back to the abstraction break of > setting tmask without updating tm. Certain > validations will check that if a field is set in > fmask (which is an accumulation of tmask from > every iteration) then it's value in tm is valid. Ah. Another way could be to fill tm with something that would satisfy ValidateDate, but that seems pretty silly. > As far as I can tell dtype always equals DTK_DATE > except when the timestamp/date is 'epoch', > 'infinity', '-infinity', and none of the > validations apply to those date/timestamps. Right. So really we ought to move the ValidateDate call as well as the next half-dozen lines about "mer" down into the subsequent "do additional checking" stanza. It's all only relevant to normal date specs. BTW, looking at the set of RESERV tokens in datetktbl[], it looks to me like this change renders the final "default:" case unreachable, so probably we could just make that an error. regards, tom lane
Re: Date-time extraneous fields with reserved keywords
On Sat, Mar 4, 2023 at 1:56 PM Tom Lane wrote: > >I think we should tread very carefully about disallowing inputs that >have been considered acceptable for 25 years. I agree with disallowing >numeric fields along with 'epoch' and 'infinity', but for example >this seems perfectly useful and sensible: > ># select timestamptz 'today 12:34'; > timestamptz > > 2023-03-04 12:34:00-05 >(1 row) Yeah, that makes sense. I'll leave it as is with the explicit case for 'epoch', 'infinity', and '-infinity'. >Why do you want to skip ValidateDate in some cases? If we've not >had to do that before, I don't see why it's a good idea now. This goes back to the abstraction break of setting tmask without updating tm. Certain validations will check that if a field is set in fmask (which is an accumulation of tmask from every iteration) then it's value in tm is valid. For example: if (fmask & DTK_M(YEAR)) { // ... else { /* there is no year zero in AD/BC notation */ if (tm->tm_year <= 0) return DTERR_FIELD_OVERFLOW; } } As far as I can tell dtype always equals DTK_DATE except when the timestamp/date is 'epoch', 'infinity', '-infinity', and none of the validations apply to those date/timestamps. Though, I think you're right this is probably not a good idea. I'll try and brainstorm a different approach, unless you have some ideas.
Re: Add standard collation UNICODE
On Sun, Mar 5, 2023 at 7:30 AM Jeff Davis wrote: > Sorting by codepoint should be encoding-independent (i.e. decode to > codepoint first); but the C collation is just strcmp, which is > encoding-dependent. So is UCS_BASIC wrong today? It's created for UTF-8 only, and UTF-8 sorts the same way as the encoded code points, when interpreted as a sequence of unsigned char by memcmp(), strcmp() etc. Seems right?
Re: [Proposal] Allow pg_dump to include all child tables with the root table
Gilles Darold writes: > But I disagree the use of --table-with-childs and > --exclude-table-with-childs because we already have the --table and > --exclude-table, and it will add lot of code where we just need a switch > to include children tables. I quite dislike the idea of a separate --with-whatever switch, because it will (presumably) apply to all of your --table and --exclude-table switches, where you may need it to apply to just some of them. Spelling considerations aside, attaching the property to the individual switches seems far superior. And I neither believe that this would add a lot of code, nor accept that as an excuse even if it's true. As noted, "childs" is bad English and "partitions" is flat out wrong (unless you change it to recurse only to partitions, which doesn't seem like a better definition). We could go with --[exclude-]table-and-children, or maybe --[exclude-]table-and-child-tables, but those are getting into carpal-tunnel-syndrome-inducing territory :-(. I lack a better naming suggestion offhand. regards, tom lane
Re: Date-time extraneous fields with reserved keywords
Joseph Koshakow writes: > - I'm not sure if we should hard code in those > three specific reserved keywords or set tmask > in the default case. I think we should tread very carefully about disallowing inputs that have been considered acceptable for 25 years. I agree with disallowing numeric fields along with 'epoch' and 'infinity', but for example this seems perfectly useful and sensible: # select timestamptz 'today 12:34'; timestamptz 2023-03-04 12:34:00-05 (1 row) > Any thoughts? Why do you want to skip ValidateDate in some cases? If we've not had to do that before, I don't see why it's a good idea now. regards, tom lane
Re: Add standard collation UNICODE
On Wed, 2023-03-01 at 11:09 +0100, Peter Eisentraut wrote: > When collation support was added to PostgreSQL, we added UCS_BASIC, > since that could easily be mapped to the C locale. Sorting by codepoint should be encoding-independent (i.e. decode to codepoint first); but the C collation is just strcmp, which is encoding-dependent. So is UCS_BASIC wrong today? (Aside: I wonder whether we should differentiate between the libc provider, which uses strcoll(), and the provider of non-localized comparisons that just use strcmp(). That would be a better reflection of what the code actually does.) > With ICU support, we can provide the UNICODE collation, since it's > just > the root locale. +1 > I suppose one hesitation was that ICU was not a > standard feature, so this would create variations in the default > catalog > contents, or something like that. It looks like the way you've handled this is by inserting the collation with collprovider=icu even if built without ICU support. I think that's a new case, so we need to make sure it throws reasonable user-facing errors. I do like your approach though because, if someone is using a standard collation, I think "not built with ICU" (feature not supported) is a better error than "collation doesn't exist". It also effectively reserves the name "unicode". -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: Missing free_var() at end of accum_sum_final()?
On Fri, Mar 3, 2023, at 16:11, Dean Rasheed wrote: > So IMO the results just don't justify such an extensive set of > changes, and I think we should abandon this fixed buffer approach. I agree. I was hoping it would be possible to reduce the invasiveness, but I think it's difficult and probably not worth it. > Having said that, I think the results from the COPY test are worth > looking at more closely. Your results seem to suggest around a 5% > improvement. On my machine it was only around 3%, but that still might > be worth having, if it didn't involve such invasive changes throughout > the rest of the code. > > As an experiment, I took another look at my earlier patch, making > make_result() construct the result using the same allocated memory as > the variable's digit buffer (if allocated). That eliminates around a > third of all free_var() calls from numeric.c, and for most functions, > it saves both a palloc() and a pfree(). In the case of numeric_in(), I > realised that it's possible to go further, by reusing the decimal > digits buffer for the NumericVar's digits, and then later for the > final Numeric result. Also, by carefully aligning things, it's > possible to arrange it so that the final make_result() doesn't need to > copy/move the digits at all. With that I get something closer to a 15% > improvement in the COPY test, which is definitely worth having. Nice! Patch LGTM. > I didn't do all the tests that you did though, so it would be > interesting to see how it fares in those. SELECT count(*) FROM generate_series(1::numeric, 1000::numeric); Time: 1196.801 ms (00:01.197) -- HEAD Time: 1278.376 ms (00:01.278) -- make-result-using-vars-buf-v2.patch TRUNCATE foo; COPY foo FROM '/tmp/random-numerics.csv'; Time: 8450.551 ms (00:08.451) -- HEAD Time: 7176.838 ms (00:07.177) -- make-result-using-vars-buf-v2.patch SELECT SUM(n1+n2+n3+n4) FROM t; Time: 643.961 ms -- HEAD Time: 620.303 ms -- make-result-using-vars-buf-v2.patch SELECT max(a + b + '17'::numeric + c) FROM (SELECT generate_series(1::numeric, 1000::numeric)) aa(a), (SELECT generate_series(1::numeric, 100::numeric)) bb(b), (SELECT generate_series(1::numeric, 10::numeric)) cc(c); Time: 141.070 ms -- HEAD Time: 139.562 ms -- make-result-using-vars-buf-v2.patch SELECT SUM(amount*1.25 + 0.5) FROM ledger; Time: 933.461 ms -- HEAD Time: 862.619 ms -- make-result-using-vars-buf-v2.patch Looks like a win in all cases except the first one. Great work! /Joel
Re: Date-time extraneous fields with reserved keywords
Attached is the described patch. I have two notes after implementing it: - It feels like a bit of an abstraction break to set tmask without actually setting any fields in tm. - I'm not sure if we should hard code in those three specific reserved keywords or set tmask in the default case. Any thoughts? - Joe Koshakow From 78d8f39db8df68502369ffd9edd6f6e38f4dadb8 Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Sun, 11 Dec 2022 16:08:43 -0500 Subject: [PATCH] Handle extraneous fields in date-time input DecodeDateTime sometimest allowed extraneous fields to be included with reserved keywords. For example `date '1995-08-06 epoch'` would be parsed successfully, but the date was ignored. This commit fixes the issue so an error is returned instead. --- src/backend/utils/adt/datetime.c | 18 ++--- src/test/regress/expected/date.out| 33 +++ src/test/regress/expected/timestamp.out | 33 +++ src/test/regress/expected/timestamptz.out | 33 +++ src/test/regress/sql/date.sql | 10 +++ src/test/regress/sql/timestamp.sql| 10 +++ src/test/regress/sql/timestamptz.sql | 10 +++ 7 files changed, 143 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 01660637a2..6f82465fd1 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -1431,6 +1431,13 @@ DecodeDateTime(char **field, int *ftype, int nf, *tzp = 0; break; + case DTK_EPOCH: + case DTK_LATE: + case DTK_EARLY: +tmask = (DTK_DATE_M | DTK_TIME_M | DTK_M(TZ)); +*dtype = val; +break; + default: *dtype = val; } @@ -1567,10 +1574,13 @@ DecodeDateTime(char **field, int *ftype, int nf, fmask |= tmask; } /* end loop over fields */ - /* do final checking/adjustment of Y/M/D fields */ - dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm); - if (dterr) - return dterr; + if (*dtype == DTK_DATE) + { + /* do final checking/adjustment of Y/M/D fields */ + dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm); + if (dterr) + return dterr; + } /* handle AM/PM */ if (mer != HR24 && tm->tm_hour > HOURS_PER_DAY / 2) diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out index f5949f3d17..c874f06546 100644 --- a/src/test/regress/expected/date.out +++ b/src/test/regress/expected/date.out @@ -1532,3 +1532,36 @@ select make_time(10, 55, 100.1); ERROR: time field value out of range: 10:55:100.1 select make_time(24, 0, 2.1); ERROR: time field value out of range: 24:00:2.1 +-- test errors with reserved keywords +SELECT date '1995-08-06 epoch'; +ERROR: invalid input syntax for type date: "1995-08-06 epoch" +LINE 1: SELECT date '1995-08-06 epoch'; +^ +SELECT date '1995-08-06 infinity'; +ERROR: invalid input syntax for type date: "1995-08-06 infinity" +LINE 1: SELECT date '1995-08-06 infinity'; +^ +SELECT date '1995-08-06 -infinity'; +ERROR: invalid input syntax for type date: "1995-08-06 -infinity" +LINE 1: SELECT date '1995-08-06 -infinity'; +^ +SELECT date 'epoch 1995-08-06'; +ERROR: invalid input syntax for type date: "epoch 1995-08-06" +LINE 1: SELECT date 'epoch 1995-08-06'; +^ +SELECT date 'infinity 1995-08-06'; +ERROR: invalid input syntax for type date: "infinity 1995-08-06" +LINE 1: SELECT date 'infinity 1995-08-06'; +^ +SELECT date '-infinity 1995-08-06'; +ERROR: invalid input syntax for type date: "-infinity 1995-08-06" +LINE 1: SELECT date '-infinity 1995-08-06'; +^ +SELECT date 'now infinity'; +ERROR: invalid input syntax for type date: "now infinity" +LINE 1: SELECT date 'now infinity'; +^ +SELECT date '-infinity infinity'; +ERROR: invalid input syntax for type date: "-infinity infinity" +LINE 1: SELECT date '-infinity infinity'; +^ diff --git a/src/test/regress/expected/timestamp.out b/src/test/regress/expected/timestamp.out index c64bcb7c12..c2159c2cec 100644 --- a/src/test/regress/expected/timestamp.out +++ b/src/test/regress/expected/timestamp.out @@ -2125,3 +2125,36 @@ select * from generate_series('2020-01-01 00:00'::timestamp, '2020-01-02 03:00'::timestamp, '0 hour'::interval); ERROR: step size cannot equal zero +-- test errors with reserved keywords +SELECT timestamp '1995-08-06 01:01:01 epoch'; +ERROR: invalid input syntax for type timestamp: "1995-08-06 01:01:01 epoch" +LINE 1: SELECT timestamp '1995-08-06 01:01:01 epoch'; + ^ +SELECT timestamp '1995-08-06 01:01:01 infinity'; +ERROR: invalid input syntax for type timestamp: "1995-08-06 01:01:01 infinity" +LINE 1: SELECT timestamp '1995-08-06 01:01:01 infinity'; +
Re: libpq-fe.h should compile *entirely* standalone
Andrew Dunstan writes: > On 2023-03-03 Fr 13:46, Tom Lane wrote: >> This is actually moving the inclusion-check goalposts quite far, >> but HEAD seems to pass cleanly, and again we can always adjust later. >> Any objections? > LGTM Pushed, thanks for looking. regards, tom lane
Re: zstd compression for pg_dump
On Fri, Mar 03, 2023 at 01:38:05PM -0800, Jacob Champion wrote: > > > With this particular dataset, I don't see much improvement with > > > zstd:long. > > > > Yeah. I this could be because either 1) you already got very good > > comprssion without looking at more data; and/or 2) the neighboring data > > is already very similar, maybe equally or more similar, than the further > > data, from which there's nothing to gain. > > What kinds of improvements do you see with your setup? I'm wondering > when we would suggest that people use it. On customer data, I see small improvements - below 10%. But on my first two tries, I made synthetic data sets where it's a lot: $ ./src/bin/pg_dump/pg_dump -d pryzbyj -Fp -Z zstd:long |wc -c 286107 $ ./src/bin/pg_dump/pg_dump -d pryzbyj -Fp -Z zstd:long=0 |wc -c 1709695 That's just 6 identical tables like: pryzbyj=# CREATE TABLE t1 AS SELECT generate_series(1,99); In this case, "custom" format doesn't see that benefit, because the greatest similarity is across tables, which don't share compressor state. But I think the note that I wrote in the docs about that should be removed - custom format could see a big benefit, as long as the table is big enough, and there's more similarity/repetition at longer distances. Here's one where custom format *does* benefit, due to long-distance repetition within a single table. The data is contrived, but the schema of ID => data is not. What's notable isn't how compressible the data is, but how much *more* compressible it is with long-distance matching. pryzbyj=# CREATE TABLE t1 AS SELECT i,array_agg(j) FROM generate_series(1,444)i,generate_series(1,9)j GROUP BY 1; $ ./src/bin/pg_dump/pg_dump -d pryzbyj -Fc -Z zstd:long=1 |wc -c 82023 $ ./src/bin/pg_dump/pg_dump -d pryzbyj -Fc -Z zstd:long=0 |wc -c 1048267 -- Justin
Re: Request for comment on setting binary format output per session
On Thu, 2023-03-02 at 09:13 -0500, Dave Cramer wrote: > I'd like to open up this discussion again so that we can > move forward. I prefer the GUC as it is relatively simple and as > Peter mentioned it works, but I'm not married to the idea. It's not very friendly to extensions, where the types are not guaranteed to have stable OIDs. Did you consider any proposals that work with type names? Regards, Jeff Davis
Re: Date-time extraneous fields with reserved keywords
On Sat, Mar 4, 2023 at 11:23 AM Keisuke Kuroda wrote: > >Good catch. >Of the reserved words that are special values of type Date/Time, >'now', 'today', 'tomorrow', 'yesterday', and 'allballs', >I get an error even before applying the patch. Thanks for pointing this out. After taking a look at the code, 'now', 'today', 'tomorrow', 'yesterday', and 'allballs' all set the appropriate tmask field which is what causes them to error. case DTK_NOW: tmask = (DTK_DATE_M | DTK_TIME_M | DTK_M(TZ)); case DTK_YESTERDAY: tmask = DTK_DATE_M; case DTK_TODAY: tmask = DTK_DATE_M; case DTK_TOMORROW: tmask = DTK_DATE_M; case DTK_ZULU: tmask = (DTK_TIME_M | DTK_M(TZ)); while 'epoch', 'infinity', and '-infinity' do not set tmask (note the default below handles all of these fields) default: *dtype = val; So I think a better fix here would be to also set tmask for those three reserved keywords. >One thing I noticed is that the following SQL >returns normal results even after applying the patch. > >postgres=# select timestamp 'epoch 01:01:01'; > timestamp >- > 1970-01-01 00:00:00 >(1 row) > >When 'epoch','infinity','-infinity' and time are specified together, >the time specified in the SQL is not included in result. >I think it might be better to assume that this pattern is also an error. >What do you think? I agree this pattern should also be an error. I think that the tmask approach will cause an error for this pattern as well. Thanks, Joe Koshakow
Re: [HACKERS] make async slave to wait for lsn to be replayed
Here I made new patch of feature, discussed above. WAIT FOR procedure - waits for certain lsn on pause == Synopsis == SELECT pg_wait_lsn(‘LSN’, timeout) returns boolean Where timeout = 0, will wait infinite without timeout And if timeout = 1, then just check if lsn was replayed How to use it == Greg Stark wrote: That said, I'm not a fan of the specific function names. Remember that we have polymorphic functions so you could probably just have an option argument: If you have any example, I will be glade to see them. Ьy searches have not been fruitful. Michael Paquier wrote: While looking at all the patches proposed, I have noticed that all the approaches proposed force a wakeup of the waiters in the redo loop of the startup process for each record, before reading the next record. It strikes me that there is some interaction with custom resource managers here, where it is possible to poke at the waiters not for each record, but after reading some specific records. Something out-of-core would not be as responsive as the per-record approach, still responsive enough that the waiters wait on input for an acceptable amount of time, depending on the frequency of the records generated by a primary to wake them up. Just something that popped into my mind while looking a bit at the threads. I`ll work on this idea to have less impact on the redo system. On 2023-03-02 13:33, Peter Eisentraut wrote: But I wonder how a client is going to get the LSN. How would all of this be used by a client? As I wrote earlier main purpose of the feature is to achieve read-your-writes-consistency, while using async replica for reads and primary for writes. In that case lsn of last modification is stored inside application. I'm tempted to think this could be a protocol-layer facility. Every query automatically returns the current LSN, and every query can also send along an LSN to wait for, and the client library would just keep track of the LSN for (what it thinks of as) the connection. So you get some automatic serialization without having to modify your client code. Yes it sounds very tempted. But I think community will be against it. -- Ivan Kartyshov Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index dbe9394762..422bb1ed82 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -43,6 +43,7 @@ #include "backup/basebackup.h" #include "catalog/pg_control.h" #include "commands/tablespace.h" +#include "commands/wait.h" #include "common/file_utils.h" #include "miscadmin.h" #include "pgstat.h" @@ -1752,6 +1753,15 @@ PerformWalRecovery(void) break; } + /* + * If we replayed an LSN that someone was waiting for, + * set latches in shared memory array to notify the waiter. + */ + if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN()) + { +WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr); + } + /* Else, try to fetch the next WAL record */ record = ReadRecord(xlogprefetcher, LOG, false, replayTLI); } while (record != NULL); diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile index 48f7348f91..d8f6965d8c 100644 --- a/src/backend/commands/Makefile +++ b/src/backend/commands/Makefile @@ -61,6 +61,7 @@ OBJS = \ vacuum.o \ vacuumparallel.o \ variable.o \ - view.o + view.o \ + wait.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c new file mode 100644 index 00..3465673f0a --- /dev/null +++ b/src/backend/commands/wait.c @@ -0,0 +1,275 @@ +/*- + * + * wait.c + * Implements wait lsn, which allows waiting for events such as + * LSN having been replayed on replica. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 2020, Regents of PostgresPro + * + * IDENTIFICATION + * src/backend/commands/wait.c + * + *- + */ +#include "postgres.h" + +#include + +#include "access/xact.h" +#include "access/xlogrecovery.h" +#include "access/xlogdefs.h" +#include "commands/wait.h" +#include "funcapi.h" +#include "miscadmin.h" +#include "pgstat.h" +#include "storage/backendid.h" +#include "storage/pmsignal.h" +#include "storage/proc.h" +#include "storage/shmem.h" +#include "storage/sinvaladt.h" +#include "storage/spin.h" +#include "utils/builtins.h" +#include "utils/pg_lsn.h" +#include "utils/timestamp.h" + +/* Add to shared memory array */ +static void AddWaitedLSN(XLogRecPtr lsn_to_wait); + +/* Shared memory structure */ +typedef struct +{ + int backend_maxid; + pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */ + slock_t mutex; + /* LSNs that
Re: libpq-fe.h should compile *entirely* standalone
On 2023-03-03 Fr 13:46, Tom Lane wrote: I wrote: We can easily do better, as attached, but I wonder which other headers should get the same treatment. After a bit of further research I propose the attached. I'm not sure exactly what subset of ECPG headers is meant to be exposed to clients, but we can adjust these patterns if new info emerges. This is actually moving the inclusion-check goalposts quite far, but HEAD seems to pass cleanly, and again we can always adjust later. Any objections? LGTM cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Fri, Mar 3, 2023 at 1:09 PM Önder Kalacı wrote: > >> >> 5. >> >> + /* >> +* If index scans are disabled, use a sequential scan. >> +* >> +* Note that we do not use index scans below when enable_indexscan is >> +* false. Allowing primary key or replica identity even when index >> scan is >> +* disabled is the legacy behaviour. So we hesitate to move the below >> +* enable_indexscan check to be done earlier in this function. >> +*/ >> + if (!enable_indexscan) >> + return InvalidOid; >> >> Since the document of enable_indexscan says "Enables or disables the query >> planner's use of index-scan plan types. The default is on.", and we don't use >> planner here, so I am not sure should we allow/disallow index scan in apply >> worker based on this GUC. >> > > Given Amit's suggestion on [1], I'm planning to drop this check altogether, > and > rely on table storage parameters. > This still seems to be present in the latest version. I think we can just remove this and then add the additional check as suggested by you as part of the second patch. Few other comments on latest version: == 1. +/* + * Returns true if the index is usable for replica identity full. For details, + * see FindUsableIndexForReplicaIdentityFull. + */ +bool +IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo) +{ + bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID); + bool is_partial = (indexInfo->ii_Predicate != NIL); + bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo); + + if (is_btree && !is_partial && !is_only_on_expression) + { + return true; ... ... +/* + * Returns the oid of an index that can be used via the apply worker. The index + * should be btree, non-partial and have at least one column reference (e.g., + * should not consist of only expressions). The limitations arise from + * RelationFindReplTupleByIndex(), which is designed to handle PK/RI and these + * limitations are inherent to PK/RI. By these two, the patch infers that it picks an index that adheres to the limitations of PK/RI. Apart from unique, the other properties of RI are "not partial, not deferrable, and include only columns marked NOT NULL". See ATExecReplicaIdentity() for corresponding checks. We don't try to ensure the last two from the list. It is fine to do so if we document the reasons for the same in comments or we can even try to enforce the remaining restrictions as well. For example, it should be okay to allow NULL column values because we anyway compare the entire tuple after getting the value from the index. 2. + { + /* + * This attribute is an expression, and + * FindUsableIndexForReplicaIdentityFull() was called earlier + * when the index for subscriber was selected. There, the indexes + * comprising *only* expressions have already been eliminated. + * + * Also, because PK/RI can't include expressions we + * sanity check the index is neither of those kinds. + */ + Assert(!IdxIsRelationIdentityOrPK(rel, idxrel->rd_id)); This comment doesn't make much sense after you have moved the corresponding Assert in RelationFindReplTupleByIndex(). Either we should move or remove this Assert as well or at least update the comments to reflect the latest code. 3. When FindLogicalRepUsableIndex() is invoked from logicalrep_partition_open(), the current memory context would be LogicalRepPartMapContext which would be a long-lived context and we allocate memory for indexes in FindLogicalRepUsableIndex() which can accumulate over a period of time. So, I think it would be better to switch to the old context in logicalrep_partition_open() before invoking FindLogicalRepUsableIndex() provided that is not a long-lived context. -- With Regards, Amit Kapila.
Re: Fix comments in gistxlogDelete, xl_heap_freeze_page and xl_btree_delete
Hi, On 3/3/23 6:53 PM, Robert Haas wrote: On Fri, Mar 3, 2023 at 11:28 AM Drouvot, Bertrand wrote: Thanks for having looked at it! +1. Committed. Thanks! Not a big deal, but the commit message that has been used is not 100% accurate. Indeed, for gistxlogDelete, that's the other way around (as compare to what the commit message says). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com