Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL
Justin Pryzby writes: > On Mon, Mar 13, 2023 at 02:19:07PM +0100, Daniel Gustafsson wrote: >> + elog(ERROR, >> +"unexpected NULL value in cached tuple for >> pg_catalog.%s.%s", >> +get_rel_name(cacheinfo[cacheId].reloid), > Question: Is it safe to be making catalog access inside an error > handler, when one of the most likely reason for hitting the error is > catalog corruption ? I don't see a big problem here. If there were catalog corruption preventing fetching the catalog's pg_class row, it's highly unlikely that you'd have managed to retrieve a catalog row to complain about. (That is, corruption in this particular catalog entry probably does not extend to the metadata about the catalog containing it.) > Maybe the answer is that it's not "safe" but "safe enough" Right. If we got concerned about this we could dodge the extra catalog access by adding the catalog's name to CatCache entries. I doubt it's worth it though. We can always re-evaluate if we see actual evidence of problems. regards, tom lane
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL
On Mon, Mar 13, 2023 at 02:19:07PM +0100, Daniel Gustafsson wrote: > > On 2 Mar 2023, at 15:44, Tom Lane wrote: > > > > Peter Eisentraut writes: > >> I think an error message like > >> "unexpected null value in system cache %d column %d" > >> is sufficient. Since these are "can't happen" errors, we don't need to > >> spend too much extra effort to make it prettier. > > > > I'd at least like to see it give the catalog's OID. That's easily > > convertible to a name, and it doesn't tend to move around across PG > > versions, neither of which are true for syscache IDs. > > > > Also, I'm fairly unconvinced that it's a "can't happen" --- this > > would be very likely to fire as a result of catalog corruption, > > so it would be good if it's at least minimally interpretable > > by a non-expert. Given that we'll now have just one copy of the > > code, ISTM there's a good case for doing the small extra work > > to report catalog and column by name. > > Rebased v3 on top of recent conflicting ICU changes causing the patch to not > apply anymore. Also took another look around the tree to see if there were > missed callsites but found none new. +++ b/src/backend/utils/cache/syscache.c @@ -77,6 +77,7 @@ #include "catalog/pg_user_mapping.h" #include "lib/qunique.h" #include "utils/catcache.h" +#include "utils/lsyscache.h" #include "utils/rel.h" #include "utils/syscache.h" @@ -1099,6 +1100,32 @@ SysCacheGetAttr(int cacheId, HeapTuple tup, + elog(ERROR, +"unexpected NULL value in cached tuple for pg_catalog.%s.%s", +get_rel_name(cacheinfo[cacheId].reloid), Question: Is it safe to be making catalog access inside an error handler, when one of the most likely reason for hitting the error is catalog corruption ? Maybe the answer is that it's not "safe" but "safe enough" - IOW, if you're willing to throw an assertion, it's good enough to try to show the table name, and if the error report crashes the server, that's "not much worse" than having Asserted(). -- Justin
Re: Request for comment on setting binary format output per session
Dave Cramer On Sat, 25 Mar 2023 at 19:06, Jeff Davis wrote: > On Thu, 2023-03-23 at 15:37 -0400, Dave Cramer wrote: > > So where do we go from here ? > > > > I can implement using type names as well as OID's > > My current thought is that you should use the protocol extension and > make type names work (in addition to OIDs) at least for fully-qualified > type names. I don't really like the GUC -- perhaps I could be convinced > it's OK, but until we find a problem with protocol extensions, it looks > like a protocol extension is the way to go here. > > Well as I said if I use any external pool that shares connections with multiple clients, some of which may not know how to decode binary data then we have to have a way to set the binary format after the connection is established. I did float the idea of using the cancel key as a unique identifier that passed with the parameter would allow setting the parameter after connection establishmen. I like Peter's idea for some kind of global identifier, but we can do > that independently at a later time. > > If search path works fine and we're all happy with it, we could also > support unqualified type names. It feels slightly off to me to use > search_path for something like that, though. > > There's still the problem about the connection pools. pgbouncer could > consider the binary formats to be an important parameter like the > database name, where the connection pooler would not mingle connections > with different settings for binary_formats. That would work, but it > would be weird because if a new version of a driver adds new binary > format support, it could cause worse connection pooling performance > until all the other drivers also support that binary format. I'm not > sure if that's a major problem or not. Another idea would be for the > connection pooler to also have a binary_formats config, and it would do > some checking to make sure all connecting clients understand some > minimal set of binary formats, so that it could still mingle the > connections. Either way, I think this is solvable by the connection > pooler. > Well that means that connection poolers have to all be fixed. There are more than just pgbouncer. Seems rather harsh that a new feature breaks a connection pooler or makes the pooler unusable. Dave
Re: Request for comment on setting binary format output per session
On Thu, 2023-03-23 at 15:37 -0400, Dave Cramer wrote: > So where do we go from here ? > > I can implement using type names as well as OID's My current thought is that you should use the protocol extension and make type names work (in addition to OIDs) at least for fully-qualified type names. I don't really like the GUC -- perhaps I could be convinced it's OK, but until we find a problem with protocol extensions, it looks like a protocol extension is the way to go here. I like Peter's idea for some kind of global identifier, but we can do that independently at a later time. If search path works fine and we're all happy with it, we could also support unqualified type names. It feels slightly off to me to use search_path for something like that, though. There's still the problem about the connection pools. pgbouncer could consider the binary formats to be an important parameter like the database name, where the connection pooler would not mingle connections with different settings for binary_formats. That would work, but it would be weird because if a new version of a driver adds new binary format support, it could cause worse connection pooling performance until all the other drivers also support that binary format. I'm not sure if that's a major problem or not. Another idea would be for the connection pooler to also have a binary_formats config, and it would do some checking to make sure all connecting clients understand some minimal set of binary formats, so that it could still mingle the connections. Either way, I think this is solvable by the connection pooler. Regards, Jeff Davis
Re: Request for comment on setting binary format output per session
On Fri, 2023-03-24 at 07:52 -0500, Merlin Moncure wrote: > I think so. Dave's idea puts a lot of flexibility into the client > side, and that's good. Search path mechanics are really well > understood and well integrated with extensions already (create > extension ..schema) assuming that the precise time UDT are looked up > in an unqualified way is very clear to- or invoked via- the client > code. I'll say it again though; OIDs really ought to be considered a > transient cache of type information rather than a > permanent identifier. I'm not clear on what proposal you are making and/or endorising? > Regarding UDT, lots of common and useful scenarios (containers, enum, > range, etc), do not require special knowledge to parse beyond the > kind of type it is. Automatic type creation from tables is one of > the most genius things about postgres and directly wiring client > structures to them through binary is really nifty. Perhaps not special knowledge, but you need to know the structure. If you have a query like "SELECT '...'::sometable", you still need to know the structure of sometable to parse it. > This undermines the case that binary parsing requires special > knowledge IMO, UDT might in fact be the main long term target...I > could see scenarios where java classes might be glued directly to > postgres tables by the driver...this would be a lot more efficient > than using json which is how everyone does it today. Then again, > maybe *I* might be overthinking this. Wouldn't that only work if someone is doing a "SELECT *"? Regards, Jeff Davis
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL
> On 14 Mar 2023, at 08:00, Peter Eisentraut > wrote: > I prefer to use "null value" for SQL null values, and NULL for the C symbol. Thats a fair point, I agree with that. > I'm a bit hesitant about hardcoding pg_catalog here. That happens to be > true, of course, but isn't actually enforced, I think. I think that could be > left off. It's not like people will be confused about which schema > "pg_class.relname" is in. > > Also, the cached tuple isn't really for the attribute, so maybe split that up > a bit, like > > "unexpected null value in cached tuple for catalog %s column %s" No objections, so changed to that wording. With these changes and a pgindent run across it per Davids comment downthread, I've pushed this now. Thanks for review! I'm keeping a watchful eye on the buildfarm; francolin has errored in recoveryCheck which I'm looking into but at first glance I don't think it's related (other animals have since passed it and it works locally, but I'll keep digging at it to make sure). -- Daniel Gustafsson
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
On Tue, Mar 21, 2023 at 6:03 AM Ants Aasma wrote: > > On Mon, 20 Mar 2023 at 00:59, Melanie Plageman > wrote: > > > > On Wed, Mar 15, 2023 at 6:46 AM Ants Aasma wrote: > > > > > > On Wed, 15 Mar 2023 at 02:29, Melanie Plageman > > > wrote: > > > > As for routine vacuuming and the other buffer access strategies, I think > > > > there is an argument for configurability based on operator knowledge -- > > > > perhaps your workload will use the data you are COPYing as soon as the > > > > COPY finishes, so you might as well disable a buffer access strategy or > > > > use a larger fraction of shared buffers. Also, the ring sizes were > > > > selected sixteen years ago and average server memory and data set sizes > > > > have changed. > > > > > > To be clear I'm not at all arguing against configurability. I was > > > thinking that dynamic use could make the configuration simpler by self > > > tuning to use no more buffers than is useful. > > > > Yes, but I am struggling with how we would define "useful". > > For copy and vacuum, the only reason I can see for keeping visited > buffers around is to avoid flushing WAL or at least doing it in larger > batches. Once the ring is big enough that WAL doesn't need to be > flushed on eviction, making it bigger only wastes space that could be > used by something that is not going to be evicted soon. Well, I think if you know you will use the data you are COPYing right away in your normal workload, it could be useful to have the ring be large or to disable use of the ring. And, for vacuum, if you need to get it done as quickly as possible, again, it could be useful to have the ring be large or disable use of the ring. > > > > StrategyRejectBuffer() will allow bulkreads to, as you say, use more > > > > buffers than the original ring size, since it allows them to kick > > > > dirty buffers out of the ring and claim new shared buffers. > > > > > > > > Bulkwrites and vacuums, however, will inevitably dirty buffers and > > > > require flushing the buffer (and thus flushing the associated WAL) when > > > > reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of > > > > the ring, since dirtying buffers is their common case. A dynamic > > > > resizing like the one you suggest would likely devolve to vacuum and > > > > bulkwrite strategies always using the max size. > > > > > > I think it should self stabilize around the point where the WAL is > > > either flushed by other commit activity, WAL writer or WAL buffers > > > filling up. Writing out their own dirtied buffers will still happen, > > > just the associated WAL flushes will be in larger chunks and possibly > > > done by other processes. > > > > They will have to write out any WAL associated with modifications to the > > dirty buffer before flushing it, so I'm not sure I understand how this > > would work. > > By the time the dirty buffer needs eviction the WAL associated with it > can already be written out by concurrent commits, WAL writer or by WAL > buffers filling up. The bigger the ring is, the higher the chance that > one of these will happen before we loop around. Ah, I think I understand the idea now. So, I think it is an interesting idea to try and find the goldilocks size for the ring buffer. It is especially interesting to me in the case in which we are enlarging the ring. However, given that concurrent workload variability, machine I/O latency fluctuations, etc, we will definitely have to set a max value of some kind anyway for the ring size. So, this seems more like a complimentary feature to vacuum_buffer_usage_limit. If we added some kind of adaptive sizing in a later version, we could emphasize in the guidance for setting vacuum_buffer_usage_limit that it is the *maximum* size you would like to allow vacuum to use. And, of course, there are the other operations which use buffer access strategies. > > > > As for decreasing the ring size, buffers are only "added" to the ring > > > > lazily and, technically, as it is now, buffers which have been added > > > > added to the ring can always be reclaimed by the clocksweep (as long as > > > > they are not pinned). The buffer access strategy is more of a > > > > self-imposed restriction than it is a reservation. Since the ring is > > > > small and the buffers are being frequently reused, odds are the usage > > > > count will be 1 and we will be the one who set it to 1, but there is no > > > > guarantee. If, when attempting to reuse the buffer, its usage count is > > > > > 1 (or it is pinned), we also will kick it out of the ring and go look > > > > for a replacement buffer. > > > > > > Right, but while the buffer is actively used by the ring it is > > > unlikely that clocksweep will find it at usage 0 as the ring buffer > > > should cycle more often than the clocksweep. Whereas if the ring stops > > > using a buffer, clocksweep will eventually come and reclaim it. And if > > > the ring shrinking decision turns out to be wrong before the > > >
Re: what should install-world do when docs are not available?
Hi, On 2023-03-25 16:40:03 -0400, Tom Lane wrote: > Andres Freund writes: > > Right now if one does install-world with autoconf, without having xmllint or > > xsltproc available, one gets an error: > > ERROR: `xmllint' is missing on your system. > > > Is that good? Should meson behave the same? > > Since install-world is defined to install documentation, it should > do so or fail trying. Maybe we could skip the xmllint step, but you'd > still need xsltproc so I'm not sure that that moves the bar very far. xmllint is the more commonly installed tool (it's part of libxml, which libxslt depends on), so that wouldn't help much - and we now depend on xmllint to build the input to xsltproc anyway... > > If that's what we decide to do, perhaps "docs" should be split further? The > > dependencies for pdf generation are a lot more heavyweight. > > Yeah. Personally I think "docs" should just build/install the HTML > docs, but maybe I'm too narrow-minded. Sorry, I meant docs as a meson option, not as a build target. The 'docs' target builds just the html doc (as with autoconf), and install-doc installs both html and manpages (also as with autoconf). I am basically wondering if we should make it so that if you say -Ddocs=enabled and xmllint or xsltproc aren't available, you get an error at configure time. And if -Ddocs=auto, the summary at the end of configure will tell you if the necessary tools to build the docs are available, but not error out. The extension to that could be to have a separate -Ddoc_pdf option, which'd mirror the above. Greetings, Andres Freund
Re: Parallel Full Hash Join
On Sat, Mar 25, 2023 at 09:21:34AM +1300, Thomas Munro wrote: > * reuse the same umatched_scan_{chunk,idx} variables as above > * rename the list of chunks to scan to work_queue > * fix race/memory leak if we see PHJ_BATCH_SCAN when we attach (it > wasn't OK to just fall through) ah, good catch. > I don't love the way that both ExecHashTableDetachBatch() and > ExecParallelPrepHashTableForUnmatched() duplicate logic relating to > the _SCAN/_FREE protocol, but I'm struggling to find a better idea. > Perhaps I just need more coffee. I'm not sure if I have strong feelings either way. To confirm I understand, though: in ExecHashTableDetachBatch(), the call to BarrierArriveAndDetachExceptLast() serves only to advance the barrier phase through _SCAN, right? It doesn't really matter if this worker is the last worker since BarrierArriveAndDetach() handles that for us. There isn't another barrier function to do this (and I mostly think it is fine), but I did have to think on it for a bit. Oh, and, unrelated, but it is maybe worth updating the BarrierAttach() function comment to mention BarrierArriveAndDetachExceptLast(). > I think your idea of opportunistically joining the scan if it's > already running makes sense to explore for a later step, ie to make > multi-batch PHFJ fully fair, and I think that should be a fairly easy > code change, and I put in some comments where changes would be needed. makes sense. I have some very minor pieces of feedback, mainly about extraneous commas that made me uncomfortable ;) > From 8b526377eb4a4685628624e75743aedf37dd5bfe Mon Sep 17 00:00:00 2001 > From: Thomas Munro > Date: Fri, 24 Mar 2023 14:19:07 +1300 > Subject: [PATCH v12 1/2] Scan for unmatched hash join tuples in memory order. > > In a full/right outer join, we need to scan every tuple in the hash > table to find the ones that were not matched while probing, so that we Given how you are using the word "so" here, I think that comma before it is not needed. > @@ -2083,58 +2079,45 @@ bool > ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext) > { > HashJoinTable hashtable = hjstate->hj_HashTable; > - HashJoinTuple hashTuple = hjstate->hj_CurTuple; > + HashMemoryChunk chunk; > > - for (;;) > + while ((chunk = hashtable->unmatched_scan_chunk)) > { > - /* > - * hj_CurTuple is the address of the tuple last returned from > the > - * current bucket, or NULL if it's time to start scanning a new > - * bucket. > - */ > - if (hashTuple != NULL) > - hashTuple = hashTuple->next.unshared; > - else if (hjstate->hj_CurBucketNo < hashtable->nbuckets) > - { > - hashTuple = > hashtable->buckets.unshared[hjstate->hj_CurBucketNo]; > - hjstate->hj_CurBucketNo++; > - } > - else if (hjstate->hj_CurSkewBucketNo < hashtable->nSkewBuckets) > + while (hashtable->unmatched_scan_idx < chunk->used) > { > - int j = > hashtable->skewBucketNums[hjstate->hj_CurSkewBucketNo]; > + HashJoinTuple hashTuple = (HashJoinTuple) > + (HASH_CHUNK_DATA(hashtable->unmatched_scan_chunk) + > + hashtable->unmatched_scan_idx); > > - hashTuple = hashtable->skewBucket[j]->tuples; > - hjstate->hj_CurSkewBucketNo++; > - } > - else > - break; /* finished all buckets > */ > + MinimalTuple tuple = HJTUPLE_MINTUPLE(hashTuple); > + int hashTupleSize = > (HJTUPLE_OVERHEAD + tuple->t_len); > > - while (hashTuple != NULL) > - { > - if > (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(hashTuple))) > - { > - TupleTableSlot *inntuple; > + /* next tuple in this chunk */ > + hashtable->unmatched_scan_idx += > MAXALIGN(hashTupleSize); > > - /* insert hashtable's tuple into exec slot */ > - inntuple = > ExecStoreMinimalTuple(HJTUPLE_MINTUPLE(hashTuple), > - > hjstate->hj_HashTupleSlot, > - > false);/* do not pfree */ > - econtext->ecxt_innertuple = inntuple; > + if > (HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(hashTuple))) > + continue; > > - /* > - * Reset temp memory each time; although this > function doesn't > -
Re: what should install-world do when docs are not available?
Andres Freund writes: > Right now if one does install-world with autoconf, without having xmllint or > xsltproc available, one gets an error: > ERROR: `xmllint' is missing on your system. > Is that good? Should meson behave the same? Since install-world is defined to install documentation, it should do so or fail trying. Maybe we could skip the xmllint step, but you'd still need xsltproc so I'm not sure that that moves the bar very far. > If that's what we decide to do, perhaps "docs" should be split further? The > dependencies for pdf generation are a lot more heavyweight. Yeah. Personally I think "docs" should just build/install the HTML docs, but maybe I'm too narrow-minded. > We should probably also generate a useful error when the stylesheets aren't > available. Maybe, but we haven't had that in the autoconf case either, and there have not been too many complaints. Not sure it's worth putting extra effort into. regards, tom lane
Re: Infinite Interval
On Sat, 25 Mar 2023 at 15:59, Tom Lane wrote: > Joseph Koshakow writes: > > In terms of adding/subtracting infinities, the IEEE standard is pay > > walled and I don't have a copy. I tried finding information online but > > I also wasn't able to find anything useful. I additionally checked to see > > the results of C++, C, and Java, and they all match which increases my > > confidence that we're doing the right thing. Does anyone happen to have > > a copy of the standard and can confirm? > > I think you can take it as read that simple C test programs on modern > platforms will exhibit IEEE-compliant handling of float infinities. > Additionally, the Java language specification claims to follow IEEE 754: https://docs.oracle.com/javase/specs/jls/se11/html/jls-15.html#jls-15.18.2 So either C and Java agree with each other and with the spec, or they disagree in the same way even while at least one of them explicitly claims to be following the spec. I think you're on pretty firm ground.
what should install-world do when docs are not available?
Hi, A question by Justin made me wonder what the right behaviour for world, install-world should be when the docs tools aren't available. I'm wondering from the angle of meson, but it also seems something we possibly should think about for autoconf. Right now if one does install-world with autoconf, without having xmllint or xsltproc available, one gets an error: ERROR: `xmllint' is missing on your system. Is that good? Should meson behave the same? I wonder if, for meson, the best behaviour would be to make 'docs' a feature set to auto. If docs set to enabled, and the necessary tools are not available, fail at that time, instead of doing so while building. If that's what we decide to do, perhaps "docs" should be split further? The dependencies for pdf generation are a lot more heavyweight. We should probably also generate a useful error when the stylesheets aren't available. Right now we just generate a long error: /usr/bin/xsltproc --nonet --path . --stringparam pg.version '16devel' /home/andres/src/postgresql/doc/src/sgml/stylesheet.xsl postgres-full.xml I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl; compilation error: file /home/andres/src/postgresql/doc/src/sgml/stylesheet.xsl line 6 element import xsl:import : unable to load http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl I/O error : Attempt to load network entity http://docbook.sourceforge.net/release/xsl/current/common/entities.ent /home/andres/src/postgresql/doc/src/sgml/stylesheet-html-common.xsl:4: warning: failed to load external entity "http://docbook.sourceforge.net/release/xsl/current/common/entities.ent; %common.entities; ^ /home/andres/src/postgresql/doc/src/sgml/stylesheet-html-common.xsl:116: parser error : Entity 'primary' not defined translate(substring(, 1, 1), ... Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20230325180310.o6drykb3uz4u4x4r%40awork3.anarazel.de
Re: windows/meson cfbot warnings
On Sun, Feb 26, 2023 at 09:54:25AM -0800, Andres Freund wrote: > On 2023-02-25 16:45:38 -0600, Justin Pryzby wrote: > > Unrelated, but something else changed and now there's this. > > > > https://cirrus-ci.com/task/6202242768830464 > > > > [20:10:34.310] c:\cirrus>call sh -c 'if grep ": warning " build.txt; then > > exit 1; fi; exit 0' > > [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: > > 'MS_WIN64': macro redefinition > > [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: > > 'MS_WIN64': macro redefinition > > [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: > > 'MS_WIN64': macro redefinition > > [20:10:34.397] C:\python\Include\pyconfig.h(117): warning C4005: > > 'MS_WIN64': macro redefinition > > Hm, odd. > > There's a bit more context about the warning in the output: > > [21:43:58.782] [1509/2165] Compiling C object > src/pl/plpython/plpython3.dll.p/plpy_exec.c.obj > [21:43:58.782] C:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': > macro redefinition > [21:43:58.924] src/pl/plpython/plpython3.dll.p/meson_pch-c.c: note: see > previous definition of 'MS_WIN64' I don't remember right now what I did to get the output, but it's like: [18:33:22.351] c:\python\Include\pyconfig.h(117): warning C4005: 'MS_WIN64': macro redefinition [18:33:22.351] c:\python\Include\pyconfig.h(117): note: 'MS_WIN64' previously declared on the command line I found that this is caused by changes in meson between 1.0.0 and 1.0.1. Probably one of these. https://github.com/mesonbuild/meson/commit/aa69cf04484309f82d2da64c433539d2f6f2fa82 https://github.com/mesonbuild/meson/commit/9bf718fcee0d9e30b5de2d6a2f154aa417aa8d4c -- Justin
Re: Infinite Interval
Joseph Koshakow writes: > In terms of adding/subtracting infinities, the IEEE standard is pay > walled and I don't have a copy. I tried finding information online but > I also wasn't able to find anything useful. I additionally checked to see > the results of C++, C, and Java, and they all match which increases my > confidence that we're doing the right thing. Does anyone happen to have > a copy of the standard and can confirm? I think you can take it as read that simple C test programs on modern platforms will exhibit IEEE-compliant handling of float infinities. regards, tom lane
Re: Progress report of CREATE INDEX for nested partitioned tables
I pushed 0001 with some cosmetic changes (for instance, trying to make the style of the doc entries for partitions_total/partitions_done match the rest of their table). I'm not touching 0002 or 0003, because I think they're fundamentally a bad idea. Progress reporting is inherently inexact, because it's so hard to predict the amount of work to be done in advance -- have you ever seen a system anywhere whose progress bars reliably advance at a uniform rate? I think adding assertions that the estimates are error-free is just going to cause headaches. As an example, I added a comment pointing out that the current fix won't crash and burn if the caller failed to lock all the child tables in advance: the find_all_inheritors call should be safe anyway, so the worst consequence would be an imprecise partitions_total estimate. But that argument falls down if we're going to add assertions that partitions_total isn't in error. regards, tom lane
Re: Non-superuser subscription owners
On Fri, 2023-03-24 at 09:24 -0400, Robert Haas wrote: > I certainly agree that the security model isn't in a reasonable place > right now. However, I feel that: > > (1) adding an extra predefined role > (2) even adding the connection string security stuff I don't see how these points are related to the question of whether you should commit your non-superuser-subscription-owners patch or logical- repl-as-table-owner patch first. My perspective is that logical replication is an unfinished feature with an incomplete design. As I said earlier, that's why I backed away from trying to do non-superuser subscriptions as a documented feature: it feels like we need to settle some of the underlying pieces first. There are some big issues, like the security model for replaying changes. And some smaller issues like feature gaps (RLS doesn't work, if I remember correctly, and maybe something with partitioning). There are potential clashes with other proposals, like the CREATE SUBSCRIPTION ... SERVER, which I hope can be sorted out later. And I don't feel like I have a good handle on the publisher security model and threats, which hopefully is just a matter of documenting some best practices. Each time we dig into one of these issues I learn something, and I think others do, too. If we skip past that process and start adding new features on top of this unfinished design, then I think we are setting ourselves up for trouble that is going to be harder to fix later. I don't mean to say all of the above issues are blockers or that they should all be resolved in my favor. But there are enough issues and some of those issues are serious enough that I feel like it's premature to just go ahead with the non-superuser subscriptions and the predefined role. There are already users, which complicates things. And you make a good point that some important users may be already working around the flaws. But there's already a patch and discussion going on for some security model improvements (thanks to you), so let's try to get that one in first. If we can't, it's probably because we learned something important. Regards, Jeff Davis
Re: Should vacuum process config file reload more often
On Thu, Mar 23, 2023 at 8:27 PM Melanie Plageman wrote: > On Thu, Mar 23, 2023 at 2:09 AM Masahiko Sawada wrote: > > > And, I was wondering if it was worth trying to split up the part that > > > reloads the config file and all of the autovacuum stuff. The reloading > > > of the config file by itself won't actually result in autovacuum workers > > > having updated cost delays because of them overwriting it with > > > wi_cost_delay, but it will allow VACUUM to have those updated values. > > > > It makes sense to me to have changes for overhauling the rebalance > > mechanism in a separate patch. > > > > Looking back at the original concern you mentioned[1]: > > > > speed up long-running vacuum of a large table by > > decreasing autovacuum_vacuum_cost_delay/vacuum_cost_delay, however the > > config file is only reloaded between tables (for autovacuum) or after > > the statement (for explicit vacuum). > > > > does it make sense to have autovac_balance_cost() update workers' > > wi_cost_delay too? Autovacuum launcher already reloads the config file > > and does the rebalance. So I thought autovac_balance_cost() can update > > the cost_delay as well, and this might be a minimal change to deal > > with your concern. This doesn't have the effect for manual VACUUM but > > since vacuum delay is disabled by default it won't be a big problem. > > As for manual VACUUMs, we would need to reload the config file in > > vacuum_delay_point() as the part of your patch does. Overhauling the > > rebalance mechanism would be another patch to improve it further. > > So, we can't do this without acquiring an access shared lock on every > call to vacuum_delay_point() because cost delay is a double. > > I will work on a patchset with separate commits for reloading the config > file, though (with autovac not benefitting in the first commit). So, I realized we could actually do as you say and have autovac workers update their wi_cost_delay and keep the balance changes in a separate commit. I've done this in attached v8. Workers take the exclusive lock to update their wi_cost_delay and wi_cost_limit only when there is a config reload. So, there is one commit that implements this behavior and a separate commit to revise the worker rebalancing. Note that we must have the workers also update wi_cost_limit_base and then call autovac_balance_cost() when they reload the config file (instead of waiting for launcher to call autovac_balance_cost()) to avoid potentially calculating the sleep with a new value of cost delay and an old value of cost limit. In the commit which revises the worker rebalancing, I'm still wondering if wi_dobalance should be an atomic flag -- probably not worth it, right? On Fri, Mar 24, 2023 at 1:27 PM Melanie Plageman wrote: > I realized nworkers_for_balance should be initialized to 0 and not 1 -- > 1 is misleading since there are often 0 autovac workers. We just never > want to use nworkers_for_balance when it is 0. But, workers put a floor > of 1 on the number when they divide limit/nworkers_for_balance (since > they know there must be at least one worker right now since they are a > worker). I thought about whether or not they should call > autovac_balance_cost() if they find that nworkers_for_balance is 0 when > updating their own limit, but I'm not sure. I've gone ahead and updated this. I haven't made the workers call autovac_balance_cost() if they find that nworkers_for_balance is 0 when they try and use it when updating their limit because I'm not sure if this can happen. I would be interested in input here. I'm also still interested in feedback on the variable name av_nworkers_for_balance. > On Fri, Mar 24, 2023 at 1:21 AM Masahiko Sawada wrote: > > > > I need another few readthroughs to figure out of VacuumFailsafeActive > > > > does what > > > > I think it does, and should be doing, but in general I think this is a > > > > good > > > > idea and a patch in good condition close to being committable. > > > > Another approach would be to make VacuumCostActive a ternary value: > > on, off, and never. When we trigger the failsafe mode we switch it to > > never, meaning that it never becomes active even after reloading the > > config file. A good point is that we don't need to add a new global > > variable, but I'm not sure it's better than the current approach. > > Hmm, this is interesting. I don't love the word "never" since it kind of > implies a duration longer than the current table being vacuumed. But we > could find a different word or just document it well. For clarity, we > might want to call it failsafe_mode or something. > > I wonder if the primary drawback to converting > LVRelState->failsafe_active to a global VacuumFailsafeActive is just the > general rule of limiting scope to the minimum needed. Okay, so I've changed my mind about this. I like having a ternary for VacuumCostActive and keeping failsafe_active in LVRelState. What I didn't like was having non-vacuum code have to care
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
On Sat, Mar 25, 2023 at 11:17 AM Andres Freund wrote: > Thinking more about this, I think there's no inherent deadlock danger with > reading the VM while holding a buffer lock, "just" an efficiency issue. If we > avoid needing to do IO nearly all the time, by trying to pin the right page > before extending, it's probably good enough. Uh, it was quite possible for lazy_vacuum_heap_page() to do that up until very recently (it was fixed by my commit 980ae17310). Since it would call visibilitymap_get_status() with an exclusive buffer lock on the heap page, which sometimes had to change the VM page. It potentially did an IO at that point, to read in a later VM page to the caller's initially-pinned one. In other words, up until recently there was a strange idiom used by lazy_vacuum_heap_page/lazy_vacuum_heap_rel, where we'd abuse visibilitymap_get_status() as a replacement for calling visibilitymap_pin() right before acquire a heap page buffer lock. But now the second heap pass does it the same way as the first heap pass. (Even still, I have no reason to believe that the previous approach was all that bad; it was just a bit ugly.) There are still a few visibilitymap_get_status()-with-buffer-lock calls in vacuumlazy.c, FWIW, but they don't run the risk of needing to change the vmbuffer we have pinned with the heap page buffer lock held. -- Peter Geoghegan
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
Hi, On 2023-03-25 12:57:17 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote: > >> Can't we ensure we actually lock the vm buffer too in ReadBufferBI, > >> before calling ReadBufferExtended? Or am I confused and that's not > >> possible for some reason? > > > Note that this is using P_NEW. I.e. we don't know the buffer location yet. > > Maybe the relation-extension logic needs to include the ability to get > the relevant vm page? I don't see how that's easily possible with the current lock ordering rules. At least without giving up using RBM_ZERO_AND_LOCK for extending or stuffing even more things to happen with the the extension lock held, which I don't think we want to. I don't think INSERT_FROZEN is worth that price. Perhaps we should just try to heuristically pin the right VM buffer before trying to extend? Thinking more about this, I think there's no inherent deadlock danger with reading the VM while holding a buffer lock, "just" an efficiency issue. If we avoid needing to do IO nearly all the time, by trying to pin the right page before extending, it's probably good enough. Greetings, Andres Freund
Re: Bug with ICU for merge join
On Fri, 2023-03-24 at 15:45 -0400, Jeff Janes wrote: > Ever since 27b62377b47f9e7bf58613, I have been getting "ERROR: > mergejoin input data is out of order" for the attached reproducer. Thank you for the report! And the simple repro. Fixed in 81a6d57e33. Regards, Jeff Davis
Re: Progress report of CREATE INDEX for nested partitioned tables
Justin Pryzby writes: > On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote: >> That's why I wanted list_length() not list_length() - 1. We are >> doing *something* at the top partitioned table, it just doesn't >> involve a table scan, so I don't find this totally unreasonable. >> If you agree we are doing work at intermediate partitioned tables, >> how are we not doing work at the top one? > What you're proposing would redefine the meaning of > PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned > tables. Which might be okay, but the scope of this thread/patch was to > fix the behavior involving intermediate partitioned tables. I'm a little skeptical of that argument, because this patch is already redefining the meaning of PARTITIONS_TOTAL. The fact that the existing documentation is vague enough to be read either way doesn't make it not a change. Still, in the interests of getting something done I'll drop the issue. regards, tom lane
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
Andres Freund writes: > On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote: >> Can't we ensure we actually lock the vm buffer too in ReadBufferBI, >> before calling ReadBufferExtended? Or am I confused and that's not >> possible for some reason? > Note that this is using P_NEW. I.e. we don't know the buffer location yet. Maybe the relation-extension logic needs to include the ability to get the relevant vm page? regards, tom lane
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
Hi, On 2023-03-25 14:34:25 +0100, Tomas Vondra wrote: > On 3/25/23 03:57, Andres Freund wrote: > > 2) Change relevant code so that we only return a valid vmbuffer if we could > > do > >so without blocking / IO and, obviously, skip updating the VM if we > >couldn't get the buffer. > > > > I don't recall the exact details about the vm locking/pinning, but can't > we just ensure we actually follow the proper locking order? I mean, this > only deals with new pages, requested at line ~624: > > buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate); > > Can't we ensure we actually lock the vm buffer too in ReadBufferBI, > before calling ReadBufferExtended? Or am I confused and that's not > possible for some reason? Note that this is using P_NEW. I.e. we don't know the buffer location yet. Greetings, Andres Freund
Re: meson/msys2 fails with plperl/Strawberry
Hi, On 2023-03-25 08:46:42 -0400, Andrew Dunstan wrote: > config/perl.m4 contains this: > > >AC_MSG_CHECKING(for flags to link embedded Perl) >if test "$PORTNAME" = "win32" ; then > perl_lib=`basename $perl_archlibexp/CORE/perl[[5-9]]*.lib .lib` > if test -e "$perl_archlibexp/CORE/$perl_lib.lib"; then > perl_embed_ldflags="-L$perl_archlibexp/CORE -l$perl_lib" > else > perl_lib=`basename $perl_archlibexp/CORE/libperl[[5-9]]*.a .a | > sed 's/^lib//'` > if test -e "$perl_archlibexp/CORE/lib$perl_lib.a"; then > perl_embed_ldflags="-L$perl_archlibexp/CORE -l$perl_lib" > fi > fi >else > pgac_tmp1=`$PERL -MExtUtils::Embed -e ldopts` > pgac_tmp2=`$PERL -MConfig -e 'print "$Config{ccdlflags} > $Config{ldflags}"'` > perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e > "s%$pgac_tmp2%%"` >fi >AC_SUBST(perl_embed_ldflags)dnl > > I don't see any equivalent in meson.build of the win32 logic, and thus I am > getting a setup failure on fairywren when trying to move it to meson, while > it will happily build with autoconf. I did not try to build with strawberry perl using mingw - it doesn't seem like a very interesting thing, given that mingw has a much more reasonable perl than strawberry - but with the mingw perl it works well. The above logic actually did *not* work well with mingw for me, because the names are not actually what configure expects, and it seems like a seriously bad idea to encode that much knowledge about library naming and locations. https://cirrus-ci.com/task/6421536551206912 [16:32:28.997] Has header "perl.h" : YES [16:32:28.997] Message: CCFLAGS recommended by perl: -DWIN32 -DWIN64 -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -D__USE_MINGW_ANSI_STDIO -fno-strict-aliasing -mms-bitfields [16:32:28.997] Message: CCFLAGS for embedding perl: -IC:\msys64\ucrt64\lib\perl5\core_perl/CORE -DWIN32 -DWIN64 -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPLPERL_HAVE_UID_GID [16:32:28.997] Message: LDFLAGS recommended by perl: "-s -L"C:\msys64\ucrt64\lib\perl5\core_perl\CORE" -L"C:\msys64\ucrt64\lib" "C:\msys64\ucrt64\lib\perl5\core_perl\CORE\libperl532.a" "C:\msys64\ucrt64\lib\libmoldname.a" "C:\msys64\ucrt64\lib\libkernel32.a" "C:\msys64\ucrt64\lib\libuser32.a" "C:\msys64\ucrt64\lib\libgdi32.a" "C:\msys64\ucrt64\lib\libwinspool.a" "C:\msys64\ucrt64\lib\libcomdlg32.a" "C:\msys64\ucrt64\lib\libadvapi32.a" "C:\msys64\ucrt64\lib\libshell32.a" "C:\msys64\ucrt64\lib\libole32.a" "C:\msys64\ucrt64\lib\liboleaut32.a" "C:\msys64\ucrt64\lib\libnetapi32.a" "C:\msys64\ucrt64\lib\libuuid.a" "C:\msys64\ucrt64\lib\libws2_32.a" "C:\msys64\ucrt64\lib\libmpr.a" "C:\msys64\ucrt64\lib\libwinmm.a" "C:\msys64\ucrt64\lib\libversion.a" "C:\msys64\ucrt64\lib\libodbc32.a" "C:\msys64\ucrt64\lib\libodbccp32.a" "C:\msys64\ucrt64\lib\libcomctl32.a"" [16:32:28.997] Message: LDFLAGS for embedding perl: "C:\msys64\ucrt64\lib\perl5\core_perl\CORE\libperl532.a C:\msys64\ucrt64\lib\libmoldname.a C:\msys64\ucrt64\lib\libkernel32.a C:\msys64\ucrt64\lib\libuser32.a C:\msys64\ucrt64\lib\libgdi32.a C:\msys64\ucrt64\lib\libwinspool.a C:\msys64\ucrt64\lib\libcomdlg32.a C:\msys64\ucrt64\lib\libadvapi32.a C:\msys64\ucrt64\lib\libshell32.a C:\msys64\ucrt64\lib\libole32.a C:\msys64\ucrt64\lib\liboleaut32.a C:\msys64\ucrt64\lib\libnetapi32.a C:\msys64\ucrt64\lib\libuuid.a C:\msys64\ucrt64\lib\libws2_32.a C:\msys64\ucrt64\lib\libmpr.a C:\msys64\ucrt64\lib\libwinmm.a C:\msys64\ucrt64\lib\libversion.a C:\msys64\ucrt64\lib\libodbc32.a C:\msys64\ucrt64\lib\libodbccp32.a C:\msys64\ucrt64\lib\libcomctl32.a" [16:32:28.997] Checking if "libperl" : links: YES > I would expect the ld flags to be "-LC:/STRAWB~1/perl/lib/CORE -lperl532" You didn't say what they ended up as? > (Off topic peeve - one of the things I dislike about meson is that the > meson.build files are written in YA bespoke language). I don't really disagree. However, all the general purpose language using build etools I found were awful. And meson's language is a heck of a lot nicer than e.g. cmake's... Greetings, Andres Freund
Re: Progress report of CREATE INDEX for nested partitioned tables
On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote: > Justin Pryzby writes: > > On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote: > >> Furthermore, as things stand it's not hard > >> for PARTITIONS_TOTAL to be zero --- there's at least one such case > >> in the regression tests --- and that seems just weird to me. > > > I don't know why it'd seem weird. postgres doesn't create partitions > > automatically, so by default there are none. If we create a table but > > never load any data, it'll have no partitions. > > My problem with it is that it's not clear how to tell "no partitioned > index creation in progress" from "partitioned index creation in progress, > but total = 0". Maybe there's some out-of-band way to tell that in the > stats reporting system, but still it's a weird corner case. > > > Also, the TOTAL=0 case > > won't go away just because we start counting intermediate partitions. > > That's why I wanted list_length() not list_length() - 1. We are > doing *something* at the top partitioned table, it just doesn't > involve a table scan, so I don't find this totally unreasonable. > If you agree we are doing work at intermediate partitioned tables, > how are we not doing work at the top one? What you're proposing would redefine the meaning of PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned tables. Which might be okay, but the scope of this thread/patch was to fix the behavior involving intermediate partitioned tables. It's somewhat weird to me that find_all_inheritors(rel) returns the rel itself. But it's an internal function, and evidently that's what's needed/desirable to do, so that's fine. However, "PARTITIONS_TOTAL" has a certain user-facing definition, and "Number of partitions" is easier to explain than "Number of partitions plus the rel itself", and IMO an easier definition is a better one. Your complaint seems similar to something I've said a few times before: it's weird to expose macroscopic progress reporting of partitioned tables in the same view and in the same *row* as microscopic progress of its partitions. But changing that is a job for another patch. I won't be opposed to it if someone were to propose a patch to remove partitions_{done,total}. See also: https://www.postgresql.org/message-id/flat/YCy5ZMt8xAyoOMmv%40paquier.xyz#b20d1be226a93dacd3fd40b402315105 -- Justin
Re: Infinite Interval
In terms of adding/subtracting infinities, the IEEE standard is pay walled and I don't have a copy. I tried finding information online but I also wasn't able to find anything useful. I additionally checked to see the results of C++, C, and Java, and they all match which increases my confidence that we're doing the right thing. Does anyone happen to have a copy of the standard and can confirm? - Joe Koshakow
Re: running logical replication as the subscription owner
On Fri, Mar 24, 2023 at 10:00:36AM -0400, Robert Haas wrote: > On Fri, Mar 24, 2023 at 3:59 AM Jeff Davis wrote: > > That's a little confusing, why not just always use the > > SECURITY_RESTRICTED_OPERATION? Is there a use case I'm missing? > > Some concern was expressed -- not sure exactly where the email is > exactly, and it might've been on pgsql-security -- that doing that > categorically might break things that are currently working. The > people who were concerned included Andres and I forget who else. My SECURITY_RESTRICTED_OPERATION blocks deferred triggers and CREATE TEMP TABLE. If you create a DEFERRABLE INITIALLY DEFERRED fk constraint and replicate to the constraint's table in a SECURITY_RESTRICTED_OPERATION, I expect an error. > gut reaction was the same as yours, just do it always and don't worry > about it. But if people think that users are likely to run afoul of Hard to know. It's the sort of thing that I model as creating months of work for epsilon users to adapt their applications. Epsilon could be zero. Most users don't notice. > the SECURITY_RESTRICTED_OPERATION restrictions in practice, then this > is better, and the implementation complexity isn't high. We could even > think of extending this kind of logic to other places where > SECURITY_RESTRICTED_OPERATION is enforced, if desired. Firing a trigger in an index expression or materialized view query is not reasonable, so today's uses of SECURITY_RESTRICTED_OPERATION would not benefit from this approach. Being able to create temp tables in those places has some value, but I would allow temp tables in SECURITY_RESTRICTED_OPERATION instead of proliferating the check on the ability to SET ROLE. (One might allow temp tables by introducing NewTempSchemaNestLevel(), called whenever we call NewGUCNestLevel(). The transaction would then proceed as though it has no temp schema, allocating an additional schema if creating a temp object.)
Re: Progress report of CREATE INDEX for nested partitioned tables
Justin Pryzby writes: > On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote: >> Furthermore, as things stand it's not hard >> for PARTITIONS_TOTAL to be zero --- there's at least one such case >> in the regression tests --- and that seems just weird to me. > I don't know why it'd seem weird. postgres doesn't create partitions > automatically, so by default there are none. If we create a table but > never load any data, it'll have no partitions. My problem with it is that it's not clear how to tell "no partitioned index creation in progress" from "partitioned index creation in progress, but total = 0". Maybe there's some out-of-band way to tell that in the stats reporting system, but still it's a weird corner case. > Also, the TOTAL=0 case > won't go away just because we start counting intermediate partitions. That's why I wanted list_length() not list_length() - 1. We are doing *something* at the top partitioned table, it just doesn't involve a table scan, so I don't find this totally unreasonable. If you agree we are doing work at intermediate partitioned tables, how are we not doing work at the top one? regards, tom lane
Re: Infinite Interval
On Fri, Mar 24, 2023 at 9:43 AM Ashutosh Bapat wrote: > >You don't need to do this, but looks like we can add DAYS_PER_WEEK macro and >use it here. I've attached a patch with this new macro. There's probably tons of places it can be used instead of hardcoding the number 7, but I'll save that for a future patch. - Joe Koshakow From 41fa5de65c757d72331aff6bb626fab76390e9db Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Sat, 18 Mar 2023 12:26:28 -0400 Subject: [PATCH 1/2] Move integer helper function to int.h --- src/backend/utils/adt/datetime.c | 25 - src/include/common/int.h | 13 + 2 files changed, 17 insertions(+), 21 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index be2e55bb29..64f28a85b0 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -51,7 +51,6 @@ static int DecodeDate(char *str, int fmask, int *tmask, bool *is2digits, struct pg_tm *tm); static char *AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool fillzeros); -static bool int64_multiply_add(int64 val, int64 multiplier, int64 *sum); static bool AdjustFractMicroseconds(double frac, int64 scale, struct pg_itm_in *itm_in); static bool AdjustFractDays(double frac, int scale, @@ -515,22 +514,6 @@ AppendTimestampSeconds(char *cp, struct pg_tm *tm, fsec_t fsec) return AppendSeconds(cp, tm->tm_sec, fsec, MAX_TIMESTAMP_PRECISION, true); } - -/* - * Add val * multiplier to *sum. - * Returns true if successful, false on overflow. - */ -static bool -int64_multiply_add(int64 val, int64 multiplier, int64 *sum) -{ - int64 product; - - if (pg_mul_s64_overflow(val, multiplier, ) || - pg_add_s64_overflow(*sum, product, sum)) - return false; - return true; -} - /* * Multiply frac by scale (to produce microseconds) and add to itm_in->tm_usec. * Returns true if successful, false if itm_in overflows. @@ -621,7 +604,7 @@ AdjustMicroseconds(int64 val, double fval, int64 scale, struct pg_itm_in *itm_in) { /* Handle the integer part */ - if (!int64_multiply_add(val, scale, _in->tm_usec)) + if (pg_mul_add_s64_overflow(val, scale, _in->tm_usec)) return false; /* Handle the float part */ return AdjustFractMicroseconds(fval, scale, itm_in); @@ -2701,9 +2684,9 @@ DecodeTimeForInterval(char *str, int fmask, int range, return dterr; itm_in->tm_usec = itm.tm_usec; - if (!int64_multiply_add(itm.tm_hour, USECS_PER_HOUR, _in->tm_usec) || - !int64_multiply_add(itm.tm_min, USECS_PER_MINUTE, _in->tm_usec) || - !int64_multiply_add(itm.tm_sec, USECS_PER_SEC, _in->tm_usec)) + if (pg_mul_add_s64_overflow(itm.tm_hour, USECS_PER_HOUR, _in->tm_usec) || + pg_mul_add_s64_overflow(itm.tm_min, USECS_PER_MINUTE, _in->tm_usec) || + pg_mul_add_s64_overflow(itm.tm_sec, USECS_PER_SEC, _in->tm_usec)) return DTERR_FIELD_OVERFLOW; return 0; diff --git a/src/include/common/int.h b/src/include/common/int.h index 450800894e..81726c65f7 100644 --- a/src/include/common/int.h +++ b/src/include/common/int.h @@ -254,6 +254,19 @@ pg_mul_s64_overflow(int64 a, int64 b, int64 *result) #endif } +/* + * Add val * multiplier to *sum. + * Returns false if successful, true on overflow. + */ +static inline bool +pg_mul_add_s64_overflow(int64 val, int64 multiplier, int64 *sum) +{ + int64 product; + + return pg_mul_s64_overflow(val, multiplier, ) || + pg_add_s64_overflow(*sum, product, sum); +} + /* * Overflow routines for unsigned integers * -- 2.34.1 From 242ffd232bef606c9c948f0ee9980152fb9e3bec Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Sat, 18 Mar 2023 12:38:58 -0400 Subject: [PATCH 2/2] Check for overflow in make_interval --- src/backend/utils/adt/timestamp.c | 24 +++- src/include/common/int.h | 13 + src/include/datatype/timestamp.h | 1 + src/test/regress/expected/interval.out | 5 + src/test/regress/sql/interval.sql | 4 5 files changed, 42 insertions(+), 5 deletions(-) diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index aaadc68ae6..ccf0019a3c 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -1517,13 +1517,27 @@ make_interval(PG_FUNCTION_ARGS) errmsg("interval out of range"))); result = (Interval *) palloc(sizeof(Interval)); - result->month = years * MONTHS_PER_YEAR + months; - result->day = weeks * 7 + days; + result->month = months; + if (pg_mul_add_s32_overflow(years, MONTHS_PER_YEAR, >month)) + ereport(ERROR, +(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("interval out of range"))); + result->day = days; + if (pg_mul_add_s32_overflow(weeks, DAYS_PER_WEEK, >day)) + ereport(ERROR, +
Re: About the constant-TRUE clause in reconsider_outer_join_clauses
Richard Guo writes: > Should we instead mark the constant-TRUE clause with required_relids > plus the OJ relid? I do not think it matters. > Even if the join does become clauseless, it will end up being an > unqualified nestloop. I think the join ordering algorithm will force > this join to be formed when necessary. We would find *some* valid plan, but not necessarily a *good* plan. The point of the dummy clause is to ensure that the join is considered as soon as possible. That might not be the ideal join order of course, but we'll consider it among other join orders and arrive at a cost-based decision. With no dummy clause, the join order heuristics would always delay this join as long as possible; so even if another ordering is better, we'd not find it. regards, tom lane
Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
On 3/25/23 03:57, Andres Freund wrote: > Hi, > > Starting with > > commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2 > Author: Tomas Vondra > Date: 2021-01-17 22:11:39 +0100 > > Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE > That's a bummer :-( > RelationGetBufferForTuple does > > /* >* The page is empty, pin vmbuffer to set all_frozen bit. >*/ > if (options & HEAP_INSERT_FROZEN) > { > Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0); > visibilitymap_pin(relation, BufferGetBlockNumber(buffer), > vmbuffer); > } > > while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer > doesn't already point to the right block. > > > The lock ordering rules are to lock VM pages *before* locking heap pages. > > > I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN > effectively requires that the relation is access exclusive locked. There > shouldn't be other backends locking multiple buffers in the relation (bgwriter > / checkpointer can lock a single buffer at a time, but that's it). > Right. Still, it seems a bit fragile ... > > I see roughly two ways forward: > > 1) We add a comment explaining why it's safe to violate lock ordering rules in >this one situation > Possible, although I feel uneasy about just documenting a broken rule. Would be better to maintain the locking order. > 2) Change relevant code so that we only return a valid vmbuffer if we could do >so without blocking / IO and, obviously, skip updating the VM if we >couldn't get the buffer. > I don't recall the exact details about the vm locking/pinning, but can't we just ensure we actually follow the proper locking order? I mean, this only deals with new pages, requested at line ~624: buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate); Can't we ensure we actually lock the vm buffer too in ReadBufferBI, before calling ReadBufferExtended? Or am I confused and that's not possible for some reason? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
meson/msys2 fails with plperl/Strawberry
Hi, config/perl.m4 contains this: AC_MSG_CHECKING(for flags to link embedded Perl) if test "$PORTNAME" = "win32" ; then perl_lib=`basename $perl_archlibexp/CORE/perl[[5-9]]*.lib .lib` if test -e "$perl_archlibexp/CORE/$perl_lib.lib"; then perl_embed_ldflags="-L$perl_archlibexp/CORE -l$perl_lib" else perl_lib=`basename $perl_archlibexp/CORE/libperl[[5-9]]*.a .a | sed 's/^lib//'` if test -e "$perl_archlibexp/CORE/lib$perl_lib.a"; then perl_embed_ldflags="-L$perl_archlibexp/CORE -l$perl_lib" fi fi else pgac_tmp1=`$PERL -MExtUtils::Embed -e ldopts` pgac_tmp2=`$PERL -MConfig -e 'print "$Config{ccdlflags} $Config{ldflags}"'` perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e "s%$pgac_tmp2%%"` fi AC_SUBST(perl_embed_ldflags)dnl I don't see any equivalent in meson.build of the win32 logic, and thus I am getting a setup failure on fairywren when trying to move it to meson, while it will happily build with autoconf. I would expect the ld flags to be "-LC:/STRAWB~1/perl/lib/CORE -lperl532" (Off topic peeve - one of the things I dislike about meson is that the meson.build files are written in YA bespoke language). cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: [PATCH] Add XMLText function (SQL/XML X038)
On 25.03.23 12:53, Pavel Stehule wrote: so 25. 3. 2023 v 12:49 odesílatel Jim Jones napsal: Hi, This small patch proposes the implementation of the standard SQL/XML function XMLText (X038). It basically converts a text parameter into an xml text node. It uses the libxml2 function xmlEncodeSpecialChars[1] to escape possible predefined entities. This patch also contains documentation and regression tests. Any thoughts? +1 Pavel Thanks! I just realized that I forgot to add a few examples to my last message :D postgres=# SELECT xmltext('foo ´/[({bar?})]\`'); xmltext foo ´/[({bar?})]\` (1 row) postgres=# SELECT xmltext('foo & '); xmltext --- foo bar (1 row)
Re: [PATCH] Add XMLText function (SQL/XML X038)
so 25. 3. 2023 v 12:49 odesílatel Jim Jones napsal: > Hi, > > This small patch proposes the implementation of the standard SQL/XML > function XMLText (X038). It basically converts a text parameter into an > xml text node. It uses the libxml2 function xmlEncodeSpecialChars[1] to > escape possible predefined entities. > > This patch also contains documentation and regression tests. > > Any thoughts? > +1 Pavel > Best, Jim > > 1 - > > https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-entities.html#xmlEncodeSpecialChars >
[PATCH] Add XMLText function (SQL/XML X038)
Hi, This small patch proposes the implementation of the standard SQL/XML function XMLText (X038). It basically converts a text parameter into an xml text node. It uses the libxml2 function xmlEncodeSpecialChars[1] to escape possible predefined entities. This patch also contains documentation and regression tests. Any thoughts? Best, Jim 1 - https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-entities.html#xmlEncodeSpecialChars From 84d6e026724d7e3869f28b09c686e2fc4da67873 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Sat, 25 Mar 2023 12:24:49 +0100 Subject: [PATCH v1] Add XMLText function (SQL/XML X038) This function implements the standard XMLTest function, which converts text into xml text nodes. It uses the libxml2 function xmlEncodeSpecialChars to escape predifined entites (&"<>), so that those do not cause any conflict when concatenating the text node output with existing xml documents. This patch includes also documentation and regression tests. --- doc/src/sgml/func.sgml | 30 +++ src/backend/catalog/sql_features.txt | 2 +- src/backend/utils/adt/xml.c | 30 +++ src/include/catalog/pg_proc.dat | 3 +++ src/test/regress/expected/xml.out| 36 src/test/regress/expected/xml_1.out | 23 ++ src/test/regress/expected/xml_2.out | 36 src/test/regress/sql/xml.sql | 7 ++ 8 files changed, 166 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index a3a13b895f..45195cf05b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14001,6 +14001,36 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple documents for processing in client applications. + +xmltext + + + xmltext + + + +xmltext ( text ) xml + + + + The function xmltext returns an XML value with a single + text node containing the input argument as its content. Predefined entities + like ampersand (), left and right angle brackets + (), and quotation marks () + are escaped. + + + + Example: + + + + xmlcomment diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index bb4c135a7f..82e6ab8258 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -595,7 +595,7 @@ X034 XMLAgg YES X035 XMLAgg: ORDER BY option YES X036 XMLComment YES X037 XMLPI YES -X038 XMLText NO +X038 XMLText YES X040 Basic table mapping YES X041 Basic table mapping: null absent YES X042 Basic table mapping: null as nil YES diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 15adbd6a01..6621d69219 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -47,6 +47,7 @@ #ifdef USE_LIBXML #include +#include #include #include #include @@ -505,6 +506,10 @@ xmlcomment(PG_FUNCTION_ARGS) appendStringInfoText(, arg); appendStringInfoString(, "-->"); + + + + PG_RETURN_XML_P(stringinfo_to_xmltype()); #else NO_XML_SUPPORT(); @@ -5006,3 +5011,28 @@ XmlTableDestroyOpaque(TableFuncScanState *state) NO_XML_SUPPORT(); #endif /* not USE_LIBXML */ } + +Datum +xmltext(PG_FUNCTION_ARGS) +{ +#ifdef USE_LIBXML + + text *arg = PG_GETARG_TEXT_PP(0); + text *result; + xmlChar*xmlbuf = NULL; + + xmlbuf = xmlEncodeSpecialChars(NULL,xml_text2xmlChar(arg)); + + Assert(xmlbuf); + + result = cstring_to_text_with_len((const char *) xmlbuf, xmlStrlen(xmlbuf)); + + xmlFree(xmlbuf); + + PG_RETURN_XML_P(result); + +#else + NO_XML_SUPPORT(); + return 0; +#endif +} \ No newline at end of file diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 5cf87aeb2c..293ae66adc 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8772,6 +8772,9 @@ { oid => '2922', descr => 'serialize an XML value to a character string', proname => 'text', prorettype => 'text', proargtypes => 'xml', prosrc => 'xmltotext' }, +{ oid => '3813', descr => 'generate XML text node', + proname => 'xmltext', proisstrict => 't', prorettype => 'xml', + proargtypes => 'text', prosrc => 'xmltext' }, { oid => '2923', descr => 'map table contents to XML', proname => 'table_to_xml', procost => '100', provolatile => 's', diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 398345ca67..93cbe4edf6 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -1785,3 +1785,39 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH | foo/ (1 row) +SELECT xmltext(NULL); + xmltext +- + +(1 row) + +SELECT xmltext(''); + xmltext +- + +(1 row) + +SELECT xmltext(' '); + xmltext +- + +(1 row) + +SELECT xmltext('foo
Re: Disable vacuuming to provide data history
W dniu sob, 25.02.2023 o godzinie 03∶11 +0100, użytkownik Vik Fearing napisał: > On 2/24/23 22:06, Corey Huinker wrote: > > The main design blocker for me is how to handle dump/restore. The > standard does not bother thinking about that. That would be a little difficult. Most probably you would need to operate on history view to dump/restore Best regards, Marek Mosiewicz
Re: Data is copied twice when specifying both child and parent table in publication
On Fri, Mar 24, 2023 at 2:36 PM wangw.f...@fujitsu.com wrote: > > On Fri, Mar 24, 2023 at 14:17 PM Amit Kapila wrote: > > And I found there is a problem in the three back-branch patches > (HEAD_v21_0002*, > REL15_* and REL14_*): > In the function fetch_table_list, we use pg_partition_ancestors to get the > list > of tables from the publisher. But the pg_partition_ancestors was introduced in > v12, which means that if the publisher is v11 and the subscriber is v14+, this > will cause an error. > Yeah, I am also not sure how to fix this for back-branches. I didn't see any field report for this so I am hesitant to make any complicated changes in back-branches that will deviate it from HEAD. Let's try to fix it for HEAD at this stage. I have slightly modified the attached patch, the changes are (a) I have removed the retail pfrees added in pg_get_publication_tables() as that memory will anyway be freed when we call SRF_RETURN_DONE(). It is also inconsistent to sometimes do retail pfree and not other times in the same function. I have also referred few similar functions and didn't find them doing retail pfree. (b) Changed the comments in a few places. The patch looks good to me. So, I am planning to push this sometime early next week unless there are more suggestions or comments. -- With Regards, Amit Kapila. v23-0001-Avoid-syncing-data-twice-for-the-publish_via_par.patch Description: Binary data
Re: Commitfest 2023-03 starting tomorrow!
On Fri, Mar 24, 2023 at 5:42 AM Greg Stark wrote: > > * asynchronous execution support for Custom Scan > - This is a pretty small isolated feature. I was planning to review this patch, but unfortunately, I did not have time for that, and I do not think I will for v16. So if anyone wants to work on this, please do so; if not, I want to in the next development cycle for v17. Best regards, Etsuro Fujita
Re: running logical replication as the subscription owner
> As things stand today, if you think somebody's going to send you > changes to tables other than the ones you intend to replicate, you > could handle that by making sure that the user that owns the > subscription only has permission to write to the tables that are > expected to receive replicated data. It's a bit awkward to set up > because you have to initially make the subscription owner a superuser > and then later remove the superuser bit, so I think this is another > argument for the pg_create_subscription patch posted elsewhere, but if > you're a superuser yourself, you can do it. However, with this patch, > that wouldn't work any more, because the permissions checks don't > happen until after we've switched to the target role. For my purposes I always trust the publisher, what I don't trust is the table owners. But I can indeed imagine scenarios where that's the other way around, and indeed you can protect against that currently, but not with your new patch. That seems fairly easily solvable though. + if (!member_can_set_role(context->save_userid, userid)) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), +errmsg("role \"%s\" cannot SET ROLE to \"%s\"", + GetUserNameFromId(context->save_userid, false), + GetUserNameFromId(userid, false; If we don't throw an error here, but instead simply return, then the current behaviour is preserved and people can manually configure permissions to protect against an untrusted publisher. This would still mean that the table owners can escalate privileges to the subscription owner, but if that subscription owner actually has fewer privileges than the table owner then you don't have that issue.
About the constant-TRUE clause in reconsider_outer_join_clauses
I happened to notice a constant-TRUE clause with is_pushed_down being true while its required_relids not including the OJ being formed, which seems abnormal to me. It turns out that this clause comes from reconsider_outer_join_clauses(), as a dummy replacement if we've generated a derived clause. The comment explains this as * If we do generate a derived clause, * however, the outer-join clause is redundant. We must still put some * clause into the regular processing, because otherwise the join will be * seen as a clauseless join and avoided during join order searching. * We handle this by generating a constant-TRUE clause that is marked with * required_relids that make it a join between the correct relations. Should we instead mark the constant-TRUE clause with required_relids plus the OJ relid? Besides, I think 'otherwise the join will be seen as a clauseless join' is not necessarily true, because the join may have other join clauses that do not have any match. As an example, consider select * from a left join b on a.i = b.i and a.j = b.j where a.i = 2; So should we use 'may' rather than 'will' here? Even if the join does become clauseless, it will end up being an unqualified nestloop. I think the join ordering algorithm will force this join to be formed when necessary. So I begin to wonder if it's really necessary to generate this dummy constant-TRUE clause. Thanks Richard