Re: Server crash on RHEL 9/s390x platform against PG16
Few more details on this: (gdb) p val $1 = 0 (gdb) p i $2 = 3 (gdb) f 3 #3 0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at ../../../../src/include/executor/tuptable.h:472 472 return slot->tts_ops->copy_minimal_tuple(slot); (gdb) p *slot $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values = 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0} (gdb) p *slot->tts_tupleDescriptor $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = 0x0, attrs = 0x202cd28} (gdb) p slot.tts_values[3] $4 = 0 (gdb) p slot.tts_values[2] $5 = 1 (gdb) p slot.tts_values[1] $6 = 34027556 As per the resultslot, it has 0 value for the third attribute (column lable). Im testing this on the docker container and facing some issues with gdb hence could not able to debug it further. Here is a explain plan: postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; QUERY PLAN - Incremental Sort Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden Presorted Key: rm32044_t1.pkey -> Merge Left Join Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey) -> Sort Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val Sort Key: rm32044_t1.pkey -> Nested Loop Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val -> Merge Left Join Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey) -> Sort Output: rm32044_t3.pkey, rm32044_t3.val Sort Key: rm32044_t3.pkey -> Seq Scan on public.rm32044_t3 Output: rm32044_t3.pkey, rm32044_t3.val -> Sort Output: rm32044_t4.pkey Sort Key: rm32044_t4.pkey -> Seq Scan on public.rm32044_t4 Output: rm32044_t4.pkey -> Materialize Output: rm32044_t1.pkey, rm32044_t1.val -> Seq Scan on public.rm32044_t1 Output: rm32044_t1.pkey, rm32044_t1.val -> Sort Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden Sort Key: rm32044_t2.pkey -> Seq Scan on public.rm32044_t2 Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden (34 rows) It seems like while building the innerslot for merge join, the value for attnum 1 is not getting fetched correctly. On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage < suraj.khar...@enterprisedb.com> wrote: > Hi, > > Found server crash on RHEL 9/s390x platform with below test case - > > *Machine details:* > > > > > > > > *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release > 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: > s390x CPU op-mode(s): 32-bit, 64-bit Address sizes:39 > bits physical, 48 bits virtual Byte Order: Big Endian* > *Configure command:* > ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd > --with-llvm --with-perl --with-python --with-tcl --with-openssl > --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl > --without-icu --enable-debug --enable-cassert --with-pgport=5414 > > > *Test case:* > CREATE TABLE rm32044_t1 > ( > pkey integer, > val text > ); > CREATE TABLE rm32044_t2 > ( > pkey integer, > label text, > hidden boolean > ); > CREATE TABLE rm32044_t3 > ( > pkey integer, > val integer > ); > CREATE TABLE rm32044_t4 > ( > pkey integer > ); > insert into rm32044_t1 values ( 1 , 'row1'); > insert into rm32044_t1 values ( 2 , 'row2'); > insert into rm32044_t2 values ( 1 , 'hidden', true); > insert into rm32044_t2 values ( 2 , 'visible', false); > insert into rm32044_t3 values (1 , 1);
Re: Speed up transaction completion faster after many relations are accessed in a transaction
On Fri, 15 Sept 2023 at 22:37, Heikki Linnakangas wrote: > > I've added a call to LockAssertNoneHeld(false) in there. > > I don't see it in the patch? hmm. I must've git format-patch before committing that part. I'll try that again... see attached. David v8-0001-wip-resowner-lock-release-all.patch Description: Binary data
Re: Add 'worker_type' to pg_stat_subscription
On Mon, Sep 18, 2023 at 6:10 AM Peter Smith wrote: > > IIUC some future feature syncing of sequences is likely to share a lot > of the tablesync worker code (maybe it is only differentiated by the > relid being for a RELKIND_SEQUENCE?). > > The original intent of this stats worker-type patch was to be able to > easily know the type of the process without having to dig through > other attributes (like relid etc.) to infer it. > That makes sense and I think it will probably be helpful in debugging. For example, I am not sure the following and similar changes in the patch are a good idea: if (am_tablesync_worker()) ereport(LOG, - (errmsg("logical replication table synchronization worker for subscription \"%s\", table \"%s\" has started", + (errmsg("logical replication synchronization worker for subscription \"%s\", table \"%s\" has started", I think it would be sometimes helpful in debugging to know the type of sync worker, so keeping the type in the above message would be helpful. > If you feel > differentiating kinds of syncing processes won't be of interest to > users then just generically calling it "synchronization" is fine by > me. OTOH, if users might care what 'kind' of syncing it is, perhaps > leaving the stats attribute as "table synchronization" (and some > future patch would add "sequence synchronization") is better. > Earlier, I thought it would be better to keep it generic but after seeing your point and the latest changes in the patch it seems differentiating between types of sync workers would be a good idea. -- With Regards, Amit Kapila.
Re: make add_paths_to_append_rel aware of startup cost
On Mon, 18 Sept 2023 at 01:42, Andy Fan wrote: > On Fri, Sep 15, 2023 at 3:15 PM David Rowley wrote: >> Instead of doing that, why don't you just create a completely new >> AppendPath containing all the cheapest_startup_paths and add that to >> the append rel. You can optimise this and only do it when >> rel->consider_startup is true. >> >> Does the attached do anything less than what your patch does? > > > We can work like this, but there is one difference from what > my current patch does. It is cheapest_startup_path vs cheapest > fraction path. For example if we have the following 3 paths with > all of the estimated rows is 100 and the tuple_fraction is 10. Yeah, it's true that the version I wrote didn't consider the fractional part, but I didn't really see it as worse than what you did. It looks like you're assuming that every append child will have the same number of tuples read from it, but it seems to me that it would only be valid to use the fractional part for the first child. The path chosen for subsequent child paths would, if done correctly, need to account for the estimated rows from the previous child paths. It's not valid here to copy the code in generate_orderedappend_paths() as MergeAppend won't necessarily exhaust the first child subpath first like Append will. > Path 1: startup_cost = 60, total_cost = 80 -- cheapest total cost. > Path 2: startup_cost = 10, total_cost = 1000 -- cheapest startup cost > Path 3: startup_cost = 20, total_cost = 90 -- cheapest fractional cost > > So with the patch you propose, Path 1 & Path 3 are chosen to build > append path. but with my current patch, Only path 3 is kept. IIUC, > path 3 should be the best one in this case. I assume you mean mine would build AppendPaths for 1+2, not 1+3. You mentioned: > I just find it is hard to build the case for Identifier 2/3/4. I wonder if this is because generate_orderedappend_paths() handles startup paths and most cases will that need a cheap startup plan will require some sort of pathkeys. The example you mentioned of: (select * from tenk1 order by hundred) UNION ALL (select * from tenk1 order by hundred) limit 3; I don't find this to be a compellingly real-world case. The planner is under no obligation to output rows from the 1st branch of the UNION ALL before the 2nd one. If the user cared about that then they'd have instead added a top-level ORDER BY, in which case the planner seems happy to use the index scan: regression=# explain (costs off) (select * from tenk1) UNION ALL (select * from tenk1) order by hundred limit 3; QUERY PLAN - Limit -> Merge Append Sort Key: tenk1.hundred -> Index Scan using tenk1_hundred on tenk1 -> Index Scan using tenk1_hundred on tenk1 tenk1_1 It would be good if you could provide a bit more detail on the cases you want to improve here. For example, if your #4 case, you have "WHERE xxx". I don't know if "xxx" is just a base qual or if there's a correlation to the outer query in there. Another concern I have with your patch is that it seems to be under the impression that there being further joins to evaluate at this query level is the only reason that we would have to pull more than the tuple fraction number of rows from the query. What gives you the confidence that's the only reason we may want to pull more than the tuple fraction of tuples from the append child? David
Re: Add 'worker_type' to pg_stat_subscription
On Sun, Sep 17, 2023 at 2:10 AM Nathan Bossart wrote: > > On Sat, Sep 16, 2023 at 06:09:48PM +0530, Amit Kapila wrote: > > This still leaves the possibility for confusion with the documentation's > use of "leader apply worker", but I haven't touched that for now. > We may want to fix that separately but as you have raised here, I found the following two places in docs which could be a bit confusing. "Specifies maximum number of logical replication workers. This includes leader apply workers, parallel apply workers, and table synchronization" ""OID of the relation that the worker is synchronizing; NULL for the leader apply worker and parallel apply workers" One simple idea to reduce confusion could be to use (leader) in the above two places. Do you see any other place which could be confusing and what do you suggest to fix it? -- With Regards, Amit Kapila.
Re: SQL:2011 application time
On Fri, Sep 15, 2023 at 12:11 AM Paul Jungwirth wrote: > > > I'll keep working on a patch to support multiple range keys, but I > wanted to work through the rest of the feedback first. Also there is > some fixing to do with partitions I believe, and then I'll finish the > PERIOD support. So this v14 patch is just some minor fixes & tweaks from > September feedback. > small issues so far I found, v14. IndexInfo struct definition comment still has Temporal related comment, should be removed. catalog-pg-index.html, no indperiod doc entry, also in table pg_index, column indperiod is junk value now. I think in UpdateIndexRelation, you need an add indperiod to build a pg_index tuple, similar to what you did in CreateConstraintEntry. seems to make the following query works, we need to bring btree_gist related code to core? CREATE TABLE temporal_fk_rng2rng22 (id int8, valid_at int4range, unique (id, valid_at WITHOUT OVERLAPS)); /* * pg_period definition. cpp turns this into * typedef struct FormData_pg_period * */ CATALOG(pg_period,8000,PeriodRelationId) { Oid oid; /* OID of the period */ NameData pername; /* name of period */ Oid perrelid; /* OID of relation containing this period */ int16 perstart; /* column for start value */ int16 perend; /* column for end value */ int16 perrange; /* column for range value */ Oid perconstraint; /* OID of (start < end) constraint */ } FormData_pg_period; no idea what the above comment "cpp'' refers to. The sixth field in FormData_pg_period: perrange, the comment conflict with catalogs.sgml >> perrngtype oid (references pg_type.oid) >> The OID of the range type associated with this period create table pt (id integer, ds date, de date, period for p (ds, de)); SELECT table_name, column_name, column_default, is_nullable, is_generated, generation_expression FROMinformation_schema.columns WHERE table_name = 'pt' ORDER BY 1, 2; the hidden generated column (p) is_nullable return NO. but ds, de is_nullable both return YES. so column p is_nullable should return YES?
Re: to_regtype() Raises Error
On Sunday, September 17, 2023, Chapman Flack wrote: > > In this one, both identifiers are part of the type name, and the > separator a little more flamboyant. > > select to_regtype('character /* hi! > am I part of the type name? /* what, me too? */ ok! */ -- huh! > varying'); > to_regtype > --- > character varying > So, maybe we should be saying: Parses a string of text, extracts a potential type name from it, and translates that name into an OID. Failure to extract a valid potential type name results in an error while a failure to determine that the extracted name is known to the system results in a null output. I take specific exception to describing your example as a “textual type name”. David J.
Re: to_regtype() Raises Error
On 2023-09-17 21:58, David G. Johnston wrote: ambiguity possible when doing that though: create type "interval second" as (x int, y int); select to_regtype('interval second'); --> interval Not ambiguity really: that composite type you just made was named with a single , which is one token. (Also, being delimited makes it case-sensitive, and always distinct from an SQL keyword; consider the different types char and "char". Ah, that SQL committee!) The argument to regtype there is a single, case-insensitive, , a , and another , where in this case the first identifier happens to name a type, the second one happens to be a typmod, and the separator is rather simple as goes. In this one, both identifiers are part of the type name, and the separator a little more flamboyant. select to_regtype('character /* hi! am I part of the type name? /* what, me too? */ ok! */ -- huh! varying'); to_regtype --- character varying As the backend already has one parser that knows all those lexical and grammar productions, I don't imagine it would be very appealing to have a second implementation of some of them. Obviously, to_regtype could add some simplifying requirements (like "only whitespace for the separator please"), but as you see above, it currently doesn't. Regards, -Chap
Re: to_regtype() Raises Error
On Sun, Sep 17, 2023 at 6:25 PM Chapman Flack wrote: > On 2023-09-17 20:58, David G. Johnston wrote: > > Put differently, there is no syntax involved when the value being > > provided > > is the text literal name of a type as it is stored in pg_type.typname, > > so > > the presence of a syntax error is wrong. > > Well, the situation is a little weirder than that, because of the > existence > of SQL standard types with multiple-token names; when you provide the > value 'character varying', you are not providing a name found in > pg_type.typname (while, if you call the same type 'varchar', you are). > For 'character varying', the parser is necessarily involved. > Why don't we just populate pg_type with these standard mandated names too? > > The case with 'interval second' is similar, but even different still; > that isn't a multiple-token type name, but a type name with a > standard-specified bespoke way of writing a typmod. Another place > the parser is necessarily involved, doing another job. (AFAICT, > to_regtype is happy with a typmod attached to the input, and > happily ignores it, so to_regtype('interval second') gives you > interval, to_regtype('character varying(20)') gives you > character varying, and so on.) > > Seems doable to teach the lookup code that suffixes of the form (n) should be ignored when matching the base type, plus maybe some kind of special case for standard mandated typmods on their specific types. There is some ambiguity possible when doing that though: create type "interval second" as (x int, y int); select to_regtype('interval second'); --> interval Or just write a function that deals with the known forms dictated by the standard and delegate checking for valid names against that first before consulting pg_type? It might be some code duplication but it isn't like it's a quickly moving target and I have to imagine it would be faster and allow us to readily implement the soft-error contract. David J.
Re: to_regtype() Raises Error
On 2023-09-17 20:58, David G. Johnston wrote: Put differently, there is no syntax involved when the value being provided is the text literal name of a type as it is stored in pg_type.typname, so the presence of a syntax error is wrong. Well, the situation is a little weirder than that, because of the existence of SQL standard types with multiple-token names; when you provide the value 'character varying', you are not providing a name found in pg_type.typname (while, if you call the same type 'varchar', you are). For 'character varying', the parser is necessarily involved. The case with 'interval second' is similar, but even different still; that isn't a multiple-token type name, but a type name with a standard-specified bespoke way of writing a typmod. Another place the parser is necessarily involved, doing another job. (AFAICT, to_regtype is happy with a typmod attached to the input, and happily ignores it, so to_regtype('interval second') gives you interval, to_regtype('character varying(20)') gives you character varying, and so on.) Regards, -Chao
Re: to_regtype() Raises Error
On Sun, Sep 17, 2023 at 5:34 PM Erik Wienhold wrote: > On 18/09/2023 00:57 CEST Vik Fearing wrote: > > > On 9/18/23 00:41, Erik Wienhold wrote: > > > On 18/09/2023 00:13 CEST David E. Wheeler > wrote: > > > > > >> david=# select to_regtype('inteval second'); > > >> ERROR: syntax error at or near "second" > > >> LINE 1: select to_regtype('inteval second'); > > >> ^ > > >> CONTEXT: invalid type name "inteval second” > > > > > > Probably a typo and you meant 'interval second' which works. > > > > No, that is precisely the point. The result should be null instead of > > an error. > > Well, the docs say "return NULL rather than throwing an error if the name > is > not found". > To me "name is not found" implies that it has to be valid syntax > first to even have a name that can be looked up. > Except there is nothing in the typed literal value that is actually a syntactical problem from the perspective of the user. IOW, the following work just fine: select to_regtype('character varying'), to_regtype('interval second'); No need for quotes and the space doesn't produce an issue (and in fact adding double quotes to the above causes them to not match since the quoting is taken literally and not syntactically) The failure to return NULL exposes an implementation detail that we shouldn't be exposing. As Tom said, maybe doing better is too hard to be worthwhile, but that doesn't mean our current behavior is somehow correct. Put differently, there is no syntax involved when the value being provided is the text literal name of a type as it is stored in pg_type.typname, so the presence of a syntax error is wrong. David J.
Re: Add 'worker_type' to pg_stat_subscription
IIUC some future feature syncing of sequences is likely to share a lot of the tablesync worker code (maybe it is only differentiated by the relid being for a RELKIND_SEQUENCE?). The original intent of this stats worker-type patch was to be able to easily know the type of the process without having to dig through other attributes (like relid etc.) to infer it. If you feel differentiating kinds of syncing processes won't be of interest to users then just generically calling it "synchronization" is fine by me. OTOH, if users might care what 'kind' of syncing it is, perhaps leaving the stats attribute as "table synchronization" (and some future patch would add "sequence synchronization") is better. TBH, I am not sure which option is best, so I am happy to go with the consensus. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: to_regtype() Raises Error
On Mon, 2023-09-18 at 00:57 +0200, Vik Fearing wrote: > On 9/18/23 00:41, Erik Wienhold wrote: > > On 18/09/2023 00:13 CEST David E. Wheeler wrote: > > > The docs for `to_regtype()` say, “this function will return NULL rather > > > than > > > throwing an error if the name is not found.” And it’s true most of the > > > time: > > > > > > david=# select to_regtype('foo'), to_regtype('clam'); > > > to_regtype | to_regtype > > > + > > > [null] | [null] > > > > > > But not others: > > > > > > david=# select to_regtype('inteval second'); > > > ERROR: syntax error at or near "second" > > > LINE 1: select to_regtype('inteval second'); > > > ^ > > > CONTEXT: invalid type name "inteval second” > > > > Probably a typo and you meant 'interval second' which works. > No, that is precisely the point. The result should be null instead of > an error. Right. I debugged into this, and found this comment to typeStringToTypeName(): * If the string cannot be parsed as a type, an error is raised, * unless escontext is an ErrorSaveContext node, in which case we may * fill that and return NULL. But note that the ErrorSaveContext option * is mostly aspirational at present: errors detected by the main * grammar, rather than here, will still be thrown. "escontext" is an ErrorSaveContext node, and it is the parser failing. Not sure if we can do anything about that or if it is worth the effort. Perhaps the documentation could reflect the implementation. Yours, Laurenz Albe
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
On Wed, Jul 26, 2023 at 6:41 AM Peter Geoghegan wrote: > > MDAM seems to require exponential storage for "scan key operations" > > for conditions on N columns (to be precise, the product of the number > > of distinct conditions on each column); e.g. an index on mytable > > (a,b,c,d,e,f,g,h) with conditions "a IN (1, 2) AND b IN (1, 2) AND ... > > AND h IN (1, 2)" would require 2^8 entries. > What you describe is a problem in theory, but I doubt that it's a > problem in practice. You don't actually have to materialize the > predicates up-front, or at all. Plus you can skip over them using the > next index tuple. So skipping works both ways. Attached is v2, which makes all array key advancement take place using the "next index tuple" approach (using binary searches to find array keys using index tuple values). This approach was necessary for fairly mundane reasons (it limits the amount of work required while holding a buffer lock), but it also solves quite a few other problems that I find far more interesting. It's easy to imagine the state machine from v2 of the patch being extended for skip scan. My approach "abstracts away" the arrays. For skip scan, it would more or less behave as if the user had written a query "WHERE a in () AND b = 5 ... " -- without actually knowing what the so-called array keys for the high-order skipped column are (not up front, at least). We'd only need to track the current "array key" for the scan key on the skipped column, "a". The state machine would notice when the scan had reached the next-greatest "a" value in the index (whatever that might be), and then make that the current value. Finally, the state machine would effectively instruct its caller to consider repositioning the scan via a new descent of the index. In other words, almost everything for skip scan would work just like regular SAOPs -- and any differences would be well encapsulated. But it's not just skip scan. This approach also enables thinking of SAOP index scans (using nbtree) as just another type of indexable clause, without any special restrictions (compared to true indexable operators such as "=", say). Particularly in the planner. That was always the general thrust of teaching nbtree about SAOPs, from the start. But it's something that should be totally embraced IMV. That's just what the patch proposes to do. In particular, the patch now: 1. Entirely removes the long-standing restriction on generating path keys for index paths with SAOPs, even when there are inequalities on a high order column present. You can mix SAOPs together with other clause types, arbitrarily, and everything still works and works efficiently. For example, the regression test expected output for this query/test (from bugfix commit 807a40c5) is updated by the patch, as shown here: explain (costs off) SELECT thousand, tenthous FROM tenk1 WHERE thousand < 2 AND tenthous IN (1001,3000) ORDER BY thousand; - QUERY PLAN --- - Sort - Sort Key: thousand - -> Index Scan using tenk1_thous_tenthous on tenk1 - Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[]))) -(4 rows) + QUERY PLAN + + Index Scan using tenk1_thous_tenthous on tenk1 + Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[]))) +(2 rows) We don't need a sort node anymore -- even though the leading column here (thousand) uses an inequality, a particularly tricky case. Now it's an index scan, much like any other, with no particular restrictions caused by using a SAOP. 2. Adds an nbtree strategy for non-required equality array scan keys, which is built on the same state machine, with only minor differences to deal with column values "appearing out of key space order". 3. Simplifies the optimizer side of things by consistently avoiding filter quals (except when it's truly unavoidable). The optimizer doesn't even consider alternative index paths with filter quals for lower-order SAOP columns, because they have no possible advantage anymore. On the other hand, as we saw already, upthread, filter quals have huge disadvantages. By always using true index quals, we automatically avoid any question of getting excessive amounts of heap page accesses just to eliminate non-matching rows. AFAICT we don't need to make a trade-off here. The first version of the patch added some crufty code to the optimizer, to account for various restrictions on sort order. This revised version actually removes existing cruft from the same place (indxpath.c) instead. Items 1, 2, and 3 are all closely related. Take the query I've shown for item 1. Bugfix commit 807a40c5 (which added the test query in question) dealt with an oversight in the then-recent original nbtree SAOP patch (commit 9e8da0f7): when nbtree
Re: to_regtype() Raises Error
On Sep 17, 2023, at 19:28, Tom Lane wrote: >> No, that is precisely the point. The result should be null instead of >> an error. > > Yeah, ideally so, but the cost/benefit of making it happen seems > pretty unattractive for now. See the soft-errors thread at [1], > particularly [2] (but searching in that thread for references to > regclassin, regtypein, and to_reg* will find additional detail). For my purposes I’m wrapping it in an exception-catching PL/pgSQL function, but it might be worth noting the condition in which it *will* raise an error on the docs. Best, David signature.asc Description: Message signed with OpenPGP
Re: to_regtype() Raises Error
On 18/09/2023 00:57 CEST Vik Fearing wrote: > On 9/18/23 00:41, Erik Wienhold wrote: > > On 18/09/2023 00:13 CEST David E. Wheeler wrote: > > > >> david=# select to_regtype('inteval second'); > >> ERROR: syntax error at or near "second" > >> LINE 1: select to_regtype('inteval second'); > >> ^ > >> CONTEXT: invalid type name "inteval second” > > > > Probably a typo and you meant 'interval second' which works. > > No, that is precisely the point. The result should be null instead of > an error. Well, the docs say "return NULL rather than throwing an error if the name is not found". To me "name is not found" implies that it has to be valid syntax first to even have a name that can be looked up. String 'inteval second' is a syntax error when interpreted as a type name. The same when I want to create a table with that typo: test=# create table t (a inteval second); ERROR: syntax error at or near "second" LINE 1: create table t (a inteval second); And a custom function is always an option: create function to_regtype_lax(name text) returns regtype language plpgsql as $$ begin return to_regtype(name); exception when others then return null; end $$; test=# select to_regtype_lax('inteval second'); to_regtype_lax (1 row) -- Erik
Re: to_regtype() Raises Error
Vik Fearing writes: > No, that is precisely the point. The result should be null instead of > an error. Yeah, ideally so, but the cost/benefit of making it happen seems pretty unattractive for now. See the soft-errors thread at [1], particularly [2] (but searching in that thread for references to regclassin, regtypein, and to_reg* will find additional detail). regards, tom lane [1] https://www.postgresql.org/message-id/flat/30df-7382-bf87-9737-340ba096e034%40postgrespro.ru [2] https://www.postgresql.org/message-id/3342239.1671988406%40sss.pgh.pa.us
Re: to_regtype() Raises Error
On 9/18/23 00:41, Erik Wienhold wrote: On 18/09/2023 00:13 CEST David E. Wheeler wrote: The docs for `to_regtype()` say, “this function will return NULL rather than throwing an error if the name is not found.” And it’s true most of the time: david=# select to_regtype('foo'), to_regtype('clam'); to_regtype | to_regtype + [null] | [null] But not others: david=# select to_regtype('inteval second'); ERROR: syntax error at or near "second" LINE 1: select to_regtype('inteval second'); ^ CONTEXT: invalid type name "inteval second” Probably a typo and you meant 'interval second' which works. No, that is precisely the point. The result should be null instead of an error. -- Vik Fearing
Re: to_regtype() Raises Error
On 18/09/2023 00:13 CEST David E. Wheeler wrote: > The docs for `to_regtype()` say, “this function will return NULL rather than > throwing an error if the name is not found.” And it’s true most of the time: > > david=# select to_regtype('foo'), to_regtype('clam'); > to_regtype | to_regtype > + > [null] | [null] > > But not others: > > david=# select to_regtype('inteval second'); > ERROR: syntax error at or near "second" > LINE 1: select to_regtype('inteval second'); > ^ > CONTEXT: invalid type name "inteval second” Probably a typo and you meant 'interval second' which works. > I presume this has something to do with not catching errors from the parser? > > david=# select to_regtype('clam bake'); > ERROR: syntax error at or near "bake" > LINE 1: select to_regtype('clam bake'); > ^ > CONTEXT: invalid type name "clam bake" Double-quoting the type name to treat it as an identifier works: test=# select to_regtype('"clam bake"'); to_regtype (1 row) So it's basically a matter of keywords vs. identifiers. -- Erik
to_regtype() Raises Error
The docs for `to_regtype()` say, “this function will return NULL rather than throwing an error if the name is not found.” And it’s true most of the time: david=# select to_regtype('foo'), to_regtype('clam'); to_regtype | to_regtype + [null] | [null] But not others: david=# select to_regtype('inteval second'); ERROR: syntax error at or near "second" LINE 1: select to_regtype('inteval second'); ^ CONTEXT: invalid type name "inteval second” I presume this has something to do with not catching errors from the parser? david=# select to_regtype('clam bake'); ERROR: syntax error at or near "bake" LINE 1: select to_regtype('clam bake'); ^ CONTEXT: invalid type name "clam bake" Best, David signature.asc Description: Message signed with OpenPGP
Re: JSON Path and GIN Questions
On Sep 17, 2023, at 12:20, Tom Lane wrote: > After thinking about it for awhile, I think we need some more > discursive explanation of what's allowed, perhaps along the lines > of the attached. (I still can't shake the feeling that this is > duplicative; but I can't find anything comparable until you get > into the weeds in Section V.) > > I put the new text at the end of section 11.1, but perhaps it > belongs a little further up in that section; it seems more > important than some of the preceding paras. I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable operators. True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar for a function call, they are in some sense the same thing. A header might be useful, something like “What Counts as an indexable expression”. Best, David signature.asc Description: Message signed with OpenPGP
Re: Fix output of zero privileges in psql
On 17/09/2023 21:31 CEST Erik Wienhold wrote: > This affects the following meta commands: > > \db+ \dD+ \df+ \dL+ \dl+ \dn+ \dp \dT+ \l also \des+ and \dew+ -- Erik
Fix output of zero privileges in psql
I wrote a patch to change psql's display of zero privileges after a user's reported confusion with the psql output for zero vs. default privileges [1]. Admittedly, zero privileges is a rare use case [2] but I think psql should not confuse the user in the off chance that this happens. With this change psql now prints "(none)" for zero privileges instead of nothing. This affects the following meta commands: \db+ \dD+ \df+ \dL+ \dl+ \dn+ \dp \dT+ \l Default privileges start as NULL::aclitem[] in various catalog columns but revoking the default privileges leaves an empty aclitem array. Using \pset null '(null)' as a workaround to spot default privileges does not work because the meta commands ignore this setting. The privileges shown by \dconfig+ and \ddp as well as the column privileges shown by \dp are not affected by this change because those privileges are reset to NULL instead of leaving empty arrays. Commands \des+ and \dew+ are not covered in src/test/regress because no foreign data wrapper is available at this point to create a foreign server. [1] https://www.postgresql.org/message-id/efdd465d-a795-6188-7f71-7cdb4b2be031%40mtneva.com [2] https://www.postgresql.org/message-id/31246.1693337238%40sss.pgh.pa.us -- ErikFrom 16af995acb4c0e5fb5981308610a76b7e87c3358 Mon Sep 17 00:00:00 2001 From: Erik Wienhold Date: Sun, 17 Sep 2023 20:54:48 +0200 Subject: [PATCH v1] Fix output of zero privileges in psql Print "(none)" for zero privileges instead of nothing so that the user is able to distinguish zero from default privileges. This affects the following meta commands: \db+ \dD+ \des+ \dew+ \df+ \dL+ \dl+ \dn+ \dp \dT+ \l Default privileges start as NULL::aclitem[] in various catalog columns but revoking the default privileges leaves an empty aclitem array. Using \pset null '(null)' as a workaround to spot default privileges does not work because the meta commands ignore this setting. The privileges shown by \dconfig+ and \ddp as well as the column privileges shown by \dp are not affected by this change because those privileges are reset to NULL instead of leaving empty arrays. Commands \des+ and \dew+ are not covered in src/test/regress because no foreign data wrapper is available at this point to create a foreign server. --- src/bin/psql/describe.c | 6 +- src/test/regress/expected/privileges.out | 93 src/test/regress/sql/privileges.sql | 46 3 files changed, 144 insertions(+), 1 deletion(-) diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index bac94a338c..154d244d97 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -6718,7 +6718,11 @@ static void printACLColumn(PQExpBuffer buf, const char *colname) { appendPQExpBuffer(buf, - "pg_catalog.array_to_string(%s, E'\\n') AS \"%s\"", + "CASE pg_catalog.cardinality(%s)\n" + " WHEN 0 THEN '%s'\n" + " ELSE pg_catalog.array_to_string(%s, E'\\n')\n" + "END AS \"%s\"", + colname, gettext_noop("(none)"), colname, gettext_noop("Access privileges")); } diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out index c1e610e62f..b9eb52f7b1 100644 --- a/src/test/regress/expected/privileges.out +++ b/src/test/regress/expected/privileges.out @@ -2895,3 +2895,96 @@ DROP SCHEMA regress_roleoption; DROP ROLE regress_roleoption_protagonist; DROP ROLE regress_roleoption_donor; DROP ROLE regress_roleoption_recipient; +-- Test zero privileges. +BEGIN; +-- Create an owner for tested objects because output contains owner info. +-- Must be superuser to be owner of tablespace. +CREATE ROLE regress_zeropriv_owner SUPERUSER; +SET LOCAL ROLE regress_zeropriv_owner; +ALTER TABLESPACE regress_tblspace OWNER TO CURRENT_USER; +REVOKE ALL ON TABLESPACE regress_tblspace FROM CURRENT_USER; +\db+ regress_tblspace +List of tablespaces + Name | Owner |Location | Access privileges | Options | Size | Description +--++-+---+-+-+- + regress_tblspace | regress_zeropriv_owner | pg_tblspc/16385 | (none)| | 0 bytes | +(1 row) + +CREATE DOMAIN regress_zeropriv_domain AS int; +REVOKE ALL ON DOMAIN regress_zeropriv_domain FROM CURRENT_USER, PUBLIC; +\dD+ regress_zeropriv_domain +List of domains + Schema | Name | Type | Collation | Nullable | Default | Check | Access privileges | Description ++-+-+---+--+-+---+---+- + public | regress_zeropriv_domain | integer | | | | | (none)| +(1 row) + +CREATE PROCEDURE regress_zeropriv_proc() LANGUAGE sql AS ''; +REVOKE ALL ON PROCEDURE
Re: JSON Path and GIN Questions
"David E. Wheeler" writes: > On Sep 15, 2023, at 20:36, Tom Lane wrote: >> I think that that indicates that you're putting the info in the >> wrong place. Perhaps the right answer is to insert something >> more explicit in section 11.2, which is the first place where >> we really spend any effort discussing what can be indexed. > Fair enough. How ’bout this? After thinking about it for awhile, I think we need some more discursive explanation of what's allowed, perhaps along the lines of the attached. (I still can't shake the feeling that this is duplicative; but I can't find anything comparable until you get into the weeds in Section V.) I put the new text at the end of section 11.1, but perhaps it belongs a little further up in that section; it seems more important than some of the preceding paras. regards, tom lane diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index 55122129d5..1a0b003fb0 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -109,6 +109,39 @@ CREATE INDEX test1_id_index ON test1 (id); Therefore indexes that are seldom or never used in queries should be removed. + + + In general, PostgreSQL indexes can be used + to optimize queries that contain one or more WHERE + or JOIN clauses of the form + + +indexed-column indexable-operator comparison-value + + + Here, the indexed-column is whatever + column or expression the index has been defined on. + The indexable-operator is an operator that + is a member of the index's operator class for + the indexed column. (More details about that appear below.) + And the comparison-value can be any + expression that is not volatile and does not reference the index's + table. + + + + In some cases the query planner can extract an indexable clause of + this form from another SQL construct. A simple example is that if + the original clause was + + +comparison-value operator indexed-column + + + then it can be flipped around into indexable form if the + original operator has a commutator + operator that is a member of the index's operator class. + @@ -120,7 +153,7 @@ CREATE INDEX test1_id_index ON test1 (id); B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension bloom. Each index type uses a different - algorithm that is best suited to different types of queries. + algorithm that is best suited to different types of indexable clauses. By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations.
Re: make add_paths_to_append_rel aware of startup cost
Hi David, Thanks for taking a look at this! On Fri, Sep 15, 2023 at 3:15 PM David Rowley wrote: > On Thu, 7 Sept 2023 at 04:37, Andy Fan wrote: > > Currently add_paths_to_append_rel overlooked the startup cost for > creating > > append path, so it may have lost some optimization chances. After a > glance, > > the following 4 identifiers can be impacted. > > > - We shouldn't do the optimization if there are still more tables to > join, > > the reason is similar to has_multiple_baserels(root) in > > set_subquery_pathlist. But the trouble here is we may inherit multiple > > levels to build an appendrel, so I have to keep the 'top_relids' all > the > > time and compare it with PlannerInfo.all_baserels. If they are the > same, > > then it is the case we want to optimize. > > I think you've likely gone to the trouble of trying to determine if > there are any joins pending because you're considering using a cheap > startup path *instead* of the cheapest total path and you don't want > to do that when some join will cause all the rows to be read thus > making the plan more expensive if a cheap startup path was picked. > Yes, that's true. However it is not something we can't resolve, one of the solutions is just like what I did in the patch. but currently the main stuff which confuses me is if it is the right thing to give up the optimization if it has more tables to join (just like set_subquery_pathlist did). > Instead of doing that, why don't you just create a completely new > AppendPath containing all the cheapest_startup_paths and add that to > the append rel. You can optimise this and only do it when > rel->consider_startup is true. > > Does the attached do anything less than what your patch does? > We can work like this, but there is one difference from what my current patch does. It is cheapest_startup_path vs cheapest fraction path. For example if we have the following 3 paths with all of the estimated rows is 100 and the tuple_fraction is 10. Path 1: startup_cost = 60, total_cost = 80 -- cheapest total cost. Path 2: startup_cost = 10, total_cost = 1000 -- cheapest startup cost Path 3: startup_cost = 20, total_cost = 90 -- cheapest fractional cost So with the patch you propose, Path 1 & Path 3 are chosen to build append path. but with my current patch, Only path 3 is kept. IIUC, path 3 should be the best one in this case. We might also argue why Path 3 is kept in the first place (the children level), I think pathkey might be one option. and even path 3 is discarded somehow, I think only if it is the best one, we should keep it ideally. Another tiny factor of this is your propose isn't consistent with what set_subquery_pathlist which uses cheapest fractional cost and my proposal isn't consistent plain rel which uses cheapest startup cost. We can't say which one is better, though. If my above analysis is correct, I think the best way to handle this is if there is no more tables to join, we use cheapest fraction cost for all the kinds of relations, including plain relation, append rel, subquery and so on. If we have more tables to join, we use cheapest startup cost. On the implementation side, I want to use RelOptInfo.tuple_fraction instead of RelOptInfo.consider_startup. tuple_fraction = -1 means startup cost should not be considered. tuple_fraction = 0 means cheapest startup cost should be used. tuple_franction > 0 means cheapest fraction cost should be used. I still don't pay enough attention to consider_param_startup in RelOptInfo, I'm feeling the above strategy will not generate too much overhead to the planner for now while it can provides a better plan sometimes. -- Best Regards Andy Fan
Re: Have better wording for snapshot file reading failure
On 9/14/23 20:33, Andres Freund wrote: I'd probably just go for something like "snapshot \"%s\" does not exist", similar to what we report for unknown tables etc. Arguably changing the errcode to ERRCODE_UNDEFINED_OBJECT would make this more precise? +1 better informative message compare to the original patch. -- Kind Regards, Yogesh Sharma PostgreSQL, Linux, and Networking Expert Open Source Enthusiast and Advocate
Re: remaining sql/json patches
Op 9/14/23 om 10:14 schreef Amit Langote: Hi Amit, Just now I built a v14-patched server and I found this crash: select json_query(jsonb ' { "arr": [ {"arr": [2,3]} , {"arr": [4,5]} ] }' , '$.arr[*].arr ? (@ <= 3)' returning anyarray WITH WRAPPER) --crash ; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost Can you have a look? Thanks, Erik