Re: Is it possible to index "deep" into a JSONB column?
> shaheedha...@gmail.com wrote: > > Suppose I have a JSONB field called "snapshot". I can create a GIN > index on it like this: > > create index idx1 on mytable using gin (snapshot); > > In principle, I believe this allows index-assisted access to keys and > values nested in arrays and inner objects but in practice, it seems > the planner "often" decides to ignore the index in favour of a table > scan. (As discussed elsewhere, this is influenced by the number of > rows, and possibly other criteria too). > > Now, I know it is possible to index inner objects, so that is snapshot > looks like this: > > { >"stuff": {}, >"more other stuff": {}, >"employee": { > "1234": {"date_of_birth": "1970-01-01"}, > "56B789": {"date_of_birth": "1971-02-02"}, >} > } > > I can say: > > create index idx2 on mytable using gin ((snapshot -> 'employee')); > > But what is the syntax to index only on date_of_birth? I assume a > btree would work since it is a primitive value, but WHAT GOES HERE in > this: > > create index idx3 on mytable using btree ((snapshot ->'employee' -> > WHAT GOES HERE -> 'date_of_birth')); > > I believe an asterisk "*" would work if 'employee' was an array, but > here it is nested object with keys. If it helps, the keys are > invariably numbers (in quoted string form, as per JSON). Try this: snapshot -> ‘employee’->>’date_of_birth’
Re: psql 15beta1 does not print notices on the console until transaction completes
On 5/29/22 18:00, Adrian Klaver wrote: On 5/29/22 15:03, Tom Lane wrote: Adrian Klaver writes: On 5/29/22 13:59, Alastair McKinley wrote: I think Tom was able to reproduce this by the sounds of his response? I have not received that post yet. I do see it in the archives. I re-addressed it to pgsql-bugs, maybe you are not subscribed to that? I am not, still I see it here: https://www.postgresql.org/list/pgsql-general/since/20220529/ And when I click on the message: https://www.postgresql.org/message-id/3662994.1653856025%40sss.pgh.pa.us Lists: pgsql-bugs pgsql-general I will admit that the mailing list software is a mystery to me, so it is entirely possible I am misreading the above. FWIW, I didn't see Tom's first (bugs) post either. Nor am I subscribed to bugs
Re: psql 15beta1 does not print notices on the console until transaction completes
On 5/29/22 15:03, Tom Lane wrote: Adrian Klaver writes: On 5/29/22 13:59, Alastair McKinley wrote: I think Tom was able to reproduce this by the sounds of his response? I have not received that post yet. I do see it in the archives. I re-addressed it to pgsql-bugs, maybe you are not subscribed to that? I am not, still I see it here: https://www.postgresql.org/list/pgsql-general/since/20220529/ And when I click on the message: https://www.postgresql.org/message-id/3662994.1653856025%40sss.pgh.pa.us Lists: pgsql-bugs pgsql-general I will admit that the mailing list software is a mystery to me, so it is entirely possible I am misreading the above. regards, tom lane -- Adrian Klaver adrian.kla...@aklaver.com
Re: psql 15beta1 does not print notices on the console until transaction completes
Adrian Klaver writes: > On 5/29/22 13:59, Alastair McKinley wrote: >> I think Tom was able to reproduce this by the sounds of his response? > I have not received that post yet. I do see it in the archives. I re-addressed it to pgsql-bugs, maybe you are not subscribed to that? regards, tom lane
Re: psql 15beta1 does not print notices on the console until transaction completes
On 5/29/22 13:59, Alastair McKinley wrote: > > From: Adrian Klaver > Sent: 29 May 2022 21:47To: Alastair McKinley Hi Adrian, I am running the function "select test_notice();" from the psql console with psql/server versions 15beta1. In psql 15beta1, the notice appears only after the function returns/transaction completes. If I execute the same function from psql 14.3 and server 15beta1, the notice appears on the psql console immediately as expected, before the pg_sleep() completes. It's reproducible for me, I just double checked it. Yeah, I see the same thing after using the new psql. I think Tom was able to reproduce this by the sounds of his response? I have not received that post yet. I do see it in the archives. I had already tried: \set SHOW_ALL_RESULTS off which is supposed to restore to previous behavior, but it did not lead to the 14.3 result. Best regards, Alastair -- Adrian Klaver adrian.kla...@aklaver.com
Re: psql 15beta1 does not print notices on the console until transaction completes
> > From: Adrian Klaver > Sent: 29 May 2022 21:47To: Alastair McKinley > ; pgsql-general@lists.postgresql.org > Subject: Re: psql 15beta1 does not print > notices on the console until transaction completes > > On 5/29/22 13:11, Alastair McKinley wrote: > > Hi all, > > > > I notice this change in behaviour with psql in 15beta1 when testing an > > existing codebase. > > > > I didn't see any mention of this change in the release notes and it > > surprised me. > > > > Using this test function: > > > > create or replace function test_notice() returns void as > > $$ > > begin > > raise notice 'hello'; > > perform pg_sleep(10); > > end; $$ language plpgsql; > > > > In psql 15beta1, the "hello" message only appears on the console when the > > transaction completes. > > I am not seeing that. > > Can you provide more information about how you are running test_notice()? > Hi Adrian, I am running the function "select test_notice();" from the psql console with psql/server versions 15beta1. In psql 15beta1, the notice appears only after the function returns/transaction completes. If I execute the same function from psql 14.3 and server 15beta1, the notice appears on the psql console immediately as expected, before the pg_sleep() completes. It's reproducible for me, I just double checked it. I think Tom was able to reproduce this by the sounds of his response? Best regards, Alastair > > > > in psql 14.3, it appears immediately as I would have expected. > > > > Is there a way to change psql behaviour to display notices immediately as > > in versions < 15? > > > > Best regards, > > > > Alastair > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com
Re: psql 15beta1 does not print notices on the console until transaction completes
On 5/29/22 13:47, Adrian Klaver wrote: On 5/29/22 13:11, Alastair McKinley wrote: Hi all, In psql 15beta1, the "hello" message only appears on the console when the transaction completes. I am not seeing that. I take that back, I was using psql 14.3 to connect to the 15 instance. When I changed to psql 15 I saw the same thing. Best regards, Alastair -- Adrian Klaver adrian.kla...@aklaver.com
Is it possible to index "deep" into a JSONB column?
Suppose I have a JSONB field called "snapshot". I can create a GIN index on it like this: create index idx1 on mytable using gin (snapshot); In principle, I believe this allows index-assisted access to keys and values nested in arrays and inner objects but in practice, it seems the planner "often" decides to ignore the index in favour of a table scan. (As discussed elsewhere, this is influenced by the number of rows, and possibly other criteria too). Now, I know it is possible to index inner objects, so that is snapshot looks like this: { "stuff": {}, "more other stuff": {}, "employee": { "1234": {"date_of_birth": "1970-01-01"}, "56B789": {"date_of_birth": "1971-02-02"}, } } I can say: create index idx2 on mytable using gin ((snapshot -> 'employee')); But what is the syntax to index only on date_of_birth? I assume a btree would work since it is a primitive value, but WHAT GOES HERE in this: create index idx3 on mytable using btree ((snapshot ->'employee' -> WHAT GOES HERE -> 'date_of_birth')); I believe an asterisk "*" would work if 'employee' was an array, but here it is nested object with keys. If it helps, the keys are invariably numbers (in quoted string form, as per JSON). Thanks, Shaheed
Re: psql 15beta1 does not print notices on the console until transaction completes
On 5/29/22 13:11, Alastair McKinley wrote: Hi all, I notice this change in behaviour with psql in 15beta1 when testing an existing codebase. I didn't see any mention of this change in the release notes and it surprised me. Using this test function: create or replace function test_notice() returns void as $$ begin raise notice 'hello'; perform pg_sleep(10); end; $$ language plpgsql; In psql 15beta1, the "hello" message only appears on the console when the transaction completes. I am not seeing that. Can you provide more information about how you are running test_notice()? in psql 14.3, it appears immediately as I would have expected. Is there a way to change psql behaviour to display notices immediately as in versions < 15? Best regards, Alastair -- Adrian Klaver adrian.kla...@aklaver.com
psql 15beta1 does not print notices on the console until transaction completes
Hi all, I notice this change in behaviour with psql in 15beta1 when testing an existing codebase. I didn't see any mention of this change in the release notes and it surprised me. Using this test function: create or replace function test_notice() returns void as $$ begin raise notice 'hello'; perform pg_sleep(10); end; $$ language plpgsql; In psql 15beta1, the "hello" message only appears on the console when the transaction completes. in psql 14.3, it appears immediately as I would have expected. Is there a way to change psql behaviour to display notices immediately as in versions < 15? Best regards, Alastair
Re: Function definition regression in 15beta1 when specific parameter name (string) is used
> From: Tom Lane > Sent: 29 May 2022 18:43 > To: Alastair McKinley > Cc: Andrew Dunstan ; pgsql-general@lists.postgresql.org > > Subject: Re: Function definition regression in 15beta1 when specific > parameter name (string) is used > > Alastair McKinley writes: > > The following function definition fails in 15beta1 (ok in 14.3): > > > create or replace function regexp_match_test(string text,pattern text) > > returns text[] as > > $$ > > select regexp_match(string,pattern); > > $$ language sql; > > Commit 1a36bc9db seems to have defined STRING as a type_func_name_keyword, > which strikes me as a pretty horrible trampling on user namespace. That > means you can't have tables or columns named "string" anymore either, and > I'll bet money the latter restriction is going to bite a lot of people. > Yes I would agree, could this potentially break a lot of upgrades? I checked the release notes and CTRL-F'd for "string" to check in case it had become reserved or become an alias for text, but there is nothing in the release notes at the minute. > In a quick experiment here, I don't see any bison complaints if I > back it down to unreserved_keyword, so this seems easily fixable. > I wonder though if we don't need more review of patches that add > partially- or fully-reserved keywords. > > regards, tom lane
Re: Function definition regression in 15beta1 when specific parameter name (string) is used
Alastair McKinley writes: > The following function definition fails in 15beta1 (ok in 14.3): > create or replace function regexp_match_test(string text,pattern text) > returns text[] as > $$ > select regexp_match(string,pattern); > $$ language sql; Commit 1a36bc9db seems to have defined STRING as a type_func_name_keyword, which strikes me as a pretty horrible trampling on user namespace. That means you can't have tables or columns named "string" anymore either, and I'll bet money the latter restriction is going to bite a lot of people. In a quick experiment here, I don't see any bison complaints if I back it down to unreserved_keyword, so this seems easily fixable. I wonder though if we don't need more review of patches that add partially- or fully-reserved keywords. regards, tom lane
Re: Function definition regression in 15beta1 when specific parameter name (string) is used
On 5/29/22 10:29, Adrian Klaver wrote: On 5/29/22 09:46, Alastair McKinley wrote: Hi all, Postgres 15: https://www.postgresql.org/docs/15/sql-keywords-appendix.html STRING reserved (can be function or type) non-reserved Postgres 14: https://www.postgresql.org/docs/14/sql-keywords-appendix.html STRING non-reserved I don't have a 15 instance available, but I would double quoting ^ think "string" would work. Alastair -- Adrian Klaver adrian.kla...@aklaver.com
Re: Function definition regression in 15beta1 when specific parameter name (string) is used
On 5/29/22 09:46, Alastair McKinley wrote: Hi all, I was testing an existing codebase with 15beta1 and ran into this issue. I reduced the test case to an example with works in 14.3, and fails in 15beta1. The following function definition fails in 15beta1 (ok in 14.3): create or replace function regexp_match_test(string text,pattern text) returns text[] as $$ select regexp_match(string,pattern); $$ language sql; The error message is: ERROR: syntax error at or near "," LINE 3: select regexp_match(string,pattern); ^ Changing the first parameter name from string to anything else (e.g. strin or string1) resolves the issue. The issue also occurs with the "string" parameter name if this is used in a plpgsql function like this: create or replace function regexp_match_test(string text,pattern text) returns text[] as $$ begin return (select regexp_match(string,pattern)); end; $$ language plpgsql; Best regards, Postgres 15: https://www.postgresql.org/docs/15/sql-keywords-appendix.html STRING reserved (can be function or type) non-reserved Postgres 14: https://www.postgresql.org/docs/14/sql-keywords-appendix.html STRING non-reserved I don't have a 15 instance available, but I would double quoting "string" would work. Alastair -- Adrian Klaver adrian.kla...@aklaver.com
Function definition regression in 15beta1 when specific parameter name (string) is used
Hi all, I was testing an existing codebase with 15beta1 and ran into this issue. I reduced the test case to an example with works in 14.3, and fails in 15beta1. The following function definition fails in 15beta1 (ok in 14.3): create or replace function regexp_match_test(string text,pattern text) returns text[] as $$ select regexp_match(string,pattern); $$ language sql; The error message is: ERROR: syntax error at or near "," LINE 3: select regexp_match(string,pattern); ^ Changing the first parameter name from string to anything else (e.g. strin or string1) resolves the issue. The issue also occurs with the "string" parameter name if this is used in a plpgsql function like this: create or replace function regexp_match_test(string text,pattern text) returns text[] as $$ begin return (select regexp_match(string,pattern)); end; $$ language plpgsql; Best regards, Alastair
Re: JSONB index not in use, but is TOAST the real cause of slow query?
On Sun, 29 May 2022, 15:58 Tom Lane, wrote: > Shaheed Haque writes: > > Unfortunately, the real query which I think should behave very > > similarly is still at the several-seconds level despite using the > > index. ... > > > -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1 > > width=4) (actual time=32.488..2258.891 rows=62 loops=1) > > Recheck Cond: ((company_id = 173) AND ((snapshot -> > > 'employee'::text) ? '16376'::text)) > > Filter: (((snapshot #> > > '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb) > > OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb) > > OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <= > > '0'::jsonb)) > > Heap Blocks: exact=5 > > -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual > > time=0.038..0.039 rows=0 loops=1) > > -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888 > > (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304 > > loops=1) > > Index Cond: (company_id = 173) > > -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9 > > width=0) (actual time=0.021..0.021 rows=62 loops=1) > > Index Cond: ((snapshot -> 'employee'::text) ? > '16376'::text) > > > IIUC, at the bottom, the indices are doing their thing, but a couple > > of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I > > cannot quite see why. > > I suppose it's the execution of that "Filter" condition, which will > require perhaps as many as three fetches of the "snapshot" column. > Thanks, that's clearly in the frame. You really need to rethink that data structure. Sure, you can store tons > of unorganized data in a jsonb column, but you pay for that convenience > with slow access. Normalizing the bits you need frequently into a more > traditional relational schema is the route to better-performing queries. > Ack. Indeed, the current design works very well for all of the access patterns other than this one, which only recently came into view as a problem. Ahead of contemplating a design change I have been looking at how to optimise this bit. I'm currently mired in a crash course on SQL syntax as pertains to JSONB, jsonpath et. al. And the equally mysterious side effects of "?" and "@>" and so on in terms of the amount of data being fetched etc. (and all wrapped in a dose of ORM for good measure). I'll write separately with more specific questions if needed on those details. Thanks again for the kind help. Shaheed > regards, tom lane >
Re: autovacuum on primary blocking queries on replica?
On Fri, 2022-05-27 at 14:00 -0500, Don Seiler wrote: > * PostgreSQL 12.9 - PGDG Ubuntu 18.04 image > * Streaming physical replication > * hot_standby_feedback = on > We use a read replica to offload a lot of (what should be) quick queries. > This morning we had an incident > where these queries were all blocking on AccessShareLock waits, written to > the log as: > > 2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp LOG: process 8185 > still waiting for AccessShareLock on relation 16834 of database 16401 after > 1000.228 ms at character 204 > 2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp DETAIL: Process > holding the lock: 10822. Wait queue: 32373, 8185, 13782, [...] > This went on for 30 seconds (the value of max_standby_streaming_delay) until > PG killed the blocking process: > 2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp FATAL: > terminating connection due to conflict with recovery > 2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp DETAIL: User > was holding a relation lock for too long. > > I'm trying to find the root cause of why this started. We did see an UPDATE > [...] An UPDATE cannot be the problem. > 15 seconds later we then see an aggressive autovacuum on this table: > > 2022-05-27 15:23:52.507 UTC [30513] LOG: automatic aggressive vacuum of > table "foo_all.public.industry": index scans: 1 > pages: 252 removed, 323 remain, 0 skipped due to pins, 0 skipped > frozen > tuples: 8252 removed, 8252 remain, 0 are dead but not yet removable, > oldest xmin: 1670999292 > buffer usage: 12219 hits, 137 misses, 54 dirtied > avg read rate: 2.372 MB/s, avg write rate: 0.935 MB/s > system usage: CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.45 s > > and less than a second after that is when we see the first AccessShareLock > message on the replica. > > I've been reading tales of autovacuum taking an AccessExclusiveLock when > truncating empty pages at the end of a table. > I'm imagining that updating every row of a table and then rolling back would > leave all of those rows empty at the end > and qualify for truncation and lead to the scenario I saw this morning. > > I'm still not entirely satisfied since that table in question was so small > (only 8252 rows) so I wouldn't imagine it would > hold things up as long as it did. Although the blocking session on the > replica was an application session, > not any background/recovery process. I think you are on the right trail. VACUUM will truncate trailing pages if it can get a (very short) ACCESS EXCLUSIVE lock on the table. Now that lock and the truncation is replicated, and they can create a replication conflict just like you describe. Even if the lock is held for a very short time, replaying it will conflict with any query on that table on the standby. You can disable autovacuum truncation on the table with ALTER TABLE foo_all.public.industry SET (vacuum_truncate = off); if you know that you can do without autovacuum truncation for that query. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: JSONB index not in use, but is TOAST the real cause of slow query?
Shaheed Haque writes: > Unfortunately, the real query which I think should behave very > similarly is still at the several-seconds level despite using the > index. ... > -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1 > width=4) (actual time=32.488..2258.891 rows=62 loops=1) > Recheck Cond: ((company_id = 173) AND ((snapshot -> > 'employee'::text) ? '16376'::text)) > Filter: (((snapshot #> > '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb) > OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb) > OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <= > '0'::jsonb)) > Heap Blocks: exact=5 > -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual > time=0.038..0.039 rows=0 loops=1) > -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888 > (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304 > loops=1) > Index Cond: (company_id = 173) > -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9 > width=0) (actual time=0.021..0.021 rows=62 loops=1) > Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text) > IIUC, at the bottom, the indices are doing their thing, but a couple > of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I > cannot quite see why. I suppose it's the execution of that "Filter" condition, which will require perhaps as many as three fetches of the "snapshot" column. You really need to rethink that data structure. Sure, you can store tons of unorganized data in a jsonb column, but you pay for that convenience with slow access. Normalizing the bits you need frequently into a more traditional relational schema is the route to better-performing queries. regards, tom lane
Re: Showing alternative query planner plans with explain ?
Danny Shemesh writes: > A tool I seem to be missing, and I wondered if such exists, is to have the > planner output alternative plans for a given query, i.e. to say, give me > the x top plans sorted by cost - I believe this would help shed some light > on the internal state machine and subsequent tinkering less > trial-and-error-ish. This does not exist, because the planner only carries one plan to completion, for reasons of speed and memory consumption. I have seen people hack things to print out info about paths (plan fragments) as they are considered, but I consider that approach pretty useless: the output is voluminous, not very readable, and mostly not interesting. You can get some of the effect by successively disabling the believed-cheapest plan choice with the "enable_xxx" parameters and seeing what the next choice is. Fooling around with the cost parameters can also provide useful insight. If the concern you have is about join order and not the details of the individual join types, another trick is to set join_collapse_limit to 1 and then write the FROM clause as a manually-parenthesized JOIN nest. That setting will force the planner to join in exactly the syntactic join order. regards, tom lane
Showing alternative query planner plans with explain ?
Hey all ! I'm currently optimizing queries and indices on a relatively large dataset; one of the frequent questions I seem to ask myself is why the planner chooses plan A over B. Reading the docs, blogs, stack exchange posts, wiki, ... helps in trying to tinker with the query or indices in a way that either A will be discouraged, or B will be favoured, so I'd be more informed on why one was chosen over the other and which is empirically better for a given dataset. A tool I seem to be missing, and I wondered if such exists, is to have the planner output alternative plans for a given query, i.e. to say, give me the x top plans sorted by cost - I believe this would help shed some light on the internal state machine and subsequent tinkering less trial-and-error-ish. Is there any way to achieve the above ? Thanks a ton, Danny