Re: Strange behavior of limit clause in complex query
On Wed, Jun 08, 2022 at 09:44:08AM +0100, Paulo Silva wrote: > But if I add an ORDER BY and a LIMIT something goes very wrong (Q2): A somewhat common problem. A common workaround is to change "ORDER BY a" to something like "ORDER BY a+0" (if your framework will allow it). > An EXPLAIN (ANALYZE, BUFFERS) for Q2 returns this: ... >-> Index Scan Backward using ix_ng_content_date on ng_content > "Extent1" (cost=0.43..40616715.85 rows=2231839 width=12) (actual > time=11027.808..183839.289 rows=5 loops=1) > Filter: ((2 = id_status) AND (date_from <= > LOCALTIMESTAMP) AND (date_to >= LOCALTIMESTAMP) AND (SubPlan 1)) > Rows Removed by Filter: 4685618 > Buffers: shared hit=15414533 read=564480 written=504 I'm not sure if it would help your original issue, but the rowcount estimate here is bad - overestimating 2231839 rows instead of 5. Could you try to determine which of those conditions (id_status, date_from, date_to, or SubPlan) causes the mis-estimate, or if the estimate is only wrong when they're combined ? -- Justin
Re: Adding non-selective key to jsonb query @> reduces performance?
Marcin Krupowicz writes: > However this one, is slow: > Q2 > select count(*) from tbl where row @> '{"SELECTIVE_COL": > "SearchValue", "DATE": "20220606", "NON_SELECTIVE_COL": "Abc"}'::jsonb > It takes 17ms > Note that the only difference is adding one more - not very unique - > key. If in Q2 I replaced NON_SELECTIVE_COL with another selective > column, it's becoming fast again. This doesn't surprise me a whole lot based on what I know of GIN. It's going to store sets of TIDs associated with each key or value mentioned in the data, and then a query will have to AND the sets of TIDs for keys/values mentioned in the query. That will take longer when some of those sets are big. It might be worth experimenting with an index built using the non-default jsonb_path_ops opclass [1]. I'm not sure if that'd be faster for this scenario, but it seems worth trying. regards, tom lane [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
Re: Strange behavior of limit clause in complex query
Hi, The problem is that the query is generated by the framework, I'm not sure if I can change anything on it. Any other way to influence planner? Regards Ranier Vilela escreveu no dia quarta, 8/06/2022 à(s) 12:40: > Em qua., 8 de jun. de 2022 às 05:44, Paulo Silva > escreveu: > >> Hi, >> >> I'm using PostgreSQL 14.3 and I'm getting strange behavior in a complex >> query generated by the Entity Framework. >> >> The inner (complex) query has a quick execution time: >> >> # SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" >> FROM "dbo"."ng_content" AS "Extent1" >> INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" >> = "Extent2"."id_content" >> WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) >> AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS >> timestamp) >> AND 2 = "Extent1"."id_status" >> AND EXISTS ( >> SELECT 1 AS "C1" >> FROM ( >> SELECT "Extent3"."TagId" FROM >> "dbo"."ngx_tag_content" AS "Extent3" >> WHERE "Extent1"."id" = >> "Extent3"."ContentId" >> ) AS "Project1" >> WHERE EXISTS ( >> SELECT 1 AS "C1" FROM (SELECT 1 AS "C") >> AS "SingleRowTable1" >> WHERE "Project1"."TagId" = 337139) >> ) >> AND ("Extent2"."id_path" IN (27495,27554,27555) >> AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS >> "SingleRowTable2" WHERE TRUE = FALSE) >> ); >> id| C3 >> --+- >> 13505155 | 2021-03-27 12:01:00 >> 13505187 | 2021-03-27 12:03:00 >> 13505295 | 2021-03-27 12:06:00 >> 13505348 | 2021-03-27 12:09:00 >> 13505552 | 2021-03-27 12:11:00 >> (5 rows) >> >> *Time: 481.826 ms* >> >> If I run the same query as a nested select I get similar results (Q1): >> >> >> *SELECT "Project5".idFROM (* >> SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" >> FROM "dbo"."ng_content" AS "Extent1" >> INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" >> = "Extent2"."id_content" >> WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) >> AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS >> timestamp) >> AND 2 = "Extent1"."id_status" >> AND EXISTS ( >> SELECT 1 AS "C1" >> FROM ( >> SELECT "Extent3"."TagId" FROM >> "dbo"."ngx_tag_content" AS "Extent3" >> WHERE "Extent1"."id" = >> "Extent3"."ContentId" >> ) AS "Project1" >> WHERE EXISTS ( >> SELECT 1 AS "C1" FROM (SELECT 1 AS "C") >> AS "SingleRowTable1" >> WHERE "Project1"."TagId" = 337139) >> ) >> AND ("Extent2"."id_path" IN (27495,27554,27555) >> AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS >> "SingleRowTable2" WHERE TRUE = FALSE) >> ) >> *) AS "Project5";* >> id >> -- >> 13505155 >> 13505187 >> 13505295 >> 13505348 >> 13505552 >> (5 rows) >> >> *Time: 486.174 ms* >> >> But if I add an ORDER BY and a LIMIT something goes very wrong (Q2): >> >> # SELECT "Project5".id >> FROM ( >> SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" >> FROM "dbo"."ng_content" AS "Extent1" >> INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" >> = "Extent2"."id_content" >> WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) >> AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS >> timestamp) >> AND 2 = "Extent1"."id_status" >> AND EXISTS ( >> SELECT 1 AS "C1" >> FROM ( >> SELECT "Extent3"."TagId" FROM >> "dbo"."ngx_tag_content" AS "Extent3" >> WHERE "Extent1"."id" = >> "Extent3"."ContentId" >> ) AS "Project1" >> WHERE EXISTS ( >> SELECT 1 AS "C1" FROM (SELECT 1 AS "C") >> AS "SingleRowTable1" >> WHERE "Project1"."TagId" = 337139) >> ) >> AND ("Extent2"."id_path" IN (27495,27554,27555) >> AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS >> "SingleRowTable2" WHERE TRUE = FALSE) >> ) >> ) AS "Project5" *ORDER BY "Project5"."C3" DESC LIMIT 6*; >> > I think that LIMIT is confusing the planner. > Forcing a path that in the end is not faster. > > Can you try something similar to this? > > WITH q AS ( > SELECT "Project5".id > FROM ( > SELECT "Extent1"."id",
Re: Strange behavior of limit clause in complex query
Em qua., 8 de jun. de 2022 às 05:44, Paulo Silva escreveu: > Hi, > > I'm using PostgreSQL 14.3 and I'm getting strange behavior in a complex > query generated by the Entity Framework. > > The inner (complex) query has a quick execution time: > > # SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" > FROM "dbo"."ng_content" AS "Extent1" > INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" > = "Extent2"."id_content" > WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) > AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS > timestamp) > AND 2 = "Extent1"."id_status" > AND EXISTS ( > SELECT 1 AS "C1" > FROM ( > SELECT "Extent3"."TagId" FROM > "dbo"."ngx_tag_content" AS "Extent3" > WHERE "Extent1"."id" = > "Extent3"."ContentId" > ) AS "Project1" > WHERE EXISTS ( > SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS > "SingleRowTable1" > WHERE "Project1"."TagId" = 337139) > ) > AND ("Extent2"."id_path" IN (27495,27554,27555) > AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS > "SingleRowTable2" WHERE TRUE = FALSE) > ); > id| C3 > --+- > 13505155 | 2021-03-27 12:01:00 > 13505187 | 2021-03-27 12:03:00 > 13505295 | 2021-03-27 12:06:00 > 13505348 | 2021-03-27 12:09:00 > 13505552 | 2021-03-27 12:11:00 > (5 rows) > > *Time: 481.826 ms* > > If I run the same query as a nested select I get similar results (Q1): > > > *SELECT "Project5".idFROM (* > SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" > FROM "dbo"."ng_content" AS "Extent1" > INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" > = "Extent2"."id_content" > WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) > AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS > timestamp) > AND 2 = "Extent1"."id_status" > AND EXISTS ( > SELECT 1 AS "C1" > FROM ( > SELECT "Extent3"."TagId" FROM > "dbo"."ngx_tag_content" AS "Extent3" > WHERE "Extent1"."id" = > "Extent3"."ContentId" > ) AS "Project1" > WHERE EXISTS ( > SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS > "SingleRowTable1" > WHERE "Project1"."TagId" = 337139) > ) > AND ("Extent2"."id_path" IN (27495,27554,27555) > AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS > "SingleRowTable2" WHERE TRUE = FALSE) > ) > *) AS "Project5";* > id > -- > 13505155 > 13505187 > 13505295 > 13505348 > 13505552 > (5 rows) > > *Time: 486.174 ms* > > But if I add an ORDER BY and a LIMIT something goes very wrong (Q2): > > # SELECT "Project5".id > FROM ( > SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" > FROM "dbo"."ng_content" AS "Extent1" > INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" > = "Extent2"."id_content" > WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) > AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS > timestamp) > AND 2 = "Extent1"."id_status" > AND EXISTS ( > SELECT 1 AS "C1" > FROM ( > SELECT "Extent3"."TagId" FROM > "dbo"."ngx_tag_content" AS "Extent3" > WHERE "Extent1"."id" = > "Extent3"."ContentId" > ) AS "Project1" > WHERE EXISTS ( > SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS > "SingleRowTable1" > WHERE "Project1"."TagId" = 337139) > ) > AND ("Extent2"."id_path" IN (27495,27554,27555) > AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS > "SingleRowTable2" WHERE TRUE = FALSE) > ) > ) AS "Project5" *ORDER BY "Project5"."C3" DESC LIMIT 6*; > I think that LIMIT is confusing the planner. Forcing a path that in the end is not faster. Can you try something similar to this? WITH q AS ( SELECT "Project5".id FROM ( SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" FROM "dbo"."ng_content" AS "Extent1" INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" = "Extent2"."id_content" WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS timestamp)
Adding non-selective key to jsonb query @> reduces performance?
Hi, I have a table "tbl" with a couple of columns. One of them is "row" jsonb. It has a GIN index as per below. The table isn't particularly large, in lower tens of GB. Each "row" has maybe 5-20 keys, nothing crazy. Now, when I query it with @> operator I get very different performance depending on the selection of keys I want to look for. The queries below return the same result set (just a few rows). I think I have narrowed the problem down to uniqueness of a given key. For example this query is fast: Q1 select count(*) from tbl where row @> '{"SELECTIVE_COL": "SearchValue", "DATE": "20220606"}'::jsonb It takes about 0.6ms execution time However this one, is slow: Q2 select count(*) from tbl where row @> '{"SELECTIVE_COL": "SearchValue", "DATE": "20220606", "NON_SELECTIVE_COL": "Abc"}'::jsonb It takes 17ms Note that the only difference is adding one more - not very unique - key. If in Q2 I replaced NON_SELECTIVE_COL with another selective column, it's becoming fast again. Here are the query plans: Q1: https://explain.depesz.com/s/qxU8 Q2: https://explain.depesz.com/s/oIW3 Both look very similar, apart from a very different number of shared buffers hit. Index on "row": "tbl_row_idx" gin ("row" jsonb_path_ops) WITH (fastupdate=off) WHERE upper_inf(effective_range) AND NOT deleted PG Version: 14.3, work_mem 512MB What are my options? Why is the second query so much slower? I changed Q2 to conjunction of conditions on single columns (row @> '..' and row @> ...) and it was fast, even with the NON_SELECTIVE_COL included. Sadly it will be difficult for me do to this in my code without using dynamic SQL. Many thanks, -- Marcin
Strange behavior of limit clause in complex query
Hi, I'm using PostgreSQL 14.3 and I'm getting strange behavior in a complex query generated by the Entity Framework. The inner (complex) query has a quick execution time: # SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" FROM "dbo"."ng_content" AS "Extent1" INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" = "Extent2"."id_content" WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS timestamp) AND 2 = "Extent1"."id_status" AND EXISTS ( SELECT 1 AS "C1" FROM ( SELECT "Extent3"."TagId" FROM "dbo"."ngx_tag_content" AS "Extent3" WHERE "Extent1"."id" = "Extent3"."ContentId" ) AS "Project1" WHERE EXISTS ( SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS "SingleRowTable1" WHERE "Project1"."TagId" = 337139) ) AND ("Extent2"."id_path" IN (27495,27554,27555) AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS "SingleRowTable2" WHERE TRUE = FALSE) ); id| C3 --+- 13505155 | 2021-03-27 12:01:00 13505187 | 2021-03-27 12:03:00 13505295 | 2021-03-27 12:06:00 13505348 | 2021-03-27 12:09:00 13505552 | 2021-03-27 12:11:00 (5 rows) *Time: 481.826 ms* If I run the same query as a nested select I get similar results (Q1): *SELECT "Project5".idFROM (* SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" FROM "dbo"."ng_content" AS "Extent1" INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" = "Extent2"."id_content" WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS timestamp) AND 2 = "Extent1"."id_status" AND EXISTS ( SELECT 1 AS "C1" FROM ( SELECT "Extent3"."TagId" FROM "dbo"."ngx_tag_content" AS "Extent3" WHERE "Extent1"."id" = "Extent3"."ContentId" ) AS "Project1" WHERE EXISTS ( SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS "SingleRowTable1" WHERE "Project1"."TagId" = 337139) ) AND ("Extent2"."id_path" IN (27495,27554,27555) AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS "SingleRowTable2" WHERE TRUE = FALSE) ) *) AS "Project5";* id -- 13505155 13505187 13505295 13505348 13505552 (5 rows) *Time: 486.174 ms* But if I add an ORDER BY and a LIMIT something goes very wrong (Q2): # SELECT "Project5".id FROM ( SELECT "Extent1"."id", CAST ("Extent1"."date" AS timestamp) AS "C3" FROM "dbo"."ng_content" AS "Extent1" INNER JOIN "dbo"."ng_path_content" AS "Extent2" ON "Extent1"."id" = "Extent2"."id_content" WHERE "Extent1"."date_from" <= CAST (LOCALTIMESTAMP AS timestamp) AND "Extent1"."date_to" >= CAST (LOCALTIMESTAMP AS timestamp) AND 2 = "Extent1"."id_status" AND EXISTS ( SELECT 1 AS "C1" FROM ( SELECT "Extent3"."TagId" FROM "dbo"."ngx_tag_content" AS "Extent3" WHERE "Extent1"."id" = "Extent3"."ContentId" ) AS "Project1" WHERE EXISTS ( SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS "SingleRowTable1" WHERE "Project1"."TagId" = 337139) ) AND ("Extent2"."id_path" IN (27495,27554,27555) AND NOT EXISTS (SELECT 1 AS "C1" FROM (SELECT 1 AS "C") AS "SingleRowTable2" WHERE TRUE = FALSE) ) ) AS "Project5" *ORDER BY "Project5"."C3" DESC LIMIT 6*; id -- 13505552 13505348 13505295 13505187 13505155 (5 rows) *Time: 389375.374 ms (06:29.375)* An EXPLAIN (ANALYZE, BUFFERS) for Q1 returns this: QUERY PLAN --- Result (cost=661.07..826757.96 rows=27943 width=4) (actual time=446.767..492.874 rows=5 loops=1) One-Time Filter: (NOT $1) Buffers: shared hit=344618 read=17702 written=349 InitPlan 2 (returns $1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1) One-Time Filter: false -> Nested Loop (cost=661.07..826757.96 rows=27943 width=4) (actual time=267.061..313.166 rows=5 loops=1) Buffers: shared hit=34461