On Mon, 2026-02-16 at 14:43 +0530, yudhi s wrote: > On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <[email protected]> wrote: > > On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote: > > > It's postgres version 17. We are having a critical UI query which runs > > > for ~7 seconds+. The requirement is to bring down the response time > > > within ~1 sec. Now in this plan , If i read this correctly, the below > > > section is consuming a significant amount of resources and should be > > > addressed. i.e. "Full scan of table "orders" and Nested loop with > > > event_audit_log table". > > > > > > Below is the query and its complete plan:- > > > https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9 > > > > > > I am a bit new to the indexing strategy in postgres. My question is, what > > > suitable index should we create to cater these above? > > > > > > 1)For table event_audit_log:- Should we create composite Index on column > > > (request_id,created_at,event_comment_text) or should we create the > > > covering index i.e. just on two column (request_id,created_at) with > > > "include" clause for "event_comment_text". How and when the covering > > > index indexes should be used here in postgres. Want to understand from > > > experts? > > > 2)Similarly for table orders:- Should we create a covering index on > > > column (entity_id,due_date,order_type) with include clause > > > (firm_dspt_case_id). Or just a composite index > > > (entity_id,due_date,order_type). > > > 3)Whether the column used as range operator (here created_at or due_date) > > > should be used as leading column in the composite index or is it fine to > > > keep it as non leading? > > > > > > -> Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual > > > time=280.735..7065.313 rows=57943 loops=3) > > > Buffers: shared hit=10014901 > > > -> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual > > > time=196.407..3805.755 rows=278131 loops=3) > > > Hash Cond: ((ord.entity_id)::numeric = e.entity_id) > > > Buffers: shared hit=755352 > > > -> Parallel Seq Scan on orders ord (cost=0.00..1022872.54 rows=3672860 > > > width=16) (actual time=139.883..3152.627 rows=2944671 loops=3) > > > Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= > > > '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[]))) > > > Rows Removed by Filter: 6572678 > > > Buffers: shared hit=755208 > > > > You are selecting a lot of rows, so the query will never be really cheap. > > But I agree that an index scan should be a win. > > > > If the condition on "order_type" is always the same, a partial index is > > ideal: > > > > CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', > > 'TYPE_B'); > > > > Otherwise, I'd create two indexes: one on "order_type" and one on > > "due_date". > > Version is 17.7. Below is the table definitions as i pulled from Dbeaver > tool:- > > https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e > > The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is > the distribution. > So , it looks like the index on this column will not help much. Correct me if > I'm wrong. > > TYPE_A 25 Million > TYPE_B 2 Million > TYPE_C 700K > TYPE_D 200K > TYPE_E 6k
No, you are right about that. > I am wondering why the already existing index on column "due_date" of table > "order" is not > getting used by the optimizer? Should we also add the column "entity_id" to > the index too? Seeing that your execution plan is incomplete, it is hard to say anything about that. The scans of "entities" are missing, as is the UNION. > And, Yes there are differences in data types of the "entity_id" for columns > of table "order" > and "entity". We need to fix that after analyzing the data. > > Also the highlighted Nested loop above shows ~10M shared hits (which will be > ~70GB+ if we > consider one hit as an 8K block). So does that mean , apart from the Full > scan on the "order" > table , the main resource consuming factor here is the scanning of > "event_audit_log". Correct. Yours, Laurenz Albe
