po 6. 12. 2021 v 18:21 odesÃlatel Francisco Olarte <fola...@peoplecall.com> napsal:
> On Mon, 6 Dec 2021 at 18:03, Alan Hodgson <ahodg...@lists.simkin.ca> > wrote: > ... > > The table has nearly 29 million records. 5069 of them match > shipment_import_id = 5090609. There is an index on shipment_import_id, > which the planner happily uses without the LIMIT specifically. Yet with it > the query planner will always do something like: > > > > # explain SELECT "shipment_import_records".* FROM > "shipment_import_records" WHERE > "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY > "shipment_import_records"."id" ASC LIMIT 1; > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------------- > > Limit (cost=0.44..873.35 rows=1 width=243) > > -> Index Scan using shipment_import_records_pkey on > shipment_import_records (cost=0.44..5122227.70 rows=5868 width=243) > > Filter: (shipment_import_id = 5090609) > > .. which takes minutes. > > Can you post an explain analyze? To me it seems like the planner > thinks shipment_import_id is randomly distributed and the table is > well correlated with it's PK, so scanning it for the first id should > be fast. > > But from the names of the field you may have correlation between > shipment_import_id and id hidden somewhere ( like they are two serial > growing together, you query for the latest shipment ids and it scans > all the table ). An explain analyze should show that ( or three, one > for that shipment import id, one for 1, one for a really big one ) > > > Just wondering if there's a knob I can turn to make these more likely to > work without constantly implementing workarounds? > > You may try a composite index. > +1 These issues can be solved by composite indexes. The low limit clause deforms costs and when the data are not really random, then index scan can be too long. > Francisco Olarte. > > >