> They are sorted by order_id only within sets of the same shipping_date, which 
> is not good enough.  

Ah yes, that totally makes sense for the general case.

> so maybe you can just tweak that application to test if the start and end 
> dates are the same and use equality when they are.

I definitely can.

But now I have a followup question, which probably should have been a separate 
question all along. I have modified the example a bit to have a more natural 
date distribution and I got rid of the weird shipping_date condition and 
actually made it different dates, so the index order is out of the picture. I 
also added some statistics so Postgres knows about the relationship between the 
columns.

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=54c7774432e896e3c0e89d8084c4b194

After inserting more rows, Postgres still chooses a scan on the primary key 
instead of using the index.

Limit  (cost=0.43..296.63 rows=50 width=4) (actual time=1052.692..1052.737 
rows=50 loops=1)
  ->  Index Scan using orders_test_pkey on orders_test  (cost=0.43..71149.43 
rows=12010 width=4) (actual time=1052.690..1052.728 rows=50 loops=1)
        Filter: ((shipping_date >= '2022-04-30'::date) AND (shipping_date <= 
'2022-05-01'::date))
        Rows Removed by Filter: 1998734

By setting the CPU costs to 0 (last block in the fiddle) I can force the use of 
the previous plan and as I already suspected it is much better:

Limit  (cost=101.00..101.00 rows=50 width=4) (actual time=4.835..4.843 rows=50 
loops=1)
  ->  Sort  (cost=101.00..101.00 rows=12010 width=4) (actual time=4.833..4.837 
rows=50 loops=1)
        Sort Key: order_id
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Index Scan using orders_test_shipping_date_idx on orders_test  
(cost=0.00..101.00 rows=12010 width=4) (actual time=0.026..3.339 rows=11266 
loops=1)
              Index Cond: ((shipping_date >= '2022-04-30'::date) AND 
(shipping_date <= '2022-05-01'::date))

Is it overestimating the cost of the sorting?




Reply via email to