Re: Improved Cost Calculation for IndexOnlyScan

Hamid Akhtar Tue, 29 Sep 2020 01:50:41 -0700

On Tue, Sep 29, 2020 at 1:08 PM Heikki Linnakangas <hlinn...@iki.fi> wrote:


> On 29/09/2020 10:06, Hamid Akhtar wrote:
> > In one of my earlier emails [1], I mentioned that there seems to be a
> > problem with how the cost for index only scans is being calculated.
> > [1]
> >
> https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com
> >
> > My concern is that there seems to be a bigger disconnect between the
> > cost of index only scan and the execution time. Having tested this on 3
> > different systems, docker, laptop and a server with RAID 5 SSD
> > configured, at the threshold where index only scan cost exceeds that of
> > sequential scan, index only is still around 30% faster than the
> > sequential scan.
>
> A 30% discrepancy doesn't sound too bad, to be honest. The exact
> threshold depends on so many factors.
>
> > My initial hunch was that perhaps we need to consider a different
> > approach when considering cost for index only scan. However, the
> > solution seems somewhat simple.
> >
> > cost_index function in costsize.c, in case of indexonlyscan, multiplies
> > the number of pages fetched by a factor of (1.0 - baserel->allvisfrac)
> > which is then used to calculate the max_IO_cost and min_IO_cost.
> >
> > This is very similar to the cost estimate methods for indexes internally
> > call genericostesimate function. This function primarily gets the number
> > of pages for the indexes and multiplies that with random page cost
> > (spc_random_page_cost) to get the total disk access cost.
> >
> > I believe that in case of index only scan, we should adjust the
> > spc_random_page_cost in context of baserel->allvisfrac so that it
> > accounts for random pages for only the fraction that needs to be read
> > for the relation and excludes that the index page fetches.
>
> That doesn't sound right to me. The genericcostestimate() function
> calculates the number of *index* pages fetched. It makes no difference
> if it's an Index Scan or an Index Only Scan.
>
> genericcostestimate() could surely be made smarter. Currently, it
> multiplies the number of index pages fetched with random_page_cost, even
> though a freshly created index is mostly physically ordered by the keys.
> seq_page_cost with some fudge factor might be more appropriate, whether
> or not it's an Index Only Scan. Not sure what the exact formula should
> be, just replacing random_page_cost with seq_page_cost is surely not
> right either.
>
> - Heikki
>

So, not actually random replacement here, rather a change with
baserel->allvisfrac taken into consideration (as given below):
----
index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost *
(1.0 - baserel->allvisfrac), spc_random_page_cost);
----

Does this make sense?

-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Improved Cost Calculation for IndexOnlyScan

Reply via email to