On Tue, Sep 29, 2020 at 1:08 PM Heikki Linnakangas <hlinn...@iki.fi> wrote:
> On 29/09/2020 10:06, Hamid Akhtar wrote: > > In one of my earlier emails [1], I mentioned that there seems to be a > > problem with how the cost for index only scans is being calculated. > > [1] > > > https://www.postgresql.org/message-id/CANugjhsnh0OBMOYc7qKcC%2BZsVvAXCeF7QiidLuFvg6zmHy1C7A%40mail.gmail.com > > > > My concern is that there seems to be a bigger disconnect between the > > cost of index only scan and the execution time. Having tested this on 3 > > different systems, docker, laptop and a server with RAID 5 SSD > > configured, at the threshold where index only scan cost exceeds that of > > sequential scan, index only is still around 30% faster than the > > sequential scan. > > A 30% discrepancy doesn't sound too bad, to be honest. The exact > threshold depends on so many factors. > > > My initial hunch was that perhaps we need to consider a different > > approach when considering cost for index only scan. However, the > > solution seems somewhat simple. > > > > cost_index function in costsize.c, in case of indexonlyscan, multiplies > > the number of pages fetched by a factor of (1.0 - baserel->allvisfrac) > > which is then used to calculate the max_IO_cost and min_IO_cost. > > > > This is very similar to the cost estimate methods for indexes internally > > call genericostesimate function. This function primarily gets the number > > of pages for the indexes and multiplies that with random page cost > > (spc_random_page_cost) to get the total disk access cost. > > > > I believe that in case of index only scan, we should adjust the > > spc_random_page_cost in context of baserel->allvisfrac so that it > > accounts for random pages for only the fraction that needs to be read > > for the relation and excludes that the index page fetches. > > That doesn't sound right to me. The genericcostestimate() function > calculates the number of *index* pages fetched. It makes no difference > if it's an Index Scan or an Index Only Scan. > > genericcostestimate() could surely be made smarter. Currently, it > multiplies the number of index pages fetched with random_page_cost, even > though a freshly created index is mostly physically ordered by the keys. > seq_page_cost with some fudge factor might be more appropriate, whether > or not it's an Index Only Scan. Not sure what the exact formula should > be, just replacing random_page_cost with seq_page_cost is surely not > right either. > > - Heikki > So, not actually random replacement here, rather a change with baserel->allvisfrac taken into consideration (as given below): ---- index_random_page_cost = Min(spc_seq_page_cost + spc_random_page_cost * (1.0 - baserel->allvisfrac), spc_random_page_cost); ---- Does this make sense? -- Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca ADDR: 10318 WHALLEY BLVD, Surrey, BC CELL:+923335449950 EMAIL: mailto:hamid.akh...@highgo.ca SKYPE: engineeredvirus