On Wed, Mar 12, 2014 at 8:29 PM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote:
> On 03/12/2014 12:09 AM, Tomas Vondra wrote: > >> Hi all, >> >> a quick question that just occured to me - do you plan to tweak the cost >> estimation fot GIN indexes, in this patch? >> >> IMHO it would be appropriate, given the improvements and gains, but it >> seems to me gincostestimate() was not touched by this patch. >> > > Good point. We have done two major changes to GIN in this release cycle: > changed the data page format and made it possible to skip items without > fetching all the keys ("fast scan"). gincostestimate doesn't know about > either change. > > Adjusting gincostestimate for the more compact data page format seems > easy. When I hacked on that, I assumed all along that gincostestimate > doesn't need to be changed as the index will just be smaller, which will be > taken into account automatically. But now that I look at gincostestimate, > it assumes that the size of one item on a posting tree page is a constant 6 > bytes (SizeOfIptrData), which is no longer true. I'll go fix that. > > Adjusting for the effects of skipping is harder. gincostestimate needs to > do the same preparation steps as startScanKey: sort the query keys by > frequency, and call consistent function to split the keys intao "required" > and "additional" sets. And then model that the "additional" entries only > need to be fetched when the other keys match. That's doable in principle, > but requires a bunch of extra code. > > Alexander, any thoughts on that? It's getting awfully late to add new code > for that, but it sure would be nice somehow take fast scan into account. Preparation we do in startScanKey requires knowledge of estimate size of posting lists/trees. We do this estimate by traversal to leaf pages. I think gincostestimate is expected to be way more cheap. So, we probably need so more rough estimate there, don't we? ------ With best regards, Alexander Korotkov.