On Fri, 29 Mar 2019 at 15:29, Andres Freund <and...@anarazel.de> wrote:
> On 2019-03-29 09:37:11 +0000, Simon Riggs wrote: > > > While trying to understand this, I see there is an even better way to > > optimize this. Since we are removing dead index tuples, we could alter > the > > killed index tuple interface so that it returns the xmax of the tuple > being > > marked as killed, rather than just a boolean to say it is dead. > > Wouldn't that quite possibly result in additional and unnecessary > conflicts? Right now the page level horizon is computed whenever the > page is actually reused, rather than when an item is marked as > deleted. As it stands right now, the computed horizons are commonly very > "old", because of that delay, leading to lower rates of conflicts. > I wasn't suggesting we change when the horizon is calculated, so no change there. The idea was to cache the data for later use, replacing the hint bit with a hint xid. That won't change the rate of conflicts, up or down - but it does avoid I/O. > > Indexes can then mark the killed tuples with the xmax that killed them > > rather than just a hint bit. This is possible since the index tuples > > are dead and cannot be used to follow the htid to the heap, so the > > htid is redundant and so the block number of the tid could be > > overwritten with the xmax, zeroing the itemid. Each killed item we > > mark with its xmax means one less heap fetch we need to perform when > > we delete the page - it's possible we optimize that away completely by > > doing this. > > That's far from a trivial feature imo. It seems quite possible that we'd > end up with increased overhead, because the current logic can get away > with only doing hint bit style writes - but would that be true if we > started actually replacing the item pointers? Because I don't see any > guarantee they couldn't cross a page boundary etc? So I think we'd need > to do WAL logging during index searches, which seems prohibitively > expensive. > Don't see that. I was talking about reusing the first 4 bytes of an index tuple's ItemPointerData, which is the first field of an index tuple. Index tuples are MAXALIGNed, so I can't see how that would ever cross a page boundary. > And I'm also doubtful it's worth it because: > > > Since this point of the code is clearly going to be a performance issue > it > > seems like something we should do now. > > I've tried quite a bit to find a workload where this matters, but after > avoiding redundant buffer accesses by sorting, and prefetching I was > unable to do so. What workload do you see where this would be really be > bad? Without the performance optimization I'd found a very minor > regression by trying to force the heap visits to happen in a pretty > random order, but after sorting that went away. I'm sure it's possible > to find a case on overloaded rotational disks where you'd find a small > regression, but I don't think it'd be particularly bad. > The code can do literally hundreds of random I/Os in an 8192 blocksize. What happens with 16 or 32kB? "Small regression" ? -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services