On Fri, 29 Mar 2019 at 16:32, Andres Freund <and...@anarazel.de> wrote:

> On 2019-03-29 16:20:54 +0000, Simon Riggs wrote:
> > On Fri, 29 Mar 2019 at 16:12, Andres Freund <and...@anarazel.de> wrote:
> >
> >
> > > On 2019-03-29 15:58:14 +0000, Simon Riggs wrote:
> > > > On Fri, 29 Mar 2019 at 15:29, Andres Freund <and...@anarazel.de>
> wrote:
> > > > > That's far from a trivial feature imo. It seems quite possible that
> > > we'd
> > > > > end up with increased overhead, because the current logic can get
> away
> > > > > with only doing hint bit style writes - but would that be true if
> we
> > > > > started actually replacing the item pointers? Because I don't see
> any
> > > > > guarantee they couldn't cross a page boundary etc? So I think we'd
> need
> > > > > to do WAL logging during index searches, which seems prohibitively
> > > > > expensive.
> > > > >
> > > >
> > > > Don't see that.
> > > >
> > > > I was talking about reusing the first 4 bytes of an index tuple's
> > > > ItemPointerData,
> > > > which is the first field of an index tuple. Index tuples are
> MAXALIGNed,
> > > so
> > > > I can't see how that would ever cross a page boundary.
> > >
> > > They're 8 bytes, and MAXALIGN often is 4 bytes:
> > >
> >
> > xids are 4 bytes, so we're good.
>
> I literally went on to explain why that's not sufficient? You can't
> *just* replace the block number with an xid. You *also* need to set a
> flag denoting that it's an xid and dead now. Which can't fit in the same
> 4 bytes.  You either have to set it in the IndexTuple's t_info, or or in
> the ItemIdData's lp_flags. And both can be on a different sectors.  If
> the flag persists, and the xid doesn't you're going to interpret a block
> number as an xid - not great; but even worse, if the xid survives, but
> the flag doesn't, you're going to access the xid as a block - definitely
> not ok, because you're going to return wrong results.
>

Yes, I agree, I was thinking the same thing after my last comment, but was
afk. The idea requires the atomic update of at least 4 bytes plus at least
1 bit and so would require at least 8byte MAXALIGN to be useful. Your other
points suggesting that multiple things all need to be set accurately for
this to work aren't correct. The idea was that we would write a hint that
would avoid later I/O, so the overall idea is still viable.

Anyway, thinking some more, I think the whole idea of generating
lastRemovedXid is moot and there are better ways in the future of doing
this that would avoid a performance issue altogether. Clearly not PG12.

The main issue relates to the potential overhead of moving this to the
master. I agree its the right thing to do, but we should have some way of
checking it is not a performance issue in practice.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to