Re: POC: Cleaning up orphaned files using undo logs

Amit Kapila Sat, 14 Nov 2020 21:55:22 -0800

On Sun, Nov 15, 2020 at 11:24 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Fri, Nov 13, 2020 at 6:02 PM Antonin Houska <a...@cybertec.at> wrote:
> >
> > Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > > On Thu, Nov 12, 2020 at 2:45 PM Antonin Houska <a...@cybertec.at> wrote:
> > > >
> > > >
> > > > No background undo
> > > > ------------------
> > > >
> > > > Reduced complexity of the patch seems to be the priority at the moment. 
> > > > Amit
> > > > suggested that cleanup of an orphaned relation file is simple enough to 
> > > > be
> > > > done on foreground and I agree.
> > > >
> > >
> > > Yeah, I think we should try and see if we can make it work but I
> > > noticed that there are few places like AbortOutOfAnyTransaction where
> > > we have the assumption that undo will be executed in the background.
> > > We need to deal with it.
> >
> > I think this is o.k. if we always check for unapplied undo during startup.
> >
>
> Hmm, how it is ok to leave undo (and rely on startup) unless it is a
> PANIC error. IIRC, this path is invoked in non-panic errors as well.
> Basically, we won't be able to discard such an undo which doesn't seem
> like a good idea.
>
> > > > "undo worker" is still there, but it only processes undo requests after 
> > > > server
> > > > restart because relation data can only be changed in a transaction - it 
> > > > seems
> > > > cleaner to launch a background worker for this than to hack the startup
> > > > process.
> > > >
> > >
> > > But, I see there are still multiple undoworkers that are getting
> > > launched and I am not sure if that works correctly because a
> > > particular undoworker is connected to a database and then it starts
> > > processing all the pending undo.
> >
> > Each undo worker applies only transactions for its own database, see
> > ProcessExistingUndoRequests():
> >
> >         /* We can only process undo of the database we are connected to. */
> >         if (xact_hdr.dboid != MyDatabaseId)
> >                 continue;
> >
> > Nevertheless, as I've just mentioned in my response to Thomas, I admit that 
> > we
> > should try to live w/o the undo worker altogether.
> >
>
> Okay, but keep in mind that there could be a large amount of undo
> (unlike redo which has some limit as we can replay it from the last
> checkpoint) which needs to be processed but it might be okay to live
> with that for now. Another thing is that it seems we need to connect
> to the database to perform it which might appear a bit odd that we
> don't allow users to connect to the database but internally we are
> connecting it. These are just some points to consider while finalizing
> the solution to this.
>
> > > > Discarding
> > > > ----------
> > > >
> > > > There's no discard worker for the URS infrastructure yet. I thought 
> > > > about
> > > > discarding the undo log during checkpoint, but checkpoint should 
> > > > probably do
> > > > more straightforward tasks than the calculation of a new discard 
> > > > pointer for
> > > > each undo log, so a background worker is needed. A few notes on that:
> > > >
> > > >   * until the zheap AM gets added, only the transaction that creates 
> > > > the undo
> > > >     records needs to access them. This assumption should make the 
> > > > discarding
> > > >     algorithm a bit simpler. Note that with zheap, the other 
> > > > transactions need
> > > >     to look for old versions of tuples, so the concept of 
> > > > oldestXidHavingUndo
> > > >     variable is needed there.
> > > >
> > > >   * it's rather simple to pass pointer the URS pointer to the discard 
> > > > worker
> > > >     when transaction either committed or the undo has been executed.
> > > >
> > >
> > > Why can't we have a separate discard worker which keeps on scanning
> > > the undorecords and discard accordingly? Giving the onus of foreground
> > > process might be tricky because say discard worker is not up to speed
> > > and we ran out of space to pass such information for each commit/abort
> > > request.
> >
> > Sure, there should be a discard worker. The question is how to make its work
> > efficient. The initial run after restart probably needs to scan everything
> > between 'discard' and 'insert' pointers,
> >
>
> Yeah, such an initial scan would be helpful to identify pending aborts
> and allow them to be processed.
>
> > but then it should process only the
> > parts created by individual transactions.
> >
>
> Yeah, it needs to process transaction-by-transaction to see which all
> we can discard. Also, note that in Single-User mode we need to discard
> undo after commit. I think we also need to maintain
> oldestXidHavingUndo for CLOG truncation and transaction-wraparound. We
> can't allow CLOG truncation for the transaction whose undo is not
> discarded as that could be required by some other transaction. For
> similar reasons, we can't allow transaction-wraparound and we need to
> integrate this into the existing xid-allocation mechanism. I have
> found one of the old patch
> (Allow-execution-and-discard-of-undo-by-background-wo) attached
>


oops, forgot to attach the patch, doing now.

-- 
With Regards,
Amit Kapila.

Allow-execution-and-discard-of-undo-by-background-wo.patch
Description: Binary data

Re: POC: Cleaning up orphaned files using undo logs

Reply via email to