Matteo Beccati wrote:
> Tom, Alvaro
>
> >>The remaining question for me is, how do we sleep until the correct
> >>offset has been stored?
> >
> >I was thinking of just pg_usleep for some nominal time (1ms maybe)
> >and try to read the offsets page again. This is a corner case so
> >the performance doesn't have to be great.
>
> Let me know if you need to test some other patches.
Ok. I had hoped to reproduce the problem with pristine sources, in
order to verify that I was able to show it not appearing with my patch.
However I have been unable to create a situation in which the problem
appears. So I attach the patch that I came up with. Please test it.
I added a loop counter, to verify that we don't loop indefinitely. I'm
not sure that it's the best way to do it, but I'm too coward to leave it
without any check.
--
Alvaro Herrera Developer, http://www.PostgreSQL.org
"La soledad es compañía"
Index: src/backend/access/transam/multixact.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.9
diff -c -r1.9 multixact.c
*** src/backend/access/transam/multixact.c 15 Oct 2005 02:49:09 -0000
1.9
--- src/backend/access/transam/multixact.c 27 Oct 2005 21:39:04 -0000
***************
*** 798,804 ****
--- 798,807 ----
ExtendMultiXactMember(*offset, nxids);
+ /* Advance the offset counter, but don't leave it at 0. */
MultiXactState->nextOffset += nxids;
+ if (MultiXactState->nextOffset == 0)
+ MultiXactState->nextOffset = 1;
LWLockRelease(MultiXactGenLock);
***************
*** 829,834 ****
--- 832,838 ----
MultiXactId tmpMXact;
MultiXactOffset nextOffset;
TransactionId *ptr;
+ int j = 0;
debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
***************
*** 922,932 ****
--- 926,975 ----
pageno = MultiXactIdToOffsetPage(tmpMXact);
entryno = MultiXactIdToOffsetEntry(tmpMXact);
+ retry:
if (pageno != prev_pageno)
slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno,
tmpMXact);
offptr = (MultiXactOffset *)
MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
+
+ /*
+ * Note a possible race condition: when we create a new
MultiXact and
+ * store its info in MultiXactState, we release the
MultiXactGenLock
+ * before storing the offset in the SLRU area. It's thus
possible that
+ * we just got the offset that some other backend has been
assigned,
+ * but hasn't written on the SLRU page yet.
+ *
+ * One way to close this hole is to make the creating backend
hold
+ * MultiXactGenLock until the offset is stored. That would be
too bad
+ * a performance hit, however, so instead we choose to check
for this
+ * situation here: if we read a zero offset, sleep and retry,
until the
+ * other backend has had a chance to write the true offset.
+ *
+ * Because of this, we have to make sure offset 0 is never used.
+ */
+ if (*offptr == 0)
+ {
+ LWLockRelease(MultiXactOffsetControlLock);
+ pg_usleep(1000);
+
+ /*
+ * Note that since we released the OffsetControlLock,
we cannot be
+ * sure that the page we read is still on the buffer,
so we must
+ * force it to be read again.
+ */
+ prev_pageno = -1;
+ /*
+ * We are not sure that there aren't other bugs in this
code, so
+ * we refuse to iterate more than a minute's worth.
+ */
+ #define MAX_OFFSET_ITERATIONS (60 * 1000)
+ if (j++ > MAX_OFFSET_ITERATIONS)
+ elog(PANIC, "too many GetMultiXactIdMembers
iterations");
+
+ LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
+ goto retry;
+ }
length = *offptr - offset;
}
***************
*** 1200,1205 ****
--- 1243,1254 ----
/* Make sure we zero out the per-backend state */
MemSet(MultiXactState, 0, SHARED_MULTIXACT_STATE_SIZE);
+
+ /*
+ * zero is not a valid offset, so skip it. See notes in
+ * GetMultiXactIdMembers.
+ */
+ MultiXactState->nextOffset = 1;
}
else
Assert(found);
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend