Hi, (sorry for not being active here, I am still catching up after being away for some family issues)
On 16/11/17 21:12, Robert Haas wrote: > On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund <and...@anarazel.de> wrote: >>> To me, it seems like SnapBuildWaitSnapshot() is fundamentally >>> misdesigned >> >> Maybe I'm confused, but why is it fundamentally misdesigned? It's not >> such an absurd idea to wait for an xid in a WAL record. I get that >> there's a race condition here, which obviously bad, but I don't really >> see as evidence of the above claim. >> >> I actually think this code used to be safe because ProcArrayLock used to >> be held while generating and logging the running snapshots record. That >> was removed when fixing some other bug, but perhaps that shouldn't have >> been done... > > OK. Well, I might be overstating the case. My comment about > fundamental misdesign was really just based on the assumption that > XactLockTableWait() could be used to wait for an XID the instant it > was generated. That was never gonna work and there's no obvious clean > workaround for the problem. Getting snapshot building to work > properly seems to be Hard (TM). > No kidding, we've been at it since 9.4. But anyway, while sure we have race condition because XactLockTableWait() finishes too early, all that should mean is we call LogStandbySnapshot() too early and as a result taking snapshot (and hence slot creation) will take longer as we'll wait for next call of LogStandbySnapshot() from some other caller (because the tx we care about will still be running). The whole SnapBuildWaitSnapshot() is just optimization to make slot creation take less time (and also to be able to write tests). What I don't understand is how it leads to crash (and I could not reproduce it using the pgbench file attached in this thread either) and moreover how it leads to 0 xid being logged. The only explanation I can come up is that some kind of similar race has to be in LogStandbySnapshot() but we explicitly check for 0 xid value there. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services