On Mon, Nov 21, 2022 at 6:17 PM Maxim Orlov <orlo...@gmail.com> wrote: > > PROBLEM > > After some investigation, I think, the problem is in the snapbuild.c (commit > 272248a0c1b1, see [0]). We do allocate InitialRunningXacts > array in the context of builder->context, but for the time when we call > SnapBuildPurgeOlderTxn this context may be already free'd. >
I think you are seeing it freed in SnapBuildPurgeOlderTxn when we finish and restart decoding in the same session. After finishing the first decoding, it frees the decoding context but we forgot to reset NInitialRunningXacts and InitialRunningXacts array. So, next time when we start decoding in the same session where we don't restore any serialized snapshot, it can lead to the problem you are seeing because NInitialRunningXacts (and InitialRunningXacts array) won't have sane values. This can happen in the catalog_change_snapshot test as we have multiple permutations and those use the same session across a restart of decoding. > > Simple fix like: > @@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr > lsn, xl_running_xacts *runn > * changes. See SnapBuildXidSetCatalogChanges. > */ > NInitialRunningXacts = nxacts; > - InitialRunningXacts = MemoryContextAlloc(builder->context, > sz); > + InitialRunningXacts = MemoryContextAlloc(TopMemoryContext, > sz); > memcpy(InitialRunningXacts, running->xids, sz); > qsort(InitialRunningXacts, nxacts, sizeof(TransactionId), > xidComparator); > > seems to solve the described problem, but I'm not in the context of [0] and > why array is allocated in builder->context. > It will leak the memory for InitialRunningXacts. We need to reset NInitialRunningXacts (and InitialRunningXacts) as mentioned above. Thank you for the report and initial analysis. I have added Sawada-San to know his views as he was the primary author of this work. -- With Regards, Amit Kapila.