Amit Kapila <amit.kapil...@gmail.com> writes: > So, can we assume that the current code can only cause the problem in > CCA builds bot not in any practical scenario because after having a > lock on relation probably there shouldn't be any invalidation which > leads to this problem?
No. The reason we expend so much time and effort on CCA testing is that cache flushes are unpredictable, and they can happen even when you have a lock on the object(s) in question. In particular, the easiest real-world case that could cause the described problem is an sinval queue overrun that we detect during the GetSubscriptionRelState call at the bottom of logicalrep_rel_open. In that case we'll come back to the caller with the LogicalRepRelMapEntry marked as already needing revalidation, because the cache inval callback will have marked *all* of them that way. That's actually a harmless condition, because we have lock on the rel so nothing really changed ... but if we blew away localreloid and the caller needs to use that, kaboom. We could imagine marking the entry valid at the very bottom of logicalrep_rel_open, but that just moves the problem somewhere else. Any caller that does *any* catalog access while holding open a LogicalRepRelMapEntry would not be able to rely on its localreloid staying valid. That's a recipe for irreproducible bugs, and it's unnecessary. In practice the entry is good as long as we continue to hold a lock on the local relation. So we should mark LogicalRepRelMapEntries as potentially-needing-revalidation in a way that doesn't interfere with active users of the entry. In short: the value of CCA testing is to model sinval overruns happening at any point where they could happen. The real-world odds of one happening at any given instant are low, but they're never zero. regards, tom lane