On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund <and...@anarazel.de> wrote: > But what *definitely* looks wrong to me is that a TruncateMultiXact() in > this scenario now (since a couple weeks ago) does a > SimpleLruReadPage_ReadOnly() in the members slru via > find_multixact_start(). That just won't work acceptably when we're not > yet consistent. There very well could not be a valid members segment at > that point? Am I missing something?
Yes: that code isn't new. TruncateMultiXact() called SimpleLruReadPage_ReadOnly() directly in 9.3.0 and every subsequent release until 9.3.7/9.4.2. The only thing that's changed is that we've moved that logic into a function called find_multixact_start() instead of having it directly inside that function. We did that because we needed to use the same logic in some other places. The reason why 9.3.7/9.4.2 are causing problems for people that they didn't have previously is because those new, additional call sites were poorly chosen and didn't include adequate protection against calling that function with an invalid input value. What this patch is about is getting back to the situation that we were in from 9.3.0 - 9.3.6 and 9.4.0 - 9.4.1, where TruncateMultiXact() did the thing that you're complaining about here but no one else did. >From my point of view, I think that you are absolutely right to question what's going on in TruncateMultiXact(). It's kooky, and there may well be bugs buried there. But I don't think fixing that should be the priority right now, because we have zero reports of problems attributable to that logic. I think the priority should be on undoing the damage that we did in 9.3.7/9.4.2, when we made other places to do the same thing. We started getting trouble reports attributable to those changes *almost immediately*, which means that whether or not TruncateMultiXact() is broken, these new call sites definitely are. I think we really need to fix those new places ASAP. > I think at the very least we'll have to skip this step while not yet > consistent. That really sucks, because we'll possibly end up with > multixacts that are completely filled by the time we've reached > consistency. That would be a departure from the behavior of every existing release that includes this code based on, to my knowledge, zero trouble reports. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers