On Mon, Nov 13, 2017 at 5:07 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: >> I've been suspicious of that commit (and related commits) for a while >> now [1]. I think that it explains a couple of different problem >> reports that we have seen. > > Yeah, the problem here is that vacuum and analyze don't acquire a > heavy weight lock for meta page using properly function. it seems not > relevant with that problem.
One thing that really bothers me about commit e2c79e14 is that LockPage() is called, not LockBuffer(). GIN had no LockPage() calls before that commit, and is now the only code in the entire system that calls LockPage()/ConditionalLockPage() (the hash am no longer uses page heavyweight locks following recent work there). Historically, the only reason that an AM would ever call LockPage() instead of LockBuffer() (at least after LWLocks were introduced many years ago) was that there was a small though acceptable risk of deadlock for concurrent inserters. It would hardly ever happen, but the possibility could not be ruled out, so deadlock detection was required. This was definitely true of hash, which is one reason why it was a second class index am for so long. I think this was even true of nbtree, up until about 15 years ago. The high level question that I have to ask about e2c79e14 is: can it deadlock? If not, why was LockPage() used at all? It seems like a bad sign that none of this is explained in the code. My guess is that bugs in this area have caused data corruption (not just undetectable deadlock issues), which was reported and commenting on elsewhere [1]; maybe this proposed fix of yours could have prevented that. But we clearly still need to do a careful analysis of e2c79e14, because it seems like it probably has fundamental design problems (it's not *just* buggy). [1] https://postgr.es/m/CAH2-WznBt2+7qc65btjxNNwa9BW+jKEBgVjb=f+26iuqhmy...@mail.gmail.com -- Peter Geoghegan