(Redirected from Bugs to Hackers, as it seems not to be the same bug as the original one.)
On Mon, Jun 9, 2014 at 3:50 PM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Jeff Janes wrote: > >> This patch seems to solve a problem I've also been having with non-existent >> "pg_multixact/members/13D35" files in my testing of torn page write and fk >> locks again recent 9.4. However, sometimes I've found the problem even >> before multi wrap around occured, and my reading of this thread suggests >> that the current issue would not show up in that case, so I don't know how >> much comfort to take in the knowledge that the problem seems to have gone >> away--as they say, things that go away by themselves without explanation >> also tend to come back by themselves. > > I think this problem is rather related to the code touched by commit > 722acf51a0d074d19782ad7e97ebe3fdb63fbb87, which is about removing > unneeded files in pg_multixact/members. I checked your files, and it > seems that pg_multixact/members files that should still exist are gone; > for instance, pg_controldata says that the oldest multi is 2586314407, > but if I run > select * from pg_get_multixact_members('that value') > I get an error saying that pg_multixact/members/218B does not exist. > > It seems that there are a lot of files that were improperly removed. > I expect that existing files should be sequential, but there are a lot > of holes in a listing of pg_multixact/members. I think the holes are because a truncation was in progress and interrupted by a 'pg_ctl stop -mi' before it completed. The files are considered in an apparently random order, so if it is interrupted midway through it will leave a collection of holes that would have been closed up had it been allowed to continue. I instrumented the code to log file deletion (attached patch) and I see that the it starts complaining about the absence of 5183 just a fraction of a second after it was deleted, so the deletion must have been inappropriate. 12538 2014-06-17 12:10:02.925 PDT:LOG: JJ deleting 0xb66b20 5183 12498 UPDATE 2014-06-17 12:10:03.188 PDT:DETAIL: Could not open file "pg_multixact/members/5183": No such file or directory. 12561 UPDATE 2014-06-17 12:10:04.473 PDT:DETAIL: Could not open file "pg_multixact/members/5183": No such file or directory. 12572 UPDATE 2014-06-17 12:10:04.475 PDT:DETAIL: Could not open file "pg_multixact/members/5183": No such file or directory. This problem was initially fairly easy to reproduce, but since I started adding instrumentation specifically to catch it, it has become devilishly hard to reproduce. I think my next step will be to also log each of the values which goes into the complex if (...) expression that decides on the deletion. Cheers, Jeff
member_delete_log.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers