On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > > > > > After an intentionally created crash, I get an Assert triggering: > > > > TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] & > > (1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553) > > > > freep[0] is zero and bitmapbit is 16. > > > > Here what is happening is that when it tries to clear the bitmapbit, > it expects it to be set. Now, I think the reason for why it didn't > find the bit as set could be that after the new overflow page is added > and the bit corresponding to it is set, you might have crashed the > system and the replay would not have set the bit. Then while freeing > the overflow page it can hit the Assert as mentioned by you. I think > the problem here could be that I am using REGBUF_STANDARD to log the > bitmap page updates which seems to be causing the issue. As bitmap > page doesn't follow the standard page layout, it would have omitted > the actual contents while taking full page image and then during > replay, it would not have set the bit, because page doesn't need REDO. > I think here the fix is to use REGBUF_NO_IMAGE as we use for vm > buffers. > > If you can send me the detailed steps for how you have produced the > problem, then I can verify after fixing whether you are seeing the > same problem or something else. > The test is rather awkward, it might be easier to just have me test it. But, I've attached it. There is a patch that needs to applied and compiled (alongside your patches, of course), to inject the crashes. A perl script which creates the schema and does the updates. And a shell script which sets up the cluster with the appropriate parameters, and then calls the perl script in a loop. The top of the shell script has some hard coded paths to the binaries, and to the test data directory (which is automatically deleted) I run it like "sh do.sh >& do.err &" It gives two different types of assertion failures: $ fgrep TRAP: do.err |sort|uniq -c 21 TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] & (1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553) 32 TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2506) The second one is related to the intentional crashes, and so is not relevant to you. Cheers, Jeff
count.pl
Description: Binary data
crash_REL10.patch
Description: Binary data
do.sh
Description: Bourne shell script
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers