On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
>
> >
> > After an intentionally created crash, I get an Assert triggering:
> >
> > TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
> > (1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553)
> >
> > freep[0] is zero and bitmapbit is 16.
> >
>
> Here what is happening is that when it tries to clear the bitmapbit,
> it expects it to be set.  Now, I think the reason for why it didn't
> find the bit as set could be that after the new overflow page is added
> and the bit corresponding to it is set, you might have crashed the
> system and the replay would not have set the bit.  Then while freeing
> the overflow page it can hit the Assert as mentioned by you.  I think
> the problem here could be that I am using REGBUF_STANDARD to log the
> bitmap page updates which seems to be causing the issue.  As bitmap
> page doesn't follow the standard page layout, it would have omitted
> the actual contents while taking full page image and then during
> replay, it would not have set the bit, because page doesn't need REDO.
> I think here the fix is to use REGBUF_NO_IMAGE as we use for vm
> buffers.
>
> If you can send me the detailed steps for how you have produced the
> problem, then I can verify after fixing whether you are seeing the
> same problem or something else.
>


The test is rather awkward, it might be easier to just have me test it.
But, I've attached it.

There is a patch that needs to applied and compiled (alongside your
patches, of course), to inject the crashes.  A perl script which creates
the schema and does the updates.  And a shell script which sets up the
cluster with the appropriate parameters, and then calls the perl script in
a loop.

The top of the shell script has some hard coded paths to the binaries, and
to the test data directory (which is automatically deleted)

I run it like "sh do.sh >& do.err &"

It gives two different types of assertion failures:

$ fgrep TRAP: do.err |sort|uniq -c

     21 TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
(1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553)
     32 TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c",
Line: 2506)

The second one is related to the intentional crashes, and so is not
relevant to you.

Cheers,

Jeff

Attachment: count.pl
Description: Binary data

Attachment: crash_REL10.patch
Description: Binary data

Attachment: do.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to