Re: [GENERAL] 2 node bdr setup gives error in replication slots

Nikhil Tue, 14 Jun 2016 00:47:05 -0700

I think its caused by hard reboots (may b hyper visor itself is rebooted!)
. Is there any setting which can reduce such problems ?


On Tue, Jun 7, 2016 at 5:30 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:

> On 7 June 2016 at 18:24, Nikhil <nikhilsme...@gmail.com> wrote:
>
>> I am getting below error in my 2 node BDR setup. postgres going down. any
>> idea?
>>
>> <35382016-06-07 10:16:59 GMT%LOG:  database system was interrupted; last
>> known up at 2016-06-07 09:06:44 GMT
>> <35382016-06-07 10:16:59 GMT%PANIC:  replication slot file
>> "pg_replslot/bdr_16389_6293051490331141125_2_16389__/state" has
>> wrong magic 4522536 instead of 17112225
>> <35352016-06-07 10:16:59 GMT%LOG:  startup process (PID 3538) was
>> terminated by signal 6: Abort trap
>> <35352016-06-07 10:16:59 GMT%LOG:  aborting startup due to startup
>> process failure
>>
>
> That suggests that there was a write failure on the replication slot file.
>
> A simple write error shouldn't be possible because we write the slot file
> to a tempfile, then replace the old slot file with the new one. Filesystem
> issues are possible, or memory corruption in the application that caused a
> bad write. Or a bug, but it's hard to see how we could write the wrong slot
> magic number here.
>
> With the slot corrupted all you can really do is part one of the nodes
> then join a new one.
>
> If you're able to reproduce this I'd really like to see how it came about.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [GENERAL] 2 node bdr setup gives error in replication slots

Reply via email to