Re: [HACKERS] 9.4 logical decoding assertion

2014-08-15 Thread Andres Freund
On 2014-08-14 16:03:08 -0400, Steve Singer wrote:
 I hit the following on 9.4 testing logical decoding.
 
 
 TRAP: FailedAssertion(!(prev_first_lsn  cur_txn-first_lsn), File:
 reorderbuffer.c, Line: 618)
 LOG:  server process (PID 3801) was terminated by signal 6: Aborted

I saw that recently while hacking around, but I thought it was because
of stuff I'd added. But apparently not.

Hm. I think I see how that might happen. It might be possible (and
harmless) if two subxacts of the same toplevel xact have the same
first_lsn. But if it's not just = vs  it'd be worse.

 Unfortunately I don't have a core file and I haven't been able to reproduce
 this.

Any information about the workload? Any chance you still have the data
directory around?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 logical decoding assertion

2014-08-15 Thread Steve Singer

On 08/15/2014 09:42 AM, Andres Freund wrote:

On 2014-08-14 16:03:08 -0400, Steve Singer wrote:

I hit the following on 9.4 testing logical decoding.


TRAP: FailedAssertion(!(prev_first_lsn  cur_txn-first_lsn), File:
reorderbuffer.c, Line: 618)
LOG:  server process (PID 3801) was terminated by signal 6: Aborted

I saw that recently while hacking around, but I thought it was because
of stuff I'd added. But apparently not.

Hm. I think I see how that might happen. It might be possible (and
harmless) if two subxacts of the same toplevel xact have the same
first_lsn. But if it's not just = vs  it'd be worse.


Unfortunately I don't have a core file and I haven't been able to reproduce
this.

Any information about the workload? Any chance you still have the data
directory around?


I was running the slony regression tests  but  I ran the same tests 
script after a number of times after and the problem didn't reproduce 
itself.


The last thing the tests did before the crash was part of the slony 
failover process.


I am doing my testing running with all 5 nodes/databases under the same 
postmaster (giving something like 20 replication slots open)


A few milliseconds before the one of the connections had just done a
START_REPLICATION SLOT slon_4_2 LOGICAL 0/32721A58

and then that connection reported the socket being closed,

but because so much was going on concurrently I can't say for sure if 
that connection experienced the assert or was closed because another 
backend asserted.



I haven't done an initdb since, so I have the data directory but I've 
dropped and recreated all of my slots many times since so the wal files 
are long gone.




Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.4 logical decoding assertion

2014-08-14 Thread Steve Singer

I hit the following on 9.4 testing logical decoding.


TRAP: FailedAssertion(!(prev_first_lsn  cur_txn-first_lsn), File: 
reorderbuffer.c, Line: 618)

LOG:  server process (PID 3801) was terminated by signal 6: Aborted

Unfortunately I don't have a core file and I haven't been able to 
reproduce this.










--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers