On 09/27/2013 05:18 PM, Andres Freund wrote:
Hi Steve,

On 2013-09-27 17:06:59 -0400, Steve Singer wrote:
I've determined that when in this test the walsender seems to be hitting
this when it is decode the transactions that are behind the slonik
commands to add tables to replication (set add table, set add sequence).
This is before the SUBSCRIBE SET is submitted.

I've also noticed something else that is strange (but might be unrelated).
If I stop my slon process and restart it I get messages like:

WARNING:  Starting logical replication from 0/a9321360
ERROR:  cannot stream from 0/A9321360, minimum is 0/A9320B00

Where 0/A9321360 was sent in the last packet my slon received from the
walsender before the restart.
Uh, that looks like I fumbled some comparison. Let me check.

I've further narrowed this down to something (or the combination of) what
the  _disorder_replica.altertableaddTriggers(1);
stored function does.  (or @SLONYNAMESPACE@.altertableaddTriggers(int);

Which is essentially
* Get an exclusive lock on sl_config_lock
* Get an exclusive lock on the user table in question
* create a trigger (the deny access trigger)
* create a truncate trigger
* create a deny truncate trigger

I am not yet able to replicate the error by issuing the same SQL commands
from psql, but I must be missing something.

I can replicate this when just using the test_decoding plugin.
Thanks. That should get me started with debugging. Unless it's possibly
fixed in the latest version, one bug fixed there might cause something
like this if the moon stands exactly right?

The latest version has NOT fixed the problem.

Also, I was a bit inaccurate in my previous descriptions. To clarify:

1.   I sometimes am getting that 'unexpected duplicate' error
2. The 'set add table ' which triggers those functions that create and configure triggers is actually causing the walsender to hit the following assertion
2  0x0000000000773d47 in ExceptionalCondition (
conditionName=conditionName@entry=0x8cf400 "!(ent->cmin == change->tuplecid.cmin)", errorType=errorType@entry=0x7ab830 "FailedAssertion",
    fileName=fileName@entry=0x8cecc3 "reorderbuffer.c",
    lineNumber=lineNumber@entry=1162) at assert.c:54
#3  0x0000000000665480 in ReorderBufferBuildTupleCidHash (txn=0x1b6e610,
    rb=<optimized out>) at reorderbuffer.c:1162
#4  ReorderBufferCommit (rb=0x1b6e4f8, xid=<optimized out>,
    commit_lsn=3461001952, end_lsn=<optimized out>) at reorderbuffer.c:1285
#5  0x000000000065f0f7 in DecodeCommit (xid=<optimized out>,
    nsubxacts=<optimized out>, sub_xids=<optimized out>, ninval_msgs=16,
    msgs=0x1b637c0, buf=0x7fff54d01530, buf=0x7fff54d01530, ctx=0x1adb928,
    ctx=0x1adb928) at decode.c:477


I had added an assert(false) to the code where the 'unknown duplicate' error was logged to make spotting this easier but yesterday I didn't double check that I was hitting the assertion I added versus this other one. I can't yet say if this is two unrelated issues or if I'd get to the 'unknown duplicate' message immediately after.




Greetings,

Andres Freund




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to