On 10/28/2014 01:31 PM, Andres Freund wrote:
On 2014-10-25 18:18:07 -0400, Steve Singer wrote:
My logical decoding plugin is occasionally getting this error
"could not resolve cmin/cmax of catalog tuple"
I get this when my output plugin is trying to read one of the user defined
catalog tables (user_catalog_table=true)
Hm. That should obviously not happen.
Could you describe how that table is modified? Does that bug happen
initially, or only after a while?
It doesn't happen right away, in this case it was maybe 4 minutes after
creating the slot.
The error also doesn't always happen when I run the this test workload
but it is reproducible with some trying.
I' don't do anything special to that table, it gets created then I do
inserts on it. I don't do an alter table or anything fancy like that.
I was running the slony failover test (all nodes under the same
postmaster) which involves the occasional dropping and recreating of
databases along with normal query load + replication.
I'll send you tar of the data directory off list with things in this state.
Do you have a testcase that would allow me to easily reproduce the
problem?
I don't have a isolated test case that does this. The test that I'm
hitting this with does lots of stuff and doesn't even always hit this.
I am not sure if this is a bug in the time-travel support in the logical
decoding support of if I'm just using it wrong (ie not getting a sufficient
lock on the relation or something).
I don't know yet...
This is the interesting part of the stack trace
#4 0x000000000091bbc8 in HeapTupleSatisfiesHistoricMVCC
(htup=0x7fffcf42a900,
snapshot=0x7f786ffe92d8, buffer=10568) at tqual.c:1631
#5 0x00000000004aedf3 in heapgetpage (scan=0x28d7080, page=0) at
heapam.c:399
#6 0x00000000004b0182 in heapgettup_pagemode (scan=0x28d7080,
dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:747
#7 0x00000000004b1ba6 in heap_getnext (scan=0x28d7080,
direction=ForwardScanDirection) at heapam.c:1475
#8 0x00007f787002dbfb in lookupSlonyInfo (tableOid=91754, ctx=0x2826118,
origin_id=0x7fffcf42ab8c, table_id=0x7fffcf42ab88,
set_id=0x7fffcf42ab84)
at slony_logical.c:663
#9 0x00007f787002b7a3 in pg_decode_change (ctx=0x2826118, txn=0x28cbec0,
relation=0x7f787a3446a8, change=0x7f786ffe3268) at slony_logical.c:237
#10 0x00000000007497d4 in change_cb_wrapper (cache=0x28cbda8, txn=0x28cbec0,
relation=0x7f787a3446a8, change=0x7f786ffe3268) at logical.c:704
Here is what the code in lookupSlonyInfo is doing
------------------
sltable_oid = get_relname_relid("sl_table",slony_namespace);
sltable_rel = relation_open(sltable_oid,AccessShareLock);
tupdesc=RelationGetDescr(sltable_rel);
scandesc=heap_beginscan(sltable_rel,
GetCatalogSnapshot(sltable_oid),0,NULL);
reloid_attnum = get_attnum(sltable_oid,"tab_reloid");
if(reloid_attnum == InvalidAttrNumber)
elog(ERROR,"sl_table does not have a tab_reloid column");
set_attnum = get_attnum(sltable_oid,"tab_set");
if(set_attnum == InvalidAttrNumber)
elog(ERROR,"sl_table does not have a tab_set column");
tableid_attnum = get_attnum(sltable_oid, "tab_id");
if(tableid_attnum == InvalidAttrNumber)
elog(ERROR,"sl_table does not have a tab_id column");
while( (tuple = heap_getnext(scandesc,ForwardScanDirection) ))
(Except missing spaces ;)) I don't see anything obviously wrong with
this.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers