On 10/28/2014 01:31 PM, Andres Freund wrote:
On 2014-10-25 18:18:07 -0400, Steve Singer wrote:
My logical decoding plugin is occasionally getting this  error

"could not resolve cmin/cmax of catalog tuple"

I get this when my output plugin is trying to read one of the user defined
catalog tables (user_catalog_table=true)
Hm. That should obviously not happen.

Could you describe how that table is modified? Does that bug happen
initially, or only after a while?

It doesn't happen right away, in this case it was maybe 4 minutes after creating the slot. The error also doesn't always happen when I run the this test workload but it is reproducible with some trying. I' don't do anything special to that table, it gets created then I do inserts on it. I don't do an alter table or anything fancy like that. I was running the slony failover test (all nodes under the same postmaster) which involves the occasional dropping and recreating of databases along with normal query load + replication.

I'll send you tar of the data directory off list with things in this state.

Do you have a testcase that would allow me to easily reproduce the
problem?

I don't have a isolated test case that does this. The test that I'm hitting this with does lots of stuff and doesn't even always hit this.

I am not sure if this is a bug in the time-travel support in the logical
decoding support of if I'm just using it wrong (ie not getting a sufficient
lock on the relation or something).
I don't know yet...

This is the interesting part of the stack trace

#4  0x000000000091bbc8 in HeapTupleSatisfiesHistoricMVCC
(htup=0x7fffcf42a900,
     snapshot=0x7f786ffe92d8, buffer=10568) at tqual.c:1631
#5  0x00000000004aedf3 in heapgetpage (scan=0x28d7080, page=0) at
heapam.c:399
#6  0x00000000004b0182 in heapgettup_pagemode (scan=0x28d7080,
     dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:747
#7  0x00000000004b1ba6 in heap_getnext (scan=0x28d7080,
     direction=ForwardScanDirection) at heapam.c:1475
#8  0x00007f787002dbfb in lookupSlonyInfo (tableOid=91754, ctx=0x2826118,
     origin_id=0x7fffcf42ab8c, table_id=0x7fffcf42ab88,
set_id=0x7fffcf42ab84)
     at slony_logical.c:663
#9  0x00007f787002b7a3 in pg_decode_change (ctx=0x2826118, txn=0x28cbec0,
     relation=0x7f787a3446a8, change=0x7f786ffe3268) at slony_logical.c:237
#10 0x00000000007497d4 in change_cb_wrapper (cache=0x28cbda8, txn=0x28cbec0,
     relation=0x7f787a3446a8, change=0x7f786ffe3268) at logical.c:704



Here is what the code in lookupSlonyInfo is doing
------------------

   sltable_oid = get_relname_relid("sl_table",slony_namespace);

   sltable_rel = relation_open(sltable_oid,AccessShareLock);
   tupdesc=RelationGetDescr(sltable_rel);
   scandesc=heap_beginscan(sltable_rel,
GetCatalogSnapshot(sltable_oid),0,NULL);
   reloid_attnum = get_attnum(sltable_oid,"tab_reloid");

   if(reloid_attnum == InvalidAttrNumber)
          elog(ERROR,"sl_table does not have a tab_reloid column");
   set_attnum = get_attnum(sltable_oid,"tab_set");

   if(set_attnum == InvalidAttrNumber)
          elog(ERROR,"sl_table does not have a tab_set column");
   tableid_attnum = get_attnum(sltable_oid, "tab_id");

   if(tableid_attnum == InvalidAttrNumber)
          elog(ERROR,"sl_table does not have a tab_id column");

   while( (tuple = heap_getnext(scandesc,ForwardScanDirection) ))
(Except missing spaces ;)) I don't see anything obviously wrong with
this.

Greetings,

Andres Freund




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to