On Wed, Dec 04, 2019 at 05:36:16PM -0800, Jeremy Schneider wrote:
On 9/8/19 14:01, Tom Lane wrote:
Fix RelationIdGetRelation calls that weren't bothering with error checks.

...

Details
-------
https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56

We had two different databases this week (with the same schema) both
independently hit the condition of this recent commit from Tom. It's on
11.5 so we're actually segfaulting and restarting rather than just
causing the walsender process to ERROR, but regardless there's still
some underlying bug here.

We have core files and we're still working to see if we can figure out
what's going on, but I thought I'd report now in case anyone has extra
ideas or suggestions.  The segfault is on line 3034 of reorderbuffer.c.

https://github.com/postgres/postgres/blob/REL_11_5/src/backend/replication/logical/reorderbuffer.c#L3034

3033     toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
3034     toast_desc = RelationGetDescr(toast_rel);

We'll keep looking; let me know any feedback! Would love to track down
whatever bug is in the logical decoding code, if that's what it is.

==========

backtrace showing the call stack...

Core was generated by `postgres: walsender <NAME-REDACTED>
<DNS-REDACTED>(31712)'.
Program terminated with signal 11, Segmentation fault.
#0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
   at reorderbuffer.c:3034
3034    reorderbuffer.c: No such file or directory.
...
(gdb) #0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
   at reorderbuffer.c:3034
#1  ReorderBufferCommit (rb=0x3086af0, xid=xid@entry=1358809,
commit_lsn=9430473346032, end_lsn=<optimized out>,
   commit_time=commit_time@entry=628712466364268,
origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at
reorderbuffer.c:1584
#2  0x0000000000716248 in DecodeCommit (xid=1358809,
parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637
#3  DecodeXactOp (ctx=0x3068f70, buf=buf@entry=0x7ffc4ce125b0) at
decode.c:245
#4  0x000000000071655a in LogicalDecodingProcessRecord (ctx=0x3068f70,
record=0x3069208) at decode.c:117
#5  0x0000000000727150 in XLogSendLogical () at walsender.c:2886
#6  0x0000000000729192 in WalSndLoop (send_data=send_data@entry=0x7270f0
<XLogSendLogical>) at walsender.c:2249
#7  0x0000000000729f91 in StartLogicalReplication (cmd=0x30485a0) at
walsender.c:1111
#8  exec_replication_command (
   cmd_string=cmd_string@entry=0x2f968b0 "START_REPLICATION SLOT
\"<NAME-REDACTED>\" LOGICAL 893/38002B98 (proto_version '1',
publication_names '\"<NAME-REDACTED>\"')") at walsender.c:1628
#9  0x000000000076e939 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x2fea168, dbname=0x2fea020 "<NAME-REDACTED>",
   username=<optimized out>) at postgres.c:4182
#10 0x00000000004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410
#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082
#12 ServerLoop () at postmaster.c:1759
#13 0x00000000007062f9 in PostmasterMain (argc=argc@entry=7,
argv=argv@entry=0x2f92540) at postmaster.c:1432
#14 0x00000000004be73b in main (argc=7, argv=0x2f92540) at main.c:228

==========

Some additional context...

# select * from pg_publication_rel;
prpubid | prrelid
---------+---------
  71417 |   16453
  71417 |   54949
(2 rows)

(gdb) print toast_rel
$4 = (struct RelationData *) 0x0

(gdb) print *relation->rd_rel
$11 = {relname = {data = "<NAME-REDACTED>", '\000' <repeats 44 times>},
relnamespace = 16402, reltype = 16430, reloftype = 0,
relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0,
relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0,

Hmmm, so reltoastrelid = 0, i.e. the relation does not have a TOAST
relation. Yet we're calling ReorderBufferToastReplace on the decoded
record ... interesting.

Can you share structure of the relation causing the issue?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to