On 10/11/2014 10:20 PM, Bruce Momjian wrote:
Uh, was this fixed. I see a cleanup commit for this C file, but this
report is from June:
commit 07a4a93a0e35a778c77ffbbbc18de29e859e18f0
Author: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Fri May 16 09:47:50 2014 +0300
Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.
Commit dd428c79 added dbId and tsId to the xl_xact_commit struct
but missed
that prepared transaction commits reuse that struct. Fix that.
Because those fields were left unitialized, replaying a commit
prepared WAL
record in a hot standby node would fail to remove the relcache init
file.
That can lead to "could not open file" errors on the standby.
Relcache init
file only needs to be removed when a system table/index is
rewritten in the
transaction using two phase commit, so that should be rare in
practice. In
HEAD, the incorrect dbId/tsId values are also used for filtering in
logical
replication code, causing the transaction to always be filtered out.
Analysis and fix by Andres Freund. Backpatch to 9.0 where hot
standby was
introduced.
No, that was a different issue.
(more below)
On Mon, Jun 30, 2014 at 11:58:59AM +0200, Andres Freund wrote:
Hi,
I've just rerun valgrind for the first time in a while and saw the
following splat. My guess is it exists since bb38fb0d43c, but that's
blindly guessing:
==2049== Use of uninitialised value of size 8
==2049== at 0x4FE66D: EndPrepare (twophase.c:1063)
==2049== by 0x4F231B: PrepareTransaction (xact.c:2217)
==2049== by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
==2049== by 0x79013E: finish_xact_command (postgres.c:2408)
==2049== by 0x78DE97: exec_simple_query (postgres.c:1062)
==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
==2049== by 0x716804: PostmasterMain (postmaster.c:1219)
==2049== by 0x679405: main (main.c:219)
==2049== Uninitialised value was created by a stack allocation
==2049== at 0x4FE16C: StartPrepare (twophase.c:942)
==2049==
==2049== Syscall param write(buf) points to uninitialised byte(s)
==2049== at 0x5C69640: __write_nocancel (syscall-template.S:81)
==2049== by 0x4FE6AE: EndPrepare (twophase.c:1064)
==2049== by 0x4F231B: PrepareTransaction (xact.c:2217)
==2049== by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
==2049== by 0x79013E: finish_xact_command (postgres.c:2408)
==2049== by 0x78DE97: exec_simple_query (postgres.c:1062)
==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
==2049== by 0x716804: PostmasterMain (postmaster.c:1219)
==2049== by 0x679405: main (main.c:219)
==2049== Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd
==2049== at 0x4C27B8F: malloc (vg_replace_malloc.c:298)
==2049== by 0x8E766E: AllocSetAlloc (aset.c:853)
==2049== by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627)
==2049== by 0x8A54D3: AtStart_Inval (inval.c:704)
==2049== by 0x4F1DFC: StartTransaction (xact.c:1841)
==2049== by 0x4F28D1: StartTransactionCommand (xact.c:2529)
==2049== by 0x7900A7: start_xact_command (postgres.c:2383)
==2049== by 0x78DAF4: exec_simple_query (postgres.c:860)
==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
==2049== Uninitialised value was created by a stack allocation
==2049== at 0x4FE16C: StartPrepare (twophase.c:942)
It's probably just padding - twophase.c:1063 is the CRC32 computation of
the record data.
Yeah. The padding bytes in TwoPhaseFileHeader were not initialized.
That's simple enough to fix, but when I run valgrind, I get a lot whole
bunch of similar messages. A few are from pgstat: the padding bytes in
the pgstat messages are not initialized. One comes from
write_relcache_init_file(); again I believe it's padding bytes being
uninitialized (in FormData_pg_attribute). And one from the XLogInsert
from heap_insert; there's an uninitialized padding byte in
xl_heap_insert. And so forth.. Is it worthwhile to hunt down all of
these? If there aren't many more than these, it probably is worth it,
but I fear this might be an endless effort. Have we been clean of these
warnings at any point in the past?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers