On Mon, Aug 26, 2019 at 01:09:19PM +1200, Thomas Munro wrote: > On Sun, Aug 25, 2019 at 3:15 PM Peter Geoghegan <p...@bowt.ie> wrote: > > I was reminded of this issue from last year, which also appeared to > > involve BufFileClose() and a double-free: > > > > https://postgr.es/m/87y3hmee19....@news-spur.riddles.org.uk > > > > That was a BufFile that was under the control of a tuplestore, so it > > was similar to but different from your case. I suspect it's related. > > Hmm. tuplestore.c follows the same coding pattern as nodeHashjoin.c: > it always nukes its pointer after calling BufFileFlush(), so it > shouldn't be capable of calling it twice for the same pointer, unless > we have two copies of that pointer somehow. > > Merlin's reported a double-free apparently in ExecHashJoin(), not > ExecHashJoinNewBatch() like this report. Unfortunately that tells us > very little. > > On Sun, Aug 25, 2019 at 2:25 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > #4 0x00000039ff678dd0 in _int_free (av=0x39ff98e120, p=0x1d40b090, > > have_lock=0) at malloc.c:4846 > > #5 0x00000000006269e5 in ExecHashJoinNewBatch (pstate=0x2771218) at > > nodeHashjoin.c:1058 > > Can you reproduce this or was it a one-off crash?
The query was of our large reports, and this job runs every 15min against recently-loaded data; in the immediate case, between 2019-08-24t08:00:00 and 2019-08-24 09:00:00 I can rerun it fine, and I ran it in a loop for awhile last night with no issues. time psql ts -f tmp/sql-2019-08-24.1 |wc 5416 779356 9793941 Since it was asked in other thread Peter mentioned: ts=# SHOW work_mem; work_mem | 128MB ts=# SHOW shared_buffers ; shared_buffers | 1536MB > might be some obscure path somewhere, possibly through a custom > operator or suchlike, that leaves us in a strange memory context, or > something like that? But then I feel like we'd have received > reproducible reports and a test case by now. No custom operator in sight. Just NATURAL JOIN on integers, and WHERE on timestamp, some plpgsql and int[]. Justin