Hi, On 2019-12-27 20:13:26 +0300, Teodor Sigaev wrote: > Found crash on production instance, assert-enabled build crashes in pfree() > call, with default config. v11, v12 and head are affected, but, seems, you > need to be a bit lucky. > > The bug is comparing old and new aggregate pass-by-ref values only by > pointer value itself, despite on null flag. Any function which returns null > doesn't worry about actual returned Datum value, so that comparison isn't > enough. Test case shows bug with ExecInterpExpr() but there several similar > places (thanks Nikita Glukhov for help). > Attached patch adds check of null flag.
Hm. I don't understand the problem here. Why do we need to reparent in that case? What freed the relevant value? Nor do I really understand why v10 wouldn't be affected if this actually is a problem. The relevant code is also only guarded by DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue)) > > Backtrace from v12 (note, newValue and oldValue are differ on current call, > but oldValue points into pfreed memory) : > #0 0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at > ../../../../src/include/utils/memutils.h:130 > 130 AssertArg(MemoryContextIsValid(context)); > (gdb) bt > #0 0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at > ../../../../src/include/utils/memutils.h:130 > #1 0x0000000000c85ae5 in pfree (pointer=0x80a808250) at mcxt.c:1058 > #2 0x000000000080475e in ExecAggTransReparent (aggstate=0x80a806370, > pertrans=0x80a87e830, newValue=34535940744, newValueIsNull=false, > oldValue=34535932496, oldValueIsNull=false) > at execExprInterp.c:4209 > #3 0x00000000007ff51f in ExecInterpExpr (state=0x80a87f4d8, > econtext=0x80a8065a8, isnull=0x7fffffffd7b7) at execExprInterp.c:1747 > #4 0x000000000082c12b in ExecEvalExprSwitchContext (state=0x80a87f4d8, > econtext=0x80a8065a8, isNull=0x7fffffffd7b7) at > ../../../src/include/executor/executor.h:308 > #5 0x000000000082bc0f in advance_aggregates (aggstate=0x80a806370) at > nodeAgg.c:679 > #6 0x000000000082b8a6 in agg_retrieve_direct (aggstate=0x80a806370) at > nodeAgg.c:1847 > #7 0x0000000000828782 in ExecAgg (pstate=0x80a806370) at nodeAgg.c:1572 > #8 0x000000000080e712 in ExecProcNode (node=0x80a806370) at > ../../../src/include/executor/executor.h:240 > How to reproduce: > http://sigaev.ru/misc/xdump.sql.bz2 > bzcat xdump.sql.bz2 | psql postgres && psql postgres < x.sql It should be possible to create a smaller reproducer... It'd be good if a bug fix for this were committed with a regression test. > diff --git a/src/backend/executor/execExprInterp.c > b/src/backend/executor/execExprInterp.c > index 034970648f3..3b5333716d4 100644 > --- a/src/backend/executor/execExprInterp.c > +++ b/src/backend/executor/execExprInterp.c > @@ -1743,7 +1743,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, > bool *isnull) > * expanded object that is already a child of the > aggcontext, > * assume we can adopt that value without copying it. > */ > - if (DatumGetPointer(newVal) != > DatumGetPointer(pergroup->transValue)) > + if (DatumGetPointer(newVal) != > DatumGetPointer(pergroup->transValue) || > + fcinfo->isnull != pergroup->transValueIsNull) > newVal = ExecAggTransReparent(aggstate, > pertrans, > > newVal, fcinfo->isnull, > > pergroup->transValue, I'd really like to avoid adding additional branches to these paths. Greetings, Andres Freund