Barry Lind wrote:

I also have the test case (in java) down to the bare minimum that
generated the following output (that test case is attached).  (Note that
if the FETCH in the test case is not executed then the backend crashes;
with the FETCH you get an error: "ERROR: unrecognized node type: 0")

I narrowed this down to:

  while (true) {
    l_stmtDeclare.execute();
  }

producing:

 FE=> Parse(stmt=S_1,query="BEGIN",oids={})
 FE=> Bind(stmt=S_1,portal=null)
 FE=> Execute(portal=null,limit=0)
 FE=> Parse(stmt=S_2,query="DECLARE CUR CURSOR FOR SELECT 1",oids={})
 FE=> Bind(stmt=S_2,portal=null)
 FE=> Describe(portal=null)
 FE=> Execute(portal=null,limit=0)
 FE=> Sync
 <=BE ParseComplete [S_1]
 <=BE BindComplete [null]
 <=BE CommandStatus(BEGIN)
 <=BE ParseComplete [S_2]
 <=BE BindComplete [null]
 <=BE NoData
 <=BE CommandStatus(DECLARE CURSOR)
 <=BE ReadyForQuery(T)
simple execute, [EMAIL PROTECTED], maxRows=0, fetchSize=0, flags=0
 FE=> Bind(stmt=S_2,portal=null)
 FE=> Describe(portal=null)
 FE=> Execute(portal=null,limit=0)
 FE=> Sync
 <=BE BindComplete [null]
 <=BE NoData
 <=BE ErrorMessage(ERROR: unrecognized node type: 2139062143
  Location: File: clauses.c, Routine: expression_tree_mutator, Line: 3237
  Server SQLState: XX000)

Valgrind says this is the culprit:

==26451== Invalid read of size 4
==26451==    at 0x8185C86: eval_const_expressions_mutator (clauses.c:1185)
==26451==    by 0x8185C32: eval_const_expressions (clauses.c:1152)
==26451==    by 0x817D1A6: preprocess_expression (planner.c:415)
==26451==    by 0x817CEBF: subquery_planner (planner.c:240)
==26451==    by 0x817CD59: planner (planner.c:129)
==26451==    by 0x810DF03: PerformCursorOpen (portalcmds.c:87)
==26451==    by 0x81C1402: PortalRunUtility (pquery.c:934)
==26451==    by 0x81C1762: PortalRunMulti (pquery.c:1001)
==26451==    by 0x81C0D8E: PortalRun (pquery.c:617)
==26451==    by 0x81BDDA7: exec_execute_message (postgres.c:1673)
==26451==    by 0x81BF6E1: PostgresMain (postgres.c:3035)
==26451==    by 0x818FC39: BackendRun (postmaster.c:2817)
==26451==    by 0x818F642: BackendStartup (postmaster.c:2453)
==26451==    by 0x818D989: ServerLoop (postmaster.c:1198)
==26451==    by 0x818CDBA: PostmasterMain (postmaster.c:917)
==26451==    by 0x81570F4: main (main.c:268)
==26451==  Address 0x1BBBF704 is 260 bytes inside a block of size 1024 free'd
==26451==    at 0x1B905460: free (vg_replace_malloc.c:153)
==26451==    by 0x8245706: AllocSetDelete (aset.c:466)
==26451==    by 0x82468B8: MemoryContextDelete (mcxt.c:193)
==26451==    by 0x8247BCF: PortalDrop (portalmem.c:384)
==26451==    by 0x82475B5: CreatePortal (portalmem.c:179)
==26451==    by 0x81BD735: exec_bind_message (postgres.c:1369)
==26451==    by 0x81BF4EF: PostgresMain (postgres.c:3023)
==26451==    by 0x818FC39: BackendRun (postmaster.c:2817)
==26451==    by 0x818F642: BackendStartup (postmaster.c:2453)
==26451==    by 0x818D989: ServerLoop (postmaster.c:1198)
==26451==    by 0x818CDBA: PostmasterMain (postmaster.c:917)
==26451==    by 0x81570F4: main (main.c:268)

With a bit of gdb work, I think what is happening is this:

The first Execute of S_2, running in portal context, calls the planner on the query contained in S_2's DeclareCursorStmt. The planner modifies the query tree in the course of planning it (specifically, it modifies parse->targetList). Memory allocated for the modified query comes from the portal context.

The portal context is freed implicitly by the second Bind of S_2 (second stack trace above).

The second Execute of S_2 then tries to use parse->targetList when planning (first stack trace above), but that's now pointing to freed memory. Boom.

Perhaps PerformCursorOpen should copy the query tree before planning, or plan in a different memory context?

-O

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to