Re: [HACKERS] help to identify the reason that extension's C function returns array get segmentation fault

2017-02-28 Thread
Thanks for your clues.

The system I have used to debug the code is x86 64bit based, Ubuntu 1404
and postgres 9.3.13, I have revised the code and it looks like as following:

Datum vquery(PG_FUNCTION_ARGS) {

int array_len = PG_GETARG_INT32(0);
int64 * node_ids;
ArrayType * retarr;
Datum * vals ;

SPI_connect();

//some code to retrieve data from various tables
// node_ids are allocated and filled up

vals = SPI_palloc(array_len * sizeof(Datum));
memset (vals, 0, array_len * sizeof(Datum));

// fill the vals up
for (i = 0 ; i < array_len ; i++)
  vals[i] = Int64GetDatum((node_ids[i]));

retarr = construct_array(vals, retcnt, INT8OID, sizeof(int64), true, 'd');

SPI_finish();

PG_RETURN_ARRAYTYPE_P(retarr);
}

It seems to solve the problem,  I have tested the code for a while and no
more segmentation faults are reported.

I have built Postgresql with --enable-debug and --enable-cassert, but use
the binary with gdb and get no symbol file loaded. I will take further
researches and use it to facilitate debug. Thanks.

On Tue, Feb 28, 2017 at 12:54 PM, Tom Lane  wrote:

> =?UTF-8?B?6ZKx5paw5p6X?=  writes:
> > I have written an extension to manage openstreetmap data. There is a C
> > function to perform spatial top k query on several  tables and return an
> > array of int8 type as result. The code skeleton of this function is as
> > follows:
>
> There are a remarkable lot of bugs in this code fragment.  Many of them
> would not bite you as long as you are running on 64-bit Intel hardware,
> but that doesn't make them not bugs.
>
> > Datum vquery(PG_FUNCTION_ARGS) {
>
> > int array_len = PG_GETARG_INT32(0);
> > long * node_ids;
>
> > SPI_connect();
>
> > //some code to retrieve data from various tables
> > // node_ids are allocated and filled up
>
> > ArrayType * retarr;
> > Datum * vals ;
>
> > vals = palloc0(array_len * sizeof(long));
>
> Datum is not necessarily the same as "long".
>
> > // fill the vals up
> > for (i = 0 ; i < array_len ; i++)
> >   vals[i] = Int64GetDatum((node_ids[i]));
>
> int64 is not necessarily the same as "long", either.
>
> > retarr = construct_array(vals, retcnt, INT8OID, sizeof(long), true, 'i');
>
> Again, INT8 is not the same size as "long", and it's not necessarily
> pass-by-val, and it's *certainly* not integer alignment.
>
> > SPI_finish();
>
> > PG_RETURN_ARRAYTYPE_P(retarr);
>
> But I think what's really biting you, probably, is that construct_array()
> made the array in CurrentMemoryContext which at that point was the SPI
> execution context; which would be deleted by SPI_finish.  So you're
> returning a dangling pointer.  You need to do something to either copy
> the array value out to the caller's context, or build it there in the
> first place.
>
> BTW, this failure would be a lot less intermittent if you were testing
> in a CLOBBER_FREED_MEMORY build.  I would go so far as to say you should
> *never* develop or test C code for the Postgres backend without using
> the --enable-cassert configure option for your build.  You're simply
> tossing away a whole lot of debug support if you don't.
>
> regards, tom lane
>


[HACKERS] help to identify the reason that extension's C function returns array get segmentation fault

2017-02-27 Thread
I have written an extension to manage openstreetmap data. There is a C
function to perform spatial top k query on several  tables and return an
array of int8 type as result. The code skeleton of this function is as
follows:

Datum vquery(PG_FUNCTION_ARGS) {

int array_len = PG_GETARG_INT32(0);
long * node_ids;

SPI_connect();

//some code to retrieve data from various tables
// node_ids are allocated and filled up

ArrayType * retarr;
Datum * vals ;

vals = palloc0(array_len * sizeof(long));

// fill the vals up
for (i = 0 ; i < array_len ; i++)
  vals[i] = Int64GetDatum((node_ids[i]));

retarr = construct_array(vals, retcnt, INT8OID, sizeof(long), true, 'i');

SPI_finish();

PG_RETURN_ARRAYTYPE_P(retarr);
}

the function runs smoothly when called using relatively small parameter,
such as select(unnest(vquery(1000))) ;  but when called with large
parameter, such as select(unnest(vquery(5))), sometimes it runs
normally, sometimes it runs into "Segmentation Fault" error. the larger the
parameter is, the more likely to run into segmentation fault.

back trace of the process as followings:

Program received signal SIGSEGV, Segmentation fault.
pg_detoast_datum (datum=0x55d4e7e43bc0) at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:2241
2241 
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:
No such file or directory.
(gdb) backtrace full
#0  pg_detoast_datum (datum=0x55d4e7e43bc0) at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:2241
No locals.
#1  0x55d4e485a29f in array_out (fcinfo=0x7ffd0fdb9f30) at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/adt/arrayfuncs.c:958
v = 
element_type = 
typlen = 
typbyval = 
typalign = 
typdelim = 
p = 
tmp = 
retval = 
values = 
dims_str =
"\346\263\346U\000\000\000\000f\021\352\312\342\177\000\000
%q\003\000\000\000\000\270\347\026\313\342\177\000\000\000\000\002\000\000\000\000\000\211O\343\312\342\177\000\000(?\a\000\000\000\000\000\371\336\342\312\342\177\000\000\001\000\000\000\000\000\000\000f\021\352\312\342\177\000\000\200\204\004\001\000\000\000\000\270\347\026\313\342\177\000\000\000\000\002\000\000\000\000\000\211O\343\312\342\177\000\000`1\354\352\324U\000\000\371\336\342\312\342\177\000\000\001\000\000\000\000\000\000\000\000\200\355\347\324U\000\000\300;\344\347\324U\000\000\200\373\350\346\324U\000\000\200\204\004\001\000\000\000\000`\347\026\313\342\177\000\000\200;\344\347\324U\000\000\200D\t\000\000\000\000\000\001\000\000\000\000\000\000"
bitmap = 
bitmask = 
needquotes = 
needdims = 
nitems = 
overall_length = 
i = 
j = 
k = 
indx = {125, 0, 1638826752, 1007657037, 266051136, 32765}
ndim = 
dims = 
lb = 
my_extra = 
#2  0x55d4e491bf77 in FunctionCall1Coll
(flinfo=flinfo@entry=0x55d4e6281608,
collation=collation@entry=0, arg1=arg1@entry=94372911922112)
at
/build/postgresql-9.3-xceQkK/postgresql-9.3-9.3.16/build/../src/backend/utils/fmgr/fmgr.c:1301
fcinfo = {flinfo = 0x55d4e6281608, context = 0x0, resultinfo = 0x0,
fncollation = 0, isnull = 0 '\000', nargs = 1, arg = {94372911922112, 16,
94372911922112,
94372881863664, 81000, 140612075225152, 140724869504928,
94372856290490, 20, 94372911922112, 140724869504960, 94372855395365, 59362,
59362,
140724869505552, 140611821448277, 140724869505184,
140724869505104, 140724869505056, 25309696688, 4630392398271114606,
463877674369024,
4628742354541509959, 4638236535588651008, 140612075225152,
140724869505184, 140724869505264, 140724869505344, 8192, 1340029796386,
94372903805648,
94372905445328, 4412211000755930201, 4295079941117417898,
4212081119735560672, 94372856202527, 2087976960, 94372882833728,
94372882836912, 1,
140724869505296, 4327854021138088704, 3599182594146, 1,
140724869505248, 94372856077425, 1016, 94372882392848, 94372860652912,
140612076116712,
140724869505680, 94372856082219, 94372882407728,
140724869505360, 140724869505343, 3886087214688, 140612076116712,
140724869505344, 140728326873992,
94372882407736, 94372882817344, 94372882817312,
1125891316908032, 0, 94372855675584, 281483566645432, 2, 0, 94372881959504,
0, 1016, 0, 0, 8192,
18446603348840046049, 513, 128, 176, 140724869505568, 16,
459561500672, 2, 0, 511101108336, 0, 140724869505567, 0, 0, 124, 0, 0, 0,
0, 0, 0,
140612046612320, 8192, 1024, 1024, 1072},
  argnull = "\000