Re: [SQL] Does anyone know of any issues around ARRAY UNNEST

2011-11-27 Thread Belinda Cussen
Hi Pavel,
Thanks for you help.
The code works ok on my database too when I call the procedure only once
sequentially. I hit the segmentation fault consistently when I try to call
the proc concurrently.

This is the actual code I am calling from within the procedure:

UPDATE activity
SET media_uri = a.media_uri
,record_last_updated_tstamp = CURRENT_TIMESTAMP
FROM (SELECT  col1.SOURCE_ID[gs.ser] source_id
,col2.MEDIA_URI[gs.ser] media_uri
FROM (SELECT v_activity_source_id_list) As COL1(source_id)
,(select v_large_media_uri_list) AS COL2(media_uri)
,generate_series(1,v_uri_count) AS gs(ser)) a


WHERE activity_source_id_value = a.source_id
AND activity_source_key = v_source_key;



 -- v_large_media_uri_list and v_activity_source_id_list - both (TEXT
ARRAY)  are passed into the proc.

Again this code works fine when I am calling it only once. I hit the
problem when it is called twice at the same time.

The previous code snippet causes a seg fault also. Interestingly when write:

UPDATE activity SET activity_recency_timestamp = CURRENT_TIMESTAMP WHERE
venue_id IN (SELECT UNNEST(v_venue_id_list));

It takes ages to run - as you point out it's not efficient code :) - but it
doesn't cause a crash.

Any ideas?
regards
Belinda


On 24 November 2011 21:57, Pavel Stehule  wrote:

> Hello
>
> This is not known bug - there should be bug in PostgreSQL or your database
> (data files) can be broken.
>
> 2011/11/24 Belinda Cussen 
>
>> Hi there,
>> We're having segmentation faults on our postgres 9.1.1 db. It seems to
>> happen when we use ARRAY unnesting eg:
>>
>>  UPDATE activity SET activity_recency_timestamp = CURRENT_TIMESTAMP WHERE
>> venue_id IN (SELECT venue_id FROM UNNEST(v_venue_id_list) venue_id);
>>
>>
> This is not effective code
>
> try to use
>
> UPDATE activity SET activity_recency_timestamp = CURRENT_TIMESTAMP WHERE
> venue_id = ANY(v_venue_id_list)
>
> Regards
>
> Pavel Stehule
>
> p.s. It working on my comp
>
> postgres=# CREATE TABLE foo (id int primary key, t timestamp, l int[]);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey"
> for table "foo"
> CREATE TABLE
> postgres=# INSERT INTO foo(id, l) SELECT i, ARRAY(SELECT * FROM
> generate_series(i-10, i)) FROM generate_series(1,1000) g(i);
> INSERT 0 1000
> postgres=# UPDATE foo SET t = CURRENT_TIMESTAMP WHERE id = ANY(l);
> UPDATE 1000
> postgres=# UPDATE foo SET t = CURRENT_TIMESTAMP WHERE id IN (SELECT x FROM
> unnest(l) x);
> UPDATE 1000
>
> Regards
>
> Pavel Stehule
>
>
>> We are working on a getting a core dump but I was just wondering if there
>> are any known issues around this construct - especially the aliasing?
>> Alternatively could there be an issue trying to write or access tmp files?
>>
>>
>> FYI:
>> v_venue_id_list is an array passed in to the procedure containing 100,000
>> INTEGER elements
>> ? IS THIS TOO MANY ELEMENTS TO PASS?
>>
>> table activity has around 3,000,000 rows
>> CREATE TABLE activity
>> (
>>   activity_id serial NOT NULL,
>>   activity_type_key integer NOT NULL,
>>   media_type_key integer NOT NULL,
>>   activity_source_key integer NOT NULL,
>>   venue_id integer NOT NULL,
>>   poster_id integer NOT NULL,
>>   event_id integer,
>>   activity_source_id_value text NOT NULL,
>>   uri text,
>>   media_uri text,
>>   activity_comment text,
>>   posted_dttm timestamp with time zone,
>>   photo_format_code character varying(10),
>>   video_format_code character varying(10),
>>   public_yn character varying(1),
>>   content_reported_yn character varying(1),
>>   last_scored_tstamp timestamp with time zone,
>>   record_expiry_tstamp timestamp with time zone,
>>   record_created_tstamp timestamp with time zone DEFAULT now(),
>>   record_last_updated_tstamp timestamp with time zone DEFAULT now(),
>>   initial_broadcast_to_text text,
>>   image_id integer,
>>   large_media_uri text,
>>   CONSTRAINT activity_pkey PRIMARY KEY (activity_id ),
>>   CONSTRAINT activity_activity_source_key_activity_source_id_value_key
>> UNIQUE (activity_source_key , activity_source_id_value )
>> );
>>
>>
>> CREATE INDEX activity_poster_ie
>>   ON activity  (poster_id );
>>
>>
>> CREATE INDEX activity_venue_ie
>>   ON activity  (venue_id );
>>
>>
>> --
>> [image: Servian Logo] *Belinda Cussen* |  Servian Pty 
>> Ltd |
>> *m:* 0466 309 169 | *t:* 02 9376 0700 | f*:* 02 9376 0730
>>
>
>


-- 
[image: Servian Logo]*Belinda Cussen* |  Servian Pty
Ltd |
*m:* 0466 309 169 | *t:* 02 9376 0700 | f*:* 02 9376 0730


Re: [SQL] Does anyone know of any issues around ARRAY UNNEST

2011-11-27 Thread Belinda Cussen
Hi Pavel,
I've just tried to reproduce the bug using your setup script and it doesn't
cause the seg fault.

We have copied one table: activity, from our code base on to a brand new db
(on a laptop so independent of any hardware issues), and we are using that
to test with.

I'm thinking there may some corrupted data or is it possible to be the
table size?

Again, thanks for your help.
regards
Belinda

On 25 November 2011 08:09, Belinda Cussen wrote:

> Hi Pavel,
> Thanks for you help.
> The code works ok on my database too when I call the procedure only once
> sequentially. I hit the segmentation fault consistently when I try to call
> the proc concurrently.
>
> This is the actual code I am calling from within the procedure:
>
> UPDATE activity
> SET media_uri = a.media_uri
>  ,record_last_updated_tstamp = CURRENT_TIMESTAMP
> FROM (SELECT  col1.SOURCE_ID[gs.ser] source_id
>  ,col2.MEDIA_URI[gs.ser] media_uri
> FROM (SELECT v_activity_source_id_list) As COL1(source_id)
>  ,(select v_large_media_uri_list) AS COL2(media_uri)
> ,generate_series(1,v_uri_count) AS gs(ser)) a
>
>
> WHERE activity_source_id_value = a.source_id
> AND activity_source_key = v_source_key;
>
>
>
>  -- v_large_media_uri_list and v_activity_source_id_list - both (TEXT
> ARRAY)  are passed into the proc.
>
> Again this code works fine when I am calling it only once. I hit the
> problem when it is called twice at the same time.
>
> The previous code snippet causes a seg fault also. Interestingly when
> write:
>
> UPDATE activity SET activity_recency_timestamp = CURRENT_TIMESTAMP WHERE
> venue_id IN (SELECT UNNEST(v_venue_id_list));
>
> It takes ages to run - as you point out it's not efficient code :) - but
> it doesn't cause a crash.
>
> Any ideas?
> regards
> Belinda
>
>
> On 24 November 2011 21:57, Pavel Stehule  wrote:
>
>> Hello
>>
>> This is not known bug - there should be bug in PostgreSQL or your
>> database (data files) can be broken.
>>
>> 2011/11/24 Belinda Cussen 
>>
>>> Hi there,
>>> We're having segmentation faults on our postgres 9.1.1 db. It seems to
>>> happen when we use ARRAY unnesting eg:
>>>
>>>  UPDATE activity SET activity_recency_timestamp = CURRENT_TIMESTAMP
>>> WHERE venue_id IN (SELECT venue_id FROM UNNEST(v_venue_id_list) venue_id);
>>>
>>>
>> This is not effective code
>>
>> try to use
>>
>> UPDATE activity SET activity_recency_timestamp = CURRENT_TIMESTAMP WHERE
>> venue_id = ANY(v_venue_id_list)
>>
>> Regards
>>
>> Pavel Stehule
>>
>> p.s. It working on my comp
>>
>> postgres=# CREATE TABLE foo (id int primary key, t timestamp, l int[]);
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey"
>> for table "foo"
>> CREATE TABLE
>> postgres=# INSERT INTO foo(id, l) SELECT i, ARRAY(SELECT * FROM
>> generate_series(i-10, i)) FROM generate_series(1,1000) g(i);
>> INSERT 0 1000
>> postgres=# UPDATE foo SET t = CURRENT_TIMESTAMP WHERE id = ANY(l);
>> UPDATE 1000
>> postgres=# UPDATE foo SET t = CURRENT_TIMESTAMP WHERE id IN (SELECT x
>> FROM unnest(l) x);
>> UPDATE 1000
>>
>> Regards
>>
>> Pavel Stehule
>>
>>
>>> We are working on a getting a core dump but I was just wondering if
>>> there are any known issues around this construct - especially the aliasing?
>>> Alternatively could there be an issue trying to write or access tmp
>>> files?
>>>
>>>
>>> FYI:
>>> v_venue_id_list is an array passed in to the procedure containing
>>> 100,000 INTEGER elements
>>> ? IS THIS TOO MANY ELEMENTS TO PASS?
>>>
>>> table activity has around 3,000,000 rows
>>> CREATE TABLE activity
>>> (
>>>   activity_id serial NOT NULL,
>>>   activity_type_key integer NOT NULL,
>>>   media_type_key integer NOT NULL,
>>>   activity_source_key integer NOT NULL,
>>>   venue_id integer NOT NULL,
>>>   poster_id integer NOT NULL,
>>>   event_id integer,
>>>   activity_source_id_value text NOT NULL,
>>>   uri text,
>>>   media_uri text,
>>>   activity_comment text,
>>>   posted_dttm timestamp with time zone,
>>>   photo_format_code character varying(10),
>>>   video_format_code character varying(10),
>>>   public_yn character varying(1),
>>>   content_reported_yn character varying(1),
>>>   last_scored_tstamp timestamp with time zone,
>>>   record_expiry_tstamp timestamp with time zone,
>>>   record_created_tstamp timestamp with time zone DEFAULT now(),
>>>   record_last_updated_tstamp timestamp with time zone DEFAULT now(),
>>>   initial_broadcast_to_text text,
>>>   image_id integer,
>>>   large_media_uri text,
>>>   CONSTRAINT activity_pkey PRIMARY KEY (activity_id ),
>>>   CONSTRAINT activity_activity_source_key_activity_source_id_value_key
>>> UNIQUE (activity_source_key , activity_source_id_value )
>>> );
>>>
>>>
>>> CREATE INDEX activity_poster_ie
>>>   ON activity  (poster_id );
>>>
>>>
>>> CREATE INDEX activity_venue_ie
>>>   ON activity  (venue_id );
>>>
>>>
>>> --
>>> [image: Servian Logo] *Belinda Cussen* |  Servian Pty 
>>> Ltd |
>>> *m:* 0466 309 169 | *t:* 02 9376 0700 | f*:* 02 9376 0730

Re: [SQL] Does anyone know of any issues around ARRAY UNNEST

2011-11-27 Thread Tom Lane
Belinda Cussen  writes:
> The code works ok on my database too when I call the procedure only once
> sequentially. I hit the segmentation fault consistently when I try to call
> the proc concurrently.

What do you mean by "concurrently"?  There is no multithreading within
Postgres backends, and it's really hard to believe that unnest would be
affected by what's happening in other server processes.

regards, tom lane

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Does anyone know of any issues around ARRAY UNNEST

2011-11-27 Thread Tom Lane
Belinda Cussen  writes:
> I've managed to produce this fault consistently now.
> Below is the simplified code:

> CREATE TABLE foo_1 (id int primary key,media_uri TEXT);
> INSERT INTO foo_1(id) SELECT i FROM generate_series(1,100) g(i);

> CREATE OR REPLACE FUNCTION bb_crash_db_5 () RETURNS TEXT AS $$
> DECLARE
> v_activity_id_list INTEGER ARRAY;
> BEGIN

> SELECT ARRAY(SELECT id FROM foo_1 ORDER BY  id  LIMIT 10) INTO
> v_activity_id_list;
>  UPDATE foo_1
> SET media_uri = 'a'
>  WHERE id IN (SELECT activity_id FROM UNNEST (v_activity_id_list)
> activity_id)
>  ;
> return 'success';

> END;
> $$ LANGUAGE plpgsql;

> I then open 2 command lines and run:
> select bb_crash_db_5();

Thanks, I was able to reproduce it with this test case.  It turns out
not to have anything directly to do with UNNEST, but with the code that
deals with concurrent row updates.

I've committed a fix, which will appear in next week's updates.
Thanks for the report and test case!

regards, tom lane

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Does anyone know of any issues around ARRAY UNNEST

2011-11-27 Thread Belinda Cussen
Hi there,
I've managed to produce this fault consistently now.

I have a procedure which is passing in an array as a variable - rather than
using an array within a table column as in your example Pavel.
 I then update a table based on the ids which are passed in in this array.

I have created a simplified script which can be called concurrently (ie we
have python jobs which call the procedure at the same time - or as in this
test, we can open different command line sessions to call the procedure).
It consistently causes a segfault. It doesn't seem to matter how big the
array is. We have created a core dump and that too points to the array
datatype.

Below is the simplified code:

CREATE TABLE foo_1 (id int primary key,media_uri TEXT);
INSERT INTO foo_1(id) SELECT i FROM generate_series(1,100) g(i);



CREATE OR REPLACE FUNCTION bb_crash_db_5 () RETURNS TEXT AS $$
DECLARE
v_activity_id_list INTEGER ARRAY;
BEGIN

SELECT ARRAY(SELECT id FROM foo_1 ORDER BY  id  LIMIT 10) INTO
v_activity_id_list;
 UPDATE foo_1
SET media_uri = 'a'
 WHERE id IN (SELECT activity_id FROM UNNEST (v_activity_id_list)
activity_id)
 ;
return 'success';

END;
$$ LANGUAGE plpgsql;

--
I then open 2 command lines and run:
select bb_crash_db_5();

--
I think the issue may be to do with trying to update the same row at the
same time. (In the production system we generate the list of ids to update
outside of the procedure and are assuming that they are unique). I would
expect a lock or a deadlock rather than a seg fault though?

We're a bit nervous of arrays now. Are they safe to use as variables?

Below is the (simplified) production procedure:

CREATE OR REPLACE FUNCTION bb_fn (v_activity_source_id_list TEXT ARRAY,
v_large_media_uri_list TEXT ARRAY ) RETURNS TEXT AS $$

DECLARE

v_source_id_count INTEGER;
v_uri_count INTEGER;
v_source_key INTEGER;

BEGIN


---
-- SET large_media_uri
-- using generate series to match array indeces
---
UPDATE activity
SET media_uri = a.media_uri
,record_last_updated_tstamp = CURRENT_TIMESTAMP
FROM (SELECT  col1.SOURCE_ID[gs.ser] source_id
,col2.MEDIA_URI[gs.ser] media_uri
FROM (SELECT v_activity_source_id_list) As COL1(source_id)
,(select v_large_media_uri_list) AS COL2(media_uri)
,generate_series(1,v_uri_count) AS gs(ser)) a


WHERE activity_source_id_value = a.source_id;


RETURN 'success';


END;
$$ LANGUAGE plpgsql;


core dump:
---

root@ip-10-94-47-103:/tmp# gdb /usr/lib/postgresql/9.1/bin/postgres 13295
GNU gdb (GDB) 7.1-ubuntu
Reading symbols from /usr/lib/postgresql/9.1/bin/postgres...Reading symbols
from /usr/lib/debug/usr/lib/postgresql/9.1/bin/postgres...done.
done.
Attaching to program: /usr/lib/postgresql/9.1/bin/postgres, process 13295
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Thread debugging using libthread_db enabled]
0x7f970f452472 in recv () from /lib/libc.so.6
(gdb) set pagination off
(gdb) set logging file debuglog.txt
(gdb) set logging on
Copying output to debuglog.txt.
(gdb) handle SIGUSR1 nostop
SignalStop Print Pass to program Description
SIGUSR1   No Yes Yes User defined signal 1
(gdb) handle SIGUSR2 nostop
SignalStop Print Pass to program Description
SIGUSR2   No Yes Yes User defined signal 2
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
pg_detoast_datum (datum=0xb20) at fmgr.c:2233
2233 fmgr.c: No such file or directory.
in fmgr.c
(gdb) bt
#0  pg_detoast_datum (datum=0xb20) at fmgr.c:2233
#1  0x7f97113c7269 in EvalPlanQualFetchRowMarks
(epqstate=0x7f9713a0ebd0) at execMain.c:2100
#2  0x7f97113c76bc in EvalPlanQual (estate=0x7f9713a0e870,
epqstate=0x7f9713a0ebd0, relation=, rti=1,
tid=0x7fff72e23af0, priorXmax=) at execMain.c:1721
#3  0x7f97113e03ca in ExecUpdate (node=0x7f9713a0eb30) at
nodeModifyTable.c:587
#4  ExecModifyTable (node=0x7f9713a0eb30) at nodeModifyTable.c:838
#5  0x7f97113c98e8 in ExecProcNode (node=0x7f9713a0eb30) at
execProcnode.c:371
#6  0x7f97113c8712 in ExecutePlan (queryDesc=0x7f97139ea7a0,
direction=24, count=0) at execMain.c:1439
#7  standard_ExecutorRun (queryDesc=0x7f97139ea7a0, direction=24, count=0)
at execMain.c:313
#8  0x7f97113eae49 in _SPI_pquery (plan=,
paramLI=, snapshot=,
crosscheck_snapshot=, read_only=0 '\000',
fire_triggers=, tcount=) at
spi.c:2110
#9  _SPI_execute_plan (plan=, paramLI=, snapshot=, crosscheck_snapshot=, read_only=0 '\000', fire_triggers=,
tcount=) at spi.c:1922
#10 0x7f97113eb20c in SPI_execute_plan_with_paramlist
(plan=0x7f97139fe030, params=, read_only=0 '\000',
tcount=) at

Re: [SQL] Does anyone know of any issues around ARRAY UNNEST

2011-11-27 Thread Belinda Cussen
Thanks!

On 28 November 2011 14:30, Tom Lane  wrote:

> Belinda Cussen  writes:
> > I've managed to produce this fault consistently now.
> > Below is the simplified code:
>
> > CREATE TABLE foo_1 (id int primary key,media_uri TEXT);
> > INSERT INTO foo_1(id) SELECT i FROM generate_series(1,100) g(i);
>
> > CREATE OR REPLACE FUNCTION bb_crash_db_5 () RETURNS TEXT AS $$
> > DECLARE
> > v_activity_id_list INTEGER ARRAY;
> > BEGIN
>
> > SELECT ARRAY(SELECT id FROM foo_1 ORDER BY  id  LIMIT 10) INTO
> > v_activity_id_list;
> >  UPDATE foo_1
> > SET media_uri = 'a'
> >  WHERE id IN (SELECT activity_id FROM UNNEST (v_activity_id_list)
> > activity_id)
> >  ;
> > return 'success';
>
> > END;
> > $$ LANGUAGE plpgsql;
>
> > I then open 2 command lines and run:
> > select bb_crash_db_5();
>
> Thanks, I was able to reproduce it with this test case.  It turns out
> not to have anything directly to do with UNNEST, but with the code that
> deals with concurrent row updates.
>
> I've committed a fix, which will appear in next week's updates.
> Thanks for the report and test case!
>
>regards, tom lane
>



-- 
[image: Servian Logo]*Belinda Cussen* |  Servian Pty
Ltd |
*m:* 0466 309 169 | *t:* 02 9376 0700 | f*:* 02 9376 0730