subject:"\[GENERAL\] Query"

Daniel Westermann  writes: 
>> Thank you, Merlin. As said I know that "not in" is not a good choice in this 
>> case but I still do not get what is going here. Why does the >> repeatedly search for NULL values when I decrease work_mem and why not when 
>> increasing work_mem? 

>The core point is that one plan is using a hashed subplan and the other is 
>not, because the planner estimated that the hashtable wouldn't fit into 
>work_mem. With a hashtable you'll have one probe into the hashtable per 
>outer row, and each probe is O(1) unless you are unlucky about data 
>distributions, so the runtime is more or less linear. Without a 
>hashtable, the inner table is rescanned for each outer row, so the 
>runtime is O(N^2) which gets pretty bad pretty fast. "Materializing" 
>the inner table doesn't really help: it gets rid of per-inner-row 
>visibility checks and some buffer locking overhead, so it cuts the 
>constant factor some, but the big-O situation is still disastrous. 

Thanks, Tom

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Tom Lane

Daniel Westermann  writes:
> Thank you, Merlin. As said I know that "not in" is not a good choice in this 
> case but I still do not get what is going here. Why does the server 
> repeatedly search for NULL values when I decrease work_mem and why not when 
> increasing work_mem? 

The core point is that one plan is using a hashed subplan and the other is
not, because the planner estimated that the hashtable wouldn't fit into
work_mem.  With a hashtable you'll have one probe into the hashtable per
outer row, and each probe is O(1) unless you are unlucky about data
distributions, so the runtime is more or less linear.  Without a
hashtable, the inner table is rescanned for each outer row, so the
runtime is O(N^2) which gets pretty bad pretty fast.  "Materializing"
the inner table doesn't really help: it gets rid of per-inner-row
visibility checks and some buffer locking overhead, so it cuts the
constant factor some, but the big-O situation is still disastrous.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 10:33 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 



2017-04-05 10:13 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 

BQ_BEGIN

2017-04-05 9:28 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 

BQ_BEGIN

>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows) 




>There is a materialize op more 
> 
>do you have a index on ids.id ? 

Yes: 

\d ids 
Table "public.ids" 
Column | Type | Modifiers 
+-+--- 
id | integer | 
Indexes: 
"i_ids" UNIQUE, btree (id) 


BQ_END



>>hmm .. NOT IN is just bad :( 
>> 
>>The second is slow becase table ids is stored in temp file. and it is 
>>repeatedly read from file. In first case, ids table is stored in memory. 
>> 
>>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from ids 
>>where id = user_id) 
>> 

Yes, really bad :) ... and I still do not get it. Even when reading from the 
tempfile all the time it should at least complete within one hour, shouldn't 
it? The tables are not so big: 

select * from pg_size_pretty ( pg_relation_size ('ids' )); 
pg_size_pretty 
 
35 MB 
(1 row) 
select * from pg_size_pretty ( pg_relation_size ('users' )); 
pg_size_pretty 
 
195 MB 
(1 row) 


BQ_END

>> 150 * few ms ~ big time 

Ok got it 
Thanks 
Pavel

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 10:33 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> 2017-04-05 10:13 GMT+02:00 Daniel Westermann  services.com>:
>
>> 2017-04-05 9:28 GMT+02:00 Daniel Westermann > services.com>:
>>
>>> >>what is result of EXPLAIN statement for slow and fast cases?
>>> >>
>>> >>regards
>>> >>
>>> >>Pavel
>>>
>>> For work_mem=32MB
>>>
>>> explain (analyze,verbose,buffers)  select count(user_id) from users
>>> where user_id not in ( select id from ids);
>>>   QUERY
>>> PLAN
>>> 
>>> 
>>> --
>>>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
>>> time=7945.592..7945.593 rows=1 loops=1)
>>>Output: count(users.user_id)
>>>Buffers: shared read=29425
>>>->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=150
>>> width=4) (actual time=1928.665..5888.645 rows=249 loops=1)
>>>  Output: users.user_id, users.username
>>>  Filter: (NOT (hashed SubPlan 1))
>>>  Rows Removed by Filter: 51
>>>  Buffers: shared read=29425
>>>  SubPlan 1
>>>->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=101
>>> width=4) (actual time=11.111..868.382 rows=101 loops=1)
>>>  Output: ids.id
>>>  Buffers: shared read=4425
>>>  Planning time: 187.396 ms
>>>  Execution time: 7948.108 ms
>>> (14 rows)
>>>
>>> Time: 8244.493 ms
>>>
>>> For work_mem='16MB' it does not complete with analyze in on hour. For
>>> explain only:
>>>
>>> explain (verbose)  select count(user_id) from users where user_id not in
>>> ( select id from ids);
>>>QUERY
>>> PLAN
>>> 
>>> 
>>>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>>>Output: count(users.user_id)
>>>->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=150
>>> width=4)
>>>  Output: users.user_id, users.username
>>>  Filter: (NOT (SubPlan 1))
>>>  SubPlan 1
>>>->  Materialize  (cost=0.00..23332.01 rows=101 width=4)
>>>  Output: ids.id
>>>  ->  Seq Scan on public.ids  (cost=0.00..14425.01
>>> rows=101 width=4)
>>>Output: ids.id
>>> (10 rows)
>>>
>>
>> >There is a materialize op more
>> >
>> >do you have a index on ids.id?
>>
>> Yes:
>>
>> \d ids
>>   Table "public.ids"
>>  Column |  Type   | Modifiers
>> +-+---
>>  id | integer |
>> Indexes:
>> "i_ids" UNIQUE, btree (id)
>>
>>
>
> >>hmm .. NOT IN is just bad :(
> >>
> >>The second is slow becase table ids is stored in temp file. and it is
> repeatedly read from file. In first case, ids table is stored in memory.
> >>
> >>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from
> ids where id = user_id)
> >>
>
> Yes, really bad :) ... and I still do not get it. Even when reading from
> the tempfile all the time it should at least complete within one hour,
> shouldn't it? The tables are not so big:
>
> select * from pg_size_pretty ( pg_relation_size ('ids' ));
>  pg_size_pretty
> 
>  35 MB
> (1 row)
> select * from pg_size_pretty ( pg_relation_size ('users' ));
>  pg_size_pretty
> 
>  195 MB
> (1 row)
>
>

150 * few ms ~ big time

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 10:13 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 



2017-04-05 9:28 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 

BQ_BEGIN

>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows) 




>There is a materialize op more 
> 
>do you have a index on ids.id ? 

Yes: 

\d ids 
Table "public.ids" 
Column | Type | Modifiers 
+-+--- 
id | integer | 
Indexes: 
"i_ids" UNIQUE, btree (id) 


BQ_END



>>hmm .. NOT IN is just bad :( 
>> 
>>The second is slow becase table ids is stored in temp file. and it is 
>>repeatedly read from file. In first case, ids table is stored in memory. 
>> 
>>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from ids 
>>where id = user_id) 
>> 

Yes, really bad :) ... and I still do not get it. Even when reading from the 
tempfile all the time it should at least complete within one hour, shouldn't 
it? The tables are not so big: 

select * from pg_size_pretty ( pg_relation_size ('ids' )); 
pg_size_pretty 
 
35 MB 
(1 row) 
select * from pg_size_pretty ( pg_relation_size ('users' )); 
pg_size_pretty 
 
195 MB 
(1 row)

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 10:13 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> 2017-04-05 9:28 GMT+02:00 Daniel Westermann  services.com>:
>
>> >>what is result of EXPLAIN statement for slow and fast cases?
>> >>
>> >>regards
>> >>
>> >>Pavel
>>
>> For work_mem=32MB
>>
>> explain (analyze,verbose,buffers)  select count(user_id) from users where
>> user_id not in ( select id from ids);
>>   QUERY
>> PLAN
>> 
>> 
>> --
>>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
>> time=7945.592..7945.593 rows=1 loops=1)
>>Output: count(users.user_id)
>>Buffers: shared read=29425
>>->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=150
>> width=4) (actual time=1928.665..5888.645 rows=249 loops=1)
>>  Output: users.user_id, users.username
>>  Filter: (NOT (hashed SubPlan 1))
>>  Rows Removed by Filter: 51
>>  Buffers: shared read=29425
>>  SubPlan 1
>>->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=101
>> width=4) (actual time=11.111..868.382 rows=101 loops=1)
>>  Output: ids.id
>>  Buffers: shared read=4425
>>  Planning time: 187.396 ms
>>  Execution time: 7948.108 ms
>> (14 rows)
>>
>> Time: 8244.493 ms
>>
>> For work_mem='16MB' it does not complete with analyze in on hour. For
>> explain only:
>>
>> explain (verbose)  select count(user_id) from users where user_id not in
>> ( select id from ids);
>>QUERY
>> PLAN
>> 
>> 
>>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>>Output: count(users.user_id)
>>->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=150
>> width=4)
>>  Output: users.user_id, users.username
>>  Filter: (NOT (SubPlan 1))
>>  SubPlan 1
>>->  Materialize  (cost=0.00..23332.01 rows=101 width=4)
>>  Output: ids.id
>>  ->  Seq Scan on public.ids  (cost=0.00..14425.01
>> rows=101 width=4)
>>Output: ids.id
>> (10 rows)
>>
>
> >There is a materialize op more
> >
> >do you have a index on ids.id?
>
> Yes:
>
> \d ids
>   Table "public.ids"
>  Column |  Type   | Modifiers
> +-+---
>  id | integer |
> Indexes:
> "i_ids" UNIQUE, btree (id)
>
>

hmm .. NOT IN is just bad :(

The second is slow becase table ids is stored in temp file. and it is
repeatedly read from file. In first case, ids table is stored in memory.

SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from
ids where id = user_id)

Regards

Pavel

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 9:28 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 



>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows) 




>There is a materialize op more 
> 
>do you have a index on ids.id ? 

Yes: 

\d ids 
Table "public.ids" 
Column | Type | Modifiers 
+-+--- 
id | integer | 
Indexes: 
"i_ids" UNIQUE, btree (id)

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 9:28 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> >>what is result of EXPLAIN statement for slow and fast cases?
> >>
> >>regards
> >>
> >>Pavel
>
> For work_mem=32MB
>
> explain (analyze,verbose,buffers)  select count(user_id) from users where
> user_id not in ( select id from ids);
>   QUERY
> PLAN
> 
> --
>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
> time=7945.592..7945.593 rows=1 loops=1)
>Output: count(users.user_id)
>Buffers: shared read=29425
>->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=150
> width=4) (actual time=1928.665..5888.645 rows=249 loops=1)
>  Output: users.user_id, users.username
>  Filter: (NOT (hashed SubPlan 1))
>  Rows Removed by Filter: 51
>  Buffers: shared read=29425
>  SubPlan 1
>->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=101
> width=4) (actual time=11.111..868.382 rows=101 loops=1)
>  Output: ids.id
>  Buffers: shared read=4425
>  Planning time: 187.396 ms
>  Execution time: 7948.108 ms
> (14 rows)
>
> Time: 8244.493 ms
>
> For work_mem='16MB' it does not complete with analyze in on hour. For
> explain only:
>
> explain (verbose)  select count(user_id) from users where user_id not in (
> select id from ids);
>QUERY
> PLAN
> 
> 
>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>Output: count(users.user_id)
>->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=150
> width=4)
>  Output: users.user_id, users.username
>  Filter: (NOT (SubPlan 1))
>  SubPlan 1
>->  Materialize  (cost=0.00..23332.01 rows=101 width=4)
>  Output: ids.id
>  ->  Seq Scan on public.ids  (cost=0.00..14425.01
> rows=101 width=4)
>Output: ids.id
> (10 rows)
>

There is a materialize op more

do you have a index on ids.id?

Pavel

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows)

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 8:57 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> >> I have set work_mem to a very low value intentionally for demonstration
> >> purposes:
> >>
> >> postgres=# show work_mem;
> >>  work_mem
> >> --
> >>  16MB
> >> (1 row)
> >>
> >> postgres=# show shared_buffers ;
> >>  shared_buffers
> >> 
> >>  128MB
> >> (1 row)
> >>
> >>
> >> When I run the following query ( I know that "not in" is not a good
> choice
> >> here ):
> >>
> >> postgres=# select count(user_id) from users where user_id not in (
> select id
> >> from ids);
>
> >"NOT IN" where the predate is a table column can lead to very poor
> >query plans especially where the haystack is not provably known (at
> >plan time) to contain only not null values.  By reducing work_mem, the
> >server has decided has to repeatedly search the table to search for
> >the presence of null values.  Try converting the query to NOT EXISTS.
>
> Thank you, Merlin. As said I know that "not in" is not a good choice in
> this case but I still do not get what is going here. Why does the server
> repeatedly search for NULL values when I decrease work_mem and why not when
> increasing work_mem?
>

what is result of EXPLAIN statement for slow and fast cases?

regards

Pavel


>
>
> Regards
> Daniel
>

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

>> I have set work_mem to a very low value intentionally for demonstration 
>> purposes: 
>> 
>> postgres=# show work_mem; 
>> work_mem 
>> -- 
>> 16MB 
>> (1 row) 
>> 
>> postgres=# show shared_buffers ; 
>> shared_buffers 
>>  
>> 128MB 
>> (1 row) 
>> 
>> 
>> When I run the following query ( I know that "not in" is not a good choice 
>> here ): 
>> 
>> postgres=# select count(user_id) from users where user_id not in ( select id 
>> from ids); 

>"NOT IN" where the predate is a table column can lead to very poor 
>query plans especially where the haystack is not provably known (at 
>plan time) to contain only not null values. By reducing work_mem, the 
>server has decided has to repeatedly search the table to search for 
>the presence of null values. Try converting the query to NOT EXISTS. 

Thank you, Merlin. As said I know that "not in" is not a good choice in this 
case but I still do not get what is going here. Why does the server repeatedly 
search for NULL values when I decrease work_mem and why not when increasing 
work_mem? 


Regards 
Daniel

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-04 Thread Merlin Moncure

On Tue, Apr 4, 2017 at 8:20 AM, Daniel Westermann
 wrote:
> Hi,
>
> PostgreSQL 9.6.2 on CentOS 7.3 x64.
>
> This is my data set:
>
> drop table if exists users;
> drop table if exists ids;
> create table users ( user_id int
>, username varchar(50)
>);
> with generator as
> ( select a.*
> from generate_series (1,300) a
>order by random()
> )
> insert into users ( user_id
>   , username
>   )
>select a
> , md5(a::varchar)
>  from generator;
> create unique index i_users on users ( user_id );
> create table ids ( id int );
> insert into ids (id) values ( generate_series ( 250, 350 ) );
> create unique index i_ids on ids ( id );
> analyze users;
> analyze ids;
>
> I have set work_mem to a very low value intentionally for demonstration
> purposes:
>
> postgres=# show work_mem;
>  work_mem
> --
>  16MB
> (1 row)
>
> postgres=# show shared_buffers ;
>  shared_buffers
> 
>  128MB
> (1 row)
>
>
> When I run the following query ( I know that "not in" is not a good choice
> here ):
>
> postgres=# select count(user_id) from users where user_id not in ( select id
> from ids);

"NOT IN" where the predate is a table column can lead to very poor
query plans especially where the haystack is not provably known (at
plan time) to contain only not null values.  By reducing work_mem, the
server has decided has to repeatedly search the table to search for
the presence of null values.  Try converting the query to NOT EXISTS.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-04 Thread Daniel Westermann

Hi, 

PostgreSQL 9.6.2 on CentOS 7.3 x64. 

This is my data set: 

drop table if exists users; 
drop table if exists ids; 
create table users ( user_id int 
, username varchar(50) 
); 
with generator as 
( select a.* 
from generate_series (1,300) a 
order by random() 
) 
insert into users ( user_id 
, username 
) 
select a 
, md5(a::varchar) 
from generator; 
create unique index i_users on users ( user_id ); 
create table ids ( id int ); 
insert into ids (id) values ( generate_series ( 250, 350 ) ); 
create unique index i_ids on ids ( id ); 
analyze users; 
analyze ids; 

I have set work_mem to a very low value intentionally for demonstration 
purposes: 

postgres=# show work_mem; 
work_mem 
-- 
16MB 
(1 row) 

postgres=# show shared_buffers ; 
shared_buffers 
 
128MB 
(1 row) 


When I run the following query ( I know that "not in" is not a good choice here 
): 

postgres=# select count(user_id) from users where user_id not in ( select id 
from ids); 

... this seems to never complete (at least not within one hour). 

Setting work_mem to 32MB and all is fine. 

top shows the session at 100% CPU. No waits are listed in pg_stat_activity: 

postgres=# select pid,wait_event_type,wait_event,state,query from 
pg_stat_activity ; 
pid | wait_event_type | wait_event | state | query 
---+-+++--
 
17817 | | | active | select count(user_id) from users where user_id not in ( 
select id from ids); 
17847 | | | active | select pid,wait_event_type,wait_event,state,query from 
pg_stat_activity ; 
(2 rows) 

strace shows more ore less always this, so something is happening: 

read(14, 
"\0\0\1\0\0\t\30\0\351G1\0\16\0\0\0\1\0\0\t\30\0\352G1\0\16\0\0\0\1\0"..., 
8192) = 8192 
read(14, 
"\1\0\0\t\30\0002J1\0\16\0\0\0\1\0\0\t\30\0003J1\0\16\0\0\0\1\0\0\t"..., 8192) 
= 8192 
read(14, "\0\t\30\0{L1\0\16\0\0\0\1\0\0\t\30\0|L1\0\16\0\0\0\1\0\0\t\30\0"..., 
8192) = 8192 
read(14, 
"\30\0\304N1\0\16\0\0\0\1\0\0\t\30\0\305N1\0\16\0\0\0\1\0\0\t\30\0\306N"..., 
8192) = 8192 
read(14, "\rQ1\0\16\0\0\0\1\0\0\t\30\0\16Q1\0\16\0\0\0\1\0\0\t\30\0\17Q1\0"..., 
8192) = 8192^C 

postgres@pgbox:/u02/pgdata/PG962/ [PG962] ls -la /proc/17817/fd/ 
total 0 
dr-x--. 2 postgres postgres 0 Apr 4 14:45 . 
dr-xr-xr-x. 9 postgres postgres 0 Apr 4 14:34 .. 
lr-x--. 1 postgres postgres 64 Apr 4 14:45 0 -> /dev/null 
l-wx--. 1 postgres postgres 64 Apr 4 14:45 1 -> pipe:[58121] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 10 -> socket:[58881] 
lr-x--. 1 postgres postgres 64 Apr 4 14:45 11 -> pipe:[58882] 
l-wx--. 1 postgres postgres 64 Apr 4 14:45 12 -> pipe:[58882] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 13 -> 
/u02/pgdata/PG962/base/13323/16516 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 14 -> 
/u02/pgdata/PG962/base/pgsql_tmp/pgsql_tmp17817.1 
l-wx--. 1 postgres postgres 64 Apr 4 14:45 2 -> pipe:[58121] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 3 -> anon_inode:[eventpoll] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 4 -> 
/u02/pgdata/PG962/base/13323/2601 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 5 -> 
/u02/pgdata/PG962/base/13323/16517 
lr-x--. 1 postgres postgres 64 Apr 4 14:45 6 -> pipe:[58120] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 7 -> 
/u02/pgdata/PG962/base/13323/16520 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 8 -> 
/u02/pgdata/PG962/base/13323/16513 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 9 -> socket:[58126] 

The size of the temp file does not change over time: 

postgres@pgbox:/u02/pgdata/PG962/ [PG962] ls -lha 
/u02/pgdata/PG962/base/pgsql_tmp/pgsql_tmp17817.2 
-rw---. 1 postgres postgres 14M Apr 4 14:48 
/u02/pgdata/PG962/base/pgsql_tmp/pgsql_tmp17817.2 

What do I miss here? Shouldn't this complete with 16MB work_mem as well, even 
when slower, but in less than one hour? Or is this expected? 

Thanks for your help 
Daniel

Re: [GENERAL] Query with type char

2017-02-16 Thread Christoph Moench-Tegeder

## Egon Frerich (e...@frerich.eu):

> Why are there 0 rows? I expect 3 rows:

>From your results I guess that sp3 IS NULL on the three rows where
it is not 'j'. You should use the correct comparison predicate for
this case, i.e. IS DISTINCT FROM instead of <>.
https://www.postgresql.org/docs/9.6/static/functions-comparison.html

Regards,
Christoph

-- 
Spare Space.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query with type char

2017-02-16 Thread Egon Frerich


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Why are there 0 rows? I expect 3 rows:

> mydb=# SELECT * FROM sample_char;
>  sp1 |   sp2   | sp3
> -+-+-
>1 | Bremen  |
>2 | Hamburg |
>4 | Toronto |
>3 | Bern| j
> (4 rows)
>
> mydb=# SELECT * FROM sample_char WHERE sp3 = 'j';
>  sp1 |   sp2   | sp3
> -+-+-
>3 | Bern| j
> (1 row)
>
> mydb=# SELECT * FROM sample_char WHERE sp3 <> 'j';
>  sp1 | sp2 | sp3
> -+-+-
> (0 rows)

Egon


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJYpXtvAAoJECc7nbY+bg5uu38P/jXtMHU6AeNwNY5X+B3B9XnD
Wy6AChGzxl1z3pQLpKLrIHPI+y0FFx68vV6TBxpzW/wvpYE7rTM2UfLFyUmcflim
nEhQPUgK/YW3A5yKbjlRQasaosiBrvTvqlP8dSyhIASC1Jxwa152xaTrYOw9+ANO
Z/prP4X1GF7m29P0ULFxthpJlA3FOgtueKcwCVkDQVb2eRISFX8pzhuWKHFHukjx
avMhTo7s0n2unZGmtqrC1m59MR5WYoxbc7Dmwg82/jLqEf4ZYCc4tniqiysl3Tzc
i55xH/aqzVAXhC9PHCxoTU8ey2vyGogRObHo5tCqUpMPNoz0zbeSFrmb829fi3xV
XOKkerx2xXOnxkZkYfCH/a50eWdt9+Nw/ktuHU7G2/UIJQH3DxPi91nuY6trpO5e
nOSqi64F9gm12trCe14Jk8z/Ea7NZCCL5GNrQKXj4zkJF1AagG+bzsefZRz/fkK3
m9Nc2AS9J27mk0jx2SuCc5hHx6o8IdsemZacAamjFa/TYvOau+c6fbHKVvU79uhE
Wxdchse2Zo0/wC6o8nlQqTKmX4xfYKA5Oqk6gmOXC5fqLNOzIYb04IIF6zZAw9Sk
CELmOHyP0ZVyQhAVipqrepgbWhIrRVZV0+a+h5IO/s4QoenFjNV+pKnnwj9BCjw7
duk+MVwoHi6fZk6z601b
=LHt3
-END PGP SIGNATURE-



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query with large in clauses uses a lot of memory

2016-11-28 Thread greigwise

Wow.  Thanks for the prompt answer.

As a follow-up I was wondering if maybe there would be a way to tell it to
NOT try to plan/execute the query (and instead throw an error) if the memory
usage exceeded X.

Thanks again.

Greig



--
View this message in context: 
http://postgresql.nabble.com/Query-with-large-in-clauses-uses-a-lot-of-memory-tp5931716p5932279.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

2016-11-26 Thread azhwkd

Greetings!

I tried running the application using the race detector but no data races
were detected.
To further narrow this down I removed the call to the function in question
from the pool
and instead run it one by one from a queue. This seems to have helped and
the query didn't lock up anymore.
I took this one step further and left the locked up queries sitting there
for a day and they eventually finished. Who would have thought a ~2 second
query could blow up to ~20hrs.

rob stone  schrieb am Fr., 25. Nov. 2016 um 02:34 Uhr:

On Thu, 2016-11-24 at 22:23 +0100, azhwkd wrote:
> It should not be possible because a group does not return to the
> update pool before the update hasn't finished.
> I watched the queries in a postgres client and there was no overlap I
> could see.
> I don't really know what to make from this behavior, sometimes when I
> start the application a few updates go through and eventually it will
> lock up completely and sometimes it locks up immediately - always
> with
> heap_hot_search_buffer using ~20 of all CPU time on the system.
>
>
Hello Sebastian,
You stated that the application is written using go (www.golang.org).
Are you able to run the application with the -race flag?

HTH,
Rob

>From wikipedia:-

Lack of race condition safety

There are no restrictions on how goroutines access shared data, making
race conditions possible. Specifically, unless a program explicitly
synchronizes via channels or other means, writes from one goroutine
might be partly, entirely, or not at all visible to another, often with
no guarantees about ordering of writes. Furthermore, Go's internal data
structures like interface values, slice headers, hash tables, and
string headers are not immune to race conditions, so type and memory
safety can be violated in multithreaded programs that modify shared
instances of those types without synchronization.

Instead of language support, safe concurrent programming thus relies on
conventions; for example, Chisnall recommends an idiom called "aliases
xor mutable", meaning that passing a mutable value (or pointer) over a
channel signals a transfer of ownership over the value to its receiver.

Re: [GENERAL] Query regarding deadlock

2016-11-25 Thread Jeff Janes

On Thu, Nov 24, 2016 at 5:44 PM, Yogesh Sharma <
yogesh1.sha...@nectechnologies.in> wrote:

> Dear All,
>
> Thanks in advance.
> I found below deadlock in postgresql logs.
> I cannot change calling of REINDEX and insert query sequence because it is
> execute automatically through some cron script.
>
> ERROR:  deadlock detected
>  DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of
> database 16385; blocked by process 4111.
>  Process 4111 waits for ShareLock on relation 16502 of database 16385;
> blocked by process 2234.
>  Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
>  Process 4111: REINDEX TABLE table1
>
> Could you please provide any solution to resolve this deadlock.
>

What are tables 16459 and 16502?  Are they related to each other through
triggers or FK constraints?

Are you reindexing multiple tables in the same transaction?  If not, I
don't see why these should deadlock.  One should win, and the other should
block.

If you are reindexing multiple tables in the same transaction, why are you
doing that?  I can't think of a situation where you couldn't use separate
transactions per table.

Cheers,

Jeff

Re: [GENERAL] query locks up when run concurrently

2016-11-25 Thread Adrian Klaver

On 11/24/2016 02:14 PM, azhwkd wrote:

Adrian Klaver > schrieb am Do., 24. Nov. 2016 um
22:34 Uhr:

On 11/24/2016 01:23 PM, azhwkd wrote:
> It should not be possible because a group does not return to the
> update pool before the update hasn't finished.

So what is this 'update pool' and what is driving/using it?

In other words how is the determination of the parameters done?

To be more specific, the implication is that a group id can be reused so
what determines that?

The application is written in go. Every group ID has its own go routine
and the routine is blocked until the SQL statement returns.
The update process starts with a check to an external API endpoint and
if there is new data available the routine is downloading it, parsing it
and inserting the data into 2 tables. Once that is done, the routine
continues to execute the statement in question using the data it
inserted before for the calculation. Only once this finishes will the
routine start over again.

> I watched the queries in a postgres client and there was no
overlap I could see.

Was this a visual inspection or did you dump the results of the various
query/parameter combinations into tables and do an SQL comparison?

I inspected it visually and also dumped all variables into a file
directly from the application.

> I don't really know what to make from this behavior, sometimes when I
> start the application a few updates go through and eventually it will
> lock up completely and sometimes it locks up immediately - always with

Is there a common thread with regard to the parameters in use when
things lock up?

Do you mean if it always locks on the same parameters? If so then it
does not, sadly

Yes, that would have been too easy. I'm out of ideas for the moment. Rob 
Stones post looks promising though.

--
Adrian Klaver
adrian.kla...@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Tom Lane

Yogesh Sharma  writes:
> ERROR:  deadlock detected
>  DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of 
> database 16385; blocked by process 4111.
>  Process 4111 waits for ShareLock on relation 16502 of database 16385; 
> blocked by process 2234.
>  Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
>  Process 4111: REINDEX TABLE table1

Offhand I would not expect those two commands to deadlock in isolation.
Are they parts of larger transactions that take additional locks?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Sameer Kumar

On Fri, 25 Nov 2016, 10:07 a.m. Yogesh Sharma, <
yogesh1.sha...@nectechnologies.in> wrote:

> Dear John,
>
> Thanks for your support.
> I mean to say, the REINDEX calls hourly and insert query executes every
> minute to update.
> So, it might be race condition that these queries can call at same time.
>

Reindex every hour? How did you end up with that. Looks like you got
another problem while fixing one with a dirty solution. Why do you need to
reindex every hour?


> If there is any solution like we can add some check before REINDEX
> operation performed.
> If it is possible?
>
> Regards,
> Yogesh
>
> -Original Message-
> From: pgsql-general-ow...@postgresql.org [mailto:
> pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
> Sent: Friday, November 25, 2016 10:55 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query regarding deadlock
>
> On 11/24/2016 5:44 PM, Yogesh Sharma wrote:
> > I cannot change calling of REINDEX and insert query sequence because it
> is execute automatically through some cron script.
>
> any cron scripts are your own doing, so this statement makes no sense at
> all.
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 

-- 

Best Regards,

*Sameer Kumar | DB Solution Architect*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

T: +65 6438 3504 | www.ashnik.com

Skype: sameer.ashnik |   T: +65 8110 0350


[image: www.ashnik.com] <http://www.ashnik.com/>

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Michael Paquier

On Fri, Nov 25, 2016 at 11:14 AM, Sameer Kumar  wrote:
> Does this mean that you reindex quite often based on a schedule. Personally I 
> don't prefer that. To me it is like you are trying to fix something that is 
> not broken.
>
> Ideally reindex only what needs to be reindexed. I would not want to reindex 
> a table in OLTP env.

Like VACUUM FULL, don't forget that REINDEX needs an exclusive lock
when working. As that's very intrusive, usually you take measures on
your database to be sure that you *never* require it, particularly if
this is a production instance.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Jan de Visser


On 2016-11-24 9:06 PM, Yogesh Sharma wrote:


Dear John,

Thanks for your support.
I mean to say, the REINDEX calls hourly and insert query executes every minute 
to update.
So, it might be race condition that these queries can call at same time.


Why do you need to run REINDEX every hour? That sounds like a stopgap 
solution for another problem.

If there is any solution like we can add some check before REINDEX operation 
performed.
If it is possible?


Try to find out why the cron script is there in the first place. Then go 
from there - eliminate it, or do it only in down periods.




Regards,
Yogesh

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, November 25, 2016 10:55 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query regarding deadlock

On 11/24/2016 5:44 PM, Yogesh Sharma wrote:

I cannot change calling of REINDEX and insert query sequence because it is 
execute automatically through some cron script.

any cron scripts are your own doing, so this statement makes no sense at all.






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Sameer Kumar

On Fri, 25 Nov 2016, 9:45 a.m. Yogesh Sharma, <
yogesh1.sha...@nectechnologies.in> wrote:

> Dear All,
>
> Thanks in advance.
> I found below deadlock in postgresql logs.
> I cannot change calling of REINDEX and insert query sequence because it is
> execute automatically through some cron script.
>

Does this mean that you reindex quite often based on a schedule. Personally
I don't prefer that. To me it is like you are trying to fix something that
is not broken.

Ideally reindex only what needs to be reindexed. I would not want to
reindex a table in OLTP env.


> ERROR:  deadlock detected
>  DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of
> database 16385; blocked by process 4111.
>  Process 4111 waits for ShareLock on relation 16502 of database 16385;
> blocked by process 2234.
>  Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
>  Process 4111: REINDEX TABLE table1
>



> Could you please provide any solution to resolve this deadlock.
>
> Regards,
> Yogesh
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 

-- 

Best Regards,

*Sameer Kumar | DB Solution Architect*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

T: +65 6438 3504 | www.ashnik.com

Skype: sameer.ashnik |   T: +65 8110 0350


[image: www.ashnik.com]

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Yogesh Sharma

Dear John,

Thanks for your support.
I mean to say, the REINDEX calls hourly and insert query executes every minute 
to update.
So, it might be race condition that these queries can call at same time.

If there is any solution like we can add some check before REINDEX operation 
performed.
If it is possible?

Regards,
Yogesh

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, November 25, 2016 10:55 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query regarding deadlock

On 11/24/2016 5:44 PM, Yogesh Sharma wrote:
> I cannot change calling of REINDEX and insert query sequence because it is 
> execute automatically through some cron script.

any cron scripts are your own doing, so this statement makes no sense at all.

-- 
john r pierce, recycling bits in santa cruz

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread John R Pierce


On 11/24/2016 5:44 PM, Yogesh Sharma wrote:

I cannot change calling of REINDEX and insert query sequence because it is 
execute automatically through some cron script.


any cron scripts are your own doing, so this statement makes no sense at 
all.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query regarding deadlock

2016-11-24 Thread Yogesh Sharma

Dear All,

Thanks in advance.
I found below deadlock in postgresql logs.
I cannot change calling of REINDEX and insert query sequence because it is 
execute automatically through some cron script.

ERROR:  deadlock detected
 DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of database 
16385; blocked by process 4111.
 Process 4111 waits for ShareLock on relation 16502 of database 16385; blocked 
by process 2234.
 Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
 Process 4111: REINDEX TABLE table1

Could you please provide any solution to resolve this deadlock.

Regards,
Yogesh

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread rob stone

On Thu, 2016-11-24 at 22:23 +0100, azhwkd wrote:
> It should not be possible because a group does not return to the
> update pool before the update hasn't finished.
> I watched the queries in a postgres client and there was no overlap I
> could see.
> I don't really know what to make from this behavior, sometimes when I
> start the application a few updates go through and eventually it will
> lock up completely and sometimes it locks up immediately - always
> with
> heap_hot_search_buffer using ~20 of all CPU time on the system.
> 
> 
Hello Sebastian,
You stated that the application is written using go (www.golang.org).
Are you able to run the application with the -race flag?

HTH,
Rob

>From wikipedia:-

Lack of race condition safety

There are no restrictions on how goroutines access shared data, making
race conditions possible. Specifically, unless a program explicitly
synchronizes via channels or other means, writes from one goroutine
might be partly, entirely, or not at all visible to another, often with
no guarantees about ordering of writes. Furthermore, Go's internal data
structures like interface values, slice headers, hash tables, and
string headers are not immune to race conditions, so type and memory
safety can be violated in multithreaded programs that modify shared
instances of those types without synchronization.

Instead of language support, safe concurrent programming thus relies on
conventions; for example, Chisnall recommends an idiom called "aliases
xor mutable", meaning that passing a mutable value (or pointer) over a
channel signals a transfer of ownership over the value to its receiver.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread azhwkd

Adrian Klaver  schrieb am Do., 24. Nov. 2016 um
22:34 Uhr:

> On 11/24/2016 01:23 PM, azhwkd wrote:
> > It should not be possible because a group does not return to the
> > update pool before the update hasn't finished.
>
> So what is this 'update pool' and what is driving/using it?
>
> In other words how is the determination of the parameters done?
>
> To be more specific, the implication is that a group id can be reused so
> what determines that?
>

The application is written in go. Every group ID has its own go routine and
the routine is blocked until the SQL statement returns.
The update process starts with a check to an external API endpoint and if
there is new data available the routine is downloading it, parsing it and
inserting the data into 2 tables. Once that is done, the routine continues
to execute the statement in question using the data it inserted before for
the calculation. Only once this finishes will the routine start over again.

>
> > I watched the queries in a postgres client and there was no overlap I
> could see.
>
> Was this a visual inspection or did you dump the results of the various
> query/parameter combinations into tables and do an SQL comparison?
>

I inspected it visually and also dumped all variables into a file directly
from the application.

>
> > I don't really know what to make from this behavior, sometimes when I
> > start the application a few updates go through and eventually it will
> > lock up completely and sometimes it locks up immediately - always with
>
> Is there a common thread with regard to the parameters in use when
> things lock up?
>

Do you mean if it always locks on the same parameters? If so then it does
not, sadly

>
> > heap_hot_search_buffer using ~20 of all CPU time on the system.
> >
> > 2016-11-24 19:14 GMT+01:00 Adrian Klaver :
> >> On 11/23/2016 10:41 PM, azhwkd wrote:
> >>>
> >>> The group ID is part of the primary key of the group_history table. My
> >>> understanding is that two INSERTs with different group IDs should not
> >>> collide in this case, or am I wrong in thinking this?
> >>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: [GENERAL] query locks up when run concurrently


On 11/24/2016 01:23 PM, azhwkd wrote:

It should not be possible because a group does not return to the
update pool before the update hasn't finished.


So what is this 'update pool' and what is driving/using it?

In other words how is the determination of the parameters done?

To be more specific, the implication is that a group id can be reused so 
what determines that?



I watched the queries in a postgres client and there was no overlap I could see.


Was this a visual inspection or did you dump the results of the various 
query/parameter combinations into tables and do an SQL comparison?



I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with


Is there a common thread with regard to the parameters in use when 
things lock up?



heap_hot_search_buffer using ~20 of all CPU time on the system.

2016-11-24 19:14 GMT+01:00 Adrian Klaver :

On 11/23/2016 10:41 PM, azhwkd wrote:


The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread azhwkd

It should not be possible because a group does not return to the
update pool before the update hasn't finished.
I watched the queries in a postgres client and there was no overlap I could see.
I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with
heap_hot_search_buffer using ~20 of all CPU time on the system.

2016-11-24 19:14 GMT+01:00 Adrian Klaver :
> On 11/23/2016 10:41 PM, azhwkd wrote:
>>
>> The group ID is part of the primary key of the group_history table. My
>> understanding is that two INSERTs with different group IDs should not
>> collide in this case, or am I wrong in thinking this?
>
>
> After fresh coffee:
>
> In your first post you had this:
> "(In the application this
> query is run up to 10 times in parallel with different parameters)"
>
> where the parameters look to be a timestamp and a group id.
>
> Is it possible that the combination of parameters over the 10 different
> queries is not actually selecting rows in each result set that are unique
> over (group,id,sub_category,"date","hour") between result sets?
>
>
>
>
>>
>> The table definition for group_history is the following:
>>
>> CREATE TABLE public.group_history (
>> group int4 NOT NULL,
>> id int4 NOT NULL,
>> sub_category int4 NOT NULL,
>> "date" date NOT NULL,
>> aa int8 NOT NULL,
>> ab int8 NOT NULL,
>> bb int8 NOT NULL,
>> ba int8 NOT NULL,
>> quantity int8 NOT NULL,
>> "hour" int2 NOT NULL,
>> CONSTRAINT group_history_pk PRIMARY KEY
>> (group,id,sub_category,"date","hour")
>> )
>> WITH (
>> OIDS=FALSE
>> );
>>
>> Kind regards,
>> Sebastian
>>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently


On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?


After fresh coffee:

In your first post you had this:
"(In the application this
query is run up to 10 times in parallel with different parameters)"

where the parameters look to be a timestamp and a group id.

Is it possible that the combination of parameters over the 10 different 
queries is not actually selecting rows in each result set that are 
unique over (group,id,sub_category,"date","hour") between result sets?






The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Me slaps head

Windy night last night + lack of sleep + insufficient caffeine = Not 
seeing the table definition before hitting Send on previous post.

Kind regards,
Sebastian

2016-11-24 0:28 GMT+01:00 Adrian Klaver :

On 11/23/2016 01:52 PM, azhwkd wrote:

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Except the INSERT query in the trigger function is working on dates not
group ids.

Kind regards,
Sebastian

Tom Lane > schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azh...@gmail.com  writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries
from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be
fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different
rows?

regards, tom lane

--
Adrian Klaver
adrian.kla...@aklaver.com

--
Adrian Klaver
adrian.kla...@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently


On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?


The suspicion is that this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour") 

is not the problem. That this(from the trigger function) is:

execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES ($1.*)
on conflict ("group", id, sub_category, "when", "hour") do




where _tablename is:

_tablename := 'group_history_'||_startdate;


It would be nice to see the schema definitions for group_history and at 
least one of the group_history_startdate tables.




The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently


On 11/23/2016 10:26 PM, azhwkd wrote:

I'm sorry. I worded this quite poorly. I meant to say that there were
no log lines added to the postgres logfile at the time.
I hope these are the settings you were refferring to (I did not change
anything in the config files vs. the default)

log_destination = 'stderr'
logging_collector = off
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 10MB
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
event_source = 'PostgreSQL'
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = -1
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t '
log_lock_waits = off
log_statement = 'none'
log_temp_files = -1
log_timezone = 'UTC'


While troubleshooting this I would enable log_connections and 
log_disconnections and change log_statement to 'mod'. This will help you 
see when and what is happening when you do the concurrent queries. FYI, 
you need reload the server to have the changes be seen.




I'm sorry, it seems like I copied the trigger definition from the
wrong table (item_history is a test table I created yesterday while
trying things which is exactly the same as group_history just a
different name).
The trigger on the group_history table is exactly the same though,
except for the table name.

Kind regards,
Sebastian




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian

2016-11-24 0:28 GMT+01:00 Adrian Klaver :
> On 11/23/2016 01:52 PM, azhwkd wrote:
>>
>> Greetings!
>>
>> The parallel calls should not be working on the same row. Each query
>> services a different group ID on it's own and there is no overlap.
>
>
> Except the INSERT query in the trigger function is working on dates not
> group ids.
>
>>
>> Kind regards,
>> Sebastian
>>
>>
>> Tom Lane > schrieb am Mi.,
>> 23. Nov. 2016 um 17:47 Uhr:
>>
>> azh...@gmail.com  writes:
>> > I have a query which if run alone usually completes in about 300ms.
>> > When run in my application this query constantly locks up and bogs
>> > down all connections of the connection pool (In the application this
>> > query is run up to 10 times in parallel with different parameters).
>> > What's really weird is that I can re-run one of the hung queries
>> from
>> > the command line while it's hung and it will complete as expected
>> > while the hung queries continue to use 100% CPU time.
>>
>> Judging from the EXPLAIN timing, most of the work is in the trigger,
>> which leads me to wonder if the parallel calls are likely to be
>> fighting
>> over inserting/updating the same row in the group_history partition
>> tables.  Or are you certain that they should be hitting different
>> rows?
>>
>> regards, tom lane
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

I'm sorry. I worded this quite poorly. I meant to say that there were
no log lines added to the postgres logfile at the time.
I hope these are the settings you were refferring to (I did not change
anything in the config files vs. the default)

log_destination = 'stderr'
logging_collector = off
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 10MB
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
event_source = 'PostgreSQL'
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = -1
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t '
log_lock_waits = off
log_statement = 'none'
log_temp_files = -1
log_timezone = 'UTC'

I'm sorry, it seems like I copied the trigger definition from the
wrong table (item_history is a test table I created yesterday while
trying things which is exactly the same as group_history just a
different name).
The trigger on the group_history table is exactly the same though,
except for the table name.

Kind regards,
Sebastian

2016-11-24 0:26 GMT+01:00 Adrian Klaver :
> On 11/23/2016 01:47 PM, azhwkd wrote:
>>
>> Greetings!
>>
>> Yes I had a look at the logfiles but there is not a single logfile
>> generated when I'm reproducing this issue.
>
>
> We are talking about the Postgres logfile, correct?
>
> I have to believe a log file exists, but maybe not entries at that time.
>
> What are you logging settings?
>
>> Concerning locks I used there queries
>> at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up
>> empty.
>> The group_history table and its sub-tables do not have any foreign keys
>> as they only contain calculated data and the source data is removed
>> after some time.
>> The trigger which calls the partition function below is defined as
>> follows:
>>
>> create
>> trigger group_history_trigger before insert
>> on
>> item_history for each row execute procedure
>> group_history_partition_function()
>
>
> This trigger is on the table item_history not group history, is that
> correct?
>
> If so how does the item_history table fit into this?
>
>>
>> Kind regards,
>> Sebastian
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Adrian Klaver

On 11/23/2016 01:52 PM, azhwkd wrote:

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Except the INSERT query in the trigger function is working on dates not 
group ids.

Kind regards,
Sebastian

Tom Lane > schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azh...@gmail.com  writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different rows?

regards, tom lane

--
Adrian Klaver
adrian.kla...@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Adrian Klaver


On 11/23/2016 01:47 PM, azhwkd wrote:

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.


We are talking about the Postgres logfile, correct?

I have to believe a log file exists, but maybe not entries at that time.

What are you logging settings?


Concerning locks I used there queries
at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys
as they only contain calculated data and the source data is removed
after some time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()


This trigger is on the table item_history not group history, is that 
correct?


If so how does the item_history table fit into this?



Kind regards,
Sebastian



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query with large in clauses uses a lot of memory

2016-11-23 Thread Tom Lane

greigwise  writes:
> So, I decided to try an experiment.  I wrote 2 queries as follows:
> 1 ) select pg_sleep(100) ;
> 2 ) with q (s1, s2) as (select pg_sleep(100), 1)
> select * from q where s2 in ( 1,  delimited numbers>)
>
> It looks to me like the connection running the big query is using about 2GB
> more memory than the other one.  I could see why it might use *some* more
> (like 28MB more?), but 2GB more seems excessive. 

Don't hold your breath waiting for that to get better.  Depending on what
I assume about the widths of your numbers, you've got something like 3
million Const parse nodes in that query, so the system is eating something
like 600-700 bytes per Const, which is not all that many copies because
one Const node plus List overhead is probably 100 bytes on a 64-bit
server.  OK, it's not exactly frugal perhaps, but it would not be hard to
get to that at all if you're running the query in a way that requires
keeping a plancache entry for it.  It would take significant work (and
probably some performance sacrifices) to make much of a dent in the
space consumption, and even if we put in the work, I'd only expect to
be able to dent it a bit --- an order-of-magnitude reduction is not in
the cards.  Queries with that many parse elements in them are just not
cheap.

Now, that WHERE condition will eventually get folded to the form

s2 = ANY ('{1,2,...}'::integer[])

and that constant array is a *lot* less space-wasteful, only 4 bytes
per element (or 8 bytes if we're talking bigints).  So the approach I'd
advise is trying to send the query with a constant array to begin with
--- either write it like that, or like

s2 = ANY ($1::integer[])

and send the array as an out-of-line parameter.

Don't know how hard it might be to arm-wrestle ActiveRecord into doing
it like that :-(

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query with large in clauses uses a lot of memory

2016-11-23 Thread greigwise

I had an issue today where the OOM killer terminated one of my postgres
processes.
On my server I have 8 GB of RAM, shared_memory is 1 GB and work_memory is
24MB.
I have connection pooling which limits us to 25 connections. Even if I'm
maxed out there, I'm still only using 1.6 MB of RAM of my 8 which seems like
it shouldn't be a problem.

Looking through my postgres logs, I noticed that right about the time of the
OOM incident, I had some queries running with pretty massive in clauses
(thank you ruby/ActiveRecord). One of the queries was about 28MB in size.

So, I decided to try an experiment. I wrote 2 queries as follows:
1 ) select pg_sleep(100) ;
2 ) with q (s1, s2) as (select pg_sleep(100), 1)
select * from q where s2 in ( 1, )

I ran those queries via psql and did this:

-sh-4.1$ ps aux | grep -i -E "local|COMMAND" | grep -v ruby
USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND
postgres 20896 27.0 28.2 3416812 2132112 ? Ss 21:18 0:02 postgres:
hireology hireology [local] SELECT
postgres 20899 0.0 0.0 1281368 4800 ?Ss 21:18 0:00 postgres:
hireology hireology [local] SELECT

It looks to me like the connection running the big query is using about 2GB
more memory than the other one. I could see why it might use *some* more
(like 28MB more?), but 2GB more seems excessive.

So, the question is why does it use so much more memory. And is there
anything I can do to limit this problem other than fixing the silly queries?

Thanks in advance for any help,
Greig Wise

--
View this message in context:
http://postgresql.nabble.com/Query-with-large-in-clauses-uses-a-lot-of-memory-tp5931716.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Kind regards,
Sebastian


Tom Lane  schrieb am Mi., 23. Nov. 2016 um 17:47 Uhr:

> azh...@gmail.com writes:
> > I have a query which if run alone usually completes in about 300ms.
> > When run in my application this query constantly locks up and bogs
> > down all connections of the connection pool (In the application this
> > query is run up to 10 times in parallel with different parameters).
> > What's really weird is that I can re-run one of the hung queries from
> > the command line while it's hung and it will complete as expected
> > while the hung queries continue to use 100% CPU time.
>
> Judging from the EXPLAIN timing, most of the work is in the trigger,
> which leads me to wonder if the parallel calls are likely to be fighting
> over inserting/updating the same row in the group_history partition
> tables.  Or are you certain that they should be hitting different rows?
>
> regards, tom lane
>

Re: [GENERAL] query locks up when run concurrently

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.
Concerning locks I used there queries at
https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys as
they only contain calculated data and the source data is removed after some
time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()

Kind regards,
Sebastian

Adrian Klaver  schrieb am Mi., 23. Nov. 2016 um
16:25 Uhr:

> On 11/22/2016 11:41 PM, azh...@gmail.com wrote:
> > Greetings!
> >
> > I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
> > installation which was installed through apt-get.
> >
> > I have a query which if run alone usually completes in about 300ms.
> > When run in my application this query constantly locks up and bogs
> > down all connections of the connection pool (In the application this
> > query is run up to 10 times in parallel with different parameters).
> > What's really weird is that I can re-run one of the hung queries from
> > the command line while it's hung and it will complete as expected
> > while the hung queries continue to use 100% CPU time.
>
> Have you looked in the Postgres logs to see if there are any relevant
> messages?
>
> Best guess is that running in parallel is creating a lock problem where
> one of the queries is holding a lock that is keeping the others from
> completing.
>
> What relationships do the group_history and group_history_* tables have
> to other tables?
>
> What is the definition of the trigger that fires the trigger function
> below?
>
> >
> > The query in question is this:
> >
> > insert into group_history ("group", id, sub_category, "date", aa, ab,
> > bb, ba, quantity, "hour")
> > (select
> > a."group",
> > a.id,
> > b.sub_category,
> > to_timestamp($2)::date as "date",
> > max(a.aa / a.quantity) as aa,
> > min(a.aa / a.quantity) as ab,
> > max(a.bb / a.quantity) as bb,
> > min(a.bb/ a.quantity) as ba,
> > sum(a.quantity) as quantity,
> > extract('hour' from to_timestamp($2)) as "hour"
> > from tbla a
> > join tblb b on a.id = b.id
> > where a."group" = $1 and b."group" = $1
> > group by a."group", a.id, b.sub_category
> > );
> >
> > When I'm running a perf on the system it looks like this while running
> > the query 10 times:
> >
> > Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
> > Overhead Shared Object Symbol
> > 17.95% postgres [.] heap_hot_search_buffer
> > 5.64% postgres [.] heap_page_prune_opt
> > 4.62% postgres [.] hash_search_with_hash_value
> > 3.80% postgres [.] LWLockRelease
> > 3.73% postgres [.] 0x002f420d
> > 2.50% postgres [.] _bt_checkkeys
> > 2.48% postgres [.] hash_any
> > 2.45% postgres [.] 0x002f41e7
> > 2.10% postgres [.] slot_getattr
> > 1.80% postgres [.] ResourceOwnerForgetBuffer
> > 1.58% postgres [.] LWLockAcquire
> > 1.58% postgres [.] ReadBufferExtended
> > 1.54% postgres [.] index_fetch_heap
> > 1.47% postgres [.] MemoryContextReset
> > 1.43% postgres [.] btgettuple
> > 1.38% postgres [.] 0x002d710c
> > 1.36% postgres [.] 0x002d70a5
> > 1.35% postgres [.] ExecQual
> >
> > Explain (Analyze, Verbose) Output
> >
> > QUERY PLAN
> >
> 
> >
> -
> > Insert on public.group_history (cost=10254.36..10315.16 rows=2432
> > width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
> > -> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
> > width=62) (actual time=353.880..376.490 rows=6139 loops=1)
> > Output: "*SELECT*"."group", "*SELECT*".id,
> > "*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
> > "*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
> > "*SELECT*"."hour"
> > -> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
> > (actual time=353.871..367.144 rows=6139 loops=1)
> > Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
> > max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
> > a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
> > precision
> > Group Key: a."group", a.id, b.sub_category
> > -> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
> > (actual time=193.949..294.106 rows=30343 loops=1)
> > Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
> > Hash Cond: (b.id = a.id)
> > -> Bitmap Heap Scan on public.auctions_extra b
> > (cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
> > rows=30343 loops=1)
> > Output: b.sub_category, b.id
> > Recheck Cond: (b."group" = 7)
> > Heap Blocks: exact=289
>

[GENERAL] query locks up when run concurrently

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead  Shared Object Symbol
  17.95%  postgres  [.] heap_hot_search_buffer
   5.64%  postgres  [.] heap_page_prune_opt
   4.62%  postgres  [.] hash_search_with_hash_value
   3.80%  postgres  [.] LWLockRelease
   3.73%  postgres  [.] 0x002f420d
   2.50%  postgres  [.] _bt_checkkeys
   2.48%  postgres  [.] hash_any
   2.45%  postgres  [.] 0x002f41e7
   2.10%  postgres  [.] slot_getattr
   1.80%  postgres  [.] ResourceOwnerForgetBuffer
   1.58%  postgres  [.] LWLockAcquire
   1.58%  postgres  [.] ReadBufferExtended
   1.54%  postgres  [.] index_fetch_heap
   1.47%  postgres  [.] MemoryContextReset
   1.43%  postgres  [.] btgettuple
   1.38%  postgres  [.] 0x002d710c
   1.36%  postgres  [.] 0x002d70a5
   1.35%  postgres  [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

   
-
Insert on public.group_history  (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
  ->  Subquery Scan on "*SELECT*"  (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
->  HashAggregate  (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
  Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
  Group Key: a."group", a.id, b.sub_category
  ->  Hash Join  (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
->  Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
  Output: b.sub_category, b.id
  Recheck Cond: (b."group" = 7)
  Heap Blocks: exact=289
  ->  Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
->  Hash  (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
  Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
  Buckets: 16384  Batches: 4  Memory Usage: 638kB
  ->  Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
->  Bitmap Index Scan on

Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Tom Lane

azh...@gmail.com writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different rows?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Adrian Klaver


On 11/22/2016 11:41 PM, azh...@gmail.com wrote:

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.


Have you looked in the Postgres logs to see if there are any relevant 
messages?


Best guess is that running in parallel is creating a lock problem where 
one of the queries is holding a lock that is keeping the others from 
completing.


What relationships do the group_history and group_history_* tables have 
to other tables?


What is the definition of the trigger that fires the trigger function below?



The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead Shared Object Symbol
17.95% postgres [.] heap_hot_search_buffer
5.64% postgres [.] heap_page_prune_opt
4.62% postgres [.] hash_search_with_hash_value
3.80% postgres [.] LWLockRelease
3.73% postgres [.] 0x002f420d
2.50% postgres [.] _bt_checkkeys
2.48% postgres [.] hash_any
2.45% postgres [.] 0x002f41e7
2.10% postgres [.] slot_getattr
1.80% postgres [.] ResourceOwnerForgetBuffer
1.58% postgres [.] LWLockAcquire
1.58% postgres [.] ReadBufferExtended
1.54% postgres [.] index_fetch_heap
1.47% postgres [.] MemoryContextReset
1.43% postgres [.] btgettuple
1.38% postgres [.] 0x002d710c
1.36% postgres [.] 0x002d70a5
1.35% postgres [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

-
Insert on public.group_history (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
-> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
Group Key: a."group", a.id, b.sub_category
-> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
-> Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
Output: b.sub_category, b.id
Recheck Cond: (b."group" = 7)
Heap Blocks: exact=289
-> Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
-> Hash (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
Buckets: 16384 Batches: 4 Memory Usage: 638kB
-> Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
-> Bitmap Index Scan on tbla_pk
(cost=0.00..682.12 rows=30627 width=0) (actual time=57.801..57.801
rows=30343 loops=1)
Index Cond: (a."group" = 7)
Planning time: 0.475 ms
Trigger group_history_trigger: time=1442.561 calls=6139
Execution time: 1834.119 ms


group_history_trigger:

CREATE OR REPLACE FUNCTION public.group_history_partition_function()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare
_new_date timestamptz;
_tablename text;
_startdate text;
begin
-- Takes the current inbound "when"

[GENERAL] query locks up when run concurrently

2016-11-22 Thread azhwkd

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead Shared Object Symbol
17.95% postgres [.] heap_hot_search_buffer
5.64% postgres [.] heap_page_prune_opt
4.62% postgres [.] hash_search_with_hash_value
3.80% postgres [.] LWLockRelease
3.73% postgres [.] 0x002f420d
2.50% postgres [.] _bt_checkkeys
2.48% postgres [.] hash_any
2.45% postgres [.] 0x002f41e7
2.10% postgres [.] slot_getattr
1.80% postgres [.] ResourceOwnerForgetBuffer
1.58% postgres [.] LWLockAcquire
1.58% postgres [.] ReadBufferExtended
1.54% postgres [.] index_fetch_heap
1.47% postgres [.] MemoryContextReset
1.43% postgres [.] btgettuple
1.38% postgres [.] 0x002d710c
1.36% postgres [.] 0x002d70a5
1.35% postgres [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

-
Insert on public.group_history (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
-> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
Group Key: a."group", a.id, b.sub_category
-> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
-> Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
Output: b.sub_category, b.id
Recheck Cond: (b."group" = 7)
Heap Blocks: exact=289
-> Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
-> Hash (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
Buckets: 16384 Batches: 4 Memory Usage: 638kB
-> Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
-> Bitmap Index Scan on tbla_pk
(cost=0.00..682.12 rows=30627 width=0) (actual time=57.801..57.801
rows=30343 loops=1)
Index Cond: (a."group" = 7)
Planning time: 0.475 ms
Trigger group_history_trigger: time=1442.561 calls=6139
Execution time: 1834.119 ms


group_history_trigger:

CREATE OR REPLACE FUNCTION public.group_history_partition_function()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare
_new_date timestamptz;
_tablename text;
_startdate text;
begin
-- Takes the current inbound "when" value and determines when
midnight is for the given date
_new_date := date_trunc('day', new."when");
_startdate := to_char(_new_date, '_MM_DD');
_tablename := 'group_history_'||_startdate;

-- Insert the current record into the correct partition
execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES ($1.*)
on conflict ("group", id, sub_category, "when", "hour") do
update set aa = excluded.aa,
ab = excluded.ab,
bb = excluded.bb,
ba =

Re: [GENERAL] Query generator

2016-10-16 Thread Vinicius Segalin

2016-10-16 8:54 GMT-02:00 Andreas Seltenreich :

> Vinicius Segalin  writes:
>
> > 2016-09-29 16:32 GMT-03:00 Julien Rouhaud :
> >
> > >  You should try sqlsmith (https://github.com/anse1/sqlsmith), which
> works
> > >  very well.
> >
> > I had found this one before, but all I could get was queries using
> > "standard" tables, like the ones from pg_catalog and
> > information_schema. It didn't generate queries from the tables I've
> > created. Was I doing something wrong?
>
> It should pick your relations with the same likelihood it picks the
> catalog tables.  You could verify this for with, e.g.,
>
> sqlsmith --dry-run | grep your_table_name
>
> Thanks to Julien, sqlsmith's master branch now has an option
> --exclude-catalog that inhibits use of catalog relations.  If you are
> building from github, you might want to give it another chance with this
> option.  I'll probably do a proper release in next two weeks.
>
> Feedback welcome :-)
>
> regards,
> Andreas
>

Yes, I was talking in private with him and he came up with this solution.
Thank you again, Julien.
I need to try it again, did a quick test on friday, but it looks like it
works perfectly! :)

Re: [GENERAL] Query generator

2016-10-16 Thread Andreas Seltenreich

Vinicius Segalin  writes:

> 2016-09-29 16:32 GMT-03:00 Julien Rouhaud :
>
> >  You should try sqlsmith (https://github.com/anse1/sqlsmith), which works
> >  very well.
>
> I had found this one before, but all I could get was queries using
> "standard" tables, like the ones from pg_catalog and
> information_schema. It didn't generate queries from the tables I've
> created. Was I doing something wrong?

It should pick your relations with the same likelihood it picks the
catalog tables.  You could verify this for with, e.g.,

sqlsmith --dry-run | grep your_table_name

Thanks to Julien, sqlsmith's master branch now has an option
--exclude-catalog that inhibits use of catalog relations.  If you are
building from github, you might want to give it another chance with this
option.  I'll probably do a proper release in next two weeks.

Feedback welcome :-)

regards,
Andreas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query help

2016-10-04 Thread Daniel Caldeweyher

Try this:

select distinct vendor_no, vendor_name
from ap_vendors
where vendor_no in (
   select vendor_no from ap_vendors
   group by vendor_no
   having array_agg(company_code) @> ARRAY['BUR','EBC','SNJ'])


On Wed, Oct 5, 2016 at 1:31 PM, Bret Stern  wrote:

> Good evening,
> I'm curious about a way to ask the following question of my vendors
> table.
>
> psuedo1 "select all vendors which exist in BUR and EBC and SNJ"
>
> and
> psuedo2 "select all vendors which DO NOT exist in all three show rooms
>
>
> The data is from a Sage accounting system which I pull out and place in a
> pg db. What we are trying to do is identify which vendors are defined in
> all three of our showrooms, and vice-versa, which ones are not.
>
> ap_vendors table
>   company_code character varying(10) NOT NULL,
>   ap_division_no character varying(2) NOT NULL,
>   vendor_no character varying(7) NOT NULL,
>   terms_code character varying(2),
>   vendor_name character varying(30),
>   address_line1 character varying(30),
>   address_line2 character varying(30),
>   address_line3 character varying(30),
>   city character varying(20),
>   state character varying(2),
>   zip_code character varying(10),
>   telephone_no character varying(17),
>   fax_no character varying(17),
>   CONSTRAINT ap_vendors_pk PRIMARY KEY (company_code, ap_division_no,
> vendor_no)
>
> sample records:
> "BUR";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";
> "92831";"''";"''"
> "BUR";"00";"AGORA";"30";"Agora Natural Surfaces";"''";"''";"''";"''";
> "''";"90248";"''";"''"
> "BUR";"00";"AKDOP";"30";"AKDO Pacific";"''";"''";"''";"''";"
> ''";"94545";"''";"''"
> "EBC";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";
> "92831";"''";"''"
> "EBC";"00";"AGORA";"30";"Agora Natural Surfaces";"''";"''";"''";"''";
> "''";"90248";"''";"''"
> "EBC";"00";"ARIZ01";"30";"Arizona Tile";"''";"''";"''";"''";"''"
> ;"94550";"''";"''"
> "SNJ";"00";"AKDOP";"30";"AKDO Pacific";"''";"''";"''";"''";"
> ''";"94545";"''";"''"
> "SNJ";"00";"AGORA";"30";"Agora Natural Surfaces";"''";"''";"''";"''";
> "''";"90248";"''";"''"
> "SNJ";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";
> "92831";"''";"''"
>
> What I need is a query which I can modify to return only vendors which
> exists
> in all three company_code columns ( BUR, EBC, SNJ) (they represent
> showroom location)
>
> eg; exists in BUR, EBC, SNJ
> ADEXU
> AGORA
>
> OR
>
> modify the query to return only the vendors which DO NOT exist in all
> three showrooms based on the first column company_code
>
> eg;
> AKDOP only exists in BUR and SNJ
> ARIZ01 only exists in EBC
>
> Thanks
> Bret
>
>
>

Re: [GENERAL] Query help

2016-10-04 Thread Rob Sargent


> On Oct 4, 2016, at 9:31 PM, Bret Stern  
> wrote:
> 
> Good evening,
> I'm curious about a way to ask the following question of my vendors
> table.
> 
> psuedo1 "select all vendors which exist in BUR and EBC and SNJ"
> 
> and
> psuedo2 "select all vendors which DO NOT exist in all three show rooms
> 
> 
> The data is from a Sage accounting system which I pull out and place in a
> pg db. What we are trying to do is identify which vendors are defined in
> all three of our showrooms, and vice-versa, which ones are not.
> 
> ap_vendors table
>   company_code character varying(10) NOT NULL,
>   ap_division_no character varying(2) NOT NULL,
>   vendor_no character varying(7) NOT NULL,
>   terms_code character varying(2),
>   vendor_name character varying(30),
>   address_line1 character varying(30),
>   address_line2 character varying(30),
>   address_line3 character varying(30),
>   city character varying(20),
>   state character varying(2),
>   zip_code character varying(10),
>   telephone_no character varying(17),
>   fax_no character varying(17),
>   CONSTRAINT ap_vendors_pk PRIMARY KEY (company_code, ap_division_no, 
> vendor_no)
> 
> sample records:
> "BUR";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> "BUR";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "BUR";"00";"AKDOP";"30";"AKDO 
> Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
> "EBC";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> "EBC";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "EBC";"00";"ARIZ01";"30";"Arizona 
> Tile";"''";"''";"''";"''";"''";"94550";"''";"''"
> "SNJ";"00";"AKDOP";"30";"AKDO 
> Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
> "SNJ";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "SNJ";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> 
> What I need is a query which I can modify to return only vendors which exists
> in all three company_code columns ( BUR, EBC, SNJ) (they represent showroom 
> location)
> 
> eg; exists in BUR, EBC, SNJ
> ADEXU
> AGORA
> 
> OR
> 
> modify the query to return only the vendors which DO NOT exist in all
> three showrooms based on the first column company_code
> 
> eg; 
> AKDOP only exists in BUR and SNJ
> ARIZ01only exists in EBC
> 
> Thanks
> Bret
> 
> 
Not sure I like the schema but 
select vendor_no, count(*) from ap_vendors having count(*) = 3;

and maybe count(*) < 3 is your second answer.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query help

2016-10-04 Thread Bret Stern

Good evening,
I'm curious about a way to ask the following question of my vendors
table.

psuedo1 "select all vendors which exist in BUR and EBC and SNJ"

and
psuedo2 "select all vendors which DO NOT exist in all three show rooms


The data is from a Sage accounting system which I pull out and place in
a
pg db. What we are trying to do is identify which vendors are defined in
all three of our showrooms, and vice-versa, which ones are not.

ap_vendors table
  company_code character varying(10) NOT NULL,
  ap_division_no character varying(2) NOT NULL,
  vendor_no character varying(7) NOT NULL,
  terms_code character varying(2),
  vendor_name character varying(30),
  address_line1 character varying(30),
  address_line2 character varying(30),
  address_line3 character varying(30),
  city character varying(20),
  state character varying(2),
  zip_code character varying(10),
  telephone_no character varying(17),
  fax_no character varying(17),
  CONSTRAINT ap_vendors_pk PRIMARY KEY (company_code, ap_division_no,
vendor_no)

sample records:
"BUR";"00";"ADEXU";"30";"ADEX
USA";"''";"''";"''";"''";"''";"92831";"''";"''"
"BUR";"00";"AGORA";"30";"Agora Natural
Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
"BUR";"00";"AKDOP";"30";"AKDO
Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
"EBC";"00";"ADEXU";"30";"ADEX
USA";"''";"''";"''";"''";"''";"92831";"''";"''"
"EBC";"00";"AGORA";"30";"Agora Natural
Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
"EBC";"00";"ARIZ01";"30";"Arizona
Tile";"''";"''";"''";"''";"''";"94550";"''";"''"
"SNJ";"00";"AKDOP";"30";"AKDO
Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
"SNJ";"00";"AGORA";"30";"Agora Natural
Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
"SNJ";"00";"ADEXU";"30";"ADEX
USA";"''";"''";"''";"''";"''";"92831";"''";"''"

What I need is a query which I can modify to return only vendors which
exists
in all three company_code columns ( BUR, EBC, SNJ) (they represent
showroom location)

eg; exists in BUR, EBC, SNJ
ADEXU
AGORA

OR

modify the query to return only the vendors which DO NOT exist in all
three showrooms based on the first column company_code

eg; 
AKDOP   only exists in BUR and SNJ
ARIZ01  only exists in EBC

Thanks
Bret

Re: [GENERAL] Query killed with Out of memory

2016-10-04 Thread Chris Mair


With a heavy query, when line number results raise over 600k query hangs with 
out of memory.

Here is the explain analyze:
[...]
Work_mem is.512mb, shared buffers 3084mb and system Ram 10Gb. Postgres version 
is 8.4.8 and for some months i cannot upgrade.

Is there a way to solve the problem?


Hi,

a few ideas:

 - what goes out of memory? The client? if you query from a programming 
language you should set the fetch/batch size
   to some value so that it won't fetch the whole 600k rows into memory... for 
the psql client you can do
   \set FETCH_COUNT 1000

 - work_mem 512 MB is high-ish unless you have a small value for 
max_connection...

 - 8.4.8 was released in 2011, the latest 8.4 release is 8.4.22, you'r missing 
lots of patches (and 8.4 was EOLed more
   than two years ago)

Bye,
Chris.










--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query killed with Out of memory

2016-10-04 Thread Job

Hello,

With a heavy query, when line number results raise over 600k query hangs with 
out of memory.

Here is the explain analyze:

CTE Scan on lista  (cost=25066.66..47721.23 rows=3678 width=260)

   CTE lista

 ->  Unique  (cost=24956.32..25066.66 rows=3678 width=512)

   ->  Sort  (cost=24956.32..24965.52 rows=3678 width=512)

 Sort Key: "*SELECT* 1".id, "*SELECT* 1".data_log, "*SELECT* 
1".type_log, "*SELECT* 1".ip, "*SELECT* 1".log_id, "*SELECT* 1".url_dominio, 
"*SELECT* 1".porta, "*SELECT* 1".action_bind, "*SELECT* 1".action, 
('DNS_DENIED/403'

::text), "*SELECT* 1".array_dominio

     ->  Append  (cost=905.76..24738.50 rows=3678 width=512)

   ->  Subquery Scan "*SELECT* 1"  (cost=905.76..12423.64 
rows=3652 width=512)

 ->  Hash Anti Join  (cost=905.76..12387.12 
rows=3652 width=512)

   Hash Cond: 
(((public.webtraffic_bind.ip)::text = (wb.ip)::text) AND 
((public.webtraffic_bind.log_id)::text = (wb.log_id)::text) AND 
(public.webtraffic_bind.url_dominio = wb.url_dominio) AND ((public.webt

raffic_bind.porta)::text = (wb.porta)::text))

   Join Filter: ((wb.data_log >= 
public.webtraffic_bind.data_log) AND (wb.data_log < 
(public.webtraffic_bind.data_log + '00:02:00'::interval)))

   ->  Bitmap Heap Scan on webtraffic_bind  
(cost=269.23..11638.68 rows=3678 width=512)

 Recheck Cond: ((type_log)::text = 
'queries'::text)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: ((type_log)::text = 
'queries'::text)

   ->  Hash  (cost=636.17..636.17 rows=18 
width=274)

 ->  Bitmap Heap Scan on 
webtraffic_bind wb  (cost=564.94..636.17 rows=18 width=274)

   Recheck Cond: (((type_log)::text 
= 'security'::text) AND ((action_bind)::text = 'approved'::text))

   ->  BitmapAnd  
(cost=564.94..564.94 rows=18 width=0)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: 
((type_log)::text = 'security'::text)

 ->  Bitmap Index Scan on 
action_bind_wbidx  (cost=0.00..296.37 rows=3678 width=0)

   Index Cond: 
((action_bind)::text = 'approved'::text)

   ->  Subquery Scan "*SELECT* 2"  (cost=905.76..12314.86 
rows=26 width=512)

 ->  Hash Semi Join  (cost=905.76..12314.60 rows=26 
width=512)

   Hash Cond: 
(((public.webtraffic_bind.ip)::text = (wb.ip)::text) AND 
((public.webtraffic_bind.log_id)::text = (wb.log_id)::text) AND 
(public.webtraffic_bind.url_dominio = wb.url_dominio) AND ((public.webt

raffic_bind.porta)::text = (wb.porta)::text))

   Join Filter: ((wb.data_log >= 
public.webtraffic_bind.data_log) AND (wb.data_log < 
(public.webtraffic_bind.data_log + '00:02:00'::interval)))

   ->  Bitmap Heap Scan on webtraffic_bind  
(cost=269.23..11638.68 rows=3678 width=512)

 Recheck Cond: ((type_log)::text = 
'queries'::text)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: ((type_log)::text = 
'queries'::text)

   ->  Hash  (cost=636.17..636.17 rows=18 
width=274)

 ->  Bitmap Heap Scan on 
webtraffic_bind wb  (cost=564.94..636.17 rows=18 width=274)

   Recheck Cond: (((type_log)::text 
= 'security'::text) AND ((action_bind)::text = 'approved'::text))

   ->  BitmapAnd  
(cost=564.94..564.94 rows=18 width=0)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: 
((type_log)::text = 'security'::text)

 ->  Bitmap Index Scan on 
action_bind_wbidx  (cost=0.00..296.37 rows=3678 width=0)

   Index Cond: 
((action_bind)::text = 'approved'::text)

   SubPlan 2

 ->  Index Scan using stpestensioni_domini_idx on stpestensioni_domini  
(cost=0.01..12.18 rows=2 width=0)

   Index Cond: ((estensione)::text =

Re: [GENERAL] Query generator

2016-10-02 Thread Julien Rouhaud

On 29/09/2016 23:23, Vinicius Segalin wrote:
> 2016-09-29 16:32 GMT-03:00 Julien Rouhaud  >:
> You should try sqlsmith (https://github.com/anse1/sqlsmith
> ), which works
> very well.
> 
> 
> I had found this one before, but all I could get was queries using
> "standard" tables, like the ones from pg_catalog and information_schema.
> It didn't generate queries from the tables I've created. Was I doing
> something wrong?
> 

I don't think so, unless you specified an empty database with the
--target option. I never had this issue myself, so I can only advice you
to open an issue on the project repository if you still have this problem.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query generator

2016-09-29 Thread Vinicius Segalin

2016-09-29 16:32 GMT-03:00 Julien Rouhaud :

> On 29/09/2016 21:27, Vinicius Segalin wrote:
> > Hi everyone,
> >
>
> Hello,
>
> > Does anyone know a random query generator for Postgres? Something that
> > gets my schema and, based on that, generates hundreds of different
> > queries with join, group by, etc.
> >
>
> You should try sqlsmith (https://github.com/anse1/sqlsmith), which works
> very well.

I had found this one before, but all I could get was queries using
"standard" tables, like the ones from pg_catalog and information_schema. It
didn't generate queries from the tables I've created. Was I doing something
wrong?

Re: [GENERAL] Query generator

2016-09-29 Thread Julien Rouhaud

On 29/09/2016 21:27, Vinicius Segalin wrote:
> Hi everyone,
> 

Hello,

> Does anyone know a random query generator for Postgres? Something that
> gets my schema and, based on that, generates hundreds of different
> queries with join, group by, etc.
> 

You should try sqlsmith (https://github.com/anse1/sqlsmith), which works
very well.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Query generator

2016-09-29 Thread Vinicius Segalin

Hi everyone,

Does anyone know a random query generator for Postgres? Something that gets
my schema and, based on that, generates hundreds of different queries with
join, group by, etc.

Thank you.

[GENERAL] Query planner using hash join when merge join seems orders of magnitude faster

2016-07-30 Thread Branden Visser

Hi all, I seem to be in a bit of a jam with the query planner choosing
a significantly less optimal route for a query.

First, a common check list I've seen around when researching these
types of issues on list:

1. pgsql version is PostgreSQL 9.4.7 running on Amazon RDS
2. Vacuum analyze was run
3. default_statistics_target was increased to 10,000 for the whole
test database and re-analyzed

Code and info referenced is in a github gist [1]. The files are:

[Q-Report.sql] The actual SQL query I'm trying to optimize, runs a
report on a large set of data
[QP-Report-Normal.txt] The query plan of Q-Report.sql without any
modification of the query plan
[QP-Report-DisableHashJoinltxt] The query plan of Q-Report.sql after
`set enable_hashjoin = false`
[Q-CC-Underestimated.sql] An isolated query that shows 1 of
potentially 2 significant row count underestimations
[QP-CC-Understimated.txt] The query plan of Q-CC-Underestimated.sql

Essentially, my report query has a hash join that is taking about 70%
of the query time (45s), whereas a merge join seems to take less than
a second.

It looks like there are probably 2 significant row underestimations
which may be contributing to this (first of which is the Q-CC query),
but I haven't been able to validate that this is specifically the
reason the query planner picks a hash join.

My usage on this schema are mostly queries that get very small slices
of data, however this particular query is a background workload that
generates denormalized reports. That said, any kind of global
configuration being added that would jeopardize the performance of the
small-slice queries is less desirable.

My question is, what would be the right way to go about diagnosing and
optimizing this kind of issue? I'm trying to think of ways to
potentially alter the structure of the query to reduce impact of the
planning estimates, but I'm not coming up with any good ideas there.
As mentioned, stats target was increased to 10,000 for testing and
while it changed the row estimates, it was only by about 12 rows --
did I maybe do something wrong there?

With this particular kind of query, I feel that I'd be willing to
trade out for a vast "overestimation" scenario for the analyzer, but I
can't find anything that allows me to tweak this for a particular
query. I've come across pg_hint_plan but doesn't seem like it would
support my version of PSQL and I'd prefer to be able to stay on RDS if
possible.

I came across a pending patch for multivariate statistics, but that
seems a little bit far out at this point.

Any advice for workarounds or solutions would be greatly appreciated!

Thanks in advance,
Branden

[1] https://gist.github.com/mrvisser/a22dddbdaa64ae1aa3dbc9637617b70d


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query from two tables return error

On Fri, Apr 1, 2016 at 4:04 PM, Adrian Klaver 
wrote:

> On 04/01/2016 06:52 AM, arnaud gaboury wrote:
>
>>
>>
>> On Fri, Apr 1, 2016 at 3:33 PM, Sándor Daku >
>
>
>
>> One of many difficulties with computers that they do what you say
>> them to do, not what you think or you think you are saying. :)
>> Lets see:
>> SELECT d.home_dir
>> FROM email.mail_dir d, email.mailusers u  <- make a join between
>> mail_dir and mailusers = join every(!) record from the first table
>> with every(!) record from the second table
>> WHERE u.username='arnaud.gaboury'; <- but I need only those from the
>> joined records where the username is arnaud.gaboury
>>
>> And there, you have it.
>>
>> You can simply redefine the view.
>>
>>   SELECT *,((mailusers.domain_name || '/'::text) ||
>> mailusers.username) || '/'::text AS home_dir
>> FROM email.mailusers;
>>
>>
>> Thank you so much. This way I get all needed info in one view.
>>
>
I even added a new column in my view to catch the email address:


SELECT *,((mailusers.domain_name || '/'::text) || mailusers.username) ||
'/'::text AS home_dir,
username||'@'||domain_name AS email_address
FROM email.mailusers;

Now, I am wondering if there is a best practice between :
- one large and complete table
- a lighter table with some views



>
> This might help understand what is going on:
>
> http://www.postgresql.org/docs/9.5/interactive/tutorial-join.html


Thank you for the link.

>
>
>
>>
>> Notice the * after the SELECT statement.
>> So you have all the data plus the homedir.
>>
>> You can leave out the whole view thing and incorporate the home_dir
>> expression right into your select.
>> Or you can write a function which makes this to you with the usename
>> as argument.
>>
>> Regards,
>> Sándor
>>
>>
>>
>>
>>
>> --
>>
>> google.com/+arnaudgabourygabx
>> <
>> https://plus.google.com/_/notifications/emlink?emr=05814804238976922326=CKiv-v6PvboCFcfoQgod6msAAA=%2F116159236040461325607%2Fop%2Fu=1383086841306=50
>> >
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>



-- 

google.com/+arnaudgabourygabx

Re: [GENERAL] Query from two tables return error

2016-04-01 Thread Adrian Klaver


On 04/01/2016 06:52 AM, arnaud gaboury wrote:



On Fri, Apr 1, 2016 at 3:33 PM, Sándor Daku 




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query from two tables return error

On Fri, Apr 1, 2016 at 3:33 PM, Sándor Daku  wrote:

>
> On 1 April 2016 at 11:33, arnaud gaboury  wrote:
>
>> I have one table email.mail.mailusers :
>>
>>Table "email.mailusers"
>>Column|   Type   |
>> Modifiers  | Storage
>>   | Stats target | Description
>>
>> -+--+-+
>> --+--+-
>>  username| text | not
>> null| extende
>> d |  |
>>  password| text | not
>> null| extende
>> d |  |
>>  domain_name | text | not null default '
>> thetradinghall.com'::text | extende
>> d |  |
>>  created | timestamp with time zone | not null default
>> now()  | plain
>>   |  |
>> Indexes:
>> "mailusers_pkey" PRIMARY KEY, btree (username, domain_name)
>> Foreign-key constraints:
>> "mailusers_domain_fk" FOREIGN KEY (domain_name) REFERENCES
>> email.domainlist(domain_name)
>>
>> username|password  |domain_name |  created
>>
>> +--+---
>>  arnaud.gaboury | YY | thetradinghall.com | 2016-02-04
>> 09:48:58.834774+01
>>
>>  admin  |XXXYY | thetradinghall.com | 2016-03-29
>> 09:58:14.599743+02
>>
>>  postmaster | XXXYY | thetradinghall.com | 2016-03-31
>> 16:36:18.96176+02
>>
>> --
>>
>> I have one view email.mail_dir :
>>
>> View "email.mail_dir"
>>   Column  | Type | Modifiers | Storage  | Description
>> --+--+---+--+-
>>  home_dir | text |   | extended |
>> View definition:
>>  SELECT ((mailusers.domain_name || '/'::text) || mailusers.username) ||
>> '/'::text AS home_dir
>>FROM email.mailusers;
>>
>>   home_dir
>> 
>>  thetradinghall.com/arnaud.gaboury/
>>  thetradinghall.com/admin/
>>  thetradinghall.com/postmaster/
>>
>> -
>>
>> What I am trying to do: I want the  be returned for 
>> u.
>>
>
> One of many difficulties with computers that they do what you say them to
> do, not what you think or you think you are saying. :)
> Lets see:
>
> SELECT d.home_dir
> FROM email.mail_dir d, email.mailusers u  <- make a join between mail_dir
> and mailusers = join every(!) record from the first table with every(!)
> record from the second table
> WHERE u.username='arnaud.gaboury'; <- but I need only those from the
> joined records where the username is arnaud.gaboury
>
> And there, you have it.
>
>
> You can simply redefine the view.
>
>  SELECT *,((mailusers.domain_name || '/'::text) || mailusers.username) ||
> '/'::text AS home_dir
>FROM email.mailusers;
>

Thank you so much. This way I get all needed info in one view.

>
> Notice the * after the SELECT statement.
> So you have all the data plus the homedir.
>
> You can leave out the whole view thing and incorporate the home_dir
> expression right into your select.
> Or you can write a function which makes this to you with the usename as
> argument.
>
> Regards,
> Sándor
>
>
>


-- 

google.com/+arnaudgabourygabx

Re: [GENERAL] Query from two tables return error

2016-04-01 Thread Sándor Daku

On 1 April 2016 at 11:33, arnaud gaboury  wrote:

> I have one table email.mail.mailusers :
>
>Table "email.mailusers"
>Column|   Type   |
> Modifiers  | Storage
>   | Stats target | Description
>
> -+--+-+
> --+--+-
>  username| text | not
> null| extende
> d |  |
>  password| text | not
> null| extende
> d |  |
>  domain_name | text | not null default '
> thetradinghall.com'::text | extende
> d |  |
>  created | timestamp with time zone | not null default
> now()  | plain
>   |  |
> Indexes:
> "mailusers_pkey" PRIMARY KEY, btree (username, domain_name)
> Foreign-key constraints:
> "mailusers_domain_fk" FOREIGN KEY (domain_name) REFERENCES
> email.domainlist(domain_name)
>
> username|password  |domain_name |  created
>
> +--+---
>  arnaud.gaboury | YY | thetradinghall.com | 2016-02-04
> 09:48:58.834774+01
>
>  admin  |XXXYY | thetradinghall.com | 2016-03-29
> 09:58:14.599743+02
>
>  postmaster | XXXYY | thetradinghall.com | 2016-03-31
> 16:36:18.96176+02
>
> --
>
> I have one view email.mail_dir :
>
> View "email.mail_dir"
>   Column  | Type | Modifiers | Storage  | Description
> --+--+---+--+-
>  home_dir | text |   | extended |
> View definition:
>  SELECT ((mailusers.domain_name || '/'::text) || mailusers.username) ||
> '/'::text AS home_dir
>FROM email.mailusers;
>
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
>
> -
>
> What I am trying to do: I want the  be returned for  u.
>

One of many difficulties with computers that they do what you say them to
do, not what you think or you think you are saying. :)
Lets see:

SELECT d.home_dir
FROM email.mail_dir d, email.mailusers u  <- make a join between mail_dir
and mailusers = join every(!) record from the first table with every(!)
record from the second table
WHERE u.username='arnaud.gaboury'; <- but I need only those from the joined
records where the username is arnaud.gaboury

And there, you have it.


> But it returns:
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
> (3 rows)
>

You would have seen the problem, if you had used * instead d.homedir.
And there is no solution to your problem in the given circumstances.
You need the usename field in the view as well and then you can:

SELECT d.home_dir
FROM email.mail_dir d, email.mailusers u
WHERE u.username='arnaud.gaboury'
and u.usename=d.usename;

But it's completely unnecessary.

You can simply redefine the view.

 SELECT *,((mailusers.domain_name || '/'::text) || mailusers.username) ||
'/'::text AS home_dir
   FROM email.mailusers;

Notice the * after the SELECT statement.
So you have all the data plus the homedir.

You can leave out the whole view thing and incorporate the home_dir
expression right into your select.
Or you can write a function which makes this to you with the usename as
argument.

Regards,
Sándor

Re: [GENERAL] Query from two tables return error

On Fri, Apr 1, 2016 at 11:33 AM, arnaud gaboury 
wrote:

> I have one table email.mail.mailusers :
>
>Table "email.mailusers"
>Column|   Type   |
> Modifiers  | Storage
>   | Stats target | Description
>
> -+--+-+
> --+--+-
>  username| text | not
> null| extende
> d |  |
>  password| text | not
> null| extende
> d |  |
>  domain_name | text | not null default '
> thetradinghall.com'::text | extende
> d |  |
>  created | timestamp with time zone | not null default
> now()  | plain
>   |  |
> Indexes:
> "mailusers_pkey" PRIMARY KEY, btree (username, domain_name)
> Foreign-key constraints:
> "mailusers_domain_fk" FOREIGN KEY (domain_name) REFERENCES
> email.domainlist(domain_name)
>
> username|password  |domain_name |  created
>
> +--+---
>  arnaud.gaboury | YY | thetradinghall.com | 2016-02-04
> 09:48:58.834774+01
>
>  admin  |XXXYY | thetradinghall.com | 2016-03-29
> 09:58:14.599743+02
>
>  postmaster | XXXYY | thetradinghall.com | 2016-03-31
> 16:36:18.96176+02
>
> --
>
> I have one view email.mail_dir :
>
> View "email.mail_dir"
>   Column  | Type | Modifiers | Storage  | Description
> --+--+---+--+-
>  home_dir | text |   | extended |
> View definition:
>  SELECT ((mailusers.domain_name || '/'::text) || mailusers.username) ||
> '/'::text AS home_dir
>FROM email.mailusers;
>
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
>
> -
>
> What I am trying to do: I want the  be returned for  u.
>
> The query :
> SELECT d.home_dir
> FROM email.mail_dir d, email.mailusers u
> WHERE u.username='arnaud.gaboury';
>
> But it returns:
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
> (3 rows)
>
> It is obviously not what I expect. I am expecting one answer in this
> case:  thetradinghall.com/arnaud.gaboury/
>
> Thank you for any help. I have been trying with no sucess. Maybe one
> solution could be to create a new view with  | username   as
> columns ?
>
>
>
> EDIT : I found a solution with creating a new view this way:

SELECT domain_name||'/'||username||'/' AS home_dir,username
FROM email.mailusers;

  home_dir  |username
+
 thetradinghall.com/arnaud.gaboury/ | arnaud.gaboury
 thetradinghall.com/admin/  | admin
 thetradinghall.com/postmaster/ | postmaster


Now I can pick up the home_dir for username u.

[GENERAL] Query from two tables return error