Re: [GENERAL] Query Using Massive Temp Space

2017-11-20 Thread legrand legrand
Hi, 
there are many Hash and Merge joins that may generate a lot of temp space,
it could also be a problem of forgotten column in the join.

Could you also provide indexes definitions (pk, uk and others)
with the EXPLAIN (ANALYZE BUFFERS)

(you can limit the scope of the query to help it to finish by limiting the
number of lines from candidates  by
...  
*FROM match_candidate_address WHERE account_id NOT IN (:exclude_ids)
limit 100
*
)

Regards
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query Using Massive Temp Space

2017-11-20 Thread Semler Miloslav
Hello,
I have one question… Why you using so huge amount of grouping columns? Is there 
some reason for it? It is not definitelly fast method.  I would prefer firstly 
do named query grouped by ids (account_id, candidate_id, parent_id) and then 
left join table candidates (to get rest of requested columns) out of this named 
query on this ids if it is possible.
You also avoid of hash indexing of these nine columns so you will save certain 
ammount of memory.

Miloslav

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Cory Tucker
Sent: Monday, November 20, 2017 7:17 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Query Using Massive Temp Space

Hello,

I have a query that is using a tremendous amount of temp disk space given the 
overall size of the dataset.  I'd love for someone to try to explain what PG is 
doing and why its using so much space for the query.

First off, the system is PG 9.6 on Ubuntu with 4 cores and 28 GB of RAM.  The 
query in question is a fairly large join of several tables (6) including some 
aggregations.  The overall dataset size of the 6 tables in question is about 
20GB and the largest table is about 15M rows.  The query is essentially a dump 
of most of the data from these tables joined together to be used in another 
system.

When the query runs it begins to use an aggressive amount of temp space on the 
volume over the course of many hours, until it reaches about 95% capacity and 
then tapers off.  Never completes though.  The temp space it uses is around 
1.5TB out of a 2TB volume.  Again, the total size of the relations in question 
is only 20GB.

Can anyone explain how the query could possibly use so much temp space?

Query and query plan are attached.  Please let me know if any more info is 
needed.

thanks
--Cory


Re: [GENERAL] Query Using Massive Temp Space

2017-11-19 Thread Laurenz Albe
Cory Tucker wrote:
> I have a query that is using a tremendous amount of temp disk space given the 
> overall size of the dataset.
> I'd love for someone to try to explain what PG is doing and why its using so 
> much space for the query.

It could be a sort or a hash operation.

Do determine what is going on, EXPLAIN (ANALYZE, BUFFERS) output
would be useful.

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query Using Massive Temp Space

2017-11-19 Thread Cory Tucker
Hello,

I have a query that is using a tremendous amount of temp disk space given
the overall size of the dataset.  I'd love for someone to try to explain
what PG is doing and why its using so much space for the query.

First off, the system is PG 9.6 on Ubuntu with 4 cores and 28 GB of RAM.
The query in question is a fairly large join of several tables (6)
including some aggregations.  The overall dataset size of the 6 tables in
question is about 20GB and the largest table is about 15M rows.  The query
is essentially a dump of *most* of the data from these tables joined
together to be used in another system.

When the query runs it begins to use an aggressive amount of temp space on
the volume over the course of many hours, until it reaches about 95%
capacity and then tapers off.  Never completes though.  The temp space it
uses is around 1.5TB out of a 2TB volume.  Again, the *total* size of the
relations in question is only 20GB.

Can anyone explain how the query could possibly use so much temp space?

Query and query plan are attached.  Please let me know if any more info is
needed.

thanks
--Cory


query
Description: Binary data


query_plan
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query on pg_settings view

2017-11-15 Thread Stephen Frost
Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> Ronen Nofar wrote:
> > I have a weird case when running a query on  the pg_settings view.
> > I have two users, first one is the default user - postgres which is a 
> > superuser
> > and another one is a role which i had created, i called it test_role and 
> > it's not a superuser.
> > When I run a select on pg_settings with these two users I recieve different 
> > results.
> 
> That is intentional, because some settings should only be
> visible for superusers, for example everything that has to
> do with the operating system (location of configuration file
> or socket directories).

This isn't quite correct any longer- with PG10, we have a default role
called 'pg_read_all_settings' which can be GRANT'd to other roles to
allow viewing of all settings, even those previously restricted to
superuser-only.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] Query on pg_settings view

2017-11-15 Thread Laurenz Albe
Ronen Nofar wrote:
> I have a weird case when running a query on  the pg_settings view.
> I have two users, first one is the default user - postgres which is a 
> superuser
> and another one is a role which i had created, i called it test_role and it's 
> not a superuser.
> When I run a select on pg_settings with these two users I recieve different 
> results.

That is intentional, because some settings should only be
visible for superusers, for example everything that has to
do with the operating system (location of configuration file
or socket directories).

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query on pg_settings view

2017-11-15 Thread Ronen Nofar
Hi,

I have a weird case when running a query on  the pg_settings view.
I have two users, first one is the default user - postgres which is a
superuser and another one is a role which i had created, i called it
test_role and it's not a superuser.
When I run a select on pg_settings with these two users I recieve different
results.
Running the query 'select * from pg_settings' with 'postgres' user returns
269 rows when running the same query with 'test_role' returns 254 rows.
For example, the following query 'select lower(setting) from pg_settings
where lower(name) = 'config_file'' returns null when I connect with
'test_role' but with 'postgres' user I recieve one row
(/var/lib/pgsql/9.6/data/postgresql.conf).

I have granted all privileges on pg_settings to 'test_role' but it didn't
changed the results.

I don't have any idea what it's wrong here.

The PostgreSQL version is PostgreSQL 9.6.1 on x86_64-pc-linux-gnu

Can you please help me with this issue?

Thanks in advance,
Ronen


[GENERAL] Query Improvement??

2017-11-05 Thread Bret Stern

Goal is to return all vendors which exist in all three companies

I think I got lucky figuring this out. Is there an obviously better way?

combined_item_master looks like this:
company_code character varying(10) NOT NULL,
primary_vendor_no character varying(7)
..more fields

data looks like this:

company_code | primary_vendor
AAA  003
BBB  004
CCC  001
CCC  004
AAA  123
BBB  123
CCC  123
BBB  003 

Query returns all primary_vendor_no (as vendor_locations) which exist in
all three companies
results:
vendor_locations
123


Here's the query

select primary_vendor_no, count(primary_vendor_no) as vendor_locations
from
(
SELECT distinct primary_vendor_no, company_code
  FROM combined_item_master
group by primary_vendor_no, company_code
) as a

group by primary_vendor_no
having count(primary_vendor_no)=3
order by vendor_locations DESC, primary_vendor_no


Thanks
Bret





Re: [GENERAL] Query plan for Merge Semi Join

2017-11-01 Thread Laurenz Albe
Peter J. Holzer wrote:
> [PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian
> 4.9.2-10) 4.9.2, 64-bit]
> 
> While investigating a performance issue, I found this query plan:
> 
> wds=> explain analyze
>   select facttablename, columnname, term, concept_id, t.hidden, language, 
> register, c.id, canonicalname, description, parent, type, r.sortorder
>   from term t, concept c, relation r
>   where facttablename='facttable_kon_eh' and columnname='arbeitsvolumen'
> and exists (select 1 from facttable_kon_eh f where 
> f.arbeitsvolumen=t.term  and thema in (values (E'E')))
> and c.id=concept_id and r.child=concept_id;
> ╔╗
> ║ 
>   QUERY PLAN  
>  ║
> ╟╢
> ║ Nested Loop  (cost=22984.13..241822.98 rows=1 width=169) (actual 
> time=667.608..3275.037 rows=6 loops=1)
> ║
> ║   ->  Nested Loop  (cost=22983.70..241822.50 rows=1 width=154) (actual 
> time=667.587..3274.972 rows=6 loops=1)
>   ║
> ║ ->  Merge Semi Join  (cost=22983.27..241819.04 rows=1 width=87) 
> (actual time=667.559..3274.877 rows=6 loops=1)
>  ║
> ║   Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text)   
>   
>  ║
> ║   ->  Index Scan using term_term_idx on term t  
> (cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430 rows=7 
> loops=1) ║
> ║ Filter: (((facttablename)::text = 
> 'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text))  
>║
> ║ Rows Removed by Filter: 3874190 
>   
>  ║
> ║   ->  Materialize  (cost=0.43..399167.10 rows=3548798 width=7) 
> (actual time=0.086..295.708 rows=184791 loops=1)  
>   ║
> ║ ->  Nested Loop Semi Join  (cost=0.43..390295.10 
> rows=3548798 width=7) (actual time=0.076..278.006 rows=184791 loops=1)
> ║
> ║   Join Filter: ((f.thema)::text = ('E'::text))  
>   
>  ║
> ║   ->  Index Scan using 
> facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f  
> (cost=0.43..337063.11 rows=3548798 width=9) (actual time=0.052..199.733 
> rows=184791 loops=1) ║
> ║   ->  Materialize  (cost=0.00..0.03 rows=1 
> width=32) (actual time=0.000..0.000 rows=1 loops=184791)  
>   ║
> ║ ->  Result  (cost=0.00..0.01 rows=1 
> width=0) (actual time=0.001..0.001 rows=1 loops=1)
>  ║
> ║ ->  Index Scan using concept_pkey on concept c  (cost=0.43..3.45 
> rows=1 width=67) (actual time=0.011..0.012 rows=1 loops=6)
> ║
> ║   Index Cond: (id = t.concept_id)   
>   
>  ║
> ║   ->  Index Scan using relation_child_idx on relation r  (cost=0.43..0.47 
> rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=6)
>║
> ║ Index Cond: (child = c.id)  
>   
>  ║
> ║ Planning time: 15.624 ms
>  

[GENERAL] Query plan for Merge Semi Join

2017-11-01 Thread Peter J. Holzer
[PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian
4.9.2-10) 4.9.2, 64-bit]

While investigating a performance issue, I found this query plan:

wds=> explain analyze
  select facttablename, columnname, term, concept_id, t.hidden, language, 
register, c.id, canonicalname, description, parent, type, r.sortorder
  from term t, concept c, relation r
  where facttablename='facttable_kon_eh' and columnname='arbeitsvolumen'
and exists (select 1 from facttable_kon_eh f where 
f.arbeitsvolumen=t.term  and thema in (values (E'E')))
and c.id=concept_id and r.child=concept_id;
╔╗
║   
QUERY PLAN  
 ║
╟╢
║ Nested Loop  (cost=22984.13..241822.98 rows=1 width=169) (actual 
time=667.608..3275.037 rows=6 loops=1)  
  ║
║   ->  Nested Loop  (cost=22983.70..241822.50 rows=1 width=154) (actual 
time=667.587..3274.972 rows=6 loops=1)  
║
║ ->  Merge Semi Join  (cost=22983.27..241819.04 rows=1 width=87) 
(actual time=667.559..3274.877 rows=6 loops=1)  
   ║
║   Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text) 

 ║
║   ->  Index Scan using term_term_idx on term t  
(cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430 rows=7 
loops=1) ║
║ Filter: (((facttablename)::text = 
'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text))
 ║
║ Rows Removed by Filter: 3874190   

 ║
║   ->  Materialize  (cost=0.43..399167.10 rows=3548798 width=7) 
(actual time=0.086..295.708 rows=184791 loops=1)
║
║ ->  Nested Loop Semi Join  (cost=0.43..390295.10 
rows=3548798 width=7) (actual time=0.076..278.006 rows=184791 loops=1)  
  ║
║   Join Filter: ((f.thema)::text = ('E'::text))

 ║
║   ->  Index Scan using 
facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f  
(cost=0.43..337063.11 rows=3548798 width=9) (actual time=0.052..199.733 
rows=184791 loops=1) ║
║   ->  Materialize  (cost=0.00..0.03 rows=1 width=32) 
(actual time=0.000..0.000 rows=1 loops=184791)  
  ║
║ ->  Result  (cost=0.00..0.01 rows=1 width=0) 
(actual time=0.001..0.001 rows=1 loops=1)   
  ║
║ ->  Index Scan using concept_pkey on concept c  (cost=0.43..3.45 
rows=1 width=67) (actual time=0.011..0.012 rows=1 loops=6)  
  ║
║   Index Cond: (id = t.concept_id) 

 ║
║   ->  Index Scan using relation_child_idx on relation r  (cost=0.43..0.47 
rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=6)  
 ║
║ Index Cond: (child = c.id)

 ║
║ Planning time: 15.624 ms  

 ║
║ Execution time: 3275.341 ms   
 

Re: [GENERAL] query not scaling

2017-10-31 Thread Rob Sargent

On 10/31/2017 03:12 AM, Laurenz Albe wrote:

Rob Sargent wrote:

I think your biggest problem is the join condition
 on m.basepos between s.startbase and s.endbase

That forces a nested loop join, which cannot be performed efficiently.

Agree! 800,000 * 4,000 = 3,200,000,000.  It's just that I thought I had
corralled that problem which indexing but apparently not.  I was hoping
some kind soul might point out a missing index or similar correction. I
have completely reworked the process, but not sure yet if it's correct.
(The slow answer is correct, once it comes in.)

You can create indexes that are useful for this query:

ON sui.segment(chrom, markerset_id)
ON sui.probandset(people_id)

But that probably won't make a big difference, because the sequential
scans take only a small fraction of your query time.

A little less than half of the query time is spent in the nested loop
join, and a little more than half of the time is spent doing the
GROUP BY.

Perhaps the biggest improvement you can easily make would be to
get rid of "numeric" for the computation.  I suspect that this is
where a lot of time is spent, since the hash aggregate is over
less than 15000 rows.

Unless you really need the precision of "numeric", try

CREATE OR REPLACE FUNCTION pv(l bigint, e bigint, g bigint, o int)
RETURNS double precision LANGUAGE sql AS
$$SELECT (g+e+o)::double precision / (l+e+g+o)::double precision$$;

Yours,
Laurenz Albe
In practice markersets are always aligned with one chromosome so I would 
not expect this to have an effect.  There's no constraint on this 
however, and there can be more than one markerset per chromosome.  I 
have played with indexing on segment.markerset_id.


In all the data sets used in the examples (runtimes, explains etc) there 
has been a in single people_id across the existing segment data.  Down 
the road this of course will not be the case and I can see the value of 
an index on probandset.people_id eventually. I can certainly add it now 
for a test.  I'm currently writing a probandset loader hoping to get a 
test case for the problem with gin indexing mentioned up-thread.


I think I'm most surprise at the notion that the arithmetic is the 
problem and will happily test your suggestion to force floating point 
values.  The value can get small (10^-12 on a good day!) but we don't 
need many digits of precision.


Thanks



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query not scaling

2017-10-30 Thread Laurenz Albe
On Thu, 2017-10-26 at 19:07 -0600, Rob Sargent wrote:
>    ->  Nested Loop  (cost=3799.40..44686205.23 rows=1361304413 width=40)
> (actual time=55.443..89684.451 rows=75577302 loops=1)

>  ->  Hash Join  (cost=3798.98..43611.56 rows=823591 width=32)
> (actual time=55.393..1397.509 rows=823591 loops=1)

>  ->  Index Scan using marker_chrom_basepos_idx on base.marker m
>     (cost=0.42..37.67 rows=1653 width=20)
> (actual time=0.010..0.075 rows=92 loops=823591)
>  Index Cond: ((m.chrom = 22) AND (m.basepos >= s.startbase) AND
(m.basepos <= s.endbase))

I think your biggest problem is the join condition
   on m.basepos between s.startbase and s.endbase

That forces a nested loop join, which cannot be performed efficiently.

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query not scaling

2017-10-27 Thread Merlin Moncure
On Thu, Oct 26, 2017 at 10:01 AM, Tom Lane  wrote:
> Laurenz Albe  writes:
>> Also, to have PostgreSQL inline the function, which would be good
>> for performance, it should be declared IMMUTABLE.
>
> Actually, if you hope to have a SQL function be inlined, it's better
> not to decorate it at all --- not with IMMUTABLE, and not with STRICT
> either.  Both of those restrict the parser's ability to inline unless
> it can prove the contained expression is equally immutable/strict.
> With the default attributes of volatile/not strict, there's nothing
> to prove.

This is extremely obnoxious.  Is it possible to raise a warning on
function creation?

> (In any case, it's usually easy enough to tell from EXPLAIN output
> whether inlining has happened.)

No it isn't.  The explain syntax is arcane and inlining as a general
concept is only very indirectly expressed.  I really think we ought to
do better here; I was not able to find any treatment of inlining given
in the 'Performance Tips' or the 'Functions and Operators' section, or
anywhere really (except the wiki).  This is really a disservice to the
users, I think.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query not scaling

2017-10-26 Thread Rob Sargent


On 10/26/2017 09:01 AM, Tom Lane wrote:

Laurenz Albe  writes:

Also, to have PostgreSQL inline the function, which would be good
for performance, it should be declared IMMUTABLE.

Actually, if you hope to have a SQL function be inlined, it's better
not to decorate it at all --- not with IMMUTABLE, and not with STRICT
either.  Both of those restrict the parser's ability to inline unless
it can prove the contained expression is equally immutable/strict.
With the default attributes of volatile/not strict, there's nothing
to prove.

(In any case, it's usually easy enough to tell from EXPLAIN output
whether inlining has happened.)

regards, tom lane


In another instance of the same schema, in same database as original
slow execution I've loaded 823591 segments (though in this case all
of them are on one chromosome, one markerset)**and 65K proband sets
using same marker table as the slow(est) query.  In the fastest run,
there are only 46K segments for the given markerset.



QUERY PLAN
--
 HashAggregate  (cost=82122076.59..8215.35 rows=14876 width=48) 
(actual time=208203.091..208210.348 rows=14645 loops=1)
   Output: m.id, min(((1.0 * (((s.events_greater + s.events_equal) + 
0))::numeric) / s.events_less + s.events_equal) + s.events_greater) 
+ 0))::numeric))

   Group Key: m.id
   Buffers: shared hit=43209090
   ->  Nested Loop  (cost=3799.40..44686205.23 rows=1361304413 
width=40) (actual time=55.443..89684.451 rows=75577302 loops=1)

 Output: m.id, s.events_greater, s.events_equal, s.events_less
 Buffers: shared hit=43209090
 ->  Hash Join  (cost=3798.98..43611.56 rows=823591 width=32) 
(actual time=55.393..1397.509 rows=823591 loops=1)
   Output: s.events_greater, s.events_equal, s.events_less, 
s.startbase, s.endbase

   Inner Unique: true
   Hash Cond: (s.probandset_id = p.id)
   Buffers: shared hit=19222
   ->  Seq Scan on sui.segment s (cost=0.00..29414.86 
rows=823591 width=48) (actual time=0.017..669.915 rows=823591 loops=1)
 Output: s.id, s.chrom, s.markerset_id, 
s.probandset_id, s.startbase, s.endbase, s.firstmarker, s.lastmarker, 
s.events_less, s.events_equal, s.events_greater
 Filter: ((s.chrom = 22) AND (s.markerset_id = 
'edf95066-24d2-4ca1-bad6-aa850cc82fef'::uuid))

 Buffers: shared hit=17061
   ->  Hash  (cost=2979.99..2979.99 rows=65519 width=16) 
(actual time=55.272..55.272 rows=65519 loops=1)

 Output: p.id
 Buckets: 65536  Batches: 1  Memory Usage: 3584kB
 Buffers: shared hit=2161
 ->  Seq Scan on sui.probandset p 
(cost=0.00..2979.99 rows=65519 width=16) (actual time=0.007..33.582 
rows=65519 loops=1)

   Output: p.id
   Filter: (p.people_id = 
'9b2308b1-9659-4a2c-91ae-8f95cd0a90b3'::uuid)

   Buffers: shared hit=2161
 ->  Index Scan using marker_chrom_basepos_idx on base.marker 
m  (cost=0.42..37.67 rows=1653 width=20) (actual time=0.010..0.075 
rows=92 loops=823591)

   Output: m.id, m.name, m.chrom, m.basepos, m.alleles
   Index Cond: ((m.chrom = 22) AND (m.basepos >= 
s.startbase) AND (m.basepos <= s.endbase))

   Buffers: shared hit=43189868
 Planning time: 0.764 ms
 Execution time: 208214.816 ms
(30 rows)



Re: [GENERAL] query not scaling

2017-10-26 Thread Rob Sargent



On 10/26/2017 09:01 AM, Tom Lane wrote:

Laurenz Albe  writes:

Also, to have PostgreSQL inline the function, which would be good
for performance, it should be declared IMMUTABLE.

Actually, if you hope to have a SQL function be inlined, it's better
not to decorate it at all --- not with IMMUTABLE, and not with STRICT
either.  Both of those restrict the parser's ability to inline unless
it can prove the contained expression is equally immutable/strict.
With the default attributes of volatile/not strict, there's nothing
to prove.

(In any case, it's usually easy enough to tell from EXPLAIN output
whether inlining has happened.)

regards, tom lane

As to the explain analyze,

   could not receive data from server: Connection timed out
   Time: 7877340.565 ms

for the second time.  I had presumed at first that this had occurred 
during a network burp.  I'll try running it directly on the pg host.




Re: [GENERAL] query not scaling

2017-10-26 Thread Tom Lane
Laurenz Albe  writes:
> Also, to have PostgreSQL inline the function, which would be good
> for performance, it should be declared IMMUTABLE.

Actually, if you hope to have a SQL function be inlined, it's better
not to decorate it at all --- not with IMMUTABLE, and not with STRICT
either.  Both of those restrict the parser's ability to inline unless
it can prove the contained expression is equally immutable/strict.
With the default attributes of volatile/not strict, there's nothing
to prove.

(In any case, it's usually easy enough to tell from EXPLAIN output
whether inlining has happened.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query not scaling

2017-10-26 Thread Rob Sargent


> On Oct 26, 2017, at 1:02 AM, Laurenz Albe  wrote:
> 
> Rob Sargent wrote:
>> I have a query I cannot tame and I'm wondering if there's an alternative
>> to  the "between"  clause I'm  using.  Perhaps  a custom  type could  do
>> better?  I've  tried the "<@" orperator and that changes the  query plan
>> significantly but the execution cost/time is not improved.
>> 
>> Any suggestion or pointers much appreciated.
> 
> You really need EXPLAIN (ANALYZE, BUFFERS) output to be able to
> undersrand what is going on.
> 
> A couple of simple things to check:
> 
> - Have all tables been ANALYZED beforehand?
> - Are all optimizer database parameters identical?
> 
> Also, to have PostgreSQL inline the function, which would be good
> for performance, it should be declared IMMUTABLE.
> 
> Yours,
> Laurenz Albe
The explain analyze was (maybe is)still running but without buffers. 
Thought of the immutable bit. Will be doing that test. 
All tables vacuumed and analyzed with each structural change. 
Thanks


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query not scaling

2017-10-26 Thread Laurenz Albe
Rob Sargent wrote:
> I have a query I cannot tame and I'm wondering if there's an alternative
> to  the "between"  clause I'm  using.  Perhaps  a custom  type could  do
> better?  I've  tried the "<@" orperator and that changes the  query plan
> significantly but the execution cost/time is not improved.
> 
> Any suggestion or pointers much appreciated.

You really need EXPLAIN (ANALYZE, BUFFERS) output to be able to
undersrand what is going on.

A couple of simple things to check:

- Have all tables been ANALYZED beforehand?
- Are all optimizer database parameters identical?

Also, to have PostgreSQL inline the function, which would be good
for performance, it should be declared IMMUTABLE.

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] query not scaling

2017-10-25 Thread Rob Sargent

I have a query I cannot tame and I'm wondering if there's an alternative
to  the "between"  clause I'm  using.  Perhaps  a custom  type could  do
better?  I've  tried the "<@" orperator and that changes the  query plan
significantly but the execution cost/time is not improved.

Any suggestion or pointers much appreciated.

Environment: Using  a virtual  CentOS Linux  release 7.4.1708  (Core), 4
cores (2.3GHz),  8G RAM and  postgres 10.0(beta3) shared_buffers  = 1GB,
work_mem = 2GB

Domain: (TL/DR)  A "segment" is  defined by  a particular set  of people
(probandset.id)  plus a  subset of  markers (markerset.id,  startmarker,
endmarker).   I need  the minimum  p-value for  each marker  in the  set
across all segments matching the set and a specific set of poeple.  So a
given segment says  "I cover all the markers from  startbase to endbase"
and each marker has a specific base position (relative to a chromosome).
I'm  after the  smallest  p-value  for each  marker  across  the set  of
segments which include that marker (from the 'between' clause).

Context:  I have  the query  in a  function so  the ids  of the  all the
players are available to the following sql:

select m.id as mkrid
   , min(pv(s.events_less, s.events_equal, s.events_greater, 0)) as 
optval
from marker m join segment s on m.basepos between s.startbase and s.endbase
 and m.chrom = 1
 and s.chrom = 1
 and s.markerset_id = suppliedMarkersetId
join probandset r on s.probandset_id = r.id
 and r.people_id =  suppliedPeopleId
group by m.id

where the pv function is

create or replace function pv(l bigint, e bigint, g bigint, o int)
returns numeric
as
$$
select 1.0*(g+e+o)/(l+e+g+o);
$$
language sql
;


I have the identical schema in  two databases (same pg instance) and the
tables definitions  involved are  below.  In one  schema there  are 1.7M
records  in segment  and in  the other  there is  40M rows.   The marker
tables are much more similar with 600K and 900K respectively.  The third
table, probandset, has 60 and 600 respectively. On average 0.8M and 1.8M
segments per markerset_id.

The explains: (fast  (12sec), then slow(hours)).  The  part which sticks
out  to  me  is  where  the "between"  gets  used.   (I'm  betting  that
probandset is  too small to  matter.)  The  slower explain plan  is very
similar to  what I  saw originally in  the now "fast"  data set  and the
current indexing stategy comes largely from that performance work.

It looks like I'm getting a Cartesian between the number of markers in a
set and the number of segments found: ten zeros at least.
  QUERY PLAN
--
 HashAggregate  (cost=291472.27..292040.58 rows=56831 width=48)
   Group Key: m.id
   ->  Nested Loop  (cost=3752.33..167295.52 rows=4515518 width=40)
 ->  Nested Loop  (cost=3751.90..17906.25 rows=715 width=32)
   ->  Seq Scan on probandset r  (cost=0.00..2.77 rows=4 width=16)
 Filter: (people_id = 
'4e3b9829-43a8-4f84-9df6-f120dc5b1a7e'::uuid)
   ->  Bitmap Heap Scan on segment s  (cost=3751.90..4473.96 
rows=191 width=48)
 Recheck Cond: ((probandset_id = r.id) AND (chrom = 1) AND 
(markerset_id = '61a7e5cb-b81d-42e4-9e07-6bd9c2fbe6d1'::uuid))
 ->  BitmapAnd  (cost=3751.90..3751.90 rows=191 width=0)
   ->  Bitmap Index Scan on useg  (cost=0.00..72.61 
rows=2418 width=0)
 Index Cond: ((probandset_id = r.id) AND (chrom 
= 1))
   ->  Bitmap Index Scan on segment_markerset_id_idx  
(cost=0.00..3676.23 rows=140240 width=0)
 Index Cond: (markerset_id = 
'61a7e5cb-b81d-42e4-9e07-6bd9c2fbe6d1'::uuid)
 ->  Index Scan using marker_chrom_basepos_idx on marker m  
(cost=0.42..145.79 rows=6315 width=20)
   Index Cond: ((chrom = 1) AND (basepos >= s.startbase) AND (basepos 
<= s.endbase))
(15 rows)


 QUERY PLAN
-
 HashAggregate  (cost=83131331.81..83132151.44 rows=81963 width=48)
   Group Key: m.id
   ->  Nested Loop  (cost=1907.38..70802659.35 rows=448315362 width=40)
 Join Filter: ((m.basepos >= s.startbase) AND (m.basepos <= s.endbase))
 ->  Bitmap Heap Scan on marker m  (cost=1883.64..11009.18 rows=81963 
width=20)
   Recheck Cond: (chrom = 1)
   ->  Bitmap Index Scan on marker_chrom_basepos_idx  
(cost=0.00..1863.15 rows=81963 width=0)
 Index Cond: (chrom = 1)
 ->  Materialize  (cost=23.74..181468.38 rows=49228 width=32)

Re: [GENERAL] query planner placement of sort/limit w.r.t. joins

2017-04-28 Thread David Rowley
On 29 April 2017 at 11:37, David G. Johnston  wrote:
>> > Perhaps there are reasons why this optimization is not safe that I
>> > haven't
>> > thought about?
>>
>> Yeah, I think so. What happens if an A row cannot find a match in B or
>> C? This version of the query will end up returning fewer rows due to
>> that, but the original version would consider other rows with a higher
>> rank.
>>
>> We've danced around a bit with using foreign keys as proofs that rows
>> will exist for other optimisations in the past, but it's tricky ground
>> since foreign keys are not updated immediately, so there are windows
>> where they may not actually hold true to their word.
>
>
> I read this query as having a relation cardinality of one-to-one mandatory -
> which precludes the scenario described.

What makes you say so?

It's pretty easy to show how the queries are not the same.

create table a (
  id int primary key,
  b_id int not null,
  val int not null,
  rank int not null
);

create table b (
  id int primary key,
  c_id int not null,
  val int not null
);

create table c (
  id int primary key,
  val int not null
);
insert into a select x,x,x,x from generate_series(1,150) x;
insert into b select x,x,x from generate_series(51,150) x;
insert into c select x,x from generate_series(51,150) x;

SELECT A.val, B.val, C.val FROM A
   JOIN B ON A.b_id = B.id
   JOIN C ON B.c_id = C.id
   ORDER BY A.rank
   LIMIT 100; -- returns 100 rows

 SELECT D.val, B.val, C.val FROM
   (SELECT * FROM A ORDER BY A.rank LIMIT 100) AS D
   JOIN B ON D.b_id = B.id
   JOIN C ON B.c_id = C.id
   LIMIT 100; -- returns 50 rows


> Is the above saying that, today, there is no planning benefit to setting up
> two deferrable references constraints to enforce the non-optional
> requirement?

There is no place in the planner where a foreign key is used as a
proof that a joined row must exist, with the exception of row
estimations for statistics.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query planner placement of sort/limit w.r.t. joins

2017-04-28 Thread David G. Johnston
On Fri, Apr 28, 2017 at 3:24 PM, David Rowley 
wrote:

> On 29 April 2017 at 07:59, Dave Vitek  wrote:
> > Is what I want in the query planner's vocabulary?  It would need to
> exploit
> > the fact that the _id columns are not nullable, and either exploit the
> > uniqueness of the id columns or do an extra LIMIT step after the join.  I
> > think I want it to effectively give the same result I expect (haven't
> > tested) it would give for:
>
> Unfortunately, it's not a plan that the current planner will consider.
>
> > SELECT D.val, B.val, C.val FROM
> >(SELECT * FROM A ORDER BY A.rank LIMIT 100) AS D
> >JOIN B ON D.b_id = B.id
> >JOIN C ON B.c_id = C.id
> >LIMIT 100;
> >
> > Perhaps there are reasons why this optimization is not safe that I
> haven't
> > thought about?
>
> Yeah, I think so. What happens if an A row cannot find a match in B or
> C? This version of the query will end up returning fewer rows due to
> that, but the original version would consider other rows with a higher
> rank.
>
> We've danced around a bit with using foreign keys as proofs that rows
> will exist for other optimisations in the past, but it's tricky ground
> since foreign keys are not updated immediately, so there are windows
> where they may not actually hold true to their word.
>

​​I read this query as having a relation cardinality of one-to-one
mandatory - which precludes the scenario described.

Is the above saying that, today, there is no planning benefit to setting up
two deferrable references constraints to enforce the non-optional
requirement?

I know I'm guilty of not enforcing the non-optional part of the
constraint.  Mostly due to not really realizing it but also having to deal
the added syntax to perform inserts.  ORMs I suspect generally would be
unaccommodating here as well...

David J.​


Re: [GENERAL] query planner placement of sort/limit w.r.t. joins

2017-04-28 Thread David Rowley
On 29 April 2017 at 07:59, Dave Vitek  wrote:
> Is what I want in the query planner's vocabulary?  It would need to exploit
> the fact that the _id columns are not nullable, and either exploit the
> uniqueness of the id columns or do an extra LIMIT step after the join.  I
> think I want it to effectively give the same result I expect (haven't
> tested) it would give for:

Unfortunately, it's not a plan that the current planner will consider.

> SELECT D.val, B.val, C.val FROM
>(SELECT * FROM A ORDER BY A.rank LIMIT 100) AS D
>JOIN B ON D.b_id = B.id
>JOIN C ON B.c_id = C.id
>LIMIT 100;
>
> Perhaps there are reasons why this optimization is not safe that I haven't
> thought about?

Yeah, I think so. What happens if an A row cannot find a match in B or
C? This version of the query will end up returning fewer rows due to
that, but the original version would consider other rows with a higher
rank.

We've danced around a bit with using foreign keys as proofs that rows
will exist for other optimisations in the past, but it's tricky ground
since foreign keys are not updated immediately, so there are windows
where they may not actually hold true to their word.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] query planner placement of sort/limit w.r.t. joins

2017-04-28 Thread Dave Vitek

Hi all,

I have a query I'd like to speed up.  I am wondering whether the query 
planner is capable of coming up with a certain kind of plan for this 
query, and if I might tickle it into doing so, or if I have to 
explicitly use subqueries to get what I want.


Imagine we have tables A, B, and C each having a one to one relationship.

SELECT A.val, B.val, C.val FROM A
   JOIN B ON A.b_id = B.id
   JOIN C ON B.c_id = C.id
   ORDER BY A.rank
   LIMIT 100;


Assume there are btree indices for all the "id" columns, but there is no 
index on "rank."  None of the fields are nullable.  id columns are 
unique.  Assume that there are about 1M tuples in the result set without 
the LIMIT.  ANALYZE has run recently.


In this case, I want postgres to do a sequential scan and a top-N sort 
against A since the ORDER BY only depends on columns of A, and then do 
100 index scans to implement the joins.


What I'm observing is that it is doing sequential scans on A, B, and C, 
joining, and then sorting.  The cost of the sequential scans of B and C 
is large enough to be painful.


Is what I want in the query planner's vocabulary?  It would need to 
exploit the fact that the _id columns are not nullable, and either 
exploit the uniqueness of the id columns or do an extra LIMIT step after 
the join.  I think I want it to effectively give the same result I 
expect (haven't tested) it would give for:


SELECT D.val, B.val, C.val FROM
   (SELECT * FROM A ORDER BY A.rank LIMIT 100) AS D
   JOIN B ON D.b_id = B.id
   JOIN C ON B.c_id = C.id
   LIMIT 100;

Perhaps there are reasons why this optimization is not safe that I 
haven't thought about?


The query is being generated for use by an ORM, so changing it to use a 
subquery for A and the ORDER BY+LIMIT is not so easy (plus, with 
different user input, it might need B or C in the ORDER BY).



- Dave



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query which "never" terminates

2017-04-27 Thread Viktor Fougstedt

Hello,

and thanks for your reply.

SET from_collapse_limit=1; 

did no difference, I’m afraid.

I ran both VACUUM and ANALYZE before I posted, and there was no noticable 
difference from either of them.


Regards,
/Viktor

> On 27 apr 2017, at 15:54, pinker  wrote:
> 
> Try to change from_collapse_limit values at first, for instance run SET
> from_collapse_limit = 1; and then your query in the same session. 
> have you ran analyze lately? 
> 
> 
> 
> --
> View this message in context: 
> http://www.postgresql-archive.org/Query-which-never-terminates-tp5958734p5958736.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query which "never" terminates

2017-04-27 Thread pinker
Try to change from_collapse_limit values at first, for instance run SET
from_collapse_limit = 1; and then your query in the same session. 
have you ran analyze lately? 



--
View this message in context: 
http://www.postgresql-archive.org/Query-which-never-terminates-tp5958734p5958736.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query which "never" terminates

2017-04-27 Thread Viktor Fougstedt
Dear sirs.

I have an issue which I am at a loss to solve. Because of the behaviour I am 
not certain that this is a “slow query” problem, so I send my question here.

My question regards a query using deep subqueries. EXPLAIN gives a maximum cost 
of about 2.000 (details below), but the query runs "forever" (at least 3 
hours). It takes 100% CPU on the machine while running. It is expected to 
produce 0 rows. I have auto-vacuum enabled, but have also manually VACUUM:ed 
and ANALYZE:d the database.

I am using "PostgreSQL 9.4.9 on x86_64-apple-darwin14.5.0, compiled by Apple 
LLVM version 7.0.2 (clang-700.1.81), 64-bit)".

The query has worked as expected before, but stopped working. See below for 
query, EXPLAIN-output, definition of all tables, the table's sizes, and a gdb 
backtrace of the "hung" process. I apologize for the SQL, which is horribly 
ugly. It is in part automatically generated, and therefore looks like . I 
have tried to format it in an as-readable-form as possible.

I have manually executed the innermost subquery ("SELECT n.account, ..."), and 
it produces 1 row in negligible time.The next levels out ("SELECT s3.nid, ...", 
"SELECT s4.nid, ..." and "SELECT i.assignment, ...") all produce 0 rows and run 
fine.

I have run "CREATE TABLE tmptbl AS SELECT i.assignment, ...", running the 
topmost subquery and all its subqueries. They produce 0 rows in negligible 
time. If I replace the subquery with a select from that temporary table, 
"SELECT s6.nid AS orgunit, i.tags_point AS tags_point FROM tmptbl, ...", that 
query runs fine, producing 0 rows in notime.

It is only when I use the complete query that it hangs - the individual parts 
all run.

A "select * from pg_stat_activity where waiting='t'” while the query is running 
produces 0 rows.

I also think that locking problems are unlikely, since the query works if I 
split it in two and use a table to store
intermediate output (which is 0 rows, BTW). Unless an inner query can hang on a 
lock held by an outer one.

The EXPLAIN (since it does not terminate I cannot run EXPLAIN ANALYZE): 
https://explain.depesz.com/s/Ex2z

See below for all the ugly details.

Thank you in advance, for all and any help.
/Viktor


SELECT s6.nid AS orgunit, i.tags_point AS tags_point
  FROM (SELECT i.assignment AS assignment, i.tags_point AS tags_point
  FROM (SELECT s4.nid AS assignment, i.tags_point AS tags_point
  FROM (SELECT s3.nid AS person, i.tags_point AS tags_point
  FROM (SELECT n.account AS account, tp.point AS 
tags_point
  FROM "pdb4_ad"."account_name" n,
   UNNEST(ARRAY[0]) AS injected_points,
   tags_point tp
 WHERE injected_points = tp.point
   AND int8range(n._r_from, n._r_to) @> 
tp.rev
   AND (n._value = '__bootstrap__')
   AND tsrange(n._start, n._stop) @> tp.time
   ) i,
   tags_point AS tp,
   "pdb4_ad"."account" AS s1,
   "pdb4_ad"."identifies" AS s2,
   "pdb4_person"."person" AS s3
 WHERE i.tags_point = tp.point
   AND i.account = s1.nid
   AND s1.nid = s2."account"
   AND s2."person" = s3.nid
   AND INT8RANGE(s1._r_from, s1._r_to) @> tp.rev
   AND TSRANGE(s1._start, s1._stop) @> tp.time
   AND INT8RANGE(s2._r_from, s2._r_to) @> tp.rev
   AND TSRANGE(s2._start, s2._stop) @> tp.time
   AND INT8RANGE(s3._r_from, s3._r_to) @> tp.rev
   AND TSRANGE(s3._start, s3._stop) @> tp.time
   ) i,
   tags_point AS tp,
   "pdb4_person"."person" AS s1,
   "pdb4_org"."assignment" AS s2,
   "pdb4_org"."assignment_active" AS s3,
   "pdb4_org"."assignment" AS s4
WHERE i.tags_point = tp.point
  AND i.person = s1.nid
  AND s1.nid = s2."person"
  AND s2.nid = s3."assignment"
  AND s3."assignment" = s4.nid
  AND INT8RANGE(s1._r_from, s1._r_to) @> tp.rev
  AND TSRANGE(s1._start, s1._stop) @> tp.time
  AND INT8RANGE(s2._r_from, s2._r_to) @> tp.rev
  AND TSRANGE(s2._start, s2._stop) @> tp.time
  AND INT8RANGE(s3._r_from, s3._r_to) @> tp.rev
  AND TSRANGE(s3._start, s3._stop) @> tp.time
  AND 

Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Daniel Westermann
Daniel Westermann  writes: 
>> Thank you, Merlin. As said I know that "not in" is not a good choice in this 
>> case but I still do not get what is going here. Why does the >> repeatedly search for NULL values when I decrease work_mem and why not when 
>> increasing work_mem? 

>The core point is that one plan is using a hashed subplan and the other is 
>not, because the planner estimated that the hashtable wouldn't fit into 
>work_mem. With a hashtable you'll have one probe into the hashtable per 
>outer row, and each probe is O(1) unless you are unlucky about data 
>distributions, so the runtime is more or less linear. Without a 
>hashtable, the inner table is rescanned for each outer row, so the 
>runtime is O(N^2) which gets pretty bad pretty fast. "Materializing" 
>the inner table doesn't really help: it gets rid of per-inner-row 
>visibility checks and some buffer locking overhead, so it cuts the 
>constant factor some, but the big-O situation is still disastrous. 

Thanks, Tom 


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Tom Lane
Daniel Westermann  writes:
> Thank you, Merlin. As said I know that "not in" is not a good choice in this 
> case but I still do not get what is going here. Why does the server 
> repeatedly search for NULL values when I decrease work_mem and why not when 
> increasing work_mem? 

The core point is that one plan is using a hashed subplan and the other is
not, because the planner estimated that the hashtable wouldn't fit into
work_mem.  With a hashtable you'll have one probe into the hashtable per
outer row, and each probe is O(1) unless you are unlucky about data
distributions, so the runtime is more or less linear.  Without a
hashtable, the inner table is rescanned for each outer row, so the
runtime is O(N^2) which gets pretty bad pretty fast.  "Materializing"
the inner table doesn't really help: it gets rid of per-inner-row
visibility checks and some buffer locking overhead, so it cuts the
constant factor some, but the big-O situation is still disastrous.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Daniel Westermann
2017-04-05 10:33 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 



2017-04-05 10:13 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 

BQ_BEGIN

2017-04-05 9:28 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 

BQ_BEGIN

>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows) 




>There is a materialize op more 
> 
>do you have a index on ids.id ? 

Yes: 

\d ids 
Table "public.ids" 
Column | Type | Modifiers 
+-+--- 
id | integer | 
Indexes: 
"i_ids" UNIQUE, btree (id) 


BQ_END



>>hmm .. NOT IN is just bad :( 
>> 
>>The second is slow becase table ids is stored in temp file. and it is 
>>repeatedly read from file. In first case, ids table is stored in memory. 
>> 
>>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from ids 
>>where id = user_id) 
>> 

Yes, really bad :) ... and I still do not get it. Even when reading from the 
tempfile all the time it should at least complete within one hour, shouldn't 
it? The tables are not so big: 

select * from pg_size_pretty ( pg_relation_size ('ids' )); 
pg_size_pretty 
 
35 MB 
(1 row) 
select * from pg_size_pretty ( pg_relation_size ('users' )); 
pg_size_pretty 
 
195 MB 
(1 row) 


BQ_END

>> 150 * few ms ~ big time 

Ok got it 
Thanks 
Pavel 



Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Pavel Stehule
2017-04-05 10:33 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> 2017-04-05 10:13 GMT+02:00 Daniel Westermann  services.com>:
>
>> 2017-04-05 9:28 GMT+02:00 Daniel Westermann > services.com>:
>>
>>> >>what is result of EXPLAIN statement for slow and fast cases?
>>> >>
>>> >>regards
>>> >>
>>> >>Pavel
>>>
>>> For work_mem=32MB
>>>
>>> explain (analyze,verbose,buffers)  select count(user_id) from users
>>> where user_id not in ( select id from ids);
>>>   QUERY
>>> PLAN
>>> 
>>> 
>>> --
>>>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
>>> time=7945.592..7945.593 rows=1 loops=1)
>>>Output: count(users.user_id)
>>>Buffers: shared read=29425
>>>->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=150
>>> width=4) (actual time=1928.665..5888.645 rows=249 loops=1)
>>>  Output: users.user_id, users.username
>>>  Filter: (NOT (hashed SubPlan 1))
>>>  Rows Removed by Filter: 51
>>>  Buffers: shared read=29425
>>>  SubPlan 1
>>>->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=101
>>> width=4) (actual time=11.111..868.382 rows=101 loops=1)
>>>  Output: ids.id
>>>  Buffers: shared read=4425
>>>  Planning time: 187.396 ms
>>>  Execution time: 7948.108 ms
>>> (14 rows)
>>>
>>> Time: 8244.493 ms
>>>
>>> For work_mem='16MB' it does not complete with analyze in on hour. For
>>> explain only:
>>>
>>> explain (verbose)  select count(user_id) from users where user_id not in
>>> ( select id from ids);
>>>QUERY
>>> PLAN
>>> 
>>> 
>>>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>>>Output: count(users.user_id)
>>>->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=150
>>> width=4)
>>>  Output: users.user_id, users.username
>>>  Filter: (NOT (SubPlan 1))
>>>  SubPlan 1
>>>->  Materialize  (cost=0.00..23332.01 rows=101 width=4)
>>>  Output: ids.id
>>>  ->  Seq Scan on public.ids  (cost=0.00..14425.01
>>> rows=101 width=4)
>>>Output: ids.id
>>> (10 rows)
>>>
>>
>> >There is a materialize op more
>> >
>> >do you have a index on ids.id?
>>
>> Yes:
>>
>> \d ids
>>   Table "public.ids"
>>  Column |  Type   | Modifiers
>> +-+---
>>  id | integer |
>> Indexes:
>> "i_ids" UNIQUE, btree (id)
>>
>>
>
> >>hmm .. NOT IN is just bad :(
> >>
> >>The second is slow becase table ids is stored in temp file. and it is
> repeatedly read from file. In first case, ids table is stored in memory.
> >>
> >>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from
> ids where id = user_id)
> >>
>
> Yes, really bad :) ... and I still do not get it. Even when reading from
> the tempfile all the time it should at least complete within one hour,
> shouldn't it? The tables are not so big:
>
> select * from pg_size_pretty ( pg_relation_size ('ids' ));
>  pg_size_pretty
> 
>  35 MB
> (1 row)
> select * from pg_size_pretty ( pg_relation_size ('users' ));
>  pg_size_pretty
> 
>  195 MB
> (1 row)
>
>

150 * few ms ~ big time


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Daniel Westermann
2017-04-05 10:13 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 



2017-04-05 9:28 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 

BQ_BEGIN

>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows) 




>There is a materialize op more 
> 
>do you have a index on ids.id ? 

Yes: 

\d ids 
Table "public.ids" 
Column | Type | Modifiers 
+-+--- 
id | integer | 
Indexes: 
"i_ids" UNIQUE, btree (id) 


BQ_END



>>hmm .. NOT IN is just bad :( 
>> 
>>The second is slow becase table ids is stored in temp file. and it is 
>>repeatedly read from file. In first case, ids table is stored in memory. 
>> 
>>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from ids 
>>where id = user_id) 
>> 

Yes, really bad :) ... and I still do not get it. Even when reading from the 
tempfile all the time it should at least complete within one hour, shouldn't 
it? The tables are not so big: 

select * from pg_size_pretty ( pg_relation_size ('ids' )); 
pg_size_pretty 
 
35 MB 
(1 row) 
select * from pg_size_pretty ( pg_relation_size ('users' )); 
pg_size_pretty 
 
195 MB 
(1 row) 




Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Pavel Stehule
2017-04-05 10:13 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> 2017-04-05 9:28 GMT+02:00 Daniel Westermann  services.com>:
>
>> >>what is result of EXPLAIN statement for slow and fast cases?
>> >>
>> >>regards
>> >>
>> >>Pavel
>>
>> For work_mem=32MB
>>
>> explain (analyze,verbose,buffers)  select count(user_id) from users where
>> user_id not in ( select id from ids);
>>   QUERY
>> PLAN
>> 
>> 
>> --
>>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
>> time=7945.592..7945.593 rows=1 loops=1)
>>Output: count(users.user_id)
>>Buffers: shared read=29425
>>->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=150
>> width=4) (actual time=1928.665..5888.645 rows=249 loops=1)
>>  Output: users.user_id, users.username
>>  Filter: (NOT (hashed SubPlan 1))
>>  Rows Removed by Filter: 51
>>  Buffers: shared read=29425
>>  SubPlan 1
>>->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=101
>> width=4) (actual time=11.111..868.382 rows=101 loops=1)
>>  Output: ids.id
>>  Buffers: shared read=4425
>>  Planning time: 187.396 ms
>>  Execution time: 7948.108 ms
>> (14 rows)
>>
>> Time: 8244.493 ms
>>
>> For work_mem='16MB' it does not complete with analyze in on hour. For
>> explain only:
>>
>> explain (verbose)  select count(user_id) from users where user_id not in
>> ( select id from ids);
>>QUERY
>> PLAN
>> 
>> 
>>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>>Output: count(users.user_id)
>>->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=150
>> width=4)
>>  Output: users.user_id, users.username
>>  Filter: (NOT (SubPlan 1))
>>  SubPlan 1
>>->  Materialize  (cost=0.00..23332.01 rows=101 width=4)
>>  Output: ids.id
>>  ->  Seq Scan on public.ids  (cost=0.00..14425.01
>> rows=101 width=4)
>>Output: ids.id
>> (10 rows)
>>
>
> >There is a materialize op more
> >
> >do you have a index on ids.id?
>
> Yes:
>
> \d ids
>   Table "public.ids"
>  Column |  Type   | Modifiers
> +-+---
>  id | integer |
> Indexes:
> "i_ids" UNIQUE, btree (id)
>
>

hmm .. NOT IN is just bad :(

The second is slow becase table ids is stored in temp file. and it is
repeatedly read from file. In first case, ids table is stored in memory.

SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from
ids where id = user_id)

Regards

Pavel


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Daniel Westermann
2017-04-05 9:28 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 



>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows) 




>There is a materialize op more 
> 
>do you have a index on ids.id ? 

Yes: 

\d ids 
Table "public.ids" 
Column | Type | Modifiers 
+-+--- 
id | integer | 
Indexes: 
"i_ids" UNIQUE, btree (id) 



Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Pavel Stehule
2017-04-05 9:28 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> >>what is result of EXPLAIN statement for slow and fast cases?
> >>
> >>regards
> >>
> >>Pavel
>
> For work_mem=32MB
>
> explain (analyze,verbose,buffers)  select count(user_id) from users where
> user_id not in ( select id from ids);
>   QUERY
> PLAN
> 
> --
>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
> time=7945.592..7945.593 rows=1 loops=1)
>Output: count(users.user_id)
>Buffers: shared read=29425
>->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=150
> width=4) (actual time=1928.665..5888.645 rows=249 loops=1)
>  Output: users.user_id, users.username
>  Filter: (NOT (hashed SubPlan 1))
>  Rows Removed by Filter: 51
>  Buffers: shared read=29425
>  SubPlan 1
>->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=101
> width=4) (actual time=11.111..868.382 rows=101 loops=1)
>  Output: ids.id
>  Buffers: shared read=4425
>  Planning time: 187.396 ms
>  Execution time: 7948.108 ms
> (14 rows)
>
> Time: 8244.493 ms
>
> For work_mem='16MB' it does not complete with analyze in on hour. For
> explain only:
>
> explain (verbose)  select count(user_id) from users where user_id not in (
> select id from ids);
>QUERY
> PLAN
> 
> 
>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>Output: count(users.user_id)
>->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=150
> width=4)
>  Output: users.user_id, users.username
>  Filter: (NOT (SubPlan 1))
>  SubPlan 1
>->  Materialize  (cost=0.00..23332.01 rows=101 width=4)
>  Output: ids.id
>  ->  Seq Scan on public.ids  (cost=0.00..14425.01
> rows=101 width=4)
>Output: ids.id
> (10 rows)
>

There is a materialize op more

do you have a index on ids.id?

Pavel


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Daniel Westermann
>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=150 width=4) 
(actual time=1928.665..5888.645 rows=249 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 51 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) (actual 
time=11.111..868.382 rows=101 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 

 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=150 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=101 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=101 width=4) 
Output: ids.id 
(10 rows) 


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Pavel Stehule
2017-04-05 8:57 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> >> I have set work_mem to a very low value intentionally for demonstration
> >> purposes:
> >>
> >> postgres=# show work_mem;
> >>  work_mem
> >> --
> >>  16MB
> >> (1 row)
> >>
> >> postgres=# show shared_buffers ;
> >>  shared_buffers
> >> 
> >>  128MB
> >> (1 row)
> >>
> >>
> >> When I run the following query ( I know that "not in" is not a good
> choice
> >> here ):
> >>
> >> postgres=# select count(user_id) from users where user_id not in (
> select id
> >> from ids);
>
> >"NOT IN" where the predate is a table column can lead to very poor
> >query plans especially where the haystack is not provably known (at
> >plan time) to contain only not null values.  By reducing work_mem, the
> >server has decided has to repeatedly search the table to search for
> >the presence of null values.  Try converting the query to NOT EXISTS.
>
> Thank you, Merlin. As said I know that "not in" is not a good choice in
> this case but I still do not get what is going here. Why does the server
> repeatedly search for NULL values when I decrease work_mem and why not when
> increasing work_mem?
>

what is result of EXPLAIN statement for slow and fast cases?

regards

Pavel


>
>
> Regards
> Daniel
>


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-05 Thread Daniel Westermann
>> I have set work_mem to a very low value intentionally for demonstration 
>> purposes: 
>> 
>> postgres=# show work_mem; 
>> work_mem 
>> -- 
>> 16MB 
>> (1 row) 
>> 
>> postgres=# show shared_buffers ; 
>> shared_buffers 
>>  
>> 128MB 
>> (1 row) 
>> 
>> 
>> When I run the following query ( I know that "not in" is not a good choice 
>> here ): 
>> 
>> postgres=# select count(user_id) from users where user_id not in ( select id 
>> from ids); 

>"NOT IN" where the predate is a table column can lead to very poor 
>query plans especially where the haystack is not provably known (at 
>plan time) to contain only not null values. By reducing work_mem, the 
>server has decided has to repeatedly search the table to search for 
>the presence of null values. Try converting the query to NOT EXISTS. 

Thank you, Merlin. As said I know that "not in" is not a good choice in this 
case but I still do not get what is going here. Why does the server repeatedly 
search for NULL values when I decrease work_mem and why not when increasing 
work_mem? 


Regards 
Daniel 


Re: [GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-04 Thread Merlin Moncure
On Tue, Apr 4, 2017 at 8:20 AM, Daniel Westermann
 wrote:
> Hi,
>
> PostgreSQL 9.6.2 on CentOS 7.3 x64.
>
> This is my data set:
>
> drop table if exists users;
> drop table if exists ids;
> create table users ( user_id int
>, username varchar(50)
>);
> with generator as
> ( select a.*
> from generate_series (1,300) a
>order by random()
> )
> insert into users ( user_id
>   , username
>   )
>select a
> , md5(a::varchar)
>  from generator;
> create unique index i_users on users ( user_id );
> create table ids ( id int );
> insert into ids (id) values ( generate_series ( 250, 350 ) );
> create unique index i_ids on ids ( id );
> analyze users;
> analyze ids;
>
> I have set work_mem to a very low value intentionally for demonstration
> purposes:
>
> postgres=# show work_mem;
>  work_mem
> --
>  16MB
> (1 row)
>
> postgres=# show shared_buffers ;
>  shared_buffers
> 
>  128MB
> (1 row)
>
>
> When I run the following query ( I know that "not in" is not a good choice
> here ):
>
> postgres=# select count(user_id) from users where user_id not in ( select id
> from ids);

"NOT IN" where the predate is a table column can lead to very poor
query plans especially where the haystack is not provably known (at
plan time) to contain only not null values.  By reducing work_mem, the
server has decided has to repeatedly search the table to search for
the presence of null values.  Try converting the query to NOT EXISTS.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query never completes with low work_mem (at least not within one hour)

2017-04-04 Thread Daniel Westermann
Hi, 

PostgreSQL 9.6.2 on CentOS 7.3 x64. 

This is my data set: 

drop table if exists users; 
drop table if exists ids; 
create table users ( user_id int 
, username varchar(50) 
); 
with generator as 
( select a.* 
from generate_series (1,300) a 
order by random() 
) 
insert into users ( user_id 
, username 
) 
select a 
, md5(a::varchar) 
from generator; 
create unique index i_users on users ( user_id ); 
create table ids ( id int ); 
insert into ids (id) values ( generate_series ( 250, 350 ) ); 
create unique index i_ids on ids ( id ); 
analyze users; 
analyze ids; 

I have set work_mem to a very low value intentionally for demonstration 
purposes: 

postgres=# show work_mem; 
work_mem 
-- 
16MB 
(1 row) 

postgres=# show shared_buffers ; 
shared_buffers 
 
128MB 
(1 row) 


When I run the following query ( I know that "not in" is not a good choice here 
): 

postgres=# select count(user_id) from users where user_id not in ( select id 
from ids); 

... this seems to never complete (at least not within one hour). 

Setting work_mem to 32MB and all is fine. 

top shows the session at 100% CPU. No waits are listed in pg_stat_activity: 

postgres=# select pid,wait_event_type,wait_event,state,query from 
pg_stat_activity ; 
pid | wait_event_type | wait_event | state | query 
---+-+++--
 
17817 | | | active | select count(user_id) from users where user_id not in ( 
select id from ids); 
17847 | | | active | select pid,wait_event_type,wait_event,state,query from 
pg_stat_activity ; 
(2 rows) 

strace shows more ore less always this, so something is happening: 

read(14, 
"\0\0\1\0\0\t\30\0\351G1\0\16\0\0\0\1\0\0\t\30\0\352G1\0\16\0\0\0\1\0"..., 
8192) = 8192 
read(14, 
"\1\0\0\t\30\0002J1\0\16\0\0\0\1\0\0\t\30\0003J1\0\16\0\0\0\1\0\0\t"..., 8192) 
= 8192 
read(14, "\0\t\30\0{L1\0\16\0\0\0\1\0\0\t\30\0|L1\0\16\0\0\0\1\0\0\t\30\0"..., 
8192) = 8192 
read(14, 
"\30\0\304N1\0\16\0\0\0\1\0\0\t\30\0\305N1\0\16\0\0\0\1\0\0\t\30\0\306N"..., 
8192) = 8192 
read(14, "\rQ1\0\16\0\0\0\1\0\0\t\30\0\16Q1\0\16\0\0\0\1\0\0\t\30\0\17Q1\0"..., 
8192) = 8192^C 

postgres@pgbox:/u02/pgdata/PG962/ [PG962] ls -la /proc/17817/fd/ 
total 0 
dr-x--. 2 postgres postgres 0 Apr 4 14:45 . 
dr-xr-xr-x. 9 postgres postgres 0 Apr 4 14:34 .. 
lr-x--. 1 postgres postgres 64 Apr 4 14:45 0 -> /dev/null 
l-wx--. 1 postgres postgres 64 Apr 4 14:45 1 -> pipe:[58121] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 10 -> socket:[58881] 
lr-x--. 1 postgres postgres 64 Apr 4 14:45 11 -> pipe:[58882] 
l-wx--. 1 postgres postgres 64 Apr 4 14:45 12 -> pipe:[58882] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 13 -> 
/u02/pgdata/PG962/base/13323/16516 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 14 -> 
/u02/pgdata/PG962/base/pgsql_tmp/pgsql_tmp17817.1 
l-wx--. 1 postgres postgres 64 Apr 4 14:45 2 -> pipe:[58121] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 3 -> anon_inode:[eventpoll] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 4 -> 
/u02/pgdata/PG962/base/13323/2601 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 5 -> 
/u02/pgdata/PG962/base/13323/16517 
lr-x--. 1 postgres postgres 64 Apr 4 14:45 6 -> pipe:[58120] 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 7 -> 
/u02/pgdata/PG962/base/13323/16520 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 8 -> 
/u02/pgdata/PG962/base/13323/16513 
lrwx--. 1 postgres postgres 64 Apr 4 14:45 9 -> socket:[58126] 

The size of the temp file does not change over time: 

postgres@pgbox:/u02/pgdata/PG962/ [PG962] ls -lha 
/u02/pgdata/PG962/base/pgsql_tmp/pgsql_tmp17817.2 
-rw---. 1 postgres postgres 14M Apr 4 14:48 
/u02/pgdata/PG962/base/pgsql_tmp/pgsql_tmp17817.2 

What do I miss here? Shouldn't this complete with 16MB work_mem as well, even 
when slower, but in less than one hour? Or is this expected? 

Thanks for your help 
Daniel 


Re: [GENERAL] Query with type char

2017-02-16 Thread Christoph Moench-Tegeder
## Egon Frerich (e...@frerich.eu):

> Why are there 0 rows? I expect 3 rows:

>From your results I guess that sp3 IS NULL on the three rows where
it is not 'j'. You should use the correct comparison predicate for
this case, i.e. IS DISTINCT FROM instead of <>.
https://www.postgresql.org/docs/9.6/static/functions-comparison.html

Regards,
Christoph

-- 
Spare Space.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query with type char

2017-02-16 Thread Egon Frerich

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Why are there 0 rows? I expect 3 rows:

> mydb=# SELECT * FROM sample_char;
>  sp1 |   sp2   | sp3
> -+-+-
>1 | Bremen  |
>2 | Hamburg |
>4 | Toronto |
>3 | Bern| j
> (4 rows)
>
> mydb=# SELECT * FROM sample_char WHERE sp3 = 'j';
>  sp1 |   sp2   | sp3
> -+-+-
>3 | Bern| j
> (1 row)
>
> mydb=# SELECT * FROM sample_char WHERE sp3 <> 'j';
>  sp1 | sp2 | sp3
> -+-+-
> (0 rows)

Egon


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJYpXtvAAoJECc7nbY+bg5uu38P/jXtMHU6AeNwNY5X+B3B9XnD
Wy6AChGzxl1z3pQLpKLrIHPI+y0FFx68vV6TBxpzW/wvpYE7rTM2UfLFyUmcflim
nEhQPUgK/YW3A5yKbjlRQasaosiBrvTvqlP8dSyhIASC1Jxwa152xaTrYOw9+ANO
Z/prP4X1GF7m29P0ULFxthpJlA3FOgtueKcwCVkDQVb2eRISFX8pzhuWKHFHukjx
avMhTo7s0n2unZGmtqrC1m59MR5WYoxbc7Dmwg82/jLqEf4ZYCc4tniqiysl3Tzc
i55xH/aqzVAXhC9PHCxoTU8ey2vyGogRObHo5tCqUpMPNoz0zbeSFrmb829fi3xV
XOKkerx2xXOnxkZkYfCH/a50eWdt9+Nw/ktuHU7G2/UIJQH3DxPi91nuY6trpO5e
nOSqi64F9gm12trCe14Jk8z/Ea7NZCCL5GNrQKXj4zkJF1AagG+bzsefZRz/fkK3
m9Nc2AS9J27mk0jx2SuCc5hHx6o8IdsemZacAamjFa/TYvOau+c6fbHKVvU79uhE
Wxdchse2Zo0/wC6o8nlQqTKmX4xfYKA5Oqk6gmOXC5fqLNOzIYb04IIF6zZAw9Sk
CELmOHyP0ZVyQhAVipqrepgbWhIrRVZV0+a+h5IO/s4QoenFjNV+pKnnwj9BCjw7
duk+MVwoHi6fZk6z601b
=LHt3
-END PGP SIGNATURE-



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query with large in clauses uses a lot of memory

2016-11-28 Thread greigwise
Wow.  Thanks for the prompt answer.

As a follow-up I was wondering if maybe there would be a way to tell it to
NOT try to plan/execute the query (and instead throw an error) if the memory
usage exceeded X.

Thanks again.

Greig



--
View this message in context: 
http://postgresql.nabble.com/Query-with-large-in-clauses-uses-a-lot-of-memory-tp5931716p5932279.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-26 Thread azhwkd
Greetings!

I tried running the application using the race detector but no data races
were detected.
To further narrow this down I removed the call to the function in question
from the pool
and instead run it one by one from a queue. This seems to have helped and
the query didn't lock up anymore.
I took this one step further and left the locked up queries sitting there
for a day and they eventually finished. Who would have thought a ~2 second
query could blow up to ~20hrs.


rob stone  schrieb am Fr., 25. Nov. 2016 um 02:34 Uhr:


On Thu, 2016-11-24 at 22:23 +0100, azhwkd wrote:
> It should not be possible because a group does not return to the
> update pool before the update hasn't finished.
> I watched the queries in a postgres client and there was no overlap I
> could see.
> I don't really know what to make from this behavior, sometimes when I
> start the application a few updates go through and eventually it will
> lock up completely and sometimes it locks up immediately - always
> with
> heap_hot_search_buffer using ~20 of all CPU time on the system.
>
>
Hello Sebastian,
You stated that the application is written using go (www.golang.org).
Are you able to run the application with the -race flag?

HTH,
Rob

>From wikipedia:-

Lack of race condition safety

There are no restrictions on how goroutines access shared data, making
race conditions possible. Specifically, unless a program explicitly
synchronizes via channels or other means, writes from one goroutine
might be partly, entirely, or not at all visible to another, often with
no guarantees about ordering of writes. Furthermore, Go's internal data
structures like interface values, slice headers, hash tables, and
string headers are not immune to race conditions, so type and memory
safety can be violated in multithreaded programs that modify shared
instances of those types without synchronization.

Instead of language support, safe concurrent programming thus relies on
conventions; for example, Chisnall recommends an idiom called "aliases
xor mutable", meaning that passing a mutable value (or pointer) over a
channel signals a transfer of ownership over the value to its receiver.


Re: [GENERAL] Query regarding deadlock

2016-11-25 Thread Jeff Janes
On Thu, Nov 24, 2016 at 5:44 PM, Yogesh Sharma <
yogesh1.sha...@nectechnologies.in> wrote:

> Dear All,
>
> Thanks in advance.
> I found below deadlock in postgresql logs.
> I cannot change calling of REINDEX and insert query sequence because it is
> execute automatically through some cron script.
>
> ERROR:  deadlock detected
>  DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of
> database 16385; blocked by process 4111.
>  Process 4111 waits for ShareLock on relation 16502 of database 16385;
> blocked by process 2234.
>  Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
>  Process 4111: REINDEX TABLE table1
>
> Could you please provide any solution to resolve this deadlock.
>

What are tables 16459 and 16502?  Are they related to each other through
triggers or FK constraints?

Are you reindexing multiple tables in the same transaction?  If not, I
don't see why these should deadlock.  One should win, and the other should
block.

If you are reindexing multiple tables in the same transaction, why are you
doing that?  I can't think of a situation where you couldn't use separate
transactions per table.

Cheers,

Jeff


Re: [GENERAL] query locks up when run concurrently

2016-11-25 Thread Adrian Klaver

On 11/24/2016 02:14 PM, azhwkd wrote:


Adrian Klaver > schrieb am Do., 24. Nov. 2016 um
22:34 Uhr:

On 11/24/2016 01:23 PM, azhwkd wrote:
> It should not be possible because a group does not return to the
> update pool before the update hasn't finished.

So what is this 'update pool' and what is driving/using it?

In other words how is the determination of the parameters done?

To be more specific, the implication is that a group id can be reused so
what determines that?


The application is written in go. Every group ID has its own go routine
and the routine is blocked until the SQL statement returns.
The update process starts with a check to an external API endpoint and
if there is new data available the routine is downloading it, parsing it
and inserting the data into 2 tables. Once that is done, the routine
continues to execute the statement in question using the data it
inserted before for the calculation. Only once this finishes will the
routine start over again.



> I watched the queries in a postgres client and there was no
overlap I could see.

Was this a visual inspection or did you dump the results of the various
query/parameter combinations into tables and do an SQL comparison?


I inspected it visually and also dumped all variables into a file
directly from the application.



> I don't really know what to make from this behavior, sometimes when I
> start the application a few updates go through and eventually it will
> lock up completely and sometimes it locks up immediately - always with

Is there a common thread with regard to the parameters in use when
things lock up?


Do you mean if it always locks on the same parameters? If so then it
does not, sadly



Yes, that would have been too easy. I'm out of ideas for the moment. Rob 
Stones post looks promising though.



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Tom Lane
Yogesh Sharma  writes:
> ERROR:  deadlock detected
>  DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of 
> database 16385; blocked by process 4111.
>  Process 4111 waits for ShareLock on relation 16502 of database 16385; 
> blocked by process 2234.
>  Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
>  Process 4111: REINDEX TABLE table1

Offhand I would not expect those two commands to deadlock in isolation.
Are they parts of larger transactions that take additional locks?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Sameer Kumar
On Fri, 25 Nov 2016, 10:07 a.m. Yogesh Sharma, <
yogesh1.sha...@nectechnologies.in> wrote:

> Dear John,
>
> Thanks for your support.
> I mean to say, the REINDEX calls hourly and insert query executes every
> minute to update.
> So, it might be race condition that these queries can call at same time.
>

Reindex every hour? How did you end up with that. Looks like you got
another problem while fixing one with a dirty solution. Why do you need to
reindex every hour?


> If there is any solution like we can add some check before REINDEX
> operation performed.
> If it is possible?
>
> Regards,
> Yogesh
>
> -Original Message-
> From: pgsql-general-ow...@postgresql.org [mailto:
> pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
> Sent: Friday, November 25, 2016 10:55 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query regarding deadlock
>
> On 11/24/2016 5:44 PM, Yogesh Sharma wrote:
> > I cannot change calling of REINDEX and insert query sequence because it
> is execute automatically through some cron script.
>
> any cron scripts are your own doing, so this statement makes no sense at
> all.
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 

-- 

Best Regards,

*Sameer Kumar | DB Solution Architect*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

T: +65 6438 3504 | www.ashnik.com

Skype: sameer.ashnik |   T: +65 8110 0350


[image: www.ashnik.com] <http://www.ashnik.com/>​


Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Michael Paquier
On Fri, Nov 25, 2016 at 11:14 AM, Sameer Kumar  wrote:
> Does this mean that you reindex quite often based on a schedule. Personally I 
> don't prefer that. To me it is like you are trying to fix something that is 
> not broken.
>
> Ideally reindex only what needs to be reindexed. I would not want to reindex 
> a table in OLTP env.

Like VACUUM FULL, don't forget that REINDEX needs an exclusive lock
when working. As that's very intrusive, usually you take measures on
your database to be sure that you *never* require it, particularly if
this is a production instance.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Jan de Visser

On 2016-11-24 9:06 PM, Yogesh Sharma wrote:


Dear John,

Thanks for your support.
I mean to say, the REINDEX calls hourly and insert query executes every minute 
to update.
So, it might be race condition that these queries can call at same time.


Why do you need to run REINDEX every hour? That sounds like a stopgap 
solution for another problem.

If there is any solution like we can add some check before REINDEX operation 
performed.
If it is possible?


Try to find out why the cron script is there in the first place. Then go 
from there - eliminate it, or do it only in down periods.




Regards,
Yogesh

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, November 25, 2016 10:55 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query regarding deadlock

On 11/24/2016 5:44 PM, Yogesh Sharma wrote:

I cannot change calling of REINDEX and insert query sequence because it is 
execute automatically through some cron script.

any cron scripts are your own doing, so this statement makes no sense at all.






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Sameer Kumar
On Fri, 25 Nov 2016, 9:45 a.m. Yogesh Sharma, <
yogesh1.sha...@nectechnologies.in> wrote:

> Dear All,
>
> Thanks in advance.
> I found below deadlock in postgresql logs.
> I cannot change calling of REINDEX and insert query sequence because it is
> execute automatically through some cron script.
>

Does this mean that you reindex quite often based on a schedule. Personally
I don't prefer that. To me it is like you are trying to fix something that
is not broken.

Ideally reindex only what needs to be reindexed. I would not want to
reindex a table in OLTP env.


> ERROR:  deadlock detected
>  DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of
> database 16385; blocked by process 4111.
>  Process 4111 waits for ShareLock on relation 16502 of database 16385;
> blocked by process 2234.
>  Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
>  Process 4111: REINDEX TABLE table1
>



> Could you please provide any solution to resolve this deadlock.
>
> Regards,
> Yogesh
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 

-- 

Best Regards,

*Sameer Kumar | DB Solution Architect*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

T: +65 6438 3504 | www.ashnik.com

Skype: sameer.ashnik |   T: +65 8110 0350


[image: www.ashnik.com] 

Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread Yogesh Sharma
Dear John,

Thanks for your support.
I mean to say, the REINDEX calls hourly and insert query executes every minute 
to update.
So, it might be race condition that these queries can call at same time.

If there is any solution like we can add some check before REINDEX operation 
performed.
If it is possible?

Regards,
Yogesh

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, November 25, 2016 10:55 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query regarding deadlock

On 11/24/2016 5:44 PM, Yogesh Sharma wrote:
> I cannot change calling of REINDEX and insert query sequence because it is 
> execute automatically through some cron script.

any cron scripts are your own doing, so this statement makes no sense at all.


-- 
john r pierce, recycling bits in santa cruz



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query regarding deadlock

2016-11-24 Thread John R Pierce

On 11/24/2016 5:44 PM, Yogesh Sharma wrote:

I cannot change calling of REINDEX and insert query sequence because it is 
execute automatically through some cron script.


any cron scripts are your own doing, so this statement makes no sense at 
all.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query regarding deadlock

2016-11-24 Thread Yogesh Sharma
Dear All,

Thanks in advance.
I found below deadlock in postgresql logs.
I cannot change calling of REINDEX and insert query sequence because it is 
execute automatically through some cron script.

ERROR:  deadlock detected
 DETAIL:  Process 2234 waits for AccessShareLock on relation 16459 of database 
16385; blocked by process 4111.
 Process 4111 waits for ShareLock on relation 16502 of database 16385; blocked 
by process 2234.
 Process 2234: INSERT INTO table1 ( id , unique_id )VALUES( '1', '4')
 Process 4111: REINDEX TABLE table1

Could you please provide any solution to resolve this deadlock.

Regards,
Yogesh

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread rob stone

On Thu, 2016-11-24 at 22:23 +0100, azhwkd wrote:
> It should not be possible because a group does not return to the
> update pool before the update hasn't finished.
> I watched the queries in a postgres client and there was no overlap I
> could see.
> I don't really know what to make from this behavior, sometimes when I
> start the application a few updates go through and eventually it will
> lock up completely and sometimes it locks up immediately - always
> with
> heap_hot_search_buffer using ~20 of all CPU time on the system.
> 
> 
Hello Sebastian,
You stated that the application is written using go (www.golang.org).
Are you able to run the application with the -race flag?

HTH,
Rob

>From wikipedia:-

Lack of race condition safety

There are no restrictions on how goroutines access shared data, making
race conditions possible. Specifically, unless a program explicitly
synchronizes via channels or other means, writes from one goroutine
might be partly, entirely, or not at all visible to another, often with
no guarantees about ordering of writes. Furthermore, Go's internal data
structures like interface values, slice headers, hash tables, and
string headers are not immune to race conditions, so type and memory
safety can be violated in multithreaded programs that modify shared
instances of those types without synchronization.

Instead of language support, safe concurrent programming thus relies on
conventions; for example, Chisnall recommends an idiom called "aliases
xor mutable", meaning that passing a mutable value (or pointer) over a
channel signals a transfer of ownership over the value to its receiver.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread azhwkd
Adrian Klaver  schrieb am Do., 24. Nov. 2016 um
22:34 Uhr:

> On 11/24/2016 01:23 PM, azhwkd wrote:
> > It should not be possible because a group does not return to the
> > update pool before the update hasn't finished.
>
> So what is this 'update pool' and what is driving/using it?
>
> In other words how is the determination of the parameters done?
>
> To be more specific, the implication is that a group id can be reused so
> what determines that?
>

The application is written in go. Every group ID has its own go routine and
the routine is blocked until the SQL statement returns.
The update process starts with a check to an external API endpoint and if
there is new data available the routine is downloading it, parsing it and
inserting the data into 2 tables. Once that is done, the routine continues
to execute the statement in question using the data it inserted before for
the calculation. Only once this finishes will the routine start over again.


>
> > I watched the queries in a postgres client and there was no overlap I
> could see.
>
> Was this a visual inspection or did you dump the results of the various
> query/parameter combinations into tables and do an SQL comparison?
>

I inspected it visually and also dumped all variables into a file directly
from the application.


>
> > I don't really know what to make from this behavior, sometimes when I
> > start the application a few updates go through and eventually it will
> > lock up completely and sometimes it locks up immediately - always with
>
> Is there a common thread with regard to the parameters in use when
> things lock up?
>

Do you mean if it always locks on the same parameters? If so then it does
not, sadly


>
> > heap_hot_search_buffer using ~20 of all CPU time on the system.
> >
> > 2016-11-24 19:14 GMT+01:00 Adrian Klaver :
> >> On 11/23/2016 10:41 PM, azhwkd wrote:
> >>>
> >>> The group ID is part of the primary key of the group_history table. My
> >>> understanding is that two INSERTs with different group IDs should not
> >>> collide in this case, or am I wrong in thinking this?
> >>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread Adrian Klaver

On 11/24/2016 01:23 PM, azhwkd wrote:

It should not be possible because a group does not return to the
update pool before the update hasn't finished.


So what is this 'update pool' and what is driving/using it?

In other words how is the determination of the parameters done?

To be more specific, the implication is that a group id can be reused so 
what determines that?



I watched the queries in a postgres client and there was no overlap I could see.


Was this a visual inspection or did you dump the results of the various 
query/parameter combinations into tables and do an SQL comparison?



I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with


Is there a common thread with regard to the parameters in use when 
things lock up?



heap_hot_search_buffer using ~20 of all CPU time on the system.

2016-11-24 19:14 GMT+01:00 Adrian Klaver :

On 11/23/2016 10:41 PM, azhwkd wrote:


The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread azhwkd
It should not be possible because a group does not return to the
update pool before the update hasn't finished.
I watched the queries in a postgres client and there was no overlap I could see.
I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with
heap_hot_search_buffer using ~20 of all CPU time on the system.

2016-11-24 19:14 GMT+01:00 Adrian Klaver :
> On 11/23/2016 10:41 PM, azhwkd wrote:
>>
>> The group ID is part of the primary key of the group_history table. My
>> understanding is that two INSERTs with different group IDs should not
>> collide in this case, or am I wrong in thinking this?
>
>
> After fresh coffee:
>
> In your first post you had this:
> "(In the application this
> query is run up to 10 times in parallel with different parameters)"
>
> where the parameters look to be a timestamp and a group id.
>
> Is it possible that the combination of parameters over the 10 different
> queries is not actually selecting rows in each result set that are unique
> over (group,id,sub_category,"date","hour") between result sets?
>
>
>
>
>>
>> The table definition for group_history is the following:
>>
>> CREATE TABLE public.group_history (
>> group int4 NOT NULL,
>> id int4 NOT NULL,
>> sub_category int4 NOT NULL,
>> "date" date NOT NULL,
>> aa int8 NOT NULL,
>> ab int8 NOT NULL,
>> bb int8 NOT NULL,
>> ba int8 NOT NULL,
>> quantity int8 NOT NULL,
>> "hour" int2 NOT NULL,
>> CONSTRAINT group_history_pk PRIMARY KEY
>> (group,id,sub_category,"date","hour")
>> )
>> WITH (
>> OIDS=FALSE
>> );
>>
>> Kind regards,
>> Sebastian
>>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread Adrian Klaver

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?


After fresh coffee:

In your first post you had this:
"(In the application this
query is run up to 10 times in parallel with different parameters)"

where the parameters look to be a timestamp and a group id.

Is it possible that the combination of parameters over the 10 different 
queries is not actually selecting rows in each result set that are 
unique over (group,id,sub_category,"date","hour") between result sets?






The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread Adrian Klaver

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);


Me slaps head

Windy night last night + lack of sleep + insufficient caffeine = Not 
seeing the table definition before hitting Send on previous post.




Kind regards,
Sebastian

2016-11-24 0:28 GMT+01:00 Adrian Klaver :

On 11/23/2016 01:52 PM, azhwkd wrote:


Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.



Except the INSERT query in the trigger function is working on dates not
group ids.



Kind regards,
Sebastian


Tom Lane > schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azh...@gmail.com  writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries
from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be
fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different
rows?

regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread Adrian Klaver

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?


The suspicion is that this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour") 

is not the problem. That this(from the trigger function) is:

execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES ($1.*)
on conflict ("group", id, sub_category, "when", "hour") do




where _tablename is:

_tablename := 'group_history_'||_startdate;


It would be nice to see the schema definitions for group_history and at 
least one of the group_history_startdate tables.




The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-24 Thread Adrian Klaver

On 11/23/2016 10:26 PM, azhwkd wrote:

I'm sorry. I worded this quite poorly. I meant to say that there were
no log lines added to the postgres logfile at the time.
I hope these are the settings you were refferring to (I did not change
anything in the config files vs. the default)

log_destination = 'stderr'
logging_collector = off
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 10MB
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
event_source = 'PostgreSQL'
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = -1
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t '
log_lock_waits = off
log_statement = 'none'
log_temp_files = -1
log_timezone = 'UTC'


While troubleshooting this I would enable log_connections and 
log_disconnections and change log_statement to 'mod'. This will help you 
see when and what is happening when you do the concurrent queries. FYI, 
you need reload the server to have the changes be seen.




I'm sorry, it seems like I copied the trigger definition from the
wrong table (item_history is a test table I created yesterday while
trying things which is exactly the same as group_history just a
different name).
The trigger on the group_history table is exactly the same though,
except for the table name.

Kind regards,
Sebastian




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread azhwkd
The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian

2016-11-24 0:28 GMT+01:00 Adrian Klaver :
> On 11/23/2016 01:52 PM, azhwkd wrote:
>>
>> Greetings!
>>
>> The parallel calls should not be working on the same row. Each query
>> services a different group ID on it's own and there is no overlap.
>
>
> Except the INSERT query in the trigger function is working on dates not
> group ids.
>
>>
>> Kind regards,
>> Sebastian
>>
>>
>> Tom Lane > schrieb am Mi.,
>> 23. Nov. 2016 um 17:47 Uhr:
>>
>> azh...@gmail.com  writes:
>> > I have a query which if run alone usually completes in about 300ms.
>> > When run in my application this query constantly locks up and bogs
>> > down all connections of the connection pool (In the application this
>> > query is run up to 10 times in parallel with different parameters).
>> > What's really weird is that I can re-run one of the hung queries
>> from
>> > the command line while it's hung and it will complete as expected
>> > while the hung queries continue to use 100% CPU time.
>>
>> Judging from the EXPLAIN timing, most of the work is in the trigger,
>> which leads me to wonder if the parallel calls are likely to be
>> fighting
>> over inserting/updating the same row in the group_history partition
>> tables.  Or are you certain that they should be hitting different
>> rows?
>>
>> regards, tom lane
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread azhwkd
I'm sorry. I worded this quite poorly. I meant to say that there were
no log lines added to the postgres logfile at the time.
I hope these are the settings you were refferring to (I did not change
anything in the config files vs. the default)

log_destination = 'stderr'
logging_collector = off
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 10MB
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
event_source = 'PostgreSQL'
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = -1
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t '
log_lock_waits = off
log_statement = 'none'
log_temp_files = -1
log_timezone = 'UTC'

I'm sorry, it seems like I copied the trigger definition from the
wrong table (item_history is a test table I created yesterday while
trying things which is exactly the same as group_history just a
different name).
The trigger on the group_history table is exactly the same though,
except for the table name.

Kind regards,
Sebastian

2016-11-24 0:26 GMT+01:00 Adrian Klaver :
> On 11/23/2016 01:47 PM, azhwkd wrote:
>>
>> Greetings!
>>
>> Yes I had a look at the logfiles but there is not a single logfile
>> generated when I'm reproducing this issue.
>
>
> We are talking about the Postgres logfile, correct?
>
> I have to believe a log file exists, but maybe not entries at that time.
>
> What are you logging settings?
>
>> Concerning locks I used there queries
>> at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up
>> empty.
>> The group_history table and its sub-tables do not have any foreign keys
>> as they only contain calculated data and the source data is removed
>> after some time.
>> The trigger which calls the partition function below is defined as
>> follows:
>>
>> create
>> trigger group_history_trigger before insert
>> on
>> item_history for each row execute procedure
>> group_history_partition_function()
>
>
> This trigger is on the table item_history not group history, is that
> correct?
>
> If so how does the item_history table fit into this?
>
>>
>> Kind regards,
>> Sebastian
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Adrian Klaver

On 11/23/2016 01:52 PM, azhwkd wrote:

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.


Except the INSERT query in the trigger function is working on dates not 
group ids.




Kind regards,
Sebastian


Tom Lane > schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azh...@gmail.com  writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different rows?

regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Adrian Klaver

On 11/23/2016 01:47 PM, azhwkd wrote:

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.


We are talking about the Postgres logfile, correct?

I have to believe a log file exists, but maybe not entries at that time.

What are you logging settings?


Concerning locks I used there queries
at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys
as they only contain calculated data and the source data is removed
after some time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()


This trigger is on the table item_history not group history, is that 
correct?


If so how does the item_history table fit into this?



Kind regards,
Sebastian



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query with large in clauses uses a lot of memory

2016-11-23 Thread Tom Lane
greigwise  writes:
> So, I decided to try an experiment.  I wrote 2 queries as follows:
> 1 ) select pg_sleep(100) ;
> 2 ) with q (s1, s2) as (select pg_sleep(100), 1)
> select * from q where s2 in ( 1,  delimited numbers>)
>
> It looks to me like the connection running the big query is using about 2GB
> more memory than the other one.  I could see why it might use *some* more
> (like 28MB more?), but 2GB more seems excessive. 

Don't hold your breath waiting for that to get better.  Depending on what
I assume about the widths of your numbers, you've got something like 3
million Const parse nodes in that query, so the system is eating something
like 600-700 bytes per Const, which is not all that many copies because
one Const node plus List overhead is probably 100 bytes on a 64-bit
server.  OK, it's not exactly frugal perhaps, but it would not be hard to
get to that at all if you're running the query in a way that requires
keeping a plancache entry for it.  It would take significant work (and
probably some performance sacrifices) to make much of a dent in the
space consumption, and even if we put in the work, I'd only expect to
be able to dent it a bit --- an order-of-magnitude reduction is not in
the cards.  Queries with that many parse elements in them are just not
cheap.

Now, that WHERE condition will eventually get folded to the form

s2 = ANY ('{1,2,...}'::integer[])

and that constant array is a *lot* less space-wasteful, only 4 bytes
per element (or 8 bytes if we're talking bigints).  So the approach I'd
advise is trying to send the query with a constant array to begin with
--- either write it like that, or like

s2 = ANY ($1::integer[])

and send the array as an out-of-line parameter.

Don't know how hard it might be to arm-wrestle ActiveRecord into doing
it like that :-(

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query with large in clauses uses a lot of memory

2016-11-23 Thread greigwise
I had an issue today where the OOM killer terminated one of my postgres
processes.   
On my server I have 8 GB of RAM, shared_memory is 1 GB and work_memory is
24MB.
I have connection pooling which limits us to 25 connections.  Even if I'm
maxed out there, I'm still only using 1.6 MB of RAM of my 8 which seems like
it shouldn't be a problem.

Looking through my postgres logs, I noticed that right about the time of the
OOM incident, I had some queries running with pretty massive in clauses
(thank you ruby/ActiveRecord).  One of the queries was about 28MB in size.

So, I decided to try an experiment.  I wrote 2 queries as follows:
1 ) select pg_sleep(100) ;
2 ) with q (s1, s2) as (select pg_sleep(100), 1)
select * from q where s2 in ( 1, )

I ran those queries via psql and did this:

-sh-4.1$ ps aux | grep -i -E "local|COMMAND" | grep -v ruby
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
postgres 20896 27.0 28.2 3416812 2132112 ? Ss   21:18   0:02 postgres:
hireology hireology [local] SELECT
postgres 20899  0.0  0.0 1281368 4800 ?Ss   21:18   0:00 postgres:
hireology hireology [local] SELECT

It looks to me like the connection running the big query is using about 2GB
more memory than the other one.  I could see why it might use *some* more
(like 28MB more?), but 2GB more seems excessive. 

So, the question is why does it use so much more memory.  And is there
anything I can do to limit this problem other than fixing the silly queries?

Thanks in advance for any help,
Greig Wise



--
View this message in context: 
http://postgresql.nabble.com/Query-with-large-in-clauses-uses-a-lot-of-memory-tp5931716.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread azhwkd
Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Kind regards,
Sebastian


Tom Lane  schrieb am Mi., 23. Nov. 2016 um 17:47 Uhr:

> azh...@gmail.com writes:
> > I have a query which if run alone usually completes in about 300ms.
> > When run in my application this query constantly locks up and bogs
> > down all connections of the connection pool (In the application this
> > query is run up to 10 times in parallel with different parameters).
> > What's really weird is that I can re-run one of the hung queries from
> > the command line while it's hung and it will complete as expected
> > while the hung queries continue to use 100% CPU time.
>
> Judging from the EXPLAIN timing, most of the work is in the trigger,
> which leads me to wonder if the parallel calls are likely to be fighting
> over inserting/updating the same row in the group_history partition
> tables.  Or are you certain that they should be hitting different rows?
>
> regards, tom lane
>


Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread azhwkd
Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.
Concerning locks I used there queries at
https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys as
they only contain calculated data and the source data is removed after some
time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()

Kind regards,
Sebastian

Adrian Klaver  schrieb am Mi., 23. Nov. 2016 um
16:25 Uhr:

> On 11/22/2016 11:41 PM, azh...@gmail.com wrote:
> > Greetings!
> >
> > I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
> > installation which was installed through apt-get.
> >
> > I have a query which if run alone usually completes in about 300ms.
> > When run in my application this query constantly locks up and bogs
> > down all connections of the connection pool (In the application this
> > query is run up to 10 times in parallel with different parameters).
> > What's really weird is that I can re-run one of the hung queries from
> > the command line while it's hung and it will complete as expected
> > while the hung queries continue to use 100% CPU time.
>
> Have you looked in the Postgres logs to see if there are any relevant
> messages?
>
> Best guess is that running in parallel is creating a lock problem where
> one of the queries is holding a lock that is keeping the others from
> completing.
>
> What relationships do the group_history and group_history_* tables have
> to other tables?
>
> What is the definition of the trigger that fires the trigger function
> below?
>
> >
> > The query in question is this:
> >
> > insert into group_history ("group", id, sub_category, "date", aa, ab,
> > bb, ba, quantity, "hour")
> > (select
> > a."group",
> > a.id,
> > b.sub_category,
> > to_timestamp($2)::date as "date",
> > max(a.aa / a.quantity) as aa,
> > min(a.aa / a.quantity) as ab,
> > max(a.bb / a.quantity) as bb,
> > min(a.bb/ a.quantity) as ba,
> > sum(a.quantity) as quantity,
> > extract('hour' from to_timestamp($2)) as "hour"
> > from tbla a
> > join tblb b on a.id = b.id
> > where a."group" = $1 and b."group" = $1
> > group by a."group", a.id, b.sub_category
> > );
> >
> > When I'm running a perf on the system it looks like this while running
> > the query 10 times:
> >
> > Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
> > Overhead Shared Object Symbol
> > 17.95% postgres [.] heap_hot_search_buffer
> > 5.64% postgres [.] heap_page_prune_opt
> > 4.62% postgres [.] hash_search_with_hash_value
> > 3.80% postgres [.] LWLockRelease
> > 3.73% postgres [.] 0x002f420d
> > 2.50% postgres [.] _bt_checkkeys
> > 2.48% postgres [.] hash_any
> > 2.45% postgres [.] 0x002f41e7
> > 2.10% postgres [.] slot_getattr
> > 1.80% postgres [.] ResourceOwnerForgetBuffer
> > 1.58% postgres [.] LWLockAcquire
> > 1.58% postgres [.] ReadBufferExtended
> > 1.54% postgres [.] index_fetch_heap
> > 1.47% postgres [.] MemoryContextReset
> > 1.43% postgres [.] btgettuple
> > 1.38% postgres [.] 0x002d710c
> > 1.36% postgres [.] 0x002d70a5
> > 1.35% postgres [.] ExecQual
> >
> > Explain (Analyze, Verbose) Output
> >
> > QUERY PLAN
> >
> 
> >
> -
> > Insert on public.group_history (cost=10254.36..10315.16 rows=2432
> > width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
> > -> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
> > width=62) (actual time=353.880..376.490 rows=6139 loops=1)
> > Output: "*SELECT*"."group", "*SELECT*".id,
> > "*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
> > "*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
> > "*SELECT*"."hour"
> > -> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
> > (actual time=353.871..367.144 rows=6139 loops=1)
> > Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
> > max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
> > a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
> > precision
> > Group Key: a."group", a.id, b.sub_category
> > -> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
> > (actual time=193.949..294.106 rows=30343 loops=1)
> > Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
> > Hash Cond: (b.id = a.id)
> > -> Bitmap Heap Scan on public.auctions_extra b
> > (cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
> > rows=30343 loops=1)
> > Output: b.sub_category, b.id
> > Recheck Cond: (b."group" = 7)
> > Heap Blocks: exact=289
> 

[GENERAL] query locks up when run concurrently

2016-11-23 Thread azhwkd
Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead  Shared Object Symbol
  17.95%  postgres  [.] heap_hot_search_buffer
   5.64%  postgres  [.] heap_page_prune_opt
   4.62%  postgres  [.] hash_search_with_hash_value
   3.80%  postgres  [.] LWLockRelease
   3.73%  postgres  [.] 0x002f420d
   2.50%  postgres  [.] _bt_checkkeys
   2.48%  postgres  [.] hash_any
   2.45%  postgres  [.] 0x002f41e7
   2.10%  postgres  [.] slot_getattr
   1.80%  postgres  [.] ResourceOwnerForgetBuffer
   1.58%  postgres  [.] LWLockAcquire
   1.58%  postgres  [.] ReadBufferExtended
   1.54%  postgres  [.] index_fetch_heap
   1.47%  postgres  [.] MemoryContextReset
   1.43%  postgres  [.] btgettuple
   1.38%  postgres  [.] 0x002d710c
   1.36%  postgres  [.] 0x002d70a5
   1.35%  postgres  [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

   
-
Insert on public.group_history  (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
  ->  Subquery Scan on "*SELECT*"  (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
->  HashAggregate  (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
  Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
  Group Key: a."group", a.id, b.sub_category
  ->  Hash Join  (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
->  Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
  Output: b.sub_category, b.id
  Recheck Cond: (b."group" = 7)
  Heap Blocks: exact=289
  ->  Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
->  Hash  (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
  Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
  Buckets: 16384  Batches: 4  Memory Usage: 638kB
  ->  Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
->  Bitmap Index Scan on 

Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Tom Lane
azh...@gmail.com writes:
> I have a query which if run alone usually completes in about 300ms.
> When run in my application this query constantly locks up and bogs
> down all connections of the connection pool (In the application this
> query is run up to 10 times in parallel with different parameters).
> What's really weird is that I can re-run one of the hung queries from
> the command line while it's hung and it will complete as expected
> while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables.  Or are you certain that they should be hitting different rows?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query locks up when run concurrently

2016-11-23 Thread Adrian Klaver

On 11/22/2016 11:41 PM, azh...@gmail.com wrote:

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.


Have you looked in the Postgres logs to see if there are any relevant 
messages?


Best guess is that running in parallel is creating a lock problem where 
one of the queries is holding a lock that is keeping the others from 
completing.


What relationships do the group_history and group_history_* tables have 
to other tables?


What is the definition of the trigger that fires the trigger function below?



The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead Shared Object Symbol
17.95% postgres [.] heap_hot_search_buffer
5.64% postgres [.] heap_page_prune_opt
4.62% postgres [.] hash_search_with_hash_value
3.80% postgres [.] LWLockRelease
3.73% postgres [.] 0x002f420d
2.50% postgres [.] _bt_checkkeys
2.48% postgres [.] hash_any
2.45% postgres [.] 0x002f41e7
2.10% postgres [.] slot_getattr
1.80% postgres [.] ResourceOwnerForgetBuffer
1.58% postgres [.] LWLockAcquire
1.58% postgres [.] ReadBufferExtended
1.54% postgres [.] index_fetch_heap
1.47% postgres [.] MemoryContextReset
1.43% postgres [.] btgettuple
1.38% postgres [.] 0x002d710c
1.36% postgres [.] 0x002d70a5
1.35% postgres [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

-
Insert on public.group_history (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
-> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
Group Key: a."group", a.id, b.sub_category
-> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
-> Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
Output: b.sub_category, b.id
Recheck Cond: (b."group" = 7)
Heap Blocks: exact=289
-> Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
-> Hash (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
Buckets: 16384 Batches: 4 Memory Usage: 638kB
-> Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
-> Bitmap Index Scan on tbla_pk
(cost=0.00..682.12 rows=30627 width=0) (actual time=57.801..57.801
rows=30343 loops=1)
Index Cond: (a."group" = 7)
Planning time: 0.475 ms
Trigger group_history_trigger: time=1442.561 calls=6139
Execution time: 1834.119 ms


group_history_trigger:

CREATE OR REPLACE FUNCTION public.group_history_partition_function()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare
_new_date timestamptz;
_tablename text;
_startdate text;
begin
-- Takes the current inbound "when" 

[GENERAL] query locks up when run concurrently

2016-11-22 Thread azhwkd
Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead Shared Object Symbol
17.95% postgres [.] heap_hot_search_buffer
5.64% postgres [.] heap_page_prune_opt
4.62% postgres [.] hash_search_with_hash_value
3.80% postgres [.] LWLockRelease
3.73% postgres [.] 0x002f420d
2.50% postgres [.] _bt_checkkeys
2.48% postgres [.] hash_any
2.45% postgres [.] 0x002f41e7
2.10% postgres [.] slot_getattr
1.80% postgres [.] ResourceOwnerForgetBuffer
1.58% postgres [.] LWLockAcquire
1.58% postgres [.] ReadBufferExtended
1.54% postgres [.] index_fetch_heap
1.47% postgres [.] MemoryContextReset
1.43% postgres [.] btgettuple
1.38% postgres [.] 0x002d710c
1.36% postgres [.] 0x002d70a5
1.35% postgres [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

-
Insert on public.group_history (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
-> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
Group Key: a."group", a.id, b.sub_category
-> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
-> Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
Output: b.sub_category, b.id
Recheck Cond: (b."group" = 7)
Heap Blocks: exact=289
-> Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
-> Hash (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
Buckets: 16384 Batches: 4 Memory Usage: 638kB
-> Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
-> Bitmap Index Scan on tbla_pk
(cost=0.00..682.12 rows=30627 width=0) (actual time=57.801..57.801
rows=30343 loops=1)
Index Cond: (a."group" = 7)
Planning time: 0.475 ms
Trigger group_history_trigger: time=1442.561 calls=6139
Execution time: 1834.119 ms


group_history_trigger:

CREATE OR REPLACE FUNCTION public.group_history_partition_function()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare
_new_date timestamptz;
_tablename text;
_startdate text;
begin
-- Takes the current inbound "when" value and determines when
midnight is for the given date
_new_date := date_trunc('day', new."when");
_startdate := to_char(_new_date, '_MM_DD');
_tablename := 'group_history_'||_startdate;

-- Insert the current record into the correct partition
execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES ($1.*)
on conflict ("group", id, sub_category, "when", "hour") do
update set aa = excluded.aa,
ab = excluded.ab,
bb = excluded.bb,
ba = 

Re: [GENERAL] Query generator

2016-10-16 Thread Vinicius Segalin
2016-10-16 8:54 GMT-02:00 Andreas Seltenreich :

> Vinicius Segalin  writes:
>
> > 2016-09-29 16:32 GMT-03:00 Julien Rouhaud :
> >
> > >  You should try sqlsmith (https://github.com/anse1/sqlsmith), which
> works
> > >  very well.
> >
> > I had found this one before, but all I could get was queries using
> > "standard" tables, like the ones from pg_catalog and
> > information_schema. It didn't generate queries from the tables I've
> > created. Was I doing something wrong?
>
> It should pick your relations with the same likelihood it picks the
> catalog tables.  You could verify this for with, e.g.,
>
> sqlsmith --dry-run | grep your_table_name
>
> Thanks to Julien, sqlsmith's master branch now has an option
> --exclude-catalog that inhibits use of catalog relations.  If you are
> building from github, you might want to give it another chance with this
> option.  I'll probably do a proper release in next two weeks.
>
> Feedback welcome :-)
>
> regards,
> Andreas
>

Yes, I was talking in private with him and he came up with this solution.
Thank you again, Julien.
I need to try it again, did a quick test on friday, but it looks like it
works perfectly! :)


Re: [GENERAL] Query generator

2016-10-16 Thread Andreas Seltenreich
Vinicius Segalin  writes:

> 2016-09-29 16:32 GMT-03:00 Julien Rouhaud :
>
> >  You should try sqlsmith (https://github.com/anse1/sqlsmith), which works
> >  very well.
>
> I had found this one before, but all I could get was queries using
> "standard" tables, like the ones from pg_catalog and
> information_schema. It didn't generate queries from the tables I've
> created. Was I doing something wrong?

It should pick your relations with the same likelihood it picks the
catalog tables.  You could verify this for with, e.g.,

sqlsmith --dry-run | grep your_table_name

Thanks to Julien, sqlsmith's master branch now has an option
--exclude-catalog that inhibits use of catalog relations.  If you are
building from github, you might want to give it another chance with this
option.  I'll probably do a proper release in next two weeks.

Feedback welcome :-)

regards,
Andreas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query help

2016-10-04 Thread Daniel Caldeweyher
Try this:

select distinct vendor_no, vendor_name
from ap_vendors
where vendor_no in (
   select vendor_no from ap_vendors
   group by vendor_no
   having array_agg(company_code) @> ARRAY['BUR','EBC','SNJ'])


On Wed, Oct 5, 2016 at 1:31 PM, Bret Stern  wrote:

> Good evening,
> I'm curious about a way to ask the following question of my vendors
> table.
>
> psuedo1 "select all vendors which exist in BUR and EBC and SNJ"
>
> and
> psuedo2 "select all vendors which DO NOT exist in all three show rooms
>
>
> The data is from a Sage accounting system which I pull out and place in a
> pg db. What we are trying to do is identify which vendors are defined in
> all three of our showrooms, and vice-versa, which ones are not.
>
> ap_vendors table
>   company_code character varying(10) NOT NULL,
>   ap_division_no character varying(2) NOT NULL,
>   vendor_no character varying(7) NOT NULL,
>   terms_code character varying(2),
>   vendor_name character varying(30),
>   address_line1 character varying(30),
>   address_line2 character varying(30),
>   address_line3 character varying(30),
>   city character varying(20),
>   state character varying(2),
>   zip_code character varying(10),
>   telephone_no character varying(17),
>   fax_no character varying(17),
>   CONSTRAINT ap_vendors_pk PRIMARY KEY (company_code, ap_division_no,
> vendor_no)
>
> sample records:
> "BUR";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";
> "92831";"''";"''"
> "BUR";"00";"AGORA";"30";"Agora Natural Surfaces";"''";"''";"''";"''";
> "''";"90248";"''";"''"
> "BUR";"00";"AKDOP";"30";"AKDO Pacific";"''";"''";"''";"''";"
> ''";"94545";"''";"''"
> "EBC";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";
> "92831";"''";"''"
> "EBC";"00";"AGORA";"30";"Agora Natural Surfaces";"''";"''";"''";"''";
> "''";"90248";"''";"''"
> "EBC";"00";"ARIZ01";"30";"Arizona Tile";"''";"''";"''";"''";"''"
> ;"94550";"''";"''"
> "SNJ";"00";"AKDOP";"30";"AKDO Pacific";"''";"''";"''";"''";"
> ''";"94545";"''";"''"
> "SNJ";"00";"AGORA";"30";"Agora Natural Surfaces";"''";"''";"''";"''";
> "''";"90248";"''";"''"
> "SNJ";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";
> "92831";"''";"''"
>
> What I need is a query which I can modify to return only vendors which
> exists
> in all three company_code columns ( BUR, EBC, SNJ) (they represent
> showroom location)
>
> eg; exists in BUR, EBC, SNJ
> ADEXU
> AGORA
>
> OR
>
> modify the query to return only the vendors which DO NOT exist in all
> three showrooms based on the first column company_code
>
> eg;
> AKDOP only exists in BUR and SNJ
> ARIZ01 only exists in EBC
>
> Thanks
> Bret
>
>
>


Re: [GENERAL] Query help

2016-10-04 Thread Rob Sargent

> On Oct 4, 2016, at 9:31 PM, Bret Stern  
> wrote:
> 
> Good evening,
> I'm curious about a way to ask the following question of my vendors
> table.
> 
> psuedo1 "select all vendors which exist in BUR and EBC and SNJ"
> 
> and
> psuedo2 "select all vendors which DO NOT exist in all three show rooms
> 
> 
> The data is from a Sage accounting system which I pull out and place in a
> pg db. What we are trying to do is identify which vendors are defined in
> all three of our showrooms, and vice-versa, which ones are not.
> 
> ap_vendors table
>   company_code character varying(10) NOT NULL,
>   ap_division_no character varying(2) NOT NULL,
>   vendor_no character varying(7) NOT NULL,
>   terms_code character varying(2),
>   vendor_name character varying(30),
>   address_line1 character varying(30),
>   address_line2 character varying(30),
>   address_line3 character varying(30),
>   city character varying(20),
>   state character varying(2),
>   zip_code character varying(10),
>   telephone_no character varying(17),
>   fax_no character varying(17),
>   CONSTRAINT ap_vendors_pk PRIMARY KEY (company_code, ap_division_no, 
> vendor_no)
> 
> sample records:
> "BUR";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> "BUR";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "BUR";"00";"AKDOP";"30";"AKDO 
> Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
> "EBC";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> "EBC";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "EBC";"00";"ARIZ01";"30";"Arizona 
> Tile";"''";"''";"''";"''";"''";"94550";"''";"''"
> "SNJ";"00";"AKDOP";"30";"AKDO 
> Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
> "SNJ";"00";"AGORA";"30";"Agora Natural 
> Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
> "SNJ";"00";"ADEXU";"30";"ADEX USA";"''";"''";"''";"''";"''";"92831";"''";"''"
> 
> What I need is a query which I can modify to return only vendors which exists
> in all three company_code columns ( BUR, EBC, SNJ) (they represent showroom 
> location)
> 
> eg; exists in BUR, EBC, SNJ
> ADEXU
> AGORA
> 
> OR
> 
> modify the query to return only the vendors which DO NOT exist in all
> three showrooms based on the first column company_code
> 
> eg; 
> AKDOP only exists in BUR and SNJ
> ARIZ01only exists in EBC
> 
> Thanks
> Bret
> 
> 
Not sure I like the schema but 
select vendor_no, count(*) from ap_vendors having count(*) = 3;

and maybe count(*) < 3 is your second answer.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query help

2016-10-04 Thread Bret Stern
Good evening,
I'm curious about a way to ask the following question of my vendors
table.

psuedo1 "select all vendors which exist in BUR and EBC and SNJ"

and
psuedo2 "select all vendors which DO NOT exist in all three show rooms


The data is from a Sage accounting system which I pull out and place in
a
pg db. What we are trying to do is identify which vendors are defined in
all three of our showrooms, and vice-versa, which ones are not.

ap_vendors table
  company_code character varying(10) NOT NULL,
  ap_division_no character varying(2) NOT NULL,
  vendor_no character varying(7) NOT NULL,
  terms_code character varying(2),
  vendor_name character varying(30),
  address_line1 character varying(30),
  address_line2 character varying(30),
  address_line3 character varying(30),
  city character varying(20),
  state character varying(2),
  zip_code character varying(10),
  telephone_no character varying(17),
  fax_no character varying(17),
  CONSTRAINT ap_vendors_pk PRIMARY KEY (company_code, ap_division_no,
vendor_no)

sample records:
"BUR";"00";"ADEXU";"30";"ADEX
USA";"''";"''";"''";"''";"''";"92831";"''";"''"
"BUR";"00";"AGORA";"30";"Agora Natural
Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
"BUR";"00";"AKDOP";"30";"AKDO
Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
"EBC";"00";"ADEXU";"30";"ADEX
USA";"''";"''";"''";"''";"''";"92831";"''";"''"
"EBC";"00";"AGORA";"30";"Agora Natural
Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
"EBC";"00";"ARIZ01";"30";"Arizona
Tile";"''";"''";"''";"''";"''";"94550";"''";"''"
"SNJ";"00";"AKDOP";"30";"AKDO
Pacific";"''";"''";"''";"''";"''";"94545";"''";"''"
"SNJ";"00";"AGORA";"30";"Agora Natural
Surfaces";"''";"''";"''";"''";"''";"90248";"''";"''"
"SNJ";"00";"ADEXU";"30";"ADEX
USA";"''";"''";"''";"''";"''";"92831";"''";"''"

What I need is a query which I can modify to return only vendors which
exists
in all three company_code columns ( BUR, EBC, SNJ) (they represent
showroom location)

eg; exists in BUR, EBC, SNJ
ADEXU
AGORA

OR

modify the query to return only the vendors which DO NOT exist in all
three showrooms based on the first column company_code

eg; 
AKDOP   only exists in BUR and SNJ
ARIZ01  only exists in EBC

Thanks
Bret




Re: [GENERAL] Query killed with Out of memory

2016-10-04 Thread Chris Mair

With a heavy query, when line number results raise over 600k query hangs with 
out of memory.

Here is the explain analyze:
[...]
Work_mem is.512mb, shared buffers 3084mb and system Ram 10Gb. Postgres version 
is 8.4.8 and for some months i cannot upgrade.

Is there a way to solve the problem?


Hi,

a few ideas:

 - what goes out of memory? The client? if you query from a programming 
language you should set the fetch/batch size
   to some value so that it won't fetch the whole 600k rows into memory... for 
the psql client you can do
   \set FETCH_COUNT 1000

 - work_mem 512 MB is high-ish unless you have a small value for 
max_connection...

 - 8.4.8 was released in 2011, the latest 8.4 release is 8.4.22, you'r missing 
lots of patches (and 8.4 was EOLed more
   than two years ago)

Bye,
Chris.










--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query killed with Out of memory

2016-10-04 Thread Job
Hello,

With a heavy query, when line number results raise over 600k query hangs with 
out of memory.

Here is the explain analyze:

CTE Scan on lista  (cost=25066.66..47721.23 rows=3678 width=260)

   CTE lista

 ->  Unique  (cost=24956.32..25066.66 rows=3678 width=512)

   ->  Sort  (cost=24956.32..24965.52 rows=3678 width=512)

 Sort Key: "*SELECT* 1".id, "*SELECT* 1".data_log, "*SELECT* 
1".type_log, "*SELECT* 1".ip, "*SELECT* 1".log_id, "*SELECT* 1".url_dominio, 
"*SELECT* 1".porta, "*SELECT* 1".action_bind, "*SELECT* 1".action, 
('DNS_DENIED/403'

::text), "*SELECT* 1".array_dominio

     ->  Append  (cost=905.76..24738.50 rows=3678 width=512)

   ->  Subquery Scan "*SELECT* 1"  (cost=905.76..12423.64 
rows=3652 width=512)

 ->  Hash Anti Join  (cost=905.76..12387.12 
rows=3652 width=512)

   Hash Cond: 
(((public.webtraffic_bind.ip)::text = (wb.ip)::text) AND 
((public.webtraffic_bind.log_id)::text = (wb.log_id)::text) AND 
(public.webtraffic_bind.url_dominio = wb.url_dominio) AND ((public.webt

raffic_bind.porta)::text = (wb.porta)::text))

   Join Filter: ((wb.data_log >= 
public.webtraffic_bind.data_log) AND (wb.data_log < 
(public.webtraffic_bind.data_log + '00:02:00'::interval)))

   ->  Bitmap Heap Scan on webtraffic_bind  
(cost=269.23..11638.68 rows=3678 width=512)

 Recheck Cond: ((type_log)::text = 
'queries'::text)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: ((type_log)::text = 
'queries'::text)

   ->  Hash  (cost=636.17..636.17 rows=18 
width=274)

 ->  Bitmap Heap Scan on 
webtraffic_bind wb  (cost=564.94..636.17 rows=18 width=274)

   Recheck Cond: (((type_log)::text 
= 'security'::text) AND ((action_bind)::text = 'approved'::text))

   ->  BitmapAnd  
(cost=564.94..564.94 rows=18 width=0)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: 
((type_log)::text = 'security'::text)

 ->  Bitmap Index Scan on 
action_bind_wbidx  (cost=0.00..296.37 rows=3678 width=0)

   Index Cond: 
((action_bind)::text = 'approved'::text)

   ->  Subquery Scan "*SELECT* 2"  (cost=905.76..12314.86 
rows=26 width=512)

 ->  Hash Semi Join  (cost=905.76..12314.60 rows=26 
width=512)

   Hash Cond: 
(((public.webtraffic_bind.ip)::text = (wb.ip)::text) AND 
((public.webtraffic_bind.log_id)::text = (wb.log_id)::text) AND 
(public.webtraffic_bind.url_dominio = wb.url_dominio) AND ((public.webt

raffic_bind.porta)::text = (wb.porta)::text))

   Join Filter: ((wb.data_log >= 
public.webtraffic_bind.data_log) AND (wb.data_log < 
(public.webtraffic_bind.data_log + '00:02:00'::interval)))

   ->  Bitmap Heap Scan on webtraffic_bind  
(cost=269.23..11638.68 rows=3678 width=512)

 Recheck Cond: ((type_log)::text = 
'queries'::text)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: ((type_log)::text = 
'queries'::text)

   ->  Hash  (cost=636.17..636.17 rows=18 
width=274)

 ->  Bitmap Heap Scan on 
webtraffic_bind wb  (cost=564.94..636.17 rows=18 width=274)

   Recheck Cond: (((type_log)::text 
= 'security'::text) AND ((action_bind)::text = 'approved'::text))

   ->  BitmapAnd  
(cost=564.94..564.94 rows=18 width=0)

 ->  Bitmap Index Scan on 
type_log_wbidx  (cost=0.00..268.31 rows=3678 width=0)

   Index Cond: 
((type_log)::text = 'security'::text)

 ->  Bitmap Index Scan on 
action_bind_wbidx  (cost=0.00..296.37 rows=3678 width=0)

   Index Cond: 
((action_bind)::text = 'approved'::text)

   SubPlan 2

 ->  Index Scan using stpestensioni_domini_idx on stpestensioni_domini  
(cost=0.01..12.18 rows=2 width=0)

   Index Cond: ((estensione)::text = 

Re: [GENERAL] Query generator

2016-10-02 Thread Julien Rouhaud
On 29/09/2016 23:23, Vinicius Segalin wrote:
> 2016-09-29 16:32 GMT-03:00 Julien Rouhaud  >:
> You should try sqlsmith (https://github.com/anse1/sqlsmith
> ), which works
> very well.
> 
> 
> I had found this one before, but all I could get was queries using
> "standard" tables, like the ones from pg_catalog and information_schema.
> It didn't generate queries from the tables I've created. Was I doing
> something wrong?
> 

I don't think so, unless you specified an empty database with the
--target option. I never had this issue myself, so I can only advice you
to open an issue on the project repository if you still have this problem.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query generator

2016-09-29 Thread Vinicius Segalin
2016-09-29 16:32 GMT-03:00 Julien Rouhaud :

> On 29/09/2016 21:27, Vinicius Segalin wrote:
> > Hi everyone,
> >
>
> Hello,
>
> > Does anyone know a random query generator for Postgres? Something that
> > gets my schema and, based on that, generates hundreds of different
> > queries with join, group by, etc.
> >
>
> You should try sqlsmith (https://github.com/anse1/sqlsmith), which works
> very well.


I had found this one before, but all I could get was queries using
"standard" tables, like the ones from pg_catalog and information_schema. It
didn't generate queries from the tables I've created. Was I doing something
wrong?


Re: [GENERAL] Query generator

2016-09-29 Thread Julien Rouhaud
On 29/09/2016 21:27, Vinicius Segalin wrote:
> Hi everyone,
> 

Hello,

> Does anyone know a random query generator for Postgres? Something that
> gets my schema and, based on that, generates hundreds of different
> queries with join, group by, etc.
> 

You should try sqlsmith (https://github.com/anse1/sqlsmith), which works
very well.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Query generator

2016-09-29 Thread Vinicius Segalin
Hi everyone,

Does anyone know a random query generator for Postgres? Something that gets
my schema and, based on that, generates hundreds of different queries with
join, group by, etc.

Thank you.


[GENERAL] Query planner using hash join when merge join seems orders of magnitude faster

2016-07-30 Thread Branden Visser
Hi all, I seem to be in a bit of a jam with the query planner choosing
a significantly less optimal route for a query.

First, a common check list I've seen around when researching these
types of issues on list:

1. pgsql version is PostgreSQL 9.4.7 running on Amazon RDS
2. Vacuum analyze was run
3. default_statistics_target was increased to 10,000 for the whole
test database and re-analyzed

Code and info referenced is in a github gist [1]. The files are:

[Q-Report.sql] The actual SQL query I'm trying to optimize, runs a
report on a large set of data
[QP-Report-Normal.txt] The query plan of Q-Report.sql without any
modification of the query plan
[QP-Report-DisableHashJoinltxt] The query plan of Q-Report.sql after
`set enable_hashjoin = false`
[Q-CC-Underestimated.sql] An isolated query that shows 1 of
potentially 2 significant row count underestimations
[QP-CC-Understimated.txt] The query plan of Q-CC-Underestimated.sql

Essentially, my report query has a hash join that is taking about 70%
of the query time (45s), whereas a merge join seems to take less than
a second.

It looks like there are probably 2 significant row underestimations
which may be contributing to this (first of which is the Q-CC query),
but I haven't been able to validate that this is specifically the
reason the query planner picks a hash join.

My usage on this schema are mostly queries that get very small slices
of data, however this particular query is a background workload that
generates denormalized reports. That said, any kind of global
configuration being added that would jeopardize the performance of the
small-slice queries is less desirable.

My question is, what would be the right way to go about diagnosing and
optimizing this kind of issue? I'm trying to think of ways to
potentially alter the structure of the query to reduce impact of the
planning estimates, but I'm not coming up with any good ideas there.
As mentioned, stats target was increased to 10,000 for testing and
while it changed the row estimates, it was only by about 12 rows --
did I maybe do something wrong there?

With this particular kind of query, I feel that I'd be willing to
trade out for a vast "overestimation" scenario for the analyzer, but I
can't find anything that allows me to tweak this for a particular
query. I've come across pg_hint_plan but doesn't seem like it would
support my version of PSQL and I'd prefer to be able to stay on RDS if
possible.

I came across a pending patch for multivariate statistics, but that
seems a little bit far out at this point.

Any advice for workarounds or solutions would be greatly appreciated!

Thanks in advance,
Branden

[1] https://gist.github.com/mrvisser/a22dddbdaa64ae1aa3dbc9637617b70d


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query from two tables return error

2016-04-01 Thread arnaud gaboury
On Fri, Apr 1, 2016 at 4:04 PM, Adrian Klaver 
wrote:

> On 04/01/2016 06:52 AM, arnaud gaboury wrote:
>
>>
>>
>> On Fri, Apr 1, 2016 at 3:33 PM, Sándor Daku >
>
>
>
>> One of many difficulties with computers that they do what you say
>> them to do, not what you think or you think you are saying. :)
>> Lets see:
>> SELECT d.home_dir
>> FROM email.mail_dir d, email.mailusers u  <- make a join between
>> mail_dir and mailusers = join every(!) record from the first table
>> with every(!) record from the second table
>> WHERE u.username='arnaud.gaboury'; <- but I need only those from the
>> joined records where the username is arnaud.gaboury
>>
>> And there, you have it.
>>
>> You can simply redefine the view.
>>
>>   SELECT *,((mailusers.domain_name || '/'::text) ||
>> mailusers.username) || '/'::text AS home_dir
>> FROM email.mailusers;
>>
>>
>> Thank you so much. This way I get all needed info in one view.
>>
>
I even added a new column in my view to catch the email address:


SELECT *,((mailusers.domain_name || '/'::text) || mailusers.username) ||
'/'::text AS home_dir,
username||'@'||domain_name AS email_address
FROM email.mailusers;

Now, I am wondering if there is a best practice between :
- one large and complete table
- a lighter table with some views



>
> This might help understand what is going on:
>
> http://www.postgresql.org/docs/9.5/interactive/tutorial-join.html


Thank you for the link.

>
>
>
>>
>> Notice the * after the SELECT statement.
>> So you have all the data plus the homedir.
>>
>> You can leave out the whole view thing and incorporate the home_dir
>> expression right into your select.
>> Or you can write a function which makes this to you with the usename
>> as argument.
>>
>> Regards,
>> Sándor
>>
>>
>>
>>
>>
>> --
>>
>> google.com/+arnaudgabourygabx
>> <
>> https://plus.google.com/_/notifications/emlink?emr=05814804238976922326=CKiv-v6PvboCFcfoQgod6msAAA=%2F116159236040461325607%2Fop%2Fu=1383086841306=50
>> >
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>



-- 

google.com/+arnaudgabourygabx



Re: [GENERAL] Query from two tables return error

2016-04-01 Thread Adrian Klaver

On 04/01/2016 06:52 AM, arnaud gaboury wrote:



On Fri, Apr 1, 2016 at 3:33 PM, Sándor Daku 




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Query from two tables return error

2016-04-01 Thread arnaud gaboury
On Fri, Apr 1, 2016 at 3:33 PM, Sándor Daku  wrote:

>
> On 1 April 2016 at 11:33, arnaud gaboury  wrote:
>
>> I have one table email.mail.mailusers :
>>
>>Table "email.mailusers"
>>Column|   Type   |
>> Modifiers  | Storage
>>   | Stats target | Description
>>
>> -+--+-+
>> --+--+-
>>  username| text | not
>> null| extende
>> d |  |
>>  password| text | not
>> null| extende
>> d |  |
>>  domain_name | text | not null default '
>> thetradinghall.com'::text | extende
>> d |  |
>>  created | timestamp with time zone | not null default
>> now()  | plain
>>   |  |
>> Indexes:
>> "mailusers_pkey" PRIMARY KEY, btree (username, domain_name)
>> Foreign-key constraints:
>> "mailusers_domain_fk" FOREIGN KEY (domain_name) REFERENCES
>> email.domainlist(domain_name)
>>
>> username|password  |domain_name |  created
>>
>> +--+---
>>  arnaud.gaboury | YY | thetradinghall.com | 2016-02-04
>> 09:48:58.834774+01
>>
>>  admin  |XXXYY | thetradinghall.com | 2016-03-29
>> 09:58:14.599743+02
>>
>>  postmaster | XXXYY | thetradinghall.com | 2016-03-31
>> 16:36:18.96176+02
>>
>> --
>>
>> I have one view email.mail_dir :
>>
>> View "email.mail_dir"
>>   Column  | Type | Modifiers | Storage  | Description
>> --+--+---+--+-
>>  home_dir | text |   | extended |
>> View definition:
>>  SELECT ((mailusers.domain_name || '/'::text) || mailusers.username) ||
>> '/'::text AS home_dir
>>FROM email.mailusers;
>>
>>   home_dir
>> 
>>  thetradinghall.com/arnaud.gaboury/
>>  thetradinghall.com/admin/
>>  thetradinghall.com/postmaster/
>>
>> -
>>
>> What I am trying to do: I want the  be returned for 
>> u.
>>
>
> One of many difficulties with computers that they do what you say them to
> do, not what you think or you think you are saying. :)
> Lets see:
>
> SELECT d.home_dir
> FROM email.mail_dir d, email.mailusers u  <- make a join between mail_dir
> and mailusers = join every(!) record from the first table with every(!)
> record from the second table
> WHERE u.username='arnaud.gaboury'; <- but I need only those from the
> joined records where the username is arnaud.gaboury
>
> And there, you have it.
>
>
> You can simply redefine the view.
>
>  SELECT *,((mailusers.domain_name || '/'::text) || mailusers.username) ||
> '/'::text AS home_dir
>FROM email.mailusers;
>

Thank you so much. This way I get all needed info in one view.

>
> Notice the * after the SELECT statement.
> So you have all the data plus the homedir.
>
> You can leave out the whole view thing and incorporate the home_dir
> expression right into your select.
> Or you can write a function which makes this to you with the usename as
> argument.
>
> Regards,
> Sándor
>
>
>


-- 

google.com/+arnaudgabourygabx



Re: [GENERAL] Query from two tables return error

2016-04-01 Thread Sándor Daku
On 1 April 2016 at 11:33, arnaud gaboury  wrote:

> I have one table email.mail.mailusers :
>
>Table "email.mailusers"
>Column|   Type   |
> Modifiers  | Storage
>   | Stats target | Description
>
> -+--+-+
> --+--+-
>  username| text | not
> null| extende
> d |  |
>  password| text | not
> null| extende
> d |  |
>  domain_name | text | not null default '
> thetradinghall.com'::text | extende
> d |  |
>  created | timestamp with time zone | not null default
> now()  | plain
>   |  |
> Indexes:
> "mailusers_pkey" PRIMARY KEY, btree (username, domain_name)
> Foreign-key constraints:
> "mailusers_domain_fk" FOREIGN KEY (domain_name) REFERENCES
> email.domainlist(domain_name)
>
> username|password  |domain_name |  created
>
> +--+---
>  arnaud.gaboury | YY | thetradinghall.com | 2016-02-04
> 09:48:58.834774+01
>
>  admin  |XXXYY | thetradinghall.com | 2016-03-29
> 09:58:14.599743+02
>
>  postmaster | XXXYY | thetradinghall.com | 2016-03-31
> 16:36:18.96176+02
>
> --
>
> I have one view email.mail_dir :
>
> View "email.mail_dir"
>   Column  | Type | Modifiers | Storage  | Description
> --+--+---+--+-
>  home_dir | text |   | extended |
> View definition:
>  SELECT ((mailusers.domain_name || '/'::text) || mailusers.username) ||
> '/'::text AS home_dir
>FROM email.mailusers;
>
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
>
> -
>
> What I am trying to do: I want the  be returned for  u.
>

One of many difficulties with computers that they do what you say them to
do, not what you think or you think you are saying. :)
Lets see:

SELECT d.home_dir
FROM email.mail_dir d, email.mailusers u  <- make a join between mail_dir
and mailusers = join every(!) record from the first table with every(!)
record from the second table
WHERE u.username='arnaud.gaboury'; <- but I need only those from the joined
records where the username is arnaud.gaboury

And there, you have it.


> But it returns:
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
> (3 rows)
>

You would have seen the problem, if you had used * instead d.homedir.
And there is no solution to your problem in the given circumstances.
You need the usename field in the view as well and then you can:

SELECT d.home_dir
FROM email.mail_dir d, email.mailusers u
WHERE u.username='arnaud.gaboury'
and u.usename=d.usename;

But it's completely unnecessary.

You can simply redefine the view.

 SELECT *,((mailusers.domain_name || '/'::text) || mailusers.username) ||
'/'::text AS home_dir
   FROM email.mailusers;

Notice the * after the SELECT statement.
So you have all the data plus the homedir.

You can leave out the whole view thing and incorporate the home_dir
expression right into your select.
Or you can write a function which makes this to you with the usename as
argument.

Regards,
Sándor


Re: [GENERAL] Query from two tables return error

2016-04-01 Thread arnaud gaboury
On Fri, Apr 1, 2016 at 11:33 AM, arnaud gaboury 
wrote:

> I have one table email.mail.mailusers :
>
>Table "email.mailusers"
>Column|   Type   |
> Modifiers  | Storage
>   | Stats target | Description
>
> -+--+-+
> --+--+-
>  username| text | not
> null| extende
> d |  |
>  password| text | not
> null| extende
> d |  |
>  domain_name | text | not null default '
> thetradinghall.com'::text | extende
> d |  |
>  created | timestamp with time zone | not null default
> now()  | plain
>   |  |
> Indexes:
> "mailusers_pkey" PRIMARY KEY, btree (username, domain_name)
> Foreign-key constraints:
> "mailusers_domain_fk" FOREIGN KEY (domain_name) REFERENCES
> email.domainlist(domain_name)
>
> username|password  |domain_name |  created
>
> +--+---
>  arnaud.gaboury | YY | thetradinghall.com | 2016-02-04
> 09:48:58.834774+01
>
>  admin  |XXXYY | thetradinghall.com | 2016-03-29
> 09:58:14.599743+02
>
>  postmaster | XXXYY | thetradinghall.com | 2016-03-31
> 16:36:18.96176+02
>
> --
>
> I have one view email.mail_dir :
>
> View "email.mail_dir"
>   Column  | Type | Modifiers | Storage  | Description
> --+--+---+--+-
>  home_dir | text |   | extended |
> View definition:
>  SELECT ((mailusers.domain_name || '/'::text) || mailusers.username) ||
> '/'::text AS home_dir
>FROM email.mailusers;
>
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
>
> -
>
> What I am trying to do: I want the  be returned for  u.
>
> The query :
> SELECT d.home_dir
> FROM email.mail_dir d, email.mailusers u
> WHERE u.username='arnaud.gaboury';
>
> But it returns:
>   home_dir
> 
>  thetradinghall.com/arnaud.gaboury/
>  thetradinghall.com/admin/
>  thetradinghall.com/postmaster/
> (3 rows)
>
> It is obviously not what I expect. I am expecting one answer in this
> case:  thetradinghall.com/arnaud.gaboury/
>
> Thank you for any help. I have been trying with no sucess. Maybe one
> solution could be to create a new view with  | username   as
> columns ?
>
>
>
> EDIT : I found a solution with creating a new view this way:

SELECT domain_name||'/'||username||'/' AS home_dir,username
FROM email.mailusers;

  home_dir  |username
+
 thetradinghall.com/arnaud.gaboury/ | arnaud.gaboury
 thetradinghall.com/admin/  | admin
 thetradinghall.com/postmaster/ | postmaster


Now I can pick up the home_dir for username u.


[GENERAL] Query from two tables return error

2016-04-01 Thread arnaud gaboury
I have one table email.mail.mailusers :

   Table "email.mailusers"
   Column|   Type   |
Modifiers  | Storage
  | Stats target | Description
-+--+-+
--+--+-
 username| text | not
null| extende
d |  |
 password| text | not
null| extende
d |  |
 domain_name | text | not null default '
thetradinghall.com'::text | extende
d |  |
 created | timestamp with time zone | not null default
now()  | plain
  |  |
Indexes:
"mailusers_pkey" PRIMARY KEY, btree (username, domain_name)
Foreign-key constraints:
"mailusers_domain_fk" FOREIGN KEY (domain_name) REFERENCES
email.domainlist(domain_name)

username|password  |domain_name |  created
+--+---
 arnaud.gaboury | YY | thetradinghall.com | 2016-02-04
09:48:58.834774+01

 admin  |XXXYY | thetradinghall.com | 2016-03-29
09:58:14.599743+02

 postmaster | XXXYY | thetradinghall.com | 2016-03-31
16:36:18.96176+02

--

I have one view email.mail_dir :

View "email.mail_dir"
  Column  | Type | Modifiers | Storage  | Description
--+--+---+--+-
 home_dir | text |   | extended |
View definition:
 SELECT ((mailusers.domain_name || '/'::text) || mailusers.username) ||
'/'::text AS home_dir
   FROM email.mailusers;

  home_dir

 thetradinghall.com/arnaud.gaboury/
 thetradinghall.com/admin/
 thetradinghall.com/postmaster/

-

What I am trying to do: I want the  be returned for  u.

The query :
SELECT d.home_dir
FROM email.mail_dir d, email.mailusers u
WHERE u.username='arnaud.gaboury';

But it returns:
  home_dir

 thetradinghall.com/arnaud.gaboury/
 thetradinghall.com/admin/
 thetradinghall.com/postmaster/
(3 rows)

It is obviously not what I expect. I am expecting one answer in this case:
thetradinghall.com/arnaud.gaboury/

Thank you for any help. I have been trying with no sucess. Maybe one
solution could be to create a new view with  | username   as
columns ?

-- 

google.com/+arnaudgabourygabx



[GENERAL] query reboot pgsql 9.5.1

2016-03-07 Thread MOLINA BRAVO FELIPE DE JESUS

Hi!!!

I try to explain my problem...sorry for my english :(


In pgsql 9.5.1 I have a two tables with the next structure:

1. Tabla unlogged «public._gc_cat»
   Columna | Tipo | Modificadores 
   -+--+---
   idppicat| integer  | 
   idprodxintegrar | integer  | 
   tipo| character(1) | 
   valor   | numeric  | 
   estado  | character(1) | 
   idsll   | text | 
   idsfte  | text | 
   arama   | text[]   | 
   ne_arama| integer  | 
   rama| text | 
   rvar| text | 
   nodec   | integer  | 

   Índices:
   "_gc_cat_arama" btree (ne_arama)
   "_gc_cat_arama_gin" gin (arama)

2. Tabla unlogged «public._gc_tb»
   Columna  |  Tipo   | Modificadores 
   --+-+---
   idb2 | integer | 
   idc1 | integer | 
   rama | text| 
   arama| text[]  | 
   ne_arama | integer | 
   Índices:
   "_gc_tb_arama" btree (ne_arama)
   "_gc_tb_arama_gin" gin (arama)
   "_gc_tb_idb2idc1" btree (idb2, idc1)


the tabla _gc_cat have 91932 records an _gc_tb have 120130 records; when i run 
the
next query:

SELECT idprodxintegrar
  FROM _gc_tb a
  LEFT  join
   _gc_cat b
   on ( b.arama <@ a.arama  and a.arama < @ b.arama )

psql send the next message (after three minutes aprox.):
      Terminado (killed)

and i have to reboot my "guest server".

Now i execute the same in pgsql 9.4.5 and all is fine!!! 

The EXPLAINs are:

- pgsql 9.5.1:

  Nested Loop Left Join  (cost=0.03..492944.81 rows=276095 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=66)
  ->  Bitmap Heap Scan on _gc_cat b  (cost=0.03..4.06 rows=2 width=70)
Recheck Cond: ((arama <@ a.arama) AND (a.arama <@ arama))
->  Bitmap Index Scan on _gc_cat_arama_gin
(cost=0.00..0.03 rows=2 width=0)
   Index Cond: ((arama <@ a.arama) AND
   (a.arama <@ arama))


- pgsql 9.4.5:
  Nested Loop Left Join  (cost=0.03..492944.81 rows=276095 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=66)
   ->  Bitmap Heap Scan on _gc_cat b  (cost=0.03..4.06 rows=2 width=70)
  Recheck Cond: ((arama <@ a.arama) AND (a.arama <@ arama))
   ->  Bitmap Index Scan on _gc_cat_arama_gin
   (cost=0.00..0.03 rows=2 width=0)
  Index Cond: ((arama <@ a.arama) AND
  (a.arama <@ arama))

If i change the query as:
   SELECT idprodxintegrar
 FROM _gc_tb a
 LEFT  join
  _gc_cat b
  on ( a.ne_arama = b.ne_arama and a.arama <@ b.arama )

In pgsql 9.5.1 finished after  450708.112 ms

In pgsql 9.4.5 finished after   17996.756 ms (very fast!!!)

The EXPLAINs are:
- pgsql 9.5.1
  Nested Loop Left Join  (cost=3.49..1915550.34 rows=41825277 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=70)
   ->  Bitmap Heap Scan on _gc_cat b  (cost=3.49..14.39 rows=153 width=74)
 Recheck Cond: (a.arama <@ arama)
Filter: (a.ne_arama = ne_arama)
 ->  Bitmap Index Scan on _gc_cat_arama_gin
 (cost=0.00..3.45 rows=460 width=0)
Index Cond: (a.arama <@ arama)

- pgsql 9.4.5
  Nested Loop Left Join  (cost=3.48..1868759.71 rows=42284738 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=70)
   ->  Bitmap Heap Scan on _gc_cat b  (cost=3.48..14.38 rows=115 width=74)
 Recheck Cond: (a.arama <@ arama)
   Filter: (a.ne_arama = ne_arama)
->  Bitmap Index Scan on _gc_cat_arama_gin
(cost=0.00..3.45 rows=460 width=0)
   Index Cond: (a.arama <@ arama)


The shared_buffers and work_mem are the same in both versions of pgsql (128MB 
and
4MB)

I am doing this test in a laptop with the next characteristics:

- hp probook with 8 Gb ram. SATA disk, AMD A8-5550M 
- OS Linux (fedora 23)
- lxc containers


I am sharing the dumper's database are in the next links:

  http://sipre.inegi.org.mx/pgsql/dump_pgsql_9_4_5.dump

  http://sipre.inegi.org.mx/pgsql/dump_pgsql_9_5_1.dump


similar post is found in:

http://www.postgresql.org/message-id/CALrs2KPMowV6juLdOkMRq_P3MA5VkUmhdM4Q1OD0vCf2qimFfA@mail.gmail.
com

thanks in advance!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread Alvaro Herrera
David G. Johnston wrote:

> ​To be more clear, you need to run a query that will complete in our
> lifetime (and without an OOM error) with all three of ANALYZE, BUFFERS, and
> TIMING specified for the EXPLAIN.

I think the problem is pretty clear.  The plan is sensible yet the
result doesn't seem to be.  Why do you think using up all the memory is
a sensible result here?

Jaime Casanova suggested that maybe the @> operator have memory leaks.
Or perhaps the GIN index machinery that's using them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread David G. Johnston
On Fri, Mar 4, 2016 at 3:09 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
>> 2016-03-04 15:58 GMT-06:00 David G. Johnston 
>> :
>>
>>> On Fri, Mar 4, 2016 at 2:56 PM, Felipe de Jesús Molina Bravo <
>>> fjmolinabr...@gmail.com> wrote:
>>>
 when i run without the ANALIZE the output is:

 pba=# EXPLAIN ( TIMING, BUFFERS) SELECT idprodxintegrar FROM _gc_tb a
 LEFT  join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
 ERROR:  la opción BUFFERS de EXPLAIN requiere ANALYZE
 pba=# EXPLAIN ( TIMING) SELECT idprodxintegrar FROM _gc_tb a LEFT  join
 _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
 ERROR:  la opción TIMING de EXPLAIN requiere ANALYZE


>>> ​What is it about those error messages that is confusing you?​
>>>
>>
> ​
> On Fri, Mar 4, 2016 at 3:06 PM, Felipe de Jesús Molina Bravo <
> fjmolinabr...@gmail.com> wrote:
>
>> No, these messages are not confuse; I just wanted to show output
>>
>
> ​Then when don't you add "ANALYZE" to those commands and show meaningful
> output instead of showing us useless errors?​
>

​To be more clear, you need to run a query that will complete in our
lifetime (and without an OOM error) with all three of ANALYZE, BUFFERS, and
TIMING specified for the EXPLAIN.

David J.
​


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread David G. Johnston
>
>
> 2016-03-04 15:58 GMT-06:00 David G. Johnston :
>
>> On Fri, Mar 4, 2016 at 2:56 PM, Felipe de Jesús Molina Bravo <
>> fjmolinabr...@gmail.com> wrote:
>>
>>> when i run without the ANALIZE the output is:
>>>
>>> pba=# EXPLAIN ( TIMING, BUFFERS) SELECT idprodxintegrar FROM _gc_tb a
>>> LEFT  join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
>>> ERROR:  la opción BUFFERS de EXPLAIN requiere ANALYZE
>>> pba=# EXPLAIN ( TIMING) SELECT idprodxintegrar FROM _gc_tb a LEFT  join
>>> _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
>>> ERROR:  la opción TIMING de EXPLAIN requiere ANALYZE
>>>
>>>
>> ​What is it about those error messages that is confusing you?​
>>
>
​
On Fri, Mar 4, 2016 at 3:06 PM, Felipe de Jesús Molina Bravo <
fjmolinabr...@gmail.com> wrote:

> No, these messages are not confuse; I just wanted to show output
>

​Then when don't you add "ANALYZE" to those commands and show meaningful
output instead of showing us useless errors?​

David J.
​​


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread Felipe de Jesús Molina Bravo
No, these messages are not confuse; I just wanted to show output

2016-03-04 15:58 GMT-06:00 David G. Johnston :

> On Fri, Mar 4, 2016 at 2:56 PM, Felipe de Jesús Molina Bravo <
> fjmolinabr...@gmail.com> wrote:
>
>> when i run without the ANALIZE the output is:
>>
>> pba=# EXPLAIN ( TIMING, BUFFERS) SELECT idprodxintegrar FROM _gc_tb a
>> LEFT  join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
>> ERROR:  la opción BUFFERS de EXPLAIN requiere ANALYZE
>> pba=# EXPLAIN ( TIMING) SELECT idprodxintegrar FROM _gc_tb a LEFT  join
>> _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
>> ERROR:  la opción TIMING de EXPLAIN requiere ANALYZE
>>
>>
> ​What is it about those error messages that is confusing you?​
>
> ​David J.
> ​
>
>


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread David G. Johnston
On Fri, Mar 4, 2016 at 2:56 PM, Felipe de Jesús Molina Bravo <
fjmolinabr...@gmail.com> wrote:

> when i run without the ANALIZE the output is:
>
> pba=# EXPLAIN ( TIMING, BUFFERS) SELECT idprodxintegrar FROM _gc_tb a
> LEFT  join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
> ERROR:  la opción BUFFERS de EXPLAIN requiere ANALYZE
> pba=# EXPLAIN ( TIMING) SELECT idprodxintegrar FROM _gc_tb a LEFT  join
> _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
> ERROR:  la opción TIMING de EXPLAIN requiere ANALYZE
>
>
​What is it about those error messages that is confusing you?​

​David J.
​


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread Felipe de Jesús Molina Bravo
when i run without the ANALIZE the output is:

pba=# EXPLAIN ( TIMING, BUFFERS) SELECT idprodxintegrar FROM _gc_tb a LEFT
join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
ERROR:  la opción BUFFERS de EXPLAIN requiere ANALYZE
pba=# EXPLAIN ( TIMING) SELECT idprodxintegrar FROM _gc_tb a LEFT  join
_gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama );
ERROR:  la opción TIMING de EXPLAIN requiere ANALYZE



if i run only with EXPLAIN the output is:

  QUERY
PLAN
--
 Nested Loop Left Join  (cost=0.03..492944.81 rows=276095 width=4)
   ->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=66)
   ->  Bitmap Heap Scan on _gc_cat b  (cost=0.03..4.06 rows=2 width=70)
 Recheck Cond: ((arama <@ a.arama) AND (a.arama <@ arama))
 ->  Bitmap Index Scan on _gc_cat_arama_gin  (cost=0.00..0.03
rows=2 width=0)
   Index Cond: ((arama <@ a.arama) AND (a.arama <@ arama))


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread Alvaro Herrera
Felipe de Jesús Molina Bravo wrote:

> pba=# \i tablas.sql
> DROP TABLE
> SELECT 120130
> CREATE INDEX
> CREATE INDEX
> CREATE INDEX
> DROP TABLE
> SELECT 91932
> CREATE INDEX
> CREATE INDEX
> pba=# EXPLAIN (ANALYZE, TIMING, BUFFERS) SELECT idprodxintegrar FROM _gc_tb
> a LEFT  join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama )
> ;
> Terminado (killed)

It used up so much memory that the OOM-killer terminated it.  That
wouldn't happen with the plan you previously showed, so please do the
same without the ANALYZE option to see what plan is it trying to
execute.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query reboot pgsql 9.5.1

2016-03-04 Thread Felipe de Jesús Molina Bravo
sorry...i made a mistake...my tables are unlogged
and in the last test these was wiped... :(

when i create the tables again (with all record) the result is:

pba=# \i tablas.sql
DROP TABLE
SELECT 120130
CREATE INDEX
CREATE INDEX
CREATE INDEX
DROP TABLE
SELECT 91932
CREATE INDEX
CREATE INDEX
pba=# EXPLAIN (ANALYZE, TIMING, BUFFERS) SELECT idprodxintegrar FROM _gc_tb
a LEFT  join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@ b.arama )
;
Terminado (killed)


:(



2016-03-04 15:30 GMT-06:00 David G. Johnston :

> On Fri, Mar 4, 2016 at 2:16 PM, Felipe de Jesús Molina Bravo <
> fjmolinabr...@gmail.com> wrote:
>
>> the output is:
>>
>> pba=# EXPLAIN (ANALYZE, TIMING, BUFFERS) SELECT idprodxintegrar FROM
>> _gc_tb a LEFT  join _gc_cat b on ( b.arama <@ a.arama  and a.arama <@
>> b.arama )
>> ;
>>QUERY
>> PLAN
>>
>> -
>>  Nested Loop Left Join  (cost=0.00..0.01 rows=1 width=4) (actual
>> time=0.003..0.003 rows=0 loops=1)
>>Join Filter: ((b.arama <@ a.arama) AND (a.arama <@ b.arama))
>>->  Seq Scan on _gc_tb a  (cost=0.00..0.00 rows=1 width=66) (actual
>> time=0.002..0.002 rows=0 loops=1)
>>->  Seq Scan on _gc_cat b  (cost=0.00..0.00 rows=1 width=70) (never
>> executed)
>>  Planning time: 0.206 ms
>>  Execution time: 0.074 ms
>> (6 filas)
>>
>>
> ​OK, so this _gc_tb is empty which means that an extremely fast execution
> time is not unsurprising.  If the other version actually contains data I
> would expect that it would take considerably longer...
>
> David J.
> ​
>
>


  1   2   3   4   5   6   7   8   9   10   >