[PERFORM] same query in high number of times

2009-06-21 Thread Peter Alban
Hey folks !


Still kind of analyzing the situation , I realized that I do have a
reasonably high shared_memory and effective_cache_size , though if the same
query is being run in a number of times ~100-200 concurrent connection it is
not being cached .

Should PG realize that if the table data is same should the query result set
also be the same ? Instead each query takes up to 1-2 seconds .

Where do I see what the PG does ? I can see now the query's that take long
time ,but do not have information about what the optimizer does neither when
the DB decides about to table scan or cache ?

cheers,
Peter


Re: [PERFORM] same query in high number of times

2009-06-21 Thread Justin Graf

With out knowing how much memory for each of those settings and how much 
work_mem for each connection its kinda hard to tell what is going. 
Also need version for PG, OS, how big the tables are, Also would be nice to see 
the query itself with explain and analyze 

PG does not cache the results from a query but the tables itself. 

The table could be completely cached but there may be some nasty Nested loops 
causing the problem.

What are you expecting the query time to be??

check out http://wiki.postgresql.org/wiki/Performance_Optimization there is 
allot of info on how to tune, and diagnose problem queries  

 Message from mailto:peter.alb...@gmail.com Peter Alban 
peter.alb...@gmail.com at 06-21-2009 12:54:40 PM --

Hey folks ! 


Still kind of analyzing the situation , I realized that I do have a reasonably 
high shared_memory and effective_cache_size , though if the same query is being 
run in a number of times ~100-200 concurrent connection it is not being cached 
. 

Should PG realize that if the table data is same should the query result set 
also be the same ? Instead each query takes up to 1-2 seconds . 

Where do I see what the PG does ? I can see now the query's that take long time 
,but do not have information about what the optimizer does neither when the DB 
decides about to table scan or cache ?

cheers,
Peter






Re: [PERFORM] same query in high number of times

2009-06-21 Thread Robert Haas
On Sun, Jun 21, 2009 at 6:54 AM, Peter Albanpeter.alb...@gmail.com wrote:
 Should PG realize that if the table data is same should the query result set
 also be the same ?

No.  That's not so easy to implement as you might think.  Saving the
results of each previous query in case someone issues the same query
again without having changed anything in the meantime would probably
cost more in performance on average that you'd get out of it.

 Where do I see what the PG does ? I can see now the query's that take long
 time ,but do not have information about what the optimizer does neither when
 the DB decides about to table scan or cache ?

Can't you get this from EXPLAIN and EXPLAIN ANALYZE?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] same query in high number of times

2009-06-21 Thread Peter Alban
Hi,

Here is the query  :
*duration: 2533.734 ms  statement: *

*SELECT news.url_text,news.title, comments.name, comments.createdate,
comments.user_id, comments.comment FROM news, comments WHERE comments.cid=
news.id  AND comments.published='1' GROUP BY news.url_text,news.title
comments.name, comments.createdate, comments.user_id, comments.comment ORDER
BY comments.createdate DESC LIMIT 3
*

And here is the query plan :
  QUERY
PLAN

 Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
time=288.525..288.528 rows=3 loops=1)
   -  Sort  (cost=4313.54..4347.26 rows=13486 width=595) (actual
time=288.523..288.523 rows=3 loops=1)
 Sort Key: comments.createdate
 -  HashAggregate  (cost=3253.60..3388.46 rows=13486 width=595)
(actual time=137.521..148.132 rows=13415 loops=1)
   -  Hash Join  (cost=1400.73..3051.31 rows=13486 width=595)
(actual time=14.298..51.049 rows=13578 loops=1)
 Hash Cond: (outer.cid = inner.id)
 -  Seq Scan on comments  (cost=0.00..1178.72
rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
   Filter: (published = 1)
 -  Hash  (cost=1391.18..1391.18 rows=3818 width=81)
(actual time=14.268..14.268 rows=3818 loops=1)
   -  Seq Scan on news  (cost=0.00..1391.18
rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)

The same is being requested from different sessions . So why is it not being
cached .
*
postgresq.conf --current --
shared_buffers = 41 # min 16 or
max_connections*2, 8KB each
temp_buffers = 11000# min 100, 8KB each
#max_prepared_transactions = 5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 51024# min 64, size in KB
#maintenance_work_mem = 16384   # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
#---
# QUERY TUNING
#---

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

effective_cache_size = 692674   # typically 8KB each
#random_page_cost = 4   # units are one sequential page
fetch
# cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5# range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0   # selects default based on effort
#geqo_selection_bias = 2.0  # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000*

cheers,
Peter
On Sun, Jun 21, 2009 at 7:42 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Jun 21, 2009 at 6:54 AM, Peter Albanpeter.alb...@gmail.com
 wrote:
  Should PG realize that if the table data is same should the query result
 set
  also be the same ?

 No.  That's not so easy to implement as you might think.  Saving the
 results of each previous query in case someone issues the same query
 again without having changed anything in the meantime would probably
 cost more in performance on average that you'd get out of it.

  Where do I see what the PG does ? I can see now the query's that take
 long
  time ,but do not have information about what the optimizer does neither
 when
  the DB decides about to table scan or cache ?

 Can't you get this from EXPLAIN and EXPLAIN ANALYZE?

 ...Robert



Re: [PERFORM] same query in high number of times

2009-06-21 Thread Justin Graf
 Peter Alban wrote: 

duration: 2533.734 ms statement: 
  
  SELECT news.url_text,news.title, http://comments.name comments.name, 
comments.createdate, comments.user_id, comments.comment FROM news, comments 
WHERE comments.cid=http://news.id news.id AND comments.published='1' GROUP BY 
news.url_text,news.title http://comments.name comments.name, 
comments.createdate, comments.user_id, comments.comment ORDER BY 
comments.createdate DESC LIMIT 3
  
  
And here is the query plan : 
 QUERY PLAN 

Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 
rows=3 loops=1)
 - Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual 
time=288.523..288.523 rows=3 loops=1)
 Sort Key: comments.createdate
 - HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual 
time=137.521..148.132 rows=13415 loops=1)
 - Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual 
time=14.298..51.049 rows=13578 loops=1)
 Hash Cond: (outer.cid = inner.id)
 - Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual 
time=0.012..17.434 rows=13418 loops=1)
 Filter: (published = 1)
 - Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 
rows=3818 loops=1)
 - Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual 
time=0.021..10.072 rows=3818 loops=1)
  
The same is being requested from different sessions . So why is it not being 
cached .


Because the query results are not cached only the RAW tables are. The query is 
rerun every time it is requested. 

What is the group by clause accomplishing??? 
The sorting and hash Aggregate is eating up all the time



work_mem = 51024 # min 64, size in KB
   
Thats allot memory dedicated to work mem if you have 30 connections open this 
could eat up 1.5gigs pushing the data out of cache. 










Re: [PERFORM] same query in high number of times

2009-06-21 Thread Grzegorz Jaśkiewicz
On Sun, Jun 21, 2009 at 9:01 PM, Justin Grafjus...@emproshunts.com wrote:
 work_mem = 51024    # min 64, size in KB

 Thats allot memory dedicated to work mem if you have 30 connections open
 this could eat up 1.5gigs pushing the data out of cache.

I thought work memory is max memory that can be allocated per
connection for sorting, etc. I think it is not allocated when
connection is opened, but only on 'if needed' basis.


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] same query in high number of times

2009-06-21 Thread Justin Graf

 Message from mailto:gryz...@gmail.com Grzegorz Jaśkiewicz 
gryz...@gmail.com at 06-21-2009 09:36:01 PM --

On Sun, Jun 21, 2009 at 9:01 PM, Justin grafjus...@emproshunts.com wrote:
 work_mem = 51024 # min 64, size in KB

 Thats allot memory dedicated to work mem if you have 30 connections open
 this could eat up 1.5gigs pushing the data out of cache.

I thought work memory is max memory that can be allocated per
connection for sorting, etc. I think it is not allocated when
connection is opened, but only on 'if needed' basis.




Re: [PERFORM] same query in high number of times

2009-06-21 Thread Scott Marlowe
On Sun, Jun 21, 2009 at 12:28 PM, Peter Albanpeter.alb...@gmail.com wrote:
 Hi,

 Here is the query  :
 duration: 2533.734 ms  statement:

SNIP

  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
 time=288.525..288.528 rows=3 loops=1)

According to this query plan, your query is taking up 288
milliseconds.  I'm guessing the rest of the time is actually is spent
transferring data.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] same query in high number of times

2009-06-21 Thread Justin Graf
 
 Message from mailto:peter.alb...@gmail.com Peter Alban 
peter.alb...@gmail.com at 06-21-2009 10:59:49 PM -- 

On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf mailto:jus...@emproshunts.com  
wrote:
  
  



Peter Alban wrote: 

duration: 2533.734 ms statement: 
  
  
Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 
rows=3 loops=1)
 

 Because the query results are not cached only the RAW tables are. The 
query is rerun every time it is requested. 

What is the group by clause accomplishing??? 
The sorting and hash Aggregate is eating up all the time  

  So this should mean that having say a 5 mb table in memory doing such query 
above takes 2 secs in memory ? 
  
Assuming that, we probably have really slow memory :) 
  
Besides , the query makes less sense to me , but I dont write the queries (yet) 
simply looking at the server side .
So do you suggest to tune the queries or shall I rather look for other 
monitoring tools ? 
cheers,
Peter
  

Thats a really tiny table it should be processed in sub milliseconds something 
else is going on. The actual time in the explain of the query states 288 
millisecond not the 2533.734 you state from above. 

You have not told us the version of PG or the OS its running on. 

Is there anything else running on the server???





Re: [PERFORM] same query in high number of times

2009-06-21 Thread Robert Haas
On Sun, Jun 21, 2009 at 4:59 PM, Peter Albanpeter.alb...@gmail.com wrote:


 On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf jus...@emproshunts.com
 wrote:

 Peter Alban wrote:

 duration: 2533.734 ms  statement:

 SELECT news.url_text,news.title, comments.name, comments.createdate,
 comments.user_id, comments.comment FROM news, comments WHERE
 comments.cid=news.id  AND comments.published='1' GROUP BY
 news.url_text,news.title comments.name, comments.createdate,
 comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3


 And here is the query plan :
   QUERY
 PLAN

 
  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
 time=288.525..288.528 rows=3 loops=1)
    -  Sort  (cost=4313.54..4347.26 rows=13486 width=595) (actual
 time=288.523..288.523 rows=3 loops=1)
  Sort Key: comments.createdate
  -  HashAggregate  (cost=3253.60..3388.46 rows=13486 width=595)
 (actual time=137.521..148.132 rows=13415 loops=1)
    -  Hash Join  (cost=1400.73..3051.31 rows=13486 width=595)
 (actual time=14.298..51.049 rows=13578 loops=1)
  Hash Cond: (outer.cid = inner.id)
  -  Seq Scan on comments  (cost=0.00..1178.72
 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
    Filter: (published = 1)
  -  Hash  (cost=1391.18..1391.18 rows=3818 width=81)
 (actual time=14.268..14.268 rows=3818 loops=1)
    -  Seq Scan on news  (cost=0.00..1391.18
 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)

 The same is being requested from different sessions . So why is it not
 being cached .


 Because the query results are not cached only the RAW tables are.   The
 query is rerun every time it is requested.

 What is the group by clause accomplishing???
 The sorting and hash Aggregate is eating up all the time

 So this should mean that having say a 5 mb table in memory doing such query
 above takes 2 secs in memory ?

Nope.  But as others have pointed out, you need to figure out why it's
taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms.

There's other things you can do to optimize this query; for example:

1. Try creating an index on comments (createdate), and don't forget to
ANALYZE the table afterward, or

2. Modify the query to remove the probably-unnecessary GROUP BY.

But figuring out the times may be the first thing.  My guess is that
the 2.5 s time is a time from your logs, maybe at a time when the
system was busy, and the 300 ms time was what you got it when you ran
it some other time.  But maybe there's some other explanation.  You
should try to figure it out.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance