Re: [PERFORM] same query in high number of times
On Sun, Jun 21, 2009 at 4:59 PM, Peter Alban wrote: > > > On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf > wrote: >> >> Peter Alban wrote: >> >> duration: 2533.734 ms statement: >> >> SELECT news.url_text,news.title, comments.name, comments.createdate, >> comments.user_id, comments.comment FROM news, comments WHERE >> comments.cid=news.id AND comments.published='1' GROUP BY >> news.url_text,news.title comments.name, comments.createdate, >> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3 >> >> >> And here is the query plan : >> QUERY >> PLAN >> >> >> Limit (cost=4313.54..4313.55 rows=3 width=595) (actual >> time=288.525..288.528 rows=3 loops=1) >> -> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual >> time=288.523..288.523 rows=3 loops=1) >> Sort Key: comments.createdate >> -> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) >> (actual time=137.521..148.132 rows=13415 loops=1) >> -> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) >> (actual time=14.298..51.049 rows=13578 loops=1) >> Hash Cond: ("outer".cid = "inner".id) >> -> Seq Scan on comments (cost=0.00..1178.72 >> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) >> Filter: (published = 1) >> -> Hash (cost=1391.18..1391.18 rows=3818 width=81) >> (actual time=14.268..14.268 rows=3818 loops=1) >> -> Seq Scan on news (cost=0.00..1391.18 >> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) >> >> The same is being requested from different sessions . So why is it not >> being cached . >> >> >> Because the query results are not cached only the RAW tables are. The >> query is rerun every time it is requested. >> >> What is the group by clause accomplishing??? >> The sorting and hash Aggregate is eating up all the time > > So this should mean that having say a 5 mb table in memory doing such query > above takes 2 secs in memory ? Nope. But as others have pointed out, you need to figure out why it's taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms. There's other things you can do to optimize this query; for example: 1. Try creating an index on comments (createdate), and don't forget to ANALYZE the table afterward, or 2. Modify the query to remove the probably-unnecessary GROUP BY. But figuring out the times may be the first thing. My guess is that the 2.5 s time is a time from your logs, maybe at a time when the system was busy, and the 300 ms time was what you got it when you ran it some other time. But maybe there's some other explanation. You should try to figure it out. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] same query in high number of times
Message from mailto:peter.alb...@gmail.com Peter Alban peter.alb...@gmail.com at 06-21-2009 10:59:49 PM -- On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf mailto:jus...@emproshunts.com wrote: Peter Alban wrote: duration: 2533.734 ms statement: Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1) Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested. What is the group by clause accomplishing??? The sorting and hash Aggregate is eating up all the time So this should mean that having say a 5 mb table in memory doing such query above takes 2 secs in memory ? Assuming that, we probably have really slow memory :) Besides , the query makes less sense to me , but I dont write the queries (yet) simply looking at the server side . So do you suggest to tune the queries or shall I rather look for other monitoring tools ? cheers, Peter Thats a really tiny table it should be processed in sub milliseconds something else is going on. The actual time in the explain of the query states 288 millisecond not the 2533.734 you state from above. You have not told us the version of PG or the OS its running on. Is there anything else running on the server???
Re: [PERFORM] same query in high number of times
On Sun, Jun 21, 2009 at 12:28 PM, Peter Alban wrote: > Hi, > > Here is the query : > duration: 2533.734 ms statement: SNIP > Limit (cost=4313.54..4313.55 rows=3 width=595) (actual > time=288.525..288.528 rows=3 loops=1) According to this query plan, your query is taking up 288 milliseconds. I'm guessing the rest of the time is actually is spent transferring data. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] same query in high number of times
On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf wrote: > Peter Alban wrote: > > *duration: 2533.734 ms statement: * > > *SELECT news.url_text,news.title, comments.name, comments.createdate, > comments.user_id, comments.comment FROM news, comments WHERE comments.cid= > news.id AND comments.published='1' GROUP BY news.url_text,news.title > comments.name, comments.createdate, comments.user_id, comments.comment > ORDER BY comments.createdate DESC LIMIT 3 > * > > And here is the query plan : > QUERY > PLAN > > > Limit (cost=4313.54..4313.55 rows=3 width=595) (actual > time=288.525..288.528 rows=3 loops=1) >-> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual > time=288.523..288.523 rows=3 loops=1) > Sort Key: comments.createdate > -> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) > (actual time=137.521..148.132 rows=13415 loops=1) >-> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) > (actual time=14.298..51.049 rows=13578 loops=1) > Hash Cond: ("outer".cid = "inner".id) > -> Seq Scan on comments (cost=0.00..1178.72 > rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) >Filter: (published = 1) > -> Hash (cost=1391.18..1391.18 rows=3818 width=81) > (actual time=14.268..14.268 rows=3818 loops=1) >-> Seq Scan on news (cost=0.00..1391.18 > rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) > > The same is being requested from different sessions . So why is it not > being cached . > > > > Because the query results are not cached only the RAW tables are. The > query is rerun every time it is requested. > > What is the group by clause accomplishing??? > The sorting and hash Aggregate is eating up all the time > *So this should mean that having say a 5 mb table in memory doing such query above takes 2 secs in memory ? * Assuming that, we probably have really slow memory :) Besides , the query makes less sense to me , but I dont write the queries (yet) simply looking at the server side . So do you suggest to tune the queries or shall I rather look for other monitoring tools ? cheers, Peter > > > *work_mem = 51024# min 64, size in KB > * > > > Thats allot memory dedicated to work mem if you have 30 connections open > this could eat up 1.5gigs pushing the data out of cache. > > > > > > >
Re: [PERFORM] same query in high number of times
Message from mailto:gryz...@gmail.com Grzegorz Jaśkiewicz gryz...@gmail.com at 06-21-2009 09:36:01 PM -- On Sun, Jun 21, 2009 at 9:01 PM, Justin grafjus...@emproshunts.com wrote: work_mem = 51024 # min 64, size in KB Thats allot memory dedicated to work mem if you have 30 connections open this could eat up 1.5gigs pushing the data out of cache. I thought work memory is max memory that can be allocated per connection for sorting, etc. I think it is not allocated when connection is opened, but only on 'if needed' basis.
Re: [PERFORM] same query in high number of times
On Sun, Jun 21, 2009 at 9:01 PM, Justin Graf wrote: > work_mem = 51024 # min 64, size in KB > > Thats allot memory dedicated to work mem if you have 30 connections open > this could eat up 1.5gigs pushing the data out of cache. I thought work memory is max memory that can be allocated per connection for sorting, etc. I think it is not allocated when connection is opened, but only on 'if needed' basis. -- GJ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] same query in high number of times
Peter Alban wrote: duration: 2533.734 ms statement: SELECT news.url_text,news.title, http://comments.name comments.name, comments.createdate, comments.user_id, comments.comment FROM news, comments WHERE comments.cid=http://news.id news.id AND comments.published='1' GROUP BY news.url_text,news.title http://comments.name comments.name, comments.createdate, comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3 And here is the query plan : QUERY PLAN Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1) - Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual time=288.523..288.523 rows=3 loops=1) Sort Key: comments.createdate - HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual time=137.521..148.132 rows=13415 loops=1) - Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual time=14.298..51.049 rows=13578 loops=1) Hash Cond: ("outer".cid = "inner".id) - Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) Filter: (published = 1) - Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 rows=3818 loops=1) - Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) The same is being requested from different sessions . So why is it not being cached . Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested. What is the group by clause accomplishing??? The sorting and hash Aggregate is eating up all the time work_mem = 51024 # min 64, size in KB Thats allot memory dedicated to work mem if you have 30 connections open this could eat up 1.5gigs pushing the data out of cache.
Re: [PERFORM] same query in high number of times
Hi, Here is the query : *duration: 2533.734 ms statement: * *SELECT news.url_text,news.title, comments.name, comments.createdate, comments.user_id, comments.comment FROM news, comments WHERE comments.cid= news.id AND comments.published='1' GROUP BY news.url_text,news.title comments.name, comments.createdate, comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3 * And here is the query plan : QUERY PLAN Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1) -> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual time=288.523..288.523 rows=3 loops=1) Sort Key: comments.createdate -> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual time=137.521..148.132 rows=13415 loops=1) -> Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual time=14.298..51.049 rows=13578 loops=1) Hash Cond: ("outer".cid = "inner".id) -> Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) Filter: (published = 1) -> Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 rows=3818 loops=1) -> Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) The same is being requested from different sessions . So why is it not being cached . * postgresq.conf --current -- shared_buffers = 41 # min 16 or max_connections*2, 8KB each temp_buffers = 11000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 51024# min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB #--- # QUERY TUNING #--- # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - effective_cache_size = 692674 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch # cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5# range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 10 # range 1-1000* cheers, Peter On Sun, Jun 21, 2009 at 7:42 PM, Robert Haas wrote: > On Sun, Jun 21, 2009 at 6:54 AM, Peter Alban > wrote: > > Should PG realize that if the table data is same should the query result > set > > also be the same ? > > No. That's not so easy to implement as you might think. Saving the > results of each previous query in case someone issues the same query > again without having changed anything in the meantime would probably > cost more in performance on average that you'd get out of it. > > > Where do I see what the PG does ? I can see now the query's that take > long > > time ,but do not have information about what the optimizer does neither > when > > the DB decides about to table scan or cache ? > > Can't you get this from EXPLAIN and EXPLAIN ANALYZE? > > ...Robert >
Re: [PERFORM] same query in high number of times
On Sun, Jun 21, 2009 at 6:54 AM, Peter Alban wrote: > Should PG realize that if the table data is same should the query result set > also be the same ? No. That's not so easy to implement as you might think. Saving the results of each previous query in case someone issues the same query again without having changed anything in the meantime would probably cost more in performance on average that you'd get out of it. > Where do I see what the PG does ? I can see now the query's that take long > time ,but do not have information about what the optimizer does neither when > the DB decides about to table scan or cache ? Can't you get this from EXPLAIN and EXPLAIN ANALYZE? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] same query in high number of times
With out knowing how much memory for each of those settings and how much work_mem for each connection its kinda hard to tell what is going. Also need version for PG, OS, how big the tables are, Also would be nice to see the query itself with explain and analyze PG does not cache the results from a query but the tables itself. The table could be completely cached but there may be some nasty Nested loops causing the problem. What are you expecting the query time to be?? check out http://wiki.postgresql.org/wiki/Performance_Optimization there is allot of info on how to tune, and diagnose problem queries Message from mailto:peter.alb...@gmail.com Peter Alban peter.alb...@gmail.com at 06-21-2009 12:54:40 PM -- Hey folks ! Still kind of analyzing the situation , I realized that I do have a reasonably high shared_memory and effective_cache_size , though if the same query is being run in a number of times ~100-200 concurrent connection it is not being cached . Should PG realize that if the table data is same should the query result set also be the same ? Instead each query takes up to 1-2 seconds . Where do I see what the PG does ? I can see now the query's that take long time ,but do not have information about what the optimizer does neither when the DB decides about to table scan or cache ? cheers, Peter
[PERFORM] same query in high number of times
Hey folks ! Still kind of analyzing the situation , I realized that I do have a reasonably high shared_memory and effective_cache_size , though if the same query is being run in a number of times ~100-200 concurrent connection it is not being cached . Should PG realize that if the table data is same should the query result set also be the same ? Instead each query takes up to 1-2 seconds . Where do I see what the PG does ? I can see now the query's that take long time ,but do not have information about what the optimizer does neither when the DB decides about to table scan or cache ? cheers, Peter