Re: [PERFORM] Slow query with planner row strange estimation
Hello, Before the week end I tried to change the index, but even with the mono-column index on differents columns, the estimated number of rows from dwhinv is 1. Anyone have a suggestion, what can I check ? thx damien hostin a écrit : Hello, I try to make a query run quicker but I don't really know how to give hints to the planner. We are using postgresql 8.4.3 64bit on ubuntu 9.10 server. The hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go RAM. Queries come from J2EE application (OLAP cube), but running them in pg_admin perform the same way. I made a short example that shows what I think is the problem. The real query is much longer but with only one join it already cause problems. Here is the short example : select rfoadv_8.rfoadvsup as c8, sum(dwhinv.dwhinvqte) as m0 from dwhinv as dwhinv, rfoadv as rfoadv_8 where (dwhinv.dwhinv___rforefide = 'HPLUS' and (dwhinv.dwhinv___rfodomide = 'PMSI' and dwhinv.dwhinv___rfoindrvs = '1' and dwhinv.dwhinv___rfoindide='recN3_BB_reel') ) and dwhinv.dwhinv_p2rfodstide = rfoadv_8.rfoadvinf and rfoadv_8.rfoadvsup = 'ACTI' group by rfoadv_8.rfoadvsup dwhinv is a table with almost 6.000.000 records rfoadv is a view with 800.000 records rfoadv is based on rfoade which is 50.000 records Here is the explain analyse : GroupAggregate (cost=0.00..16.56 rows=1 width=13) (actual time=2028.452..2028.453 rows=1 loops=1) - Nested Loop (cost=0.00..16.54 rows=1 width=13) (actual time=0.391..1947.432 rows=42664 loops=1) Join Filter: (((ade2.rfoadegch)::text = (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text = (ade1.rfoadedrt)::text)) - Nested Loop (cost=0.00..12.54 rows=1 width=214) (actual time=0.304..533.281 rows=114350 loops=1) - Index Scan using dwhinv_rdi_idx on dwhinv (cost=0.00..4.87 rows=1 width=12) (actual time=0.227..16.827 rows=6360 loops=1) Index Cond: (((dwhinv___rforefide)::text = 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND (dwhinv___rfoindrvs = 1)) - Index Scan using rfoade_dsi_idx on rfoade ade2 (cost=0.00..7.63 rows=3 width=213) (actual time=0.007..0.037 rows=18 loops=6360) Index Cond: ((ade2.rfoade_i_rfodstide)::text = (dwhinv.dwhinv_p2rfodstide)::text) - Index Scan using rfoade_pk on rfoade ade1 (cost=0.00..3.98 rows=1 width=213) (actual time=0.008..0.009 rows=0 loops=114350) Index Cond: (((ade1.rfoade___rforefide)::text = (ade2.rfoade___rforefide)::text) AND ((ade1.rfoade_i_rfodstide)::text = 'ACTI'::text) AND ((ade1.rfoade___rfovdeide)::text = (ade2.rfoade___rfovdeide)::text) AND (ade1.rfoadervs = ade2.rfoadervs)) We can see that the planner think that accessing dwhinv with the dwhinv_rdi_idx index will return 1 row, but in fact there are 6360. So the nested loop is not done with 1 loop but 6360. With only one Join, the query runs in about 1.5 sec which is not really long, but with 8 join, the same mistake is repeated 8 times, the query runs in 30-60 sec. I try to disable nested loop, hash join and merge join are done instead of nested loops, example query runs in 0.2 - 0.5 sec, and the real query no more that 1 sec ! Which is great. Here is the execution plan with nested loop off: GroupAggregate (cost=12.56..2453.94 rows=1 width=13) (actual time=817.306..817.307 rows=1 loops=1) - Hash Join (cost=12.56..2453.93 rows=1 width=13) (actual time=42.583..720.746 rows=42664 loops=1) Hash Cond: (((ade2.rfoade___rforefide)::text = (ade1.rfoade___rforefide)::text) AND ((ade2.rfoade___rfovdeide)::text = (ade1.rfoade___rfovdeide)::text) AND (ade2.rfoadervs = ade1.rfoadervs)) Join Filter: (((ade2.rfoadegch)::text = (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text = (ade1.rfoadedrt)::text)) - Hash Join (cost=4.88..2446.21 rows=1 width=214) (actual time=42.168..411.962 rows=114350 loops=1) Hash Cond: ((ade2.rfoade_i_rfodstide)::text = (dwhinv.dwhinv_p2rfodstide)::text) - Seq Scan on rfoade ade2 (cost=0.00..2262.05 rows=47805 width=213) (actual time=0.057..78.988 rows=47805 loops=1) - Hash (cost=4.87..4.87 rows=1 width=12) (actual time=41.632..41.632 rows=6360 loops=1) - Index Scan using dwhinv_rdi_idx on dwhinv (cost=0.00..4.87 rows=1 width=12) (actual time=0.232..28.199 rows=6360 loops=1) Index Cond: (((dwhinv___rforefide)::text = 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND (dwhinv___rfoindrvs = 1)) - Hash (cost=7.63..7.63 rows=3 width=213) (actual time=0.347..0.347 rows=11 loops=1) - Index Scan using rfoade_dsi_idx on rfoade ade1 (cost=0.00..7.63 rows=3 width=213) (actual time=0.095..0.307 rows=11 loops=1) Index Cond:
[PERFORM] Two equivalent WITH RECURSIVE queries, one of them slow.
Hello. I have a tree-like table with a three-field PK (name, date, id) and one parent field. It has 5k to 6k records as of now, but it will hold about 1 million records. I am trying the following WITH RECURSIVE query: WITH RECURSIVE t AS ( SELECT par.id AS tid, par.name, par.date, par.id, par.text, par.h_title, par.h_name, par.parent FROM _books.par UNION SELECT t.tid AS pid, p.name, p.date, p.id, p.text, p.h_title, p.h_name, p.parent FROM t, _books.par p WHERE p.name = t.name AND p.date = t.date AND t.id = p.parent ) SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340'; ... which takes 2547.503 ms However, if I try the same query but adding the same WHERE clause to the non-recursive term, I get much better results. WITH RECURSIVE t AS ( SELECT par.id AS tid, par.name, par.date, par.id, par.text, par.h_title, par.h_name, par.parent FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19' AND par.id = '28340' UNION SELECT t.tid AS pid, p.name, p.date, p.id, p.text, p.h_title, p.h_name, p.parent FROM t, _books.par p WHERE p.name = t.name AND p.date = t.date AND t.id = p.parent ) SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340'; ... which takes 0.221 ms I am being forced to use the slow query because I want to define it as a view, leaving the WHERE clause to the application. I fail to see where the two queries might be different, or, what cases the slow one considers that the fast one doesn't, as to get a clue on how to workaround this. I have taken the EXPLAIN ANALYZE output for both queries. It looks like the slow one is processing all records (read: not adding the WHERE clause to the non-recursive term). QUERY PLAN -- CTE Scan on t (cost=96653.20..96820.57 rows=1 width=144) (actual time=32.931..2541.792 rows=1 loops=1) Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date) AND (tid = 28340)) CTE t - Recursive Union (cost=0.00..96653.20 rows=6086 width=212) (actual time=0.017..2442.655 rows=33191 loops=1) - Seq Scan on par (cost=0.00..237.96 rows=5996 width=208) (actual time=0.011..5.591 rows=5996 loops=1) - Merge Join (cost=8909.74..9629.35 rows=9 width=212) (actual time=225.979..254.727 rows=3022 loops=9) Merge Cond: (((t.name)::text = (p.name)::text) AND (t.date = p.date) AND (t.id = p.parent)) - Sort (cost=7700.54..7850.44 rows=59960 width=44) (actual time=58.163..59.596 rows=3685 loops=9) Sort name: t.name, t.date, t.id Sort Method: quicksort Memory: 17kB - WorkTable Scan on t (cost=0.00..1199.20 rows=59960 width=44) (actual time=0.027..3.486 rows=3688 loops=9) - Materialize (cost=1209.20..1284.15 rows=5996 width=208) (actual time=163.062..177.415 rows=5810 loops=9) - Sort (cost=1209.20..1224.19 rows=5996 width=208) (actual time=163.054..172.543 rows=5810 loops=9) Sort name: p.name, p.date, p.parent Sort Method: external merge Disk: 1304kB - Seq Scan on par p (cost=0.00..237.96 rows=5996 width=208) (actual time=0.015..3.330 rows=5996 loops=9) Total runtime: 2547.503 ms (17 rows) QUERY PLAN -- CTE Scan on t (cost=927.80..928.10 rows=1 width=144) (actual time=0.036..0.132 rows=1 loops=1) Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date) AND (tid = 28340)) CTE t - Recursive Union (cost=0.00..927.80 rows=11 width=212) (actual time=0.030..0.124 rows=1 loops=1) - Index Scan using par_id on par (cost=0.00..8.27 rows=1 width=208) (actual time=0.024..0.026 rows=1 loops=1) Index Cond: (id = 28340) Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)) - Nested Loop (cost=0.00..91.93 rows=1 width=212) (actual time=0.091..0.091 rows=0 loops=1) Join Filter: (((t.name)::text = (p.name)::text) AND (t.date = p.date)) - WorkTable Scan on t (cost=0.00..0.20 rows=10 width=44) (actual
Re: [PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer
On Fri, Jul 2, 2010 at 8:30 AM, Craig Ringer cr...@postnewspapers.com.auwrote: Yeah, if you're in a weird virtualized environment like that you're likely to have problems... On Sat, 3 Jul 2010, Rajesh Kumar Mallah wrote: Thanks for thinking about it.I do not understand why u feel OpenVz is weird. at the most its not very popular. It's not OpenVz that is wierd, but virtualisation in general. If you are running in a virtual machine, then all sorts of things will not run as well as expected. Matthew -- Contrary to popular belief, Unix is user friendly. It just happens to be very selective about who its friends are. -- Kyle Hearn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] SeqScans on boolen values / How to speed this up?
Hello everyone, We've recently finished developing a bigger webapplication, and we are about to put it online. I ran some load tests yesterday, and configured 'slow query' logging beforehand, so I could see if there might be a performance bottleneck in the PG. While I discovered no real problems, the log file analysis using pgFouine revealed two queries, which are executed often, and take quite a bit some time. I'm just curious if there is any way to improve the performance of those queries. I'm seeing SeqScans in the EXPLAIN ANALYZE, but nothing I have done yet has removed those. The statements and query plans are: Query 1 - explain analyze SELECT n.name_short,n.flag,n.nation_id,n.urlidentifier,count(p.person_id) as athletes from nations n left join persons p on n.nation_id = p.nation_id left join efclicences e on p.person_id = e.person_id where continent = 'eu' and p.deleted = false and p.inactive = false and e.fencer = true group by n.name_short,n.flag,n.nation_id,n.urlidentifier order by n.name_short; QUERY PLAN - Sort (cost=9997.21..9997.32 rows=44 width=33) (actual time=872.000..872.000 rows=44 loops=1) Sort Key: n.name_short Sort Method: quicksort Memory: 28kB - HashAggregate (cost=9995.45..9996.01 rows=44 width=33) (actual time=872.000..872.000 rows=44 loops=1) - Hash Join (cost=5669.49..9611.83 rows=30690 width=33) (actual time=332.000..720.000 rows=142240 loops=1) Hash Cond: (e.person_id = p.person_id) - Seq Scan on efclicences e (cost=0.00..2917.29 rows=143629 width=8) (actual time=0.000..80.000 rows=143629 loops=1) Filter: fencer - Hash (cost=5285.87..5285.87 rows=30690 width=33) (actual time=332.000..332.000 rows=142240 loops=1) - Hash Join (cost=7.10..5285.87 rows=30690 width=33) (actual time=0.000..256.000 rows=142240 loops=1) Hash Cond: (p.nation_id = n.nation_id) - Seq Scan on persons p (cost=0.00..4438.29 rows=142288 width=16) (actual time=0.000..112.000 rows=142418 loops=1) Filter: ((NOT deleted) AND (NOT inactive)) - Hash (cost=6.55..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44 loops=1) - Seq Scan on nations n (cost=0.00..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44 loops=1) Filter: ((continent)::text = 'eu'::text) Total runtime: 880.000 ms (17 rows) --- Query 2 --- explain analyze SELECT persons.person_id AS persons_person_id FROM persons LEFT OUTER JOIN indexing_persons ON persons.person_id = indexing_persons.person_id WHERE indexing_persons.person_id IS NULL OR persons.modified indexing_persons.indexed ORDER BY persons.modified DESC LIMIT 1000; QUERY PLAN - Limit (cost=17755.23..17757.73 rows=1000 width=16) (actual time=372.000..372.000 rows=0 loops=1) - Sort (cost=17755.23..17994.61 rows=95753 width=16) (actual time=372.000..372.000 rows=0 loops=1) Sort Key: persons.modified Sort Method: quicksort Memory: 25kB - Hash Left Join (cost=4313.44..12505.20 rows=95753 width=16) (actual time=372.000..372.000 rows=0 loops=1) Hash Cond: (persons.person_id = indexing_persons.person_id) Filter: ((indexing_persons.person_id IS NULL) OR (persons.modified indexing_persons.indexed)) - Seq Scan on persons (cost=0.00..4438.29 rows=143629 width=16) (actual time=0.000..56.000 rows=143629 loops=1) - Hash (cost=2534.86..2534.86 rows=142286 width=16) (actual time=140.000..140.000 rows=143629 loops=1) - Seq Scan on indexing_persons (cost=0.00..2534.86 rows=142286 width=16) (actual time=0.000..72.000 rows=143629 loops=1) Total runtime: 372.000 ms (11 rows) Table definitions --- \d persons Table public.persons Column| Type | Modifiers -+--+- person_id | bigint | not null default nextval('persons_person_id_seq'::regclass) givenname | character varying(100) | not null surname | character varying(100) | not null name_display_short | character varying(20)| not null name_display_long | character varying(50)| not null title | character varying(50)|
Re: [PERFORM] SeqScans on boolen values / How to speed this up?
Jens, * Jens Hoffrichter (jens.hoffrich...@gmail.com) wrote: I'm just curious if there is any way to improve the performance of those queries. I'm seeing SeqScans in the EXPLAIN ANALYZE, but nothing I have done yet has removed those. SeqScans aren't necessairly bad. Also, providing your postgresql.conf parameters would be useful in doing any kind of analysis work like this. For starters, why are you using left joins for these queries? When you use a left-join and then have a filter on the right-hand table that requires it to be non-null, you're causing it to be an inner join anyway. Fixing that might change/improve the plans you're getting. The statements and query plans are: Query 1 - explain analyze SELECT n.name_short,n.flag,n.nation_id,n.urlidentifier,count(p.person_id) as athletes from nations n left join persons p on n.nation_id = p.nation_id left join efclicences e on p.person_id = e.person_id where continent = 'eu' and p.deleted = false and p.inactive = false and e.fencer = true group by n.name_short,n.flag,n.nation_id,n.urlidentifier order by n.name_short; Alright, for this one, you're processing 144k rows in persons up into the aggregate, how big is the table? If it's anything less than 1M, seqscanning that is almost certainly the fastest way. You could *test* that theory by disabling seqscans and running the query again for the timing. If it's faster, then you probably need to adjust some PG parameters (eg: effective_cache_size, maybe random_page_cost) for your system. QUERY PLAN - Sort (cost=9997.21..9997.32 rows=44 width=33) (actual time=872.000..872.000 rows=44 loops=1) Sort Key: n.name_short Sort Method: quicksort Memory: 28kB - HashAggregate (cost=9995.45..9996.01 rows=44 width=33) (actual time=872.000..872.000 rows=44 loops=1) - Hash Join (cost=5669.49..9611.83 rows=30690 width=33) (actual time=332.000..720.000 rows=142240 loops=1) Hash Cond: (e.person_id = p.person_id) - Seq Scan on efclicences e (cost=0.00..2917.29 rows=143629 width=8) (actual time=0.000..80.000 rows=143629 loops=1) Filter: fencer - Hash (cost=5285.87..5285.87 rows=30690 width=33) (actual time=332.000..332.000 rows=142240 loops=1) - Hash Join (cost=7.10..5285.87 rows=30690 width=33) (actual time=0.000..256.000 rows=142240 loops=1) Hash Cond: (p.nation_id = n.nation_id) - Seq Scan on persons p (cost=0.00..4438.29 rows=142288 width=16) (actual time=0.000..112.000 rows=142418 loops=1) Filter: ((NOT deleted) AND (NOT inactive)) - Hash (cost=6.55..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44 loops=1) - Seq Scan on nations n (cost=0.00..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44 loops=1) Filter: ((continent)::text = 'eu'::text) Total runtime: 880.000 ms (17 rows) --- Query 2 --- explain analyze SELECT persons.person_id AS persons_person_id FROM persons LEFT OUTER JOIN indexing_persons ON persons.person_id = indexing_persons.person_id WHERE indexing_persons.person_id IS NULL OR persons.modified indexing_persons.indexed ORDER BY persons.modified DESC LIMIT 1000; For this one, you might try indexing persons.modified and indexing_persons.indexed and see if that changes things. - Limit (cost=17755.23..17757.73 rows=1000 width=16) (actual time=372.000..372.000 rows=0 loops=1) - Sort (cost=17755.23..17994.61 rows=95753 width=16) (actual time=372.000..372.000 rows=0 loops=1) Sort Key: persons.modified Sort Method: quicksort Memory: 25kB - Hash Left Join (cost=4313.44..12505.20 rows=95753 width=16) (actual time=372.000..372.000 rows=0 loops=1) Hash Cond: (persons.person_id = indexing_persons.person_id) Filter: ((indexing_persons.person_id IS NULL) OR (persons.modified indexing_persons.indexed)) - Seq Scan on persons (cost=0.00..4438.29 rows=143629 width=16) (actual time=0.000..56.000 rows=143629 loops=1) - Hash (cost=2534.86..2534.86 rows=142286 width=16) (actual time=140.000..140.000 rows=143629 loops=1) - Seq Scan on indexing_persons (cost=0.00..2534.86 rows=142286 width=16) (actual time=0.000..72.000 rows=143629 loops=1) Total runtime: 372.000 ms (11 rows) Thanks, Stephen signature.asc Description:
Re: [PERFORM] using dbt2 postgresql 8.4 - rampup time issue
A clarification of terms may help to start. The terminals per warehouse in the scripts correlates to the number terminals emulated. An emulated terminal is tied to a warehouse's district. In other words, the number of terminals translates to the number of districts in a warehouse across the entire database. To increase the terminals per warehouse implies you have scaled the database differently, which I'm assuming is not the case here. Scale the database … Can you please elaborate ? . To increase terminals per warehouse I added only one option ( i.e. -t for dbt2-run-workload ) with normal dbt2 test i.e. ./dbt2-pgsql-create-db ./dbt2-pgsql-build-db -d $DBDATA -g -r -w $WAREHOUSES ./dbt2-run-workload -a pgsql -c $DB_CONNECTIONS -d $REGRESS_DURATION_SEC -w $WAREHOUSES -o $OUTPUT_DIR -t $TERMINAL_PER_WAREHOUSE ./dbt2-pgsql-stop-db Is this change enough or I am missing some thing ? 1. Settings : DATABASE CONNECTIONS: 50 TERMINALS PER WAREHOUSE: 10 SCALE FACTOR (WAREHOUSES): 200 DURATION OF TEST (in sec): 7200 Result : Response Time (s) Transaction %Average :90th %Total Rollbacks % - - --- --- - Delivery 3.96 0.285 : 0.02326883 0 0.00 New Order 45.26 0.360 : 0.010 307335 3082 1.01 Order Status 3.98 0.238 : 0.00327059 0 0.00 Payment 42.82 0.233 : 0.003 290802 0 0.00 Stock Level 3.97 0.245 : 0.00226970 0 0.00 - - --- --- - 2508.36 new-order transactions per minute (NOTPM) 120.1 minute duration 0 total unknown errors 2000 second(s) ramping up 2. Settings : DATABASE CONNECTIONS: 50 TERMINALS PER WAREHOUSE: 40 SCALE FACTOR (WAREHOUSES): 200 DURATION OF TEST (in sec): 7200 Result : Response Time (s) Transaction %Average :90th %Total Rollbacks % - - --- --- - Delivery 3.95 8.123 : 4.60543672 0 0.00 New Order 45.19 12.205 : 2.563 499356 4933 1.00 Order Status 4.00 7.385 : 3.31444175 0 0.00 Payment 42.89 7.221 : 1.920 473912 0 0.00 Stock Level 3.97 7.093 : 1.88743868 0 0.00 - - --- --- - 7009.40 new-order transactions per minute (NOTPM) 69.8 minute duration 0 total unknown errors 8016 second(s) ramping up 8016 (actual rampup time) + ( 69.8 * 60 ) = 12204 5010 (estimated rampup time) + 7200 (estimated steady state time) = 12210 3. Settings : DATABASE CONNECTIONS: 50 TERMINALS PER WAREHOUSE: 40 SCALE FACTOR (WAREHOUSES): 200 DURATION OF TEST (in sec): 7200 Result : Response Time (s) Transaction %Average :90th %Total Rollbacks % - - --- --- - Delivery 3.98 9.095 :16.10315234 0 0.00 New Order 45.33 7.896 :14.794 173539 1661 0.97 Order Status 3.96 8.165 :13.98915156 0 0.00 Payment 42.76 7.295 :12.470 163726 0 0.00 Stock Level 3.97 7.198 :12.52015198 0 0.00 - - --- --- - 10432.09 new-order transactions per minute (NOTPM) 16.3 minute duration 0 total unknown errors 11227 second(s) ramping up 11227 (actual rampup time) + ( 16.3 * 60 ) = 12205 5010 (estimated rampup time) + 7200 (estimated steady state time) = 12210 These results show that dbt2 test actually did not run for 2 hours but it start varying with the increase of TERMINALS PER WAREHOUSE value i.e. 1st Run ( 120.1 minute duration ), 2nd Run (69.8 minute duration) and 3rd Run (16.3 minute duration). The ramp up times are actually as expected (explained below). What you are witnessing is more likely that the driver is crashing because the values are out of range from the scale of the database. You have effectively told the driver that there are more than 10 districts per warehouse, and have likely not built the database that way. I'm actually surprised the driver actually ramped up completely. I run the dbt2 test with the following configuration i.e. WAREHOUSES=100 DB_CONNECTIONS=20 REGRESS_DURATION=7200 #HOURS
Re: [PERFORM] big data - slow select (speech search)
On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso michal.fa...@gmail.com wrote: It took about 4.5 seconds. If I rerun it, it takes less than 2 miliseconds, but it is because of the cache. I need to optimize the first-run. laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM EXPLAIN ANALYZE SELECT h1.docid FROM hyps AS h1 WHERE h1.wordid=65658; Bitmap Heap Scan on hyps h1 (cost=10.97..677.09 rows=171 width=4) (actual time=62.106..4416.864 rows=343 loops=1) Recheck Cond: (wordid = 65658) - Bitmap Index Scan on hyps_wordid_index (cost=0.00..10.92 rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1) Index Cond: (wordid = 65658) Total runtime: 4432.015 ms If I run the same query in Lucene search engine, it takes 0.105 seconds on the same data which is quite a huge difference. So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12 ms/row. I'm not an expert on seek times, but that might not really be that unreasonable, considering that those rows may be scattered all over the index and thus it may be basically random I/O. Have you tried clustering hyps on hyps_wordid_index? If you had a more sophisticated disk subsystem you could try increasing effective_io_concurrency but that's not going to help with only one spindle. If you run the same query in Lucene and it takes only 0.105 s, then Lucene is obviously doing a lot less I/O. I doubt that any amount of tuning of your existing schema is going to produce that kind of result on PostgreSQL. Using the full-text search stuff, or a gin index of some kind, might get you closer, but it's hard to beat a special-purpose engine that implements exactly the right algorithm for your use case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issues with postgresql-8.4.0: Query gets stuck sometimes
On Fri, Jul 2, 2010 at 1:40 AM, Sachin Kumar sachin...@globallogic.com wrote: At times we have observed that postgres stops responding for several minutes, even couldn’t fetch the number of entries in a particular table. One such instance happens when we execute the following steps: Sounds sort of like a checkpoint spike. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SeqScans on boolen values / How to speed this up?
On 05/07/10 19:36, Jens Hoffrichter wrote: Hello everyone, We've recently finished developing a bigger webapplication, and we are about to put it online. I ran some load tests yesterday, and configured 'slow query' logging beforehand, so I could see if there might be a performance bottleneck in the PG. While I discovered no real problems, the log file analysis using pgFouine revealed two queries, which are executed often, and take quite a bit some time. It might be worth looking at what queries have results that change infrequently or don't have to be up to the minute accurate, so they're candidates for caching. Memcached is an incredibly handy tool for taking load off your database. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SeqScans on boolen values / How to speed this up?
On Mon, Jul 5, 2010 at 5:36 AM, Jens Hoffrichter jens.hoffrich...@gmail.com wrote: Hello everyone, We've recently finished developing a bigger webapplication, and we are about to put it online. If you're checking for bools, and 99.99% of the result is just true or just false, look at creating partial indexes on the .01% part. create index (boolfield) where boolfield is true (or is false) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance