Re: [PERFORM] analyzing intermediate query
Scott, Often times, switching an inner subselect that requires a distinct to a group by on that column yields better results. In this case, the IN should be equivalent, so it probably will not help. This would look like: SELECT dok.* FROM dok JOIN (SELECT dokumnr FROM temptbl GROUP BY dokumnr ) x USING(dokumnr); Thank you. This may be great idea. I changed my query to use GROUP BY instead of DISTINCT Whether that hepls depends on how big dokumnr is and where the query bottleneck is. I'm wondering how this can solve the issue when there is single or few dokumnr columns. Planner still thinks that temptbl projection contains 1000 rows and uses seq scan instead of using bitmap index on dok table. I tried SELECT dok.* FROM dok JOIN (SELECT dokumnr FROM temptbl GROUP BY dokumnr ANALYZE ) x USING(dokumnr); but got error. Note there are subtle differences between DISTINCT and GROUP BY with respect to nulls. dokumnr is int type and is not null always. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] analyzing intermediate query
Oh, I just thought about something, I don't remember in which version it was added, but : EXPLAIN ANALYZE SELECT sum(column1) FROM (VALUES ...a million ntegers... ) AS v Postgres is perfectly happy with that ; it's either a bit slow (about 1 second) or very fast depending on how you view things... I tried in 8.1.4 select * from (values (0)) xx but got ERROR: syntax error at or near ")" SQL state: 42601 Character: 26 Even if this works this may be not solution: I need to apply distinct to temporary table. Temporary table may contain duplicate values and without DISTINCT join produces invalid result. Temporary table itself is created from data from server tables, it is not generated from list. I can use SELECT dok.* FROM dok WHERE dokumnr IN (SELECT dokumnr FROM temptbl) but this seems never use bitmap index scan in 8.1.4 Sadly, creating second temporary table from first temporary table specially for this query seems to be only solution. When materialized row count will be added so that statistics is exact and select count(*) from tbl runs fast ? Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] analyzing intermediate query
My list can contain 1 .. 10 records and table contains 300 records and is growing. Ah. No IN(), then ;) Temp table + ANALYZE seems your only option... In 8.3 or 8.4 I think that IN() or temp table produce exactly the same result. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] analyzing intermediate query
Generally if you know your temptbl will always contains a few rows (say, generally a few and never more than a few thousands) it is better to use something like that : - get list of items - SELECT * FROM table WHERE id IN (...) My list can contain 1 .. 10 records and table contains 300 records and is growing. As discussed here few time ago, IN (...) forces seq scan over 300 rows and maybe stack overflow exception also occurs (stack overflow occurs in 8.0, maybe it is fixed in 8.1). Using temp table + ANALYZE enables bitmap index scan for this query and is thus a lot faster. I formerly used IN (...) but changed this to use temp table + primary key on temp table + analyze this temp table. Using 8.1.4 I can switch this to temp table also if it helps. This requires some special logic to generate temp table name since there may be a number of such tables in single transaction, so is would be major appl rewrite. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] analyzing intermediate query
I noticed that query SELECT dok.* FROM dok JOIN (SELECT DISTINCT dokumnr FROM temptbl ) x USING(dokumnr); is slow in 8.1.4 I cannot use explain analyze since this query uses results from temporary table temptbl which is not available. Sometimes innter table returns only 1 row so maybe seq scan is selected instead of single row index access becauses expected count is 1000 As I understand, PostgreSql requires manually running ANALYZE for temporary tables if their row count is different from 1000 How to force PostgreSql to analyze inner table in this query or use other way to get index using query plan if inner query returns single row ? How -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Seq scan over 3.3 millions of rows instead of using date and pattern indexes
explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) WHERE dok.kuupaev>='2008-05-01' and ( ( dok.doktyyp IN ('V','G','Y','K','I','T','D','N','H','M','E','B','A','R','C','F','J','Q') AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus ELSE rid.kuluobjekt END LIKE 'AEGVIIDU%' ) OR ( dok.doktyyp IN ('O','S','I','U','D','P') AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus END LIKE 'AEGVIIDU%' ) ) "Aggregate (cost=369240.67..369240.68 rows=1 width=0) (actual time=41135.557..41135.560 rows=1 loops=1)" " -> Hash Join (cost=96614.24..369229.39 rows=4508 width=0) (actual time=5859.704..40912.979 rows=59390 loops=1)" "Hash Cond: ("outer".dokumnr = "inner".dokumnr)" "Join Filter: "inner".doktyyp = 'V'::bpchar) OR ("inner".doktyyp = 'G'::bpchar) OR ("inner".doktyyp = 'Y'::bpchar) OR ("inner".doktyyp = 'K'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR ("inner".doktyyp = 'T'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR ("inner".doktyyp = 'N'::bpchar) OR ("inner".doktyyp = 'H'::bpchar) OR ("inner".doktyyp = 'M'::bpchar) OR ("inner".doktyyp = 'E'::bpchar) OR ("inner".doktyyp = 'B'::bpchar) OR ("inner".doktyyp = 'A'::bpchar) OR ("inner".doktyyp = 'R'::bpchar) OR ("inner".doktyyp = 'C'::bpchar) OR ("inner".doktyyp = 'F'::bpchar) OR ("inner".doktyyp = 'J'::bpchar) OR ("inner".doktyyp = 'Q'::bpchar)) AND (CASE WHEN ((NOT ("inner".objrealt)::boolean) OR ("inner".doktyyp = 'I'::bpchar)) THEN "inner".yksus ELSE "outer".kuluobjekt END ~~ 'AEGVIIDU%'::text)) OR ((("inner".doktyyp = 'O'::bpchar) OR ("inner".doktyyp = 'S'::bpchar) OR ("inner".doktyyp = 'I'::bpchar) OR ("inner".doktyyp = 'U'::bpchar) OR ("inner".doktyyp = 'D'::bpchar) OR ("inner".doktyyp = 'P'::bpchar)) AND (CASE WHEN ("inner".objrealt)::boolean THEN "outer".kuluobjekt ELSE "inner".sihtyksus END ~~ 'AEGVIIDU%'::text)))" "-> Seq Scan on rid (cost=0.00..129911.53 rows=3299853 width=18) (actual time=0.039..17277.888 rows=3299777 loops=1)" "-> Hash (cost=92983.97..92983.97 rows=336110 width=38) (actual time=3965.478..3965.478 rows=337455 loops=1)" " -> Bitmap Heap Scan on dok (cost=1993.66..92983.97 rows=336110 width=38) (actual time=135.810..2389.703 rows=337455 loops=1)" "Recheck Cond: (kuupaev >= '2008-05-01'::date)" "Filter: ((doktyyp = 'V'::bpchar) OR (doktyyp = 'G'::bpchar) OR (doktyyp = 'Y'::bpchar) OR (doktyyp = 'K'::bpchar) OR (doktyyp = 'I'::bpchar) OR (doktyyp = 'T'::bpchar) OR (doktyyp = 'D'::bpchar) OR (doktyyp = 'N'::bpchar) OR (doktyyp = 'H'::bpchar) OR (doktyyp = 'M'::bpchar) OR (doktyyp = 'E'::bpchar) OR (doktyyp = 'B'::bpchar) OR (doktyyp = 'A'::bpchar) OR (doktyyp = 'R'::bpchar) OR (doktyyp = 'C'::bpchar) OR (doktyyp = 'F'::bpchar) OR (doktyyp = 'J'::bpchar) OR (doktyyp = 'Q'::bpchar) OR (doktyyp = 'O'::bpchar) OR (doktyyp = 'S'::bpchar) OR (doktyyp = 'I'::bpchar) OR (doktyyp = 'U'::bpchar) OR (doktyyp = 'D'::bpchar) OR (doktyyp = 'P'::bpchar))" "-> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1993.66 rows=347618 width=0) (actual time=97.881..97.881 rows=337770 loops=1)" " Index Cond: (kuupaev >= '2008-05-01'::date)" "Total runtime: 41136.348 ms" 8.1.4 Db is analyzed, default_statistics_target is 40. PostgreSql still choices seq scan over rid. This query can optimized as follows: 1. kuupaev >= '2008-05-01' index can reduce number of scanned rows 10 times (to 33) 2. AEGVIIDU% can reduce number of rows 6 times (to 6) How to force pg to use indexes for those conditions ? This query can be executed against different shops groups (int this case there is other value than AEGVIIDU) and for different date. There are 6 different shop groups containing roughly same number or records each. So using index on AEGVIIDU% can decrease number of scanned rows 6 times. Usually 90% of dok records contain 'Y' in dok.doktyyp column and dok.objrealt is false for those records. Is it possible to use come functional index or other method to speed it ? rid.kuluobjekt, dok.yksus and dok.sihtyksus types are char(10). There are indexes dok(yksus bpchar_pattern_ops) dok(sihtyksus bpchar_pattern_ops) Is it possible to re-write query that it uses those indexes or create some other indexes? Using 8.1.4, us-en locale, utf-8 db encoding. select column list is removed from sample. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing GROUP BY CHAR columns speed
I'm still not sure why the planner chose to sort rather than hash with oversized work_mem (is there an implied order in the query results I missed?). Group by contains decimal column exchrate. Maybe pg is not capable to use hash with numeric datatype. My guess is that this query can still get much faster if a hash is possible on the last part. It looks like the gain so far has more to do with sorting purely in memory which reduced the number of compares required. But that is just a guess. I fixed this by adding cast to :::float bilkaib.exchrate:::float In this case query is much faster. Hopefully this will not affect to result since numeric(13,8) can casted to float without data loss. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing GROUP BY CHAR columns speed
I it seems that slowness is caused by grouping by column exchrate numeric(13,8) exchrate has different values in few rows. It has usually value 0 In this sample query it is always 0. I tried not change exchrate with nullif( bilkaib.EXCHRATE,0) but this does not up speed query, no idea why. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing GROUP BY CHAR columns speed
I it seems that slowness is caused by grouping by column exchrate numeric(13,8) if this column is excluded, query takes 12 seconds if this column in present, query takes 27 (!) seconds. How to fix this ? Andrus. set search_path to firma2,public; SET work_mem = 2097151 ; explain analyze SELECT CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END, bilkaib.DB, bilkaib.CR, bilkaib.RAHA, -- 12 sek bilkaib.EXCHRATE, -- 27 sec SUM(bilkaib.summa)::numeric(14,2) AS summa from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND crkonto.iseloom='A' join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND dbkonto.iseloom='A' left join klient on bilkaib.klient=klient.kood WHERE --(bilkaib.cr LIKE ''||'%' OR bilkaib.db LIKE ''||'%') bilkaib.kuupaev BETWEEN '2008-01-01' AND '2008-12-31' GROUP BY 1,2,3,4,5 "GroupAggregate (cost=71338.72..79761.05 rows=240638 width=58) (actual time=24570.085..27382.022 rows=217 loops=1)" " -> Sort (cost=71338.72..71940.31 rows=240638 width=58) (actual time=24566.700..25744.006 rows=322202 loops=1)" "Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE bilkaib.id END, bilkaib.db, bilkaib.cr, bilkaib.raha, bilkaib.exchrate" "-> Hash Left Join (cost=193.31..49829.89 rows=240638 width=58) (actual time=17.072..9901.578 rows=322202 loops=1)" " Hash Cond: ("outer".klient = "inner".kood)" " -> Hash Join (cost=42.18..45624.00 rows=240638 width=74) (actual time=4.715..7151.111 rows=322202 loops=1)" "Hash Cond: ("outer".cr = "inner".kontonr)" "-> Hash Join (cost=21.09..41803.63 rows=278581 width=74) (actual time=2.306..4598.703 rows=322202 loops=1)" " Hash Cond: ("outer".db = "inner".kontonr)" " -> Seq Scan on bilkaib (cost=0.00..37384.19 rows=322507 width=74) (actual time=0.075..1895.027 rows=322202 loops=1)" "Filter: ((kuupaev >= '2008-01-01'::date) AND (kuupaev <= '2008-12-31'::date))" " -> Hash (cost=20.49..20.49 rows=241 width=14) (actual time=2.193..2.193 rows=241 loops=1)" "-> Seq Scan on konto dbkonto (cost=0.00..20.49 rows=241 width=14) (actual time=0.011..1.189 rows=241 loops=1)" " Filter: (iseloom = 'A'::bpchar)" "-> Hash (cost=20.49..20.49 rows=241 width=14) (actual time=2.386..2.386 rows=241 loops=1)" " -> Seq Scan on konto crkonto (cost=0.00..20.49 rows=241 width=14) (actual time=0.020..1.394 rows=241 loops=1)" "Filter: (iseloom = 'A'::bpchar)" " -> Hash (cost=147.90..147.90 rows=1290 width=16) (actual time=12.319..12.319 rows=1290 loops=1)" "-> Seq Scan on klient (cost=0.00..147.90 rows=1290 width=16) (actual time=0.032..6.979 rows=1290 loops=1)" "Total runtime: 27434.724 ms" set search_path to firma2,public; SET work_mem = 2097151 ; explain analyze SELECT CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END, bilkaib.DB, bilkaib.CR, bilkaib.RAHA, SUM(bilkaib.summa)::numeric(14,2) AS summa from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND crkonto.iseloom='A' join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND dbkonto.iseloom='A' left join klient on bilkaib.klient=klient.kood WHERE bilkaib.kuupaev BETWEEN '2008-01-01' AND '2008-12-31' GROUP BY 1,2,3,4 "HashAggregate (cost=52837.86..57049.03 rows=240638 width=50) (actual time=11744.137..11745.578 rows=215 loops=1)" " -> Hash Left Join (cost=193.31..49829.89 rows=240638 width=50) (actual time=17.330..9826.549 rows=322202 loops=1)" "Hash Cond: ("outer".klient = "inner".kood)" "-> Hash Join (cost=42.18..45624.00 rows=240638 width=66) (actual time=4.804..7141.983 rows=322202 loops=1)" " Hash Cond: ("outer".cr = "inner".kontonr)" " -> Hash Join (cost=21.09..41803.63 rows=278581 width=66) (actual time=2.343..4600.683 rows=322202 loops=1)" "Hash Cond: ("outer".db = "inner".kontonr)" "-> Seq Scan on bilkaib (cost=0.00..37384.19 rows=322507 width=66) (actual time=0.081..1939.376 rows=322202 loops=1)" "
Re: [PERFORM] Increasing GROUP BY CHAR columns speed
Application should work with any server starting at 8.1 with any RAM size (probably starting at 1 GB). How to find safe value which does not cause error in SET work_mem command ? If I use 2 GB maybe this can cause excaption when running in server with 1 GB RAM where this setting may be not allowed. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing GROUP BY CHAR columns speed
Scott, Thank you. The below query is spending most of its time in the sort, or perhaps the complicated check condition before it. The explain has a 8 second gap in time between the 2.8 seconds after the Hash Left Join and before the Sort. I'm guessing its hidden in the sort. You can get the planner to switch from a sort to a hash aggregate with a large work_mem. Try calling SET work_mem = '100MB'; before this query first. It may not help that much if the check time is as expensive as it looks in the plan below, but its very easy to try. If it does help, you may want to temporarily increase that value only for this query rather than making it a default in the config file. SET work_mem = 2097151 (this is max allowed value) or SET work_mem = 97151 decreases query time from 12 seconds to 9 seconds. My application may ran in servers with 1 GB RAM only. I'm afraid than in those servers 2097151 will cause error and abort query. Is it reasonable to add SET work_mem = 97151 before this query and SET work_mem TO DEFAULT after this query ? Or should I use max value in cases where there are much more data ? This query may return a much more data for longer period and more accounts. CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END:: CHAR(10) AS dbobjekt is ugly. I tried to rewrite it using NullIfNot(dbkonto.objekt1, '+') AS dbobjekt, bot got error ERROR: function nullifnot(character, "unknown") does not exist How to re-write this in nicer and faster way ? For most of rows checks WHEN objektn='+' will fail: objektn values are usually rarely equal to '+': they are empty or null mostly. Maybe this can be used to optimize the query. Btw. Tom Lane's reply from earlier discussion about this query speed (then there were '' instead of NULL in group columns) some years ago: "I think the problem is probably that you're sorting two dozen CHAR columns, and that in many of the rows all these entries are '' forcing the sort code to compare all two dozen columns (not so)? So the sort ends up doing lots and lots and lots of CHAR comparisons. Which can be slow, especially in non-C locales." locale specific check is not nessecary for those CHAR(10) columns. How to force PostgreSql to use binary check for grouping ? Some dbms allow to mark columns as C locale. I havent found this nor chartobin() function in PostgreSql. Will creating BinaryNullIfNot(dbkonto.objekt1, '+') function solve this ? Andrus. New testcase: set search_path to firma2,public; SET work_mem = 2097151; -- 9 seconds -- SET work_mem = 1097151; -- 9 seconds --SET work_mem to default; -- 12 seconds explain analyze SELECT CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END, bilkaib.DB, CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END:: CHAR(10) AS dbobjekt, CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END:: CHAR(10) AS db2objekt, CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END:: CHAR(10) AS db3objekt, CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END:: CHAR(10) AS db4objekt, CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END:: CHAR(10) AS db5objekt, CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END:: CHAR(10) AS db6objekt, CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END:: CHAR(10) AS db7objekt, CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END:: CHAR(10) AS db8objekt, CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END:: CHAR(10) AS db9objekt, bilkaib.CR, CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END:: CHAR(10) AS crobjekt, CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END:: CHAR(10) AS cr2objekt, CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END:: CHAR(10) AS cr3objekt, CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END:: CHAR(10) AS cr4objekt, CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END:: CHAR(10) AS cr5objekt, CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END:: CHAR(10) AS cr6objekt, CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END:: CHAR(10) AS cr7objekt, CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END:: CHAR(10) AS cr8objekt, CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END:: CHAR(10) AS cr9objekt, bilkaib.RAHA, CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I') THEN bilkaib.KLIENT ELSE NULL END AS klient, bilkaib.EXCHRATE, CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.ty
[PERFORM] Increasing GROUP BY CHAR columns speed
Group by using CHAR columns takes abnormally big time. How to speed it ? Andrus. 8.1.4, cluster locale is en-us, db encoding is utf-8 set search_path to firma2,public; explain analyze SELECT CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END, bilkaib.DB, CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END:: CHAR(10) AS dbobjekt, CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END:: CHAR(10) AS db2objekt, CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END:: CHAR(10) AS db3objekt, CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END:: CHAR(10) AS db4objekt, CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END:: CHAR(10) AS db5objekt, CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END:: CHAR(10) AS db6objekt, CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END:: CHAR(10) AS db7objekt, CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END:: CHAR(10) AS db8objekt, CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END:: CHAR(10) AS db9objekt, bilkaib.CR, CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END:: CHAR(10) AS crobjekt, CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END:: CHAR(10) AS cr2objekt, CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END:: CHAR(10) AS cr3objekt, CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END:: CHAR(10) AS cr4objekt, CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END:: CHAR(10) AS cr5objekt, CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END:: CHAR(10) AS cr6objekt, CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END:: CHAR(10) AS cr7objekt, CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END:: CHAR(10) AS cr8objekt, CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END:: CHAR(10) AS cr9objekt, bilkaib.RAHA, CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I') THEN bilkaib.KLIENT ELSE NULL END AS klient, bilkaib.EXCHRATE, CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I') THEN klient.nimi ELSE NULL END AS kliendinim, -- 24. CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa OR (bilkaib.cr<>'00' AND crkonto.tyyp='K') OR (bilkaib.db<>'00' AND dbkonto.tyyp='K') THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr ,bilkaib.ratediffer ,CASE WHEN bilkaib.raha='EEK' THEN DATE'20070101' ELSE bilkaib.kuupaev END AS kuupaev ,SUM(bilkaib.summa)::numeric(14,2) AS summa from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND crkonto.iseloom='A' join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND dbkonto.iseloom='A' left join klient on bilkaib.klient=klient.kood where ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26' GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 "GroupAggregate (cost=52316.23..61434.48 rows=41923 width=838) (actual time=10771.337..11372.135 rows=577 loops=1)" " -> Sort (cost=52316.23..52421.03 rows=41923 width=838) (actual time=10770.529..11012.651 rows=52156 loops=1)" "Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE bilkaib.id END, bilkaib.db, (CASE WHEN (dbkonto.objekt1 = '+'::bpchar) THEN bilkaib.dbobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt2 = '+'::bpchar) THEN bilkaib.db2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt3 = '+'::bpchar) THEN bilkaib.db3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt4 = '+'::bpchar) THEN bilkaib.db4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt5 = '+'::bpchar) THEN bilkaib.db5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt6 = '+'::bpchar) THEN bilkaib.db6objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt7 = '+'::bpchar) THEN bilkaib.db7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt8 = '+'::bpchar) THEN bilkaib.db8objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN (dbkonto.objekt9 = '+'::bpchar) THEN bilkaib.db9objekt ELSE NULL::bpchar END)::character(10), bilkaib.cr, (CASE WHEN (crkonto.objekt1
Re: [PERFORM] Increasing pattern index query speed
har) OR (doktyyp = 'E'::bpchar) OR (doktyyp = 'B'::bpchar) OR (doktyyp = 'A'::bpchar) OR (doktyyp = 'R'::bpchar) OR (doktyyp = 'C'::bpchar) OR (doktyyp = 'F'::bpchar) OR (doktyyp = 'J'::bpchar) OR (doktyyp = 'Q'::bpchar) OR (doktyyp = 'O'::bpchar) OR (doktyyp = 'S'::bpchar) OR (doktyyp = 'I'::bpchar) OR (doktyyp = 'U'::bpchar) OR (doktyyp = 'D'::bpchar) OR (doktyyp = 'P'::bpchar))" "-> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..414.75 rows=72500 width=0) (actual time=20.049..20.049 rows=72664 loops=1)" " Index Cond: (kuupaev >= '2008-10-01'::date)" "-> Sort (cost=1775.54..1809.10 rows=13423 width=24) (actual time=223.235..457.888 rows=59876 loops=1)" " Sort Key: toode.toode" " -> Seq Scan on toode (cost=0.00..855.23 rows=13423 width=24) (actual time=0.046..63.783 rows=13427 loops=1)" "Total runtime: 32807.767 ms" How to speed this up ? 'RIISIPERE%' is shop group code. Using this condition can limit scan to 6 times less documentes since there are 6 shops. Mayber is it possible to create indexes or other way to force index search for condition dok.doktyyp IN ('V','G','Y','K','I','T','D','N','H','M','E','B','A','R','C','F','J','Q') AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus ELSE rid.kuluobjekt END LIKE 'RIISIPERE%' ) OR ( dok.doktyyp IN ('O','S','I','U','D','P') AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus END LIKE 'RIISIPERE%' ) or is it possible to re-write this condition so that it uses existing pattern indexes CREATE INDEX dok_sihtyksus_pattern_idx ON firma2.dok (sihtyksus bpchar_pattern_ops); CREATE INDEX dok_yksus_pattern_idx ON firma2.dok (yksus bpchar_pattern_ops); without changing tables structureˇ? Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Richard, Results are provided in bottom of the message to which you replied. No - the explains there were contrasting a date test BETWEEN versus =. I changed rid.toode statitics target to 100: ALTER TABLE firma2.rid ALTER COLUMN toode SET STATISTICS 100; analyze firma2.rid; Analyze takes 3 seconds and testcase rans fast. I'm planning to monitor results by looking log file for queries which take longer than 10 seconds. Do you still need results ? If yes, which query and how many times should I run? Ah, I think I understand. The test case was *missing* this clause. I added this clause to testcase. Also added char(70) colums containing padding characters to all three tables. Cannot still reproduce this issue in testcase in fast devel 8.3 notebook. In testcase order_products contains product_id values in a very regular order, maybe this affects the results. No idea how to use random() to generate random products for every order. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Is it OK to run ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000 in prod database or should I try to decrease 1000 to smaller value ? rid is big increasing table and is changed frequently, mostly by adding rows. pgAdmin shows default_statistic_target value has its default value 10 in postgresql.conf file Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Richard, And the results were? Results are provided in bottom of the message to which you replied. One problem at a time. Let's get the pattern-matching speed problems on your live server sorted, then we can look at different queries. First message in this thread described the issue with query having additional condition AND dok.kuupaev BETWEEN '2008-11-21' AND '2008-11-21' It seems that this problem occurs when pattern matching and BETWEEN conditions are used in same query. According to Scott Garey great recommendation I added ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000; This fixes testcase in live server, see my other message. Is it OK to run ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000 in prod database or should I try to decrease 1000 to smaller value ? rid is big increasing table and is changed frequently, mostly by adding rows. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Scott, My first thought on the query where a pattern being faster than the query with an exact value is that the planner does not have good enough statistics on that column. Without looking at the explain plans further, I would suggest trying something simple. The fact that it is fasster on 8.3 but slower on 8.1 may have to do with changes between versions, or may simply be due to luck in the statistics sampling. See if increasing the statistics target on that column significantly does anything: EXPLAIN (your query); ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000; ANALYZE orders_products; EXPLAIN (your query); 2000 is simply a guess of mine for a value much larger than the default. This will generally make query planning slower but the system will have a lot more data about that column and the distribution of data in it. This should help stabilize the query performance. If this has an effect, the query plans will change. Your question below really boils down to something more simple: --Why is the most optimal query plan not chosen? This is usually due to either insufficient statistics or quirks in how the query planner works on a specific data >set or with certain configuration options. Thank you very much. I found that AND dok.kuupaev = date'2008-11-21' runs fast but AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' runs very slow. explain SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '9910%' plan with default statistics: "Aggregate (cost=17.86..17.87 rows=1 width=0)" " -> Nested Loop (cost=0.00..17.85 rows=1 width=0)" "-> Nested Loop (cost=0.00..11.84 rows=1 width=24)" " Join Filter: ("outer".dokumnr = "inner".dokumnr)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81 rows=1 width=4)" "Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" " -> Index Scan using rid_toode_pattern_idx on rid (cost=0.00..6.01 rows=1 width=28)" "Index Cond: ((toode ~>=~ '9910'::bpchar) AND (toode ~<~ '9911'::bpchar))" "Filter: (toode ~~ '9910%'::text)" "-> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1 width=24)" " Index Cond: ("outer".toode = toode.toode)" after statistics is changed query runs fast ( 70 ... 1000 ms) ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000; analyze rid; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '9910%' AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' "Aggregate (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834 rows=1 loops=1)" " -> Nested Loop (cost=0.00..27.04 rows=1 width=0) (actual time=0.727..44.370 rows=108 loops=1)" "-> Nested Loop (cost=0.00..21.02 rows=1 width=24) (actual time=0.688..40.519 rows=108 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81 rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)" "Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" " -> Index Scan using rid_dokumnr_idx on rid (cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0 loops=1678)" "Index Cond: ("outer".dokumnr = rid.dokumnr)" "Filter: (toode ~~ '9910%'::text)" "-> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1 width=24) (actual time=0.016..0.020 rows=1 loops=108)" " Index Cond: ("outer".toode = toode.toode)" "Total runtime: 45.050 ms" It seems that you are genius. I used 1000 since doc wrote that max value is 1000 Rid table contains 3.5millions rows, will increase 1 millions of rows per year and is updated frequently, mostly by adding. Is it OK to leave SET STATISTICS 1000; setting for this table this column or should I try to decrease it ? Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Richard and Mario, You can't use xxx_pattern_ops indexes for non-pattern tests. I missed regular index. Sorry for that. Now issue with testcase is solved. Thank you very much. I researched issue in live 8.1.4 db a bit more. Performed vacuum and whole db reindex. Tried several times to run two same pattern queries in quiet db. additonal condition AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' takes 239 seconds to run. additonal condition AND dok.kuupaev = date'2008-11-21' takes 1 seconds. Both query conditions are logically the same. How to make BETWEEN query fast (real queries are running as between queries over some date range)? P.S. VACUUM issues warning that free space map 15 is not sufficient, 16 nodes reqired. Two days ago after vacuum full there were 6 used enties in FSM. No idea why this occurs. Andrus. set search_path to firma2,public; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '9910%' AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' "Aggregate (cost=17.86..17.87 rows=1 width=0) (actual time=239346.647..239346.651 rows=1 loops=1)" " -> Nested Loop (cost=0.00..17.85 rows=1 width=0) (actual time=3429.715..239345.923 rows=108 loops=1)" "-> Nested Loop (cost=0.00..11.84 rows=1 width=24) (actual time=3429.666..239339.687 rows=108 loops=1)" " Join Filter: ("outer".dokumnr = "inner".dokumnr)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81 rows=1 width=4) (actual time=0.028..13.341 rows=1678 loops=1)" "Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" " -> Index Scan using rid_toode_pattern_idx on rid (cost=0.00..6.01 rows=1 width=28) (actual time=0.025..86.156 rows=15402 loops=1678)" "Index Cond: ((toode ~>=~ '9910'::bpchar) AND (toode ~<~ '9911'::bpchar))" "Filter: (toode ~~ '9910%'::text)" "-> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1 width=24) (actual time=0.032..0.037 rows=1 loops=108)" " Index Cond: ("outer".toode = toode.toode)" "Total runtime: 239347.132 ms" explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '9910%' AND dok.kuupaev = date'2008-11-21' "Aggregate (cost=17.86..17.87 rows=1 width=0) (actual time=707.028..707.032 rows=1 loops=1)" " -> Nested Loop (cost=0.00..17.85 rows=1 width=0) (actual time=60.890..706.460 rows=108 loops=1)" "-> Nested Loop (cost=0.00..11.84 rows=1 width=24) (actual time=60.848..701.908 rows=108 loops=1)" " -> Index Scan using rid_toode_pattern_idx on rid (cost=0.00..6.01 rows=1 width=28) (actual time=0.120..247.636 rows=15402 loops=1)" "Index Cond: ((toode ~>=~ '9910'::bpchar) AND (toode ~<~ '9911'::bpchar))" "Filter: (toode ~~ '9910%'::text)" " -> Index Scan using dok_dokumnr_idx on dok (cost=0.00..5.81 rows=1 width=4) (actual time=0.021..0.021 rows=0 loops=15402)" "Index Cond: (dok.dokumnr = "outer".dokumnr)" "Filter: (kuupaev = '2008-11-21'::date)" "-> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1 width=24) (actual time=0.021..0.026 rows=1 loops=108)" " Index Cond: ("outer".toode = toode.toode)" "Total runtime: 707.250 ms" vmstat 5 output during running slower query: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 2 0332 738552 0 126483200 4 1111 6 1 83 10 1 0332 738520 0 126483200 0 135 25934 24 76 0 0 1 0332 738488 0 126483200 0 112 26342 24 76 0 0 1 0332 738504 0 126483200 013 25219 23 77 0 0 1 0332 738528 0 126483200 031 25526 26 74 0 0 1 0332 738528 0 126483200 0 6 25118 27 73 0 0 1 0332 738544 0 126485600 522 25425 27 73 0 0 1 0332 737908 0 126485600 013 25222 27 73 0 0 1 0332 737932 0 126485600 0 2 25118 23 77 0 0 1 0332 737932
Re: [PERFORM] limit clause produces wrong query plan
it was veery fast. To be honest I do not know what is happening?! This is really weird. It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical paging queries SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100 or even first page query SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100 cannot be used in PostgreSql at all for big tables. Do you have any idea how to fix this ? Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
9 2 0 98 0 1 0216 124664 0 189924400 0 7 26662 2 0 97 0 0 0216 124788 0 189924800 0 8 27373 3 1 96 0 0 0216 124656 0 189925200 0 2 26554 2 0 97 0 1 0216 124656 0 189925200 022 26863 14 39 48 0 < start of query 1 0216 124656 0 189925200 062 26761 25 75 0 0 1 0216 124656 0 189925200 0 2 26655 28 72 0 0 1 0216 124664 0 189925600 0 5 26556 26 74 0 0 1 0216 124788 0 189925600 010 27167 25 75 0 0 1 0216 124664 0 189925600 0 3 26554 25 75 0 0 1 0216 124160 0 189926000 0 1 26557 28 72 0 0 1 0216 125020 0 189926000 021 27260 28 72 0 0 1 0216 124020 0 189926400 0 0 27173 29 71 0 0 0 0216 124260 0 189926800 0 3 26661 19 59 22 0 <-- end of query 1 0216 125268 0 189926000 0 9 26859 2 0 97 0 0 0216 124912 0 189926800 0 5 27059 3 0 96 0 0 0216 124656 0 189927200 0 5 26764 2 0 98 0 0 0216 124664 0 189927200 0 0 26350 2 0 98 0 top shows single postmaster process activity: top - 23:07:49 up 27 days, 4:20, 1 user, load average: 0.25, 0.22, 0.12 Tasks: 50 total, 3 running, 47 sleeping, 0 stopped, 0 zombie Cpu(s): 29.3% us, 70.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 2075828k total, 1951604k used, 124224k free,0k buffers Swap: 3911816k total, 216k used, 3911600k free, 1899348k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29687 postgres 25 0 144m 124m 121m R 96.3 6.2 8:18.04 postmaster 8147 root 16 0 4812 1628 1316 S 1.0 0.1 0:00.03 sshd 8141 root 15 0 5080 1892 1528 S 0.7 0.1 0:00.02 sshd 8145 sshd 15 0 4816 1220 912 S 0.3 0.1 0:00.01 sshd 8151 sshd 15 0 4812 1120 816 S 0.3 0.1 0:00.01 sshd 1 root 16 0 1480 508 444 S 0.0 0.0 0:01.89 init 2 root 34 19 000 S 0.0 0.0 0:00.01 ksoftirqd/0 3 root 10 -5 000 S 0.0 0.0 0:00.00 events/0 4 root 10 -5 000 S 0.0 0.0 0:00.63 khelper 5 root 10 -5 000 S 0.0 0.0 0:00.00 kthread 7 root 10 -5 000 S 0.0 0.0 2:21.73 kblockd/0 8 root 20 -5 000 S 0.0 0.0 0:00.00 kacpid 115 root 13 -5 000 S 0.0 0.0 0:00.00 aio/0 114 root 15 0 000 S 0.0 0.0 9:21.41 kswapd0 116 root 10 -5 000 S 0.0 0.0 0:12.06 xfslogd/0 117 root 10 -5 000 S 0.0 0.0 1:36.43 xfsdatad/0 706 root 10 -5 000 S 0.0 0.0 0:00.00 kseriod 723 root 13 -5 000 S 0.0 0.0 0:00.00 kpsmoused 738 root 11 -5 000 S 0.0 0.0 0:00.00 ata/0 740 root 11 -5 000 S 0.0 0.0 0:00.00 scsi_eh_0 741 root 11 -5 000 S 0.0 0.0 0:00.00 scsi_eh_1 742 root 11 -5 000 S 0.0 0.0 0:00.00 scsi_eh_2 743 root 11 -5 000 S 0.0 0.0 0:00.00 scsi_eh_3 762 root 10 -5 000 S 0.0 0.0 0:18.64 xfsbufd 763 root 10 -5 000 S 0.0 0.0 0:00.79 xfssyncd 963 root 16 -4 1712 528 336 S 0.0 0.0 0:00.24 udevd 6677 root 15 0 1728 572 400 S 0.0 0.0 0:10.14 syslog-ng 7183 root 15 0 3472 828 672 S 0.0 0.0 0:08.50 sshd 7222 root 16 0 1736 672 556 S 0.0 0.0 0:00.03 cron 7237 root 16 0 1620 712 608 S 0.0 0.0 0:00.00 agetty 7238 root 17 0 1616 712 608 S 0.0 0.0 0:00.00 agetty 7239 root 16 0 1616 712 608 S 0.0 0.0 0:00.00 agetty 7240 root 16 0 1616 708 608 S 0.0 0.0 0:00.00 agetty 7241 root 16 0 1616 708 608 S 0.0 0.0 0:00.00 agetty 31873 root 15 0 1616 712 608 S 0.0 0.0 0:00.00 agetty 14908 postgres 16 0 141m 10m 9936 S 0.0 0.5 0:01.44 postmaster 14915 postgres 16 0 8468 1360 896 S 0.0 0.1 0:00.36 postmaster Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing pattern index query speed
Richard, These are the same but the times are different. I'd be very surprised if you can reproduce these times reliably. I re-tried today again and got same results: in production database pattern query is many times slower that equality query. toode and rid base contain only single product starting with 9910 So both queries should scan exactly same numbers of rows. Can I give you some wider-ranging suggestions Andrus? 1. Fix the vacuuming issue in your hash-join question. I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=15 So issue is fixed before those tests. 2. Monitor the system to make sure you know if/when disk activity is high. I optimized this system. Now there are short (some seconds) sales queries about after every 5 - 300 seconds which cause few disk activity and add few new rows to some tables. I havent seen that this activity affects to this test result. 3. *Then* start to profile individual queries and look into their plans. Change the queries one at a time and monitor again. How to change pattern matching query to faster ? Andrus. Btw. I tried to reproduce this big difference in test server in 8.3 using sample data script below and got big difference but in opposite direction. explain analyze SELECT sum(1) FROM orders JOIN orders_products USING (order_id) JOIN products USING (product_id) WHERE orders.order_date>'2006-01-01' and ... different where clauses produce different results: AND orders_products.product_id = '3370' -- 880 .. 926 ms AND orders_products.product_id like '3370%' -- 41 ..98 ms So patter index is 10 .. 20 times (!) faster always. No idea why. Test data creation script: begin; CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS $_$ SELECT 350; $_$ LANGUAGE SQL; CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL); CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name char(70) NOT NULL, quantity numeric(12,2) default 1); CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id CHAR(20), id serial, price numeric(12,2) default 1 ); INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20), 'product number ' || n::TEXT FROM generate_series(0,13410) AS n; INSERT INTO orders SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval) FROM generate_series(0, Counter()/3 ) AS n; SET work_mem TO 2097151; INSERT INTO orders_products SELECT generate_series/3 as order_id, ( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS product_id FROM generate_series(1, Counter()); ALTER TABLE orders ADD PRIMARY KEY (order_id); ALTER TABLE products ADD PRIMARY KEY (product_id); ALTER TABLE orders_products ADD PRIMARY KEY (id); ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES products(product_id); ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE; CREATE INDEX orders_date ON orders( order_date ); CREATE INDEX order_product_pattern_idx ON orders_products( product_id bpchar_pattern_ops ); COMMIT; SET work_mem TO DEFAULT; ANALYZE; -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit clause produces wrong query plan
Scott, And how exactly should it be optimized? If a query is even moderately interesting, with a few joins and a where clause, postgresql HAS to create the rows that come before your offset in order to assure that it's giving you the right rows. SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100 It should scan primary key in index order for 200 first keys and skipping first 100 keys. SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100 That should be plenty fast. The example which I posted shows that SELECT ... FROM bigtable ORDER BY intprimarykey LIMIT 100 this is extremely *slow*: seq scan is performed over whole bigtable. A standard workaround is to use some kind of sequential, or nearly so, id field, and then use between on that field. select * from table where idfield between x and x+100; Users can delete and insert any rows in table. This appoarch requires updating x in every row in big table after each insert, delete or order column change and is thus extremely slow. So I do'nt understand how this can be used for large tables. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Tomas, Let's suppose you set a reasonable value (say 8096) instead of 2GB. That gives about 160MB. Anyway this depends - if you have a lot of slow queries caused by on-disk sorts / hash tables, use a higher value. Otherwise leave it as it is. Probably product orders table is frequently joined which product table. currently there was work_memory = 512 in conf file. I changed it to work_memory = 8096 If it is all cached in memory, you may want to ensure that your shared_buffers is a reasonalbe size so that there is less shuffling of data from the kernel to postgres and back. Generally, shared_buffers works best between 5% and 25% of system memory. currently shared_buffers = 15000 That's 120MB, i.e. about 6% of the memory. Might be a little bit higher, but seems reasonable. I changed it to 2 Given the fact that the performance issues are caused by bloated tables and / or slow I/O subsystem, moving to a similar system won't help I guess. I have ran VACUUM FULL ANALYZE VERBOSE and set MAX_FSM_PAGES = 15 So there is no any bloat except pg_shdepend indexes which should not affect to query speed. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
2 shows only cpu activity when queries are running: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa ... 0 0232 123692 0 189104400 0 0 25217 0 0 100 0 0 0232 123692 0 189104400 0 0 25217 0 0 100 0 0 0232 123684 0 189104400 0 162 25422 0 0 100 0 0 0232 123684 0 189104400 0 0 25218 0 0 100 0 1 0232 123056 0 189144400 013 25421 62 5 34 0 < start of slower query 1 0232 102968 0 191106000 016 25218 26 75 0 0 1 0232 77424 0 193699600 0 0 25218 37 63 0 0 1 0232 71464 0 194192800 073 26034 38 62 0 0 0 0232 123420 0 189104400 032 25731 8 15 77 0 < end of slower query 0 0232 123420 0 189104400 025 25524 0 0 100 0 0 0232 123420 0 189104400 028 25527 0 0 100 0 Is it safe to set work_mem = 2097151 in postgresql.conf file ? First, demonstrate that it is all or mostly in memory -- use iostat or other tools to ensure that there is not much disk activity during the query. If your system doesn't have iostat installed, it should be installed. It is a very useful tool. # iostat bash: iostat: command not found # locate iostat /usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt I have few experience in Linux. No idea how to install or compile iostat in this system. If it is all cached in memory, you may want to ensure that your shared_buffers is a reasonalbe size so that there is less shuffling of data from the kernel to postgres and back. Generally, shared_buffers works best between 5% and 25% of system memory. currently shared_buffers = 15000 If it is completely CPU bound then the work done for the query has to be reduced by altering the plan to a more optimal one or making the work it has to do at each step easier. Most of the ideas in this thread revolve around those things. When running on loaded server even after VACUUM FULL, response time for original work_mem is longer probably because it must fetch blocks from disk. Based on the time it took to do the vacuum, I suspect your disk subsystem is a bit slow. If it can be determined that there is much disk I/O in your use cases, there are generally several things that can be done to tune Linux I/O. The main ones in my experience are the 'readahead' value for each disk which helps sequential reads significantly, and trying out the linux 'deadline' scheduler and comparing it to the more commonly used 'cfq' scheduler. If the system is configured with the anticipatory scheduler, absolutely switch to cfq or deadline as the anticipatory scheduler will perform horribly poorly for a database. This is 3 year old cheap server. No idea what to config. There is also other similar server which as 1.2 GB more usable memory. No idea is it worth to switch into it. After some years sales data will still exceed this more memory. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
I guess you have backups - take them, restore the database on a different machine (preferably with the same / similar hw config) and tune the queries on it. After restoring all the tables / indexes will be 'clean' (not bloated), so you'll see if performing VACUUM FULL / CLUSTER is the right solution or if you have to change the application internals. Sure, the times will be slightly different but the performance problems should remain the same. VACUUM FULL has My test computer has PostgreSql 8.3, 4 GB RAM, SSD disks, Intel X2Extreme CPU So it is much faster than this prod server. No idea how to emulate this environment. I can create new db in prod server as old copy but this can be used in late night only. Where to find script which clones some database in server? Something like CREATE DATABASE newdb AS SELECT * FROM olddb; It would be more convenient to run db cloning script from pgadmin command window. Only way I found is to use SSH with pg_dup/pg_restore. This requires SSH access to server and SSH port opening to public internet. Or probably try to run CLUSTER command in prod server. Hopefully clustering by product id cannot make things slow too much. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] limit clause produces wrong query plan
Adding limit clause causes very slow query: explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr limit 100 "Limit (cost=0.00..4371.71 rows=100 width=1107) (actual time=33189.971..33189.971 rows=0 loops=1)" " -> Index Scan using dok_dokumnr_idx on dok (cost=0.00..278740.01 rows=6376 width=1107) (actual time=33189.959..33189.959 rows=0 loops=1)" "Filter: (doktyyp = 'J'::bpchar)" "Total runtime: 33190.103 ms" Without limit is is fast: explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr "Sort (cost=7061.80..7077.74 rows=6376 width=1107) (actual time=0.119..0.119 rows=0 loops=1)" " Sort Key: dokumnr" " -> Index Scan using dok_doktyyp on dok (cost=0.00..3118.46 rows=6376 width=1107) (actual time=0.101..0.101 rows=0 loops=1)" "Index Cond: (doktyyp = 'J'::bpchar)" "Total runtime: 0.245 ms" How to fix this without dropping dok_doktyyp index so that limit can safely used for paged data access ? indexes: dok_doktyyp: dok(doktyyp) dok_dokumnr_idx: dok(dokumnr) types: dokumnr int primary key doktyyp char(1) Andrus. Using 8.1.4 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
You could try writing a plpgsql function which would generate the data set. Or you could use your existing data set. Creating 3.5 mln rows using stored proc is probably slow. Probably it would be better and faster to use some random() and generate_series() trick. In this case others can try it and dataset generation is faster. By the way, a simple way to de-bloat your big table without blocking would be this : - stop all inserts and updates - begin - create table new like old table - insert into new select * from old (order by perhaps) - create indexes - rename new into old - commit If this is just a reporting database where you insert a batch of new data every day, for instance, that's very easy to do. If it's OLTP, then, no. Those are orders and order_products tables. I ran vacuum full analyze verbose last night. Now database has 4832 MB size, including 1 GB pg_shdepend bloated indexes. I added max_fsm_pages=15 and re-booted. Query below and other queries are still too slow set search_path to firma2,public; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=181795.13..181795.14 rows=1 width=0) (actual time=23678.265..23678.268 rows=1 loops=1)" " -> Nested Loop (cost=73999.44..181733.74 rows=24555 width=0) (actual time=18459.230..23598.956 rows=21476 loops=1)" "-> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.134..0.145 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" "-> Hash Join (cost=73999.44..181482.18 rows=24555 width=24) (actual time=18459.076..23441.098 rows=21476 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4082.88..101779.03 rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)" "Recheck Cond: (toode = 'X05'::bpchar)" "-> Bitmap Index Scan on rid_toode_idx (cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634 rows=278183 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=69195.13..69195.13 rows=112573 width=4) (actual time=8894.465..8894.465 rows=109890 loops=1)" "-> Bitmap Heap Scan on dok (cost=1492.00..69195.13 rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177 rows=110484 loops=1)" "Index Cond: (kuupaev >= '2008-09-01'::date)" "Total runtime: 23678.790 ms" Here is a list of untried recommendations from this thread: 1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx - In 8.1.4 provided form of CLUSTER causes syntax error, no idea what syntax to use. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 2. Change CHAR(20) product index to int index by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 3. Denormalization of sale date to order_producs table by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 4. Check on the performance of the RAID: Does it leverage NCQ appropriately when running queries in parallel ? No idea how. 5. Materialized views. I need date granularity so it is possible to sum only one days sales. http://www.pgcon.org/2008/schedule/events/69.en.html Seems to be major appl re-write, no idea how. Appoaches which probably does not change perfomance: 6. Upgrade to 8.4 or to 8.3.5 7. run server on standalone mode and recover 1 GB pg_shdepend bloated index. 8. tune some conf file parameters: work_mem = 512 I'd consider increasing this value a little - 0.5 MB seems too low to me (but not necessarily). effective_cache_size= 7 Well, your server has 2GB of RAM and usually it's recommended to set this value to about 60-70% of your RAM, so using 540MB (25%) seems quite low. Data size is nearly the same as RAM size. It is unpleasant surprise that queries take so long time. What should I do next? Andrus. 140926 firma2.rid1737 MB 240595 firma2.dok1632 MB 3 1214 pg_catalog.pg_shdepend
Re: [PERFORM] seq scan over 3.3 million rows instead of single keyindex access
An index-scan makes only sense if rid contains considerable more than 300 rows. I'm sorry, I meant using index to get the row. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] seq scan over 3.3 million rows instead of single key index access
Gregory, I would suggest running ANALYZE idtellUued at some point before the problematic query. Thank you. After adding analyze all is OK. Is analyze command required in 8.3 also ? Or is it better better to specify some hint at create temp table time since I know the number of rows before running query ? Andrus. set search_path to firma2,public; CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP; INSERT INTO idtellUued VALUES(1249228); analyze idtelluued; explain analyze select 1 from dok JOIN rid USING(dokumnr) JOIN idtellUued USING(dokumnr) "Nested Loop (cost=0.00..275.18 rows=3 width=0) (actual time=87.266..87.388 rows=8 loops=1)" " -> Nested Loop (cost=0.00..6.95 rows=1 width=8) (actual time=36.613..36.636 rows=1 loops=1)" "-> Seq Scan on idtelluued (cost=0.00..1.01 rows=1 width=4) (actual time=0.009..0.015 rows=1 loops=1)" "-> Index Scan using dok_dokumnr_idx on dok (cost=0.00..5.93 rows=1 width=4) (actual time=36.585..36.590 rows=1 loops=1)" " Index Cond: (dok.dokumnr = "outer".dokumnr)" " -> Index Scan using rid_dokumnr_idx on rid (cost=0.00..267.23 rows=80 width=4) (actual time=50.635..50.672 rows=8 loops=1)" "Index Cond: ("outer".dokumnr = rid.dokumnr)" "Total runtime: 87.586 ms" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Increasing pattern index query speed
Both queries return same result (19) and return same data. Pattern query is a much slower (93 sec) than equality check (13 sec). How to fix this ? Using 8.1.4, utf-8 encoding, et-EE locale. Andrus. SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode = '9910' AND dok.kuupaev BETWEEN '2008-11-21' AND '2008-11-21' AND dok.yksus LIKE 'ORISSAARE%' "Aggregate (cost=43.09..43.10 rows=1 width=0) (actual time=12674.675..12674.679 rows=1 loops=1)" " -> Nested Loop (cost=29.57..43.08 rows=1 width=0) (actual time=2002.045..12673.645 rows=19 loops=1)" "-> Nested Loop (cost=29.57..37.06 rows=1 width=24) (actual time=2001.922..12672.344 rows=19 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..3.47 rows=1 width=4) (actual time=342.812..9810.627 rows=319 loops=1)" "Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" "Filter: (yksus ~~ 'ORISSAARE%'::text)" " -> Bitmap Heap Scan on rid (cost=29.57..33.58 rows=1 width=28) (actual time=8.948..8.949 rows=0 loops=319)" "Recheck Cond: (("outer".dokumnr = rid.dokumnr) AND (rid.toode = '9910'::bpchar))" "-> BitmapAnd (cost=29.57..29.57 rows=1 width=0) (actual time=8.930..8.930 rows=0 loops=319)" " -> Bitmap Index Scan on rid_dokumnr_idx (cost=0.00..2.52 rows=149 width=0) (actual time=0.273..0.273 rows=2 loops=319)" "Index Cond: ("outer".dokumnr = rid.dokumnr)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..26.79 rows=1941 width=0) (actual time=8.596..8.596 rows=15236 loops=319)" "Index Cond: (toode = '9910'::bpchar)" "-> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.043..0.048 rows=1 loops=19)" " Index Cond: ('9910'::bpchar = toode)" "Total runtime: 12675.191 ms" explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode like '9910%' AND dok.kuupaev BETWEEN '2008-11-21' AND '2008-11-21' AND dok.yksus LIKE 'ORISSAARE%' "Aggregate (cost=15.52..15.53 rows=1 width=0) (actual time=92966.501..92966.505 rows=1 loops=1)" " -> Nested Loop (cost=0.00..15.52 rows=1 width=0) (actual time=24082.032..92966.366 rows=19 loops=1)" "-> Nested Loop (cost=0.00..9.50 rows=1 width=24) (actual time=24081.919..92965.116 rows=19 loops=1)" " Join Filter: ("outer".dokumnr = "inner".dokumnr)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..3.47 rows=1 width=4) (actual time=0.203..13924.324 rows=319 loops=1)" "Index Cond: ((kuupaev >= '2008-11-21'::date) AND (kuupaev <= '2008-11-21'::date))" "Filter: (yksus ~~ 'ORISSAARE%'::text)" " -> Index Scan using rid_toode_pattern_idx on rid (cost=0.00..6.01 rows=1 width=28) (actual time=0.592..166.778 rows=15235 loops=319)" "Index Cond: ((toode ~>=~ '9910'::bpchar) AND (toode ~<~ '9911'::bpchar))" "Filter: (toode ~~ '9910%'::text)" "-> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.041..0.046 rows=1 loops=19)" " Index Cond: ("outer".toode = toode.toode)" "Total runtime: 92967.512 ms" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] seq scan over 3.3 million rows instead of single key index access
There are indexes on rid(dokumnr) and dok(dokumnr) and dokumnr is int. Instead of using single key index, 8.1.4 scans over whole rid table. Sometimes idtelluued can contain more than single row so replacing join with equality is not possible. How to fix ? Andrus. CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP; INSERT INTO idtellUued VALUES(1249228); explain analyze select 1 from dok JOIN rid USING(dokumnr) JOIN idtellUued USING(dokumnr) "Hash Join (cost=7483.22..59.77 rows=5706 width=0) (actual time=14905.981..27065.903 rows=8 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Seq Scan on rid (cost=0.00..198240.33 rows=3295833 width=4) (actual time=0.036..15021.641 rows=3280576 loops=1)" " -> Hash (cost=7477.87..7477.87 rows=2140 width=8) (actual time=0.114..0.114 rows=1 loops=1)" "-> Nested Loop (cost=0.00..7477.87 rows=2140 width=8) (actual time=0.076..0.099 rows=1 loops=1)" " -> Seq Scan on idtelluued (cost=0.00..31.40 rows=2140 width=4) (actual time=0.006..0.011 rows=1 loops=1)" " -> Index Scan using dok_dokumnr_idx on dok (cost=0.00..3.47 rows=1 width=4) (actual time=0.051..0.058 rows=1 loops=1)" "Index Cond: (dok.dokumnr = "outer".dokumnr)" "Total runtime: 27066.080 ms" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
You could perhaps run a little check on the performance of the RAID, is it better than linux software RAID ? Does it leverage NCQ appropriately when running queries in parallel ? I was told that this RAID is software RAID. I have no experience what to check. This HP server was installed 3 years ago and in this time it was not high perfomance server. explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik using(grupp,liik) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' By the way, note that the presence of the toode table in the query above is not required at all, unless you use columns of toode in your aggregates. In real query, SELECT column list contains data form sales table dok (sale date and time) and sales detail table rid (quantity, price) WHERE clause may contain additional filters from product table (product category, supplier). Let's play with that, after all, it's friday night. Thank you very much for great sample. I tried to create testcase from this to match production db: 1.2 million orders 3.5 million order details 13400 products with char(20) as primary keys containing ean-13 codes mostly 3 last year data every order has usually 1..3 detail lines same product can appear multiple times in order products are queried by start of code This sample does not distribute products randomly between orders. How to change this so that every order contains 3 (or 1..6 ) random products? I tried to use random row sample from http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i but in this case constant product is returned always. It seems than query containing randon() is executed only once. Andrus. begin; CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT NULL); CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name char(70) NOT NULL, quantity numeric(12,2) default 1); CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id CHAR(20),padding1 char(70), id serial, price numeric(12,2) default 1 ); INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20), 'product number ' || n::TEXT FROM generate_series(0,13410) AS n; INSERT INTO orders SELECT n,'2005-01-01'::date + (4000.0 * n/350.0 * '1 DAY'::interval) FROM generate_series(0,350/3) AS n; SET work_mem TO 2097151; -- 1048576; INSERT INTO orders_products SELECT generate_series/3 as order_id, ( ((generate_series/350.0)*13410.0)::int*power( 10,13))::INT8::CHAR(20) FROM generate_series(1,350) where generate_series/3>0; ALTER TABLE orders ADD PRIMARY KEY (order_id); ALTER TABLE products ADD PRIMARY KEY (product_id); ALTER TABLE orders_products ADD PRIMARY KEY (id); ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES products(product_id); ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE; CREATE INDEX orders_date ON orders( order_date ); CREATE INDEX order_product_pattern_idx ON orders_products( product_id bpchar_pattern_ops ); COMMIT; SET work_mem TO DEFAULT; ANALYZE; SELECT sum(quantity*price) FROM orders JOIN orders_products USING (order_id) JOIN products USING (product_id) WHERE orders.order_date>='2008-01-17' and orders_products.product_id like '130%' -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
If it's not a million rows, then the table is bloated. Try (as postgres or some other db superuser) "vacuum full pg_shdepend" and a "reindex pg_shdepend". reindex table pg_shdepend causes error ERROR: shared table "pg_shdepend" can only be reindexed in stand-alone mode vacuum full verbose pg_shdepend seems to work but indexes are still bloated. How to remove index bloat ? sizes after vacuum full are below. pg_shdepend size 1234 MB includes its index sizes, so indexes are 100% bloated. 4 1214 pg_catalog.pg_shdepend1234 MB 6 1232 pg_catalog.pg_shdepend_depender_index 795 MB 7 1233 pg_catalog.pg_shdepend_reference_index439 MB Andrus. vacuum full verbose pg_shdepend; INFO: vacuuming "pg_catalog.pg_shdepend" INFO: "pg_shdepend": found 254 removable, 3625 nonremovable row versions in 131517 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 49 to 49 bytes long. There were 16115259 unused item pointers. Total free space (including removable row versions) is 1010091872 bytes. 131456 pages are or will become empty, including 8 at the end of the table. 131509 pages containing 1010029072 free bytes are potential move destinations. CPU 2.08s/0.92u sec elapsed 63.51 sec. INFO: index "pg_shdepend_depender_index" now contains 3625 row versions in 101794 pages DETAIL: 254 index row versions were removed. 101611 index pages have been deleted, 2 are currently reusable. CPU 0.87s/0.28u sec elapsed 25.44 sec. INFO: index "pg_shdepend_reference_index" now contains 3625 row versions in 56139 pages DETAIL: 254 index row versions were removed. 56076 index pages have been deleted, 2 are currently reusable. CPU 0.51s/0.15u sec elapsed 23.10 sec. INFO: "pg_shdepend": moved 1518 row versions, truncated 131517 to 25 pages DETAIL: CPU 5.26s/2.39u sec elapsed 89.93 sec. INFO: index "pg_shdepend_depender_index" now contains 3625 row versions in 101794 pages DETAIL: 1518 index row versions were removed. 101609 index pages have been deleted, 2 are currently reusable. CPU 0.94s/0.28u sec elapsed 24.61 sec. INFO: index "pg_shdepend_reference_index" now contains 3625 row versions in 56139 pages DETAIL: 1518 index row versions were removed. 56088 index pages have been deleted, 2 are currently reusable. CPU 0.54s/0.14u sec elapsed 21.11 sec. Query returned successfully with no result in 253356 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Alvaro, 1. vacuum_cost_delay does not affect vacuum full 2. vacuum full is always blocking, regardless of settings So only way is to disable other database acces if vacuum full is required. So I gather you're not doing any vacuuming, eh? Log files for every day are full of garbage messages below. So I hope that vacuum is running well, isn't it ? Andrus. 2008-11-19 00:00:48 EET11728 1 LOG: autovacuum: processing database "postgres" 2008-11-19 00:01:48 EET11729 1 LOG: autovacuum: processing database "mydb1" 2008-11-19 00:02:48 EET11730 1 LOG: autovacuum: processing database "emydb1" 2008-11-19 00:03:48 EET11731 1 LOG: autovacuum: processing database "template1" 2008-11-19 00:04:48 EET11732 1 LOG: autovacuum: processing database "testmydb1" 2008-11-19 00:05:48 EET11733 1 LOG: autovacuum: processing database "mydb3" 2008-11-19 00:06:48 EET11734 1 LOG: autovacuum: processing database "postgres" 2008-11-19 00:07:48 EET11735 1 LOG: autovacuum: processing database "mydb1" 2008-11-19 00:08:48 EET11736 1 LOG: autovacuum: processing database "emydb1" 2008-11-19 00:09:48 EET11737 1 LOG: autovacuum: processing database "template1" 2008-11-19 00:10:48 EET11750 1 LOG: autovacuum: processing database "testmydb1" 2008-11-19 00:11:48 EET11751 1 LOG: autovacuum: processing database "mydb3" 2008-11-19 00:12:48 EET11752 1 LOG: autovacuum: processing database "postgres" 2008-11-19 00:13:48 EET11753 1 LOG: autovacuum: processing database "mydb1" 2008-11-19 00:14:48 EET11754 1 LOG: autovacuum: processing database "emydb1" 2008-11-19 00:15:48 EET11755 1 LOG: autovacuum: processing database "template1" 2008-11-19 00:16:48 EET11756 1 LOG: autovacuum: processing database "testmydb1" 2008-11-19 00:17:48 EET11757 1 LOG: autovacuum: processing database "mydb3" ... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Alvaro, Are you really using vacuum_cost_delay=2000? If so, therein lies your problem. That's a silly value to use for that variable. Useful values are in the 20-40 range probably, or maybe 10-100 being extremely generous. Thank you. My 8.1.4 postgresql.conf does not contain such option. So vacuum_cost_delay is off probably. Since doc does not recommend any value, I planned to use 2000 Will value of 30 allow other clients to work when VACUUM FULL is running ? Uncommented relevant values in postgresql.conf file are: shared_buffers = 15000 work_mem = 512 maintenance_work_mem = 131072 fsync = on effective_cache_size= 7 log_min_duration_statement= 30000 Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
How to vacuum full pg_shdepend automatically so that other users can work at same time ? Your table is horribly bloated. You must use VACUUM FULL + REINDEX (as superuser) on it, however unfortunately, it is blocking. Therefore, you should wait for sunday night to do this, when noone will notice. Shops are closed late night for a short time, including sunday night. This time may be shorter than time required to complete VACUUM command. I discovered vacuum_cost_delay=2000 option. Will this remove blocking issue and allow vacuum full to work ? Meanwhile, you can always VACUUM it (as superuser) and REINDEX it. I expect that autovacuum does this automatically. And while you're at it, VACUUM FULL + reindex the entire database. To avoid such annoyances in the future, you should ensure that autovacuum runs properly ; you should investigate this. If you use a cron'ed VACUUM that does not run as superuser, then it will not be able to VACUUM the system catalogs, and the problem will come back. autovacuum is turned on in postgresql.conf file log file shows a lot of messages every day that database is vacuumed. I assume that it is running as user postgres. I do'nt understand how autovacuum can avoid this: it does not perform vacuum full so pg_shdepend ja my tables become bloated again and again. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Thomas, Thank you. Just the most important points: 1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB pages this means 1.4GB MB of data), but there are 1834279 unused item pointers (i.e. about 60% of the space is wasted) 2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means about 1.3GB of data), but there are 1878923 unused item pointers (i.e. about 30% of the space is wasted) 3) don't forget to execute analyze after vacuuming (or vacuum analyze) autovacuum is running. So if I understand properly, I must ran VACUUM FULL ANALYZE dok; VACUUM FULL ANALYZE rid; Those commands cause server probably to stop responding to other client like vacuum full pg_shdepend did. Should vacuum_cost_delay = 2000 allow other users to work when running those commands ? 4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB for "doc" table) - the difference seems too large for me. I used pg_total_relation_size(). So 2.3 GB includes indexes also: 844286 dok_tasudok_idx245 MB 1044283 dok_klient_idx 142 MB 1844288 dok_tasumata_idx 91 MB 1944289 dok_tellimus_idx 89 MB 2044284dok_krdokumnr_idx 89 MB 2144285 dok_kuupaev_idx84 MB 2243531 makse_pkey 77 MB 2343479 dok_pkey 74 MB 2444282 dok_dokumnr_idx74 MB 26 18663923 dok_yksus_pattern_idx 43 MB 27 18801591 dok_sihtyksus_pattern_idx 42 MB Anyway the amount of wasted rows seems significant to me - I'd try to solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will lock the table exclusively, but the results may be better (when sorting by a well chosen index). Don't forget to run ANALYZE afterwards. How to invoke those commands so that other clients can continue work? I'm using 8.1.4. Log files show that autovacuum is running. I'm planning the following solution: 1. Set vacuum_cost_delay=2000 2. Run the following commands periodically in this order: VACUUM FULL; vacuum full pg_shdepend; CLUSTER rid on (toode); CLUSTER dok on (kuupaev); REINDEX DATABASE mydb; REINDEX SYSTEM mydb; ANALYZE; Are all those command required or can something leaved out ? Several other things to consider: 1) Regarding the toode column - why are you using CHAR(20) when the values are actually shorter? This may significantly increase the amount of space required. There may be some products whose codes may be up to 20 characters. PostgreSQL does not hold trailing spaces in db, so this does *not* affect to space. 2) I've noticed the CPU used is Celeron, which may negatively affect the speed of hash computation. I'd try to replace it by something faster - say INTEGER as an artificial primary key of the "toode" table and using it as a FK in other tables. This might improve the "Bitmap Heap Scan on rid" part, but yes - it's just a minor improvement compared to the "Hash Join" part of the query. Natural key Toode CHAR(20) is used widely in different queries. Replacing it with INT surrogate key requires major application rewrite. Should I add surrogate index INT columns to toode and rid table and measure test query speed in this case? Materialized views seem like a good idea to me, but maybe I'm not seeing something. What do you mean by "reports are different"? If there is a lot of rows for a given product / day, then creating an aggregated table with (product code / day) as a primary key is quite simple. It may require a lot of disk space, but it'll remove the hash join overhead. But if the queries are very different, then it may be difficult to build such materialized view(s). log file seems that mostly only those queries are slow: SELECT ... FROM dok JOIN rid USING (dokumnr) JOIN ProductId USING (ProductId) WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate>=:p2 :p1 and :p2 are parameters different for different queries. dok contains several years of data. :p2 is usually only few previous months or last year ago. SELECT column list contains fixed list of known columns from all tables. How to create index or materialized view to optimize this types of queries ? Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Richard, Thank you. Try "SELECT count(*) FROM pg_shdepend". This query returns 3625 and takes 35 seconds to run. If it's not a million rows, then the table is bloated. Try (as postgres or some other db superuser) "vacuum full pg_shdepend" and a "reindex pg_shdepend". vacuum full verbose pg_shdepend INFO: vacuuming "pg_catalog.pg_shdepend" INFO: "pg_shdepend": found 16103561 removable, 3629 nonremovable row versions in 131425 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 49 to 49 bytes long. There were 0 unused item pointers. Total free space (including removable row versions) is 1009387632 bytes. 131363 pages are or will become empty, including 0 at the end of the table. 131425 pages containing 1009387632 free bytes are potential move destinations. CPU 2.12s/1.69u sec elapsed 52.66 sec. INFO: index "pg_shdepend_depender_index" now contains 3629 row versions in 101794 pages DETAIL: 16103561 index row versions were removed. 101311 index pages have been deleted, 2 are currently reusable. CPU 20.12s/14.52u sec elapsed 220.66 sec. After 400 seconds of run I got phone calls that server does not respond to other clients. So I was forced to cancel "vacuum full verbose pg_shdepend " command. How to run it so that other users can use database at same time ? If it is a million rows, you'll need to find out why. Do you have a lot of temporary tables that aren't being dropped or something similar? Application creates temporary tables in many places. Every sales operation probably creates some temporary tables. Should I change something in configuration or change application (Only single POS application which is used to access this db) or is only solution to manully run vacuum full pg_shdepend reindex pg_shdepend periodically ? How to vacuum full pg_shdepend automatically so that other users can work at same time ? Hopefully this table size does not affect to query speed. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
38331 index row versions were removed. 2290 index pages have been deleted, 2290 are currently reusable. CPU 1.39s/1.58u sec elapsed 333.82 sec. INFO: index "rid_rtellimus_idx" now contains 3275230 row versions in 7390 pages DETAIL: 18591 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.18s/0.66u sec elapsed 1.78 sec. INFO: index "rid_toode_pattern_idx" now contains 3275230 row versions in 16310 pages DETAIL: 17800 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.44s/1.04u sec elapsed 6.55 sec. INFO: "rid": removed 38331 row versions in 3090 pages DETAIL: CPU 0.20s/0.10u sec elapsed 5.49 sec. INFO: "rid": found 38331 removable, 3275189 nonremovable row versions in 165282 pages DETAIL: 0 dead row versions cannot be removed yet. There were 1878923 unused item pointers. 0 pages are entirely empty. CPU 5.06s/7.27u sec elapsed 607.59 sec. Query returned successfully with no result in 1058319 ms. - what's the size of the dataset relative to the RAM ? Db size is 7417 MB relevant table sizes in desc by size order: 140595 dok 2345 MB 2 1214 pg_shdepend 2259 MB 340926 rid 2057 MB 6 1232 pg_shdepend_depender_index 795 MB 7 1233 pg_shdepend_reference_index 438 MB 844286 dok_tasudok_idx 245 MB 944299 rid_toode_idx 243 MB 1044283 dok_klient_idx 142 MB 11 19103791 rid_toode_pattern_idx 127 MB 1444298 rid_inpdokumnr_idx 118 MB 1544297 rid_dokumnr_idx 110 MB 1643573 rid_pkey109 MB 1844288 dok_tasumata_idx91 MB 1944289 dok_tellimus_idx89 MB 2044284 dok_krdokumnr_idx 89 MB 2144285 dok_kuupaev_idx 84 MB 2343479 dok_pkey74 MB 2444282 dok_dokumnr_idx 74 MB 25 19076304 rid_rtellimus_idx 58 MB 26 18663923 dok_yksus_pattern_idx 43 MB 27 18801591 dok_sihtyksus_pattern_idx 42 MB 29 18774881 dok_doktyyp 30 MB 4640967 toode 13 MB server is HP Proliant DL320 G3 http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML CPU is 2.93Ghz Celeron 256kb cache. Server has 2 GB RAM. It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ mirrored disks. Now let's look more closely at the query : explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik using(grupp,liik) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' I presume doing the query without artliik changes nothing to the runtime, yes ? Yes. After removing artkliik from join I got response times 65 and 50 seconds, so this does not make difference. Your problem here is that, no matter what, postgres will have to examine - all rows where dok.kuupaev>='2008-09-01', - and all rows where rid.toode = 'X05'. If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have to scan many, many more rows. Probably yes, since then it reads one year more sales data. If you perform this query often you could CLUSTER rid on (toode) and dok on (kuupaev), but this can screw other queries. Some reports are by sales date (dok.kuupaev) and customers. CLUSTER rid on (toode) slows them down. Also autovacuum cannot do clustering. What is the meaning of the columns ? This is typical sales data: -- Receipt headers: DOK ( dokumnr INT SERIAL PRIMARY KEY, kuupaev DATE --- sales date ) -- Receipt details RID ( dokumnr INT, toode CHAR(20), -- item code CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr) REFERENCES dok (dokumnr), CONSTRAINT rid_toode_fkey FOREIGN KEY (toode) REFERENCES firma2.toode (toode) ) -- Products TOODE ( toode CHAR(20) PRIMARY KEY ) To make this type of query faster I would tend to think about : - denormalization (ie adding a column in one of your tables and a multicolumn index) For this query it is possible to duplicate kuupaev column to rid table. However most of the this seems to go to scanning rid table, so I suspect that this will help. - materialized views - materialized summary tables (ie. summary of sales for last month, for instance) There are about 1000 items and reports are different. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Richard, In addition to "top" below, you'll probably find "vmstat 5" useful. Thank you. During this query run (65 sec), vmstat 5 shows big values in bi,cs and wa columns: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 1 88 51444 0 18544040017 6 1513 5 1 83 10 0 1 88 52140 0 185430400 362695 562 784 15 38 0 47 0 1 92 51608 0 185566800 14116 103 1382 2294 4 8 0 88 0 1 92 51620 0 185725601 1525831 1210 1975 4 8 0 88 0 2 92 50448 0 185918800 1311819 1227 1982 3 7 0 90 0 1 92 51272 0 185908800 769153 828 1284 14 4 0 82 0 1 92 52472 0 185879200 10691 9 758 968 3 7 0 89 0 2 92 51240 0 185859600 8204 7407 717 1063 2 5 0 93 0 1 92 51648 0 186038800 20622 121 1118 2229 12 9 0 79 2 1 92 50564 0 186139600 20994 3277 969 1681 15 8 0 76 1 0 92 52180 0 186019200 1854236 802 1276 36 12 0 51 0 0 92 91872 0 182094801 1528547 588 774 9 12 32 47 0 0 92 91904 0 182094800 0 4 25118 0 0 100 0 0 0 92 92044 0 182094800 0 0 25017 0 0 100 0 0 0 92 91668 0 1821156002793 27266 5 0 92 3 0 0 92 91668 0 182115600 064 26038 0 0 100 0 0 0 92 91636 0 182115600 0 226 27771 0 0 100 0 0 0 92 91676 0 182115600 026 25523 0 0 100 0 Here you're stuck waiting for disks (91.0% wa). Check out vmstat and iostat to see what's happening. typing iostat returns bash: iostat: command not found It seems that this is not installed in this gentoo. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
00.24 udevd 6677 root 15 0 1728 572 400 S 0.0 0.0 0:04.99 syslog-ng Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join on int takes 8..114 seconds
Just a question, what are you doing with the 20.000 result rows ? Those rows represent monthly sales data of one item. They are used as following: 1. Detailed sales report for month. This report can browsed in screen for montly sales and ordering analysis. 2. Total reports. In those reports, sum( sales), sum(quantity) is used to get total sales in day, week, month, time for item and resulting rows are summed. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Hash join on int takes 8..114 seconds
Query below seems to use indexes everywhere in most optimal way. dokumnr column is of type int Speed of this query varies rapidly: In live db fastest response I have got is 8 seconds. Re-running same query after 10 seconds may take 60 seconds. Re-running it again after 10 seconds may take 114 seconds. Any idea how to speed it up ? Is it possible to optimize it, will upgrading to 8.3.5 help or should I require to add more RAM, disk or CPU speed ? Real query contains column list instead of sum(1) used in test below. Andrus. explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik using(grupp,liik) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' longest response time: "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=114479.933..114479.936 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=100435.523..114403.293 rows=20588 loops=1)" "Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" "-> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=100405.258..114207.387 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=18.312..18.325 rows=1 loops=1)" "Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=100386.921..114037.986 rows=20588 loops=1)" "Hash Cond: ("outer".dokumnr = "inner".dokumnr)" "-> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=9.932..76225.918 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807 rows=280599 loops=1)" "Index Cond: (toode = 'X05'::bpchar)" "-> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=35082.427..35082.427 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331 rows=105202 loops=1)" "Index Cond: (kuupaev >= '2008-09-01'::date)" "-> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=30.220..30.220 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=20.104..29.845 rows=84 loops=1)" "Total runtime: 114480.373 ms" Same query in other runs: "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=62164.496..62164.500 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=46988.005..62088.379 rows=20588 loops=1)" "Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" "-> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=46957.750..61893.613 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=146.530..146.543 rows=1 loops=1)" "Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=46811.194..61595.560 rows=20588 loops=1)" "Hash Cond: ("outer".dokumnr = "inner".dokumnr)" "-> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=1870.209..55864.237 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=1863.713..1863.713 rows=280599 loops=1)" "Index Cond: (toode = 'X05'::bpchar)" "-> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=1650.823..1650.823 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=0.091..1190.962 rows=105202 loops=1)" "Index Cond: (kuupaev >= '2008-09-01'::date)" "
Re: [PERFORM] Increasing select max(datecol) from bilkaib wheredatecol<=date'2008-11-01' and (cr='00' or db='00') speed
Depesz, do you always have this: "(cr='00' or db='00')"? or do the values (00) change? if they don't change, or *most* of the queries have "(cr='00' or db='00')", than the biggest time difference you will get after creating this index: create index test on bilkaib (kuupaev) where cr='00' or db='00'; I have always cr='00' or db='00' clause. Separate values are never tested. I changed by queries back to old values and created this single index. This seems to be even better that Vladimir suggestion. Thank you very much. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing select max(datecol) from bilkaib where datecol<=date'2008-11-01' and (cr='00' or db='00') speed
Vladimir, I am afraid PostgreSQL is not smart enough to rewrite query with "or" into two separate index scans. There is no way to improve the query significantly without rewriting it. Note: for this case indices on (datecol), (cr) and (db) are not very helpful. Thank you very much. I added you indexes to db and re-write query. Now it runs fast. Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing select max(datecol) from bilkaib where datecol<=date'2008-11-01' and (cr='00' or db='00') speed
Matthew, Thank you. bilkaib table contains GL transactions for every day. 00 records are initial balance records and they appear only in start of year or start of month. They may present or may be not present for some month if initial balance is not calculated yet. If 00 records are present, usuallly there are lot of them for single date for db and cr columns. This query finds initial balance date befeore given date. bilkaib table contains several year transactions so it is large. Alternatively if you create an index on (cr, bilkaib) and one on (db, bilkaib) then you will be able to use other values in the query too. I'm sorry I do'nt understand this. What does the (cr, bilkaib) syntax mean? Should I create two functions indexes and re-write query as Vladimir suggests or is there better appoach ? Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Increasing select max(datecol) from bilkaib where datecol<=date'2008-11-01' and (cr='00' or db='00') speed
There are columns kuupaev date, cr char(10), db char(10) and regular indexes for all those fields. bilkaib table contains large number of rows. The following query takes too much time. How to make it faster ? I think PostgreSql should use multiple indexes as bitmaps to speed it. I can re-write this query in any way or split to multiple statements if this makes it faster. Andrus. explain analyze select max(kuupaev) from bilkaib where kuupaev<=date'2008-11-01' and (cr='00' or db='00') "Result (cost=339.75..339.76 rows=1 width=0) (actual time=52432.256..52432.260 rows=1 loops=1)" " InitPlan" "-> Limit (cost=0.00..339.75 rows=1 width=4) (actual time=52432.232..52432.236 rows=1 loops=1)" " -> Index Scan Backward using bilkaib_kuupaev_idx on bilkaib (cost=0.00..1294464.73 rows=3810 width=4) (actual time=52432.222..52432.222 rows=1 loops=1)" "Index Cond: (kuupaev <= '2008-11-01'::date)" "Filter: ((kuupaev IS NOT NULL) AND ((cr = '00'::bpchar) OR (db = '00'::bpchar)))" "Total runtime: 52432.923 ms" "PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Using index for IS NULL query
Index is not used for is null condition: create index makse_dokumnr_idx on makse(dokumnr); explain select sum( summa) from MAKSE where dokumnr is null "Aggregate (cost=131927.95..131927.96 rows=1 width=10)" " -> Seq Scan on makse (cost=0.00..131927.94 rows=1 width=10)" "Filter: (dokumnr IS NULL)" Table makse contains 120 rows and about 800 rows with dokumnr is null so using index is much faster that seq scan. How to fix ? Andrus. "PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Simple indexed IN query takes 40 seconds
Tom, Using something newer than 8.1 would help. Thank you. If CREATE TEMP TABLE ids ( id int ) ON COMMIT DROP; is created, ids are added to this table and ids table is used in inner join insted of IN clause or IN clause is replaced with ... dokumnr IN ( SELECT id FROM ids ) ... , will this fix the issue in 8.1.4 ? Andrus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Simple indexed IN query takes 40 seconds
I found that simple IN query on indexed tables takes too much time. dok and rid have both indexes on int dokumnr columnr and dokumnr is not null. PostgreSql can use index on dok or event on rid so it can executed fast. How to make this query to run fast ? Andrus. note: list contain a lot of integers, output below is abbreviated in this part. explain analyze select sum(rid.kogus) from dok JOIN rid USING(dokumnr) where dok.dokumnr in (869906,869907,869910,869911,869914,869915,869916,869917,869918,869921,869925,869926,869928,869929,869934,869935,869936,...) "Aggregate (cost=327569.15..327569.16 rows=1 width=9) (actual time=39749.842..39749.846 rows=1 loops=1)" " -> Hash Join (cost=83872.74..327537.74 rows=12563 width=9) (actual time=25221.702..39697.249 rows=11857 loops=1)" "Hash Cond: ("outer".dokumnr = "inner".dokumnr)" "-> Seq Scan on rid (cost=0.00..195342.35 rows=3213135 width=13) (actual time=0.046..26347.959 rows=3243468 loops=1)" "-> Hash (cost=83860.76..83860.76 rows=4792 width=4) (actual time=128.366..128.366 rows=4801 loops=1)" " -> Bitmap Heap Scan on dok (cost=9618.80..83860.76 rows=4792 width=4) (actual time=58.667..108.611 rows=4801 loops=1)" "Recheck Cond: ((dokumnr = 869906) OR (dokumnr = 869907) OR (dokumnr = 869910) OR (dokumnr = 869911) OR (dokumnr = 869914) OR (dokumnr = 869915) OR (dokumnr = 869916) OR (dokumnr = 869917) OR (dokumnr = 869918) OR (dokumnr = 869921) OR (dokumnr = 869925) OR (dokumnr = 869926) OR (dokumnr = 869928) OR (dokumnr = 869929) OR (dokumnr = 869934) OR (dokumnr = 869935) OR (dokumnr = 869936) OR (dokumnr = 869937) OR (dokumnr = 869940) OR (dokumnr = 869941) OR (dokumnr = 869945) OR (dokumnr = 869951) OR (dokumnr = 869964) OR (dokumnr = 869966) OR (dokumnr = 869969) OR (dokumnr = 869974) OR (dokumnr = 869979) OR (dokumnr = 869986) OR (dokumnr = 869992) OR (dokumnr = 869993) OR (dokumnr = 869995) OR (dokumnr = 869997) OR (dokumnr = 870007) OR (dokumnr = 870018) OR (dokumnr = 870021) OR (dokumnr = 870023) OR (dokumnr = 870025) OR (dokumnr = 870033) OR (dokumnr = 870034) OR (dokumnr = 870036) OR (dokumnr = 870038) OR (dokumnr = 870043) OR (dokumnr = 870044) OR (dokumnr = 870046) OR (dokumnr = 870050) OR (dokumnr = 870051) OR (dokumnr = 870053) OR (dokumnr = 870054) OR (dokumnr = 870055) OR (dokumnr = 870064) OR (dokumnr = 870066) OR (dokumnr = 870069) OR (dokumnr = 870077) OR (dokumnr = 870079) OR (dokumnr = 870081) OR (dokumnr = 870084) OR (dokumnr = 870085) OR (dokumnr = 870090) OR (dokumnr = 870096) OR (dokumnr = 870110) OR (dokumnr = 870111) OR (dokumnr = 870117) OR (dokumnr = 870120) OR (dokumnr = 870124) OR (dokumnr = 870130) ... OR (dokumnr = 890907) OR (dokumnr = 890908))" "-> BitmapOr (cost=9618.80..9618.80 rows=4801 width=0) (actual time=58.248..58.248 rows=0 loops=1)" " -> Bitmap Index Scan on dok_dokumnr_idx (cost=0.00..2.00 rows=1 width=0) (actual time=0.052..0.052 rows=3 loops=1)" "Index Cond: (dokumnr = 869906)" " -> Bitmap Index Scan on dok_dokumnr_idx (cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=3 loops=1)" "Index Cond: (dokumnr = 869907)" " -> Bitmap Index Scan on dok_dokumnr_idx (cost=0.00..2.00 rows=1 width=0) (actual time=0.020..0.020 rows=3 loops=1)" "Index Cond: (dokumnr = 869910)" " -> Bitmap Index Scan on dok_dokumnr_idx (cost=0.00..2.00 rows=1 width=0) (actual time=0.010..0.010 rows=3 loops=1)" "Index Cond: (dokumnr = 869911)" " -> Bitmap Index Scan on dok_dokumnr_idx (cost=0.00..2.00 rows=1 width=0) (actual time=0.008..0.008 rows=3 loops=1)" "Index Cond: (dokumnr = 869914)" ... " -> Bitmap Index Scan on dok_dokumnr_idx (cost=0.00..2.00 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)" "Index Cond: (dokumnr = 890908)" "Total runtime: 39771.385 ms" "PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] why group expressions cause query to run forever
> I think the problem is probably that you're sorting two dozen CHAR > columns, and that in many of the rows all these entries are '' forcing > the sort code to compare all two dozen columns (not so)? Yes, most of columns return empty strings. I changed empty strings to null, casted to varchar and simplyfied the statment. However, this select statement runs forever. Any idea how to speed it up ? Andrus. SELECT bilkaib.DB, CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::VARCHAR(10) AS dbobjekt, CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END::VARCHAR(10) AS db2objekt, CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END::VARCHAR(10) AS db3objekt, CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END::VARCHAR(10) AS db4objekt, CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END::VARCHAR(10) AS db5objekt, CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END::VARCHAR(10) AS db6objekt, CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END::VARCHAR(10) AS db7objekt, CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END::VARCHAR(10) AS db8objekt, CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END::VARCHAR(10) AS db9objekt from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr where bilkaib.kuupaev BETWEEN '2006-01-01' AND '2006-12-31' GROUP BY 1,2,3,4,5,6,7,8,9,10 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] why group expressions cause query to run forever
Tom, thank you. > I think the problem is probably that you're sorting two dozen CHAR > columns, and that in many of the rows all these entries are '' forcing > the sort code to compare all two dozen columns (not so)? So the sort > ends up doing lots and lots and lots of CHAR comparisons. Which can > be slow, especially in non-C locales. What's your locale setting? show all returns "lc_collate";"en_US.UTF-8" "lc_ctype";"en_US.UTF-8" "lc_messages";"C" "lc_monetary";"et_EE.utf-8" "lc_numeric";"et_EE.utf-8" "lc_time";"et_EE.utf-8" How to speed up this query ? Is it possible to force the binary comparison for grouping ? Should I concatenate all the char columns into single column ? Andrus. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] why group expressions cause query to run forever
a from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND crkonto.iseloom='A' join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND dbkonto.iseloom='A' where bilkaib.kuupaev BETWEEN '2006-01-01' AND '2006-12-31' GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26 "HashAggregate (cost=22099.33..22099.34 rows=1 width=11) (actual time=4518.820..4518.824 rows=1 loops=1)" " -> Hash Join (cost=41.71..13669.25 rows=124890 width=11) (actual time=4.347..3445.650 rows=167349 loops=1)" "Hash Cond: ("outer".cr = "inner".kontonr)" "-> Hash Join (cost=20.86..11676.02 rows=144696 width=25) (actual time=2.165..2076.951 rows=167349 loops=1)" " Hash Cond: ("outer".db = "inner".kontonr)" " -> Seq Scan on bilkaib (cost=0.00..9369.99 rows=167643 width=39) (actual time=0.012..725.813 rows=167349 loops=1)" "Filter: ((kuupaev >= '2006-01-01'::date) AND (kuupaev <= '2006-12-31'::date))" " -> Hash (cost=20.29..20.29 rows=227 width=14) (actual time=2.112..2.112 rows=227 loops=1)" "-> Seq Scan on konto dbkonto (cost=0.00..20.29 rows=227 width=14) (actual time=0.011..1.126 rows=227 loops=1)" " Filter: (iseloom = 'A'::bpchar)" "-> Hash (cost=20.29..20.29 rows=227 width=14) (actual time=2.149..2.149 rows=227 loops=1)" " -> Seq Scan on konto crkonto (cost=0.00..20.29 rows=227 width=14) (actual time=0.022..1.152 rows=227 loops=1)" "Filter: (iseloom = 'A'::bpchar)" "Total runtime: 4519.063 ms" Postgres 8.1 on Gentoo Linux. Andrus. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Why date index is not used
Tom, > Because it doesn't help --- the system still has to do the sort. It can help a lot in this case. kuupaev is sales date kellaaeg is sales time Postgres can use kuupaev index to fetch first 100 rows plus a number of more rows whose kellaaeg value is equal to kellaaeg in 100 th row. I have 500 sales per day. So it can fetch 600 rows using index on kuupaev column. After that it can sort those 600 rows fast. Currently it sorts blindly all 54000 rows in table. > You'd need a two-column index on both of the ORDER BY columns to avoid > sorting. Thank you. It works. Andrus. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Why date index is not used
> Actually It looks to me like the sorting is the slow part of this query. > Maybe if you did create an index on both kuupaev and kellaaeg it might > make the sorting faster. Thank you. It makes query fast. > Or maybe you could try increasing the server's > work mem. The sort will be much slower if the server can't do the whole > thing in ram. I have W2K server with 0.5 GB RAM there are only 6 connections open ( 6 point of sales) to this server. shared_buffes is 1 I see approx 10 postgres processes in task manager each taking about 30 MB ram Server prefomance is very slow: Windows swap file size is 1 GB For each sale a new row will be inserted to this table. So the file size grows rapidly every day. Changing work_mem by 1 MB increares memory requirment by 10 MB since I may have 10 processes running. Sorting in memory this table requires very large amout of work_mem for each process address space. I think that if I increase work_mem then swap file will became bigger and perfomance will decrease even more. How to increase perfomance ? Andrus. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Why date index is not used
> Btw. have you run ANALYZE on the table recently? I have autovacuum with default statitics settings running so I expect that it is analyzed. > What is the number > of distinct values in the 'kuupaev' column? select count(distinct kuupaev) from makse returns 61 kuupaev is sales date. So this can contain 365 distinct values per year and max 10 year database, total can be 3650 distinct values after 10 years. Andrus ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Why date index is not used
Why Postgres 8.1 does not use makse_kuupaev_idx index in the following query ? How to speed this query up ? explain analyze select * from makse order by kuupaev desc, kellaaeg desc limit 100 "Limit (cost=62907.94..62908.19 rows=100 width=876) (actual time=33699.551..33701.001 rows=100 loops=1)" " -> Sort (cost=62907.94..63040.49 rows=53022 width=876) (actual time=33699.534..33700.129 rows=100 loops=1)" "Sort Key: kuupaev, kellaaeg" "-> Seq Scan on makse (cost=0.00..2717.22 rows=53022 width=876) (actual time=0.020..308.502 rows=53028 loops=1)" "Total runtime: 37857.177 ms" CREATE TABLE makse( kuupaev date, kellaaeg char(6) NOT NULL DEFAULT ''::bpchar, guid char(36) NOT NULL, CONSTRAINT makse_pkey PRIMARY KEY (guid) ) CREATE INDEX makse_kuupaev_idx ON makse USING btree (kuupaev); Andrus. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How to force Postgres to use index on ILIKE
>> SELECT toode, nimetus >> FROM toode >> WHERE toode ILIKE 'x10%' ESCAPE '!' >> ORDER BY UPPER(toode ),nimetus LIMIT 100 >> >> runs 1 minute in first time for small table size. >> >> Toode field type is CHAR(20) > > 1) why are you using CHAR and not VARCHAR or TEXT? CHAR will give you > problems using an index, period. 1. I haven't seen any example where VARCHAR is better that CHAR for indexing 2. I have a lot of existing code. Changing CHAR to VARCHAR requires probably re-writing a lot of code, a huge work. > 2) You can't use an index on ILIKE. I'ts very sad. I expected that lower(toode) index can be used. > You can, however, use an index on > lower(field) if your query is properly phrased and if you've created an > expression index on lower(field). I tried by Postgres does not use index. Why ? create index nimib2 on firma1.klient(lower(nimi) bpchar_pattern_ops); explain analyze select nimi from firma1.klient where lower(nimi) like 'mokter%' "Seq Scan on klient (cost=0.00..9.79 rows=1 width=74) (actual time=0.740..0.761 rows=1 loops=1)" " Filter: (lower((nimi)::text) ~~ 'mokter%'::text)" "Total runtime: 0.877 ms" ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] How to force Postgres to use index on ILIKE
I have UTF-8 Postgres 8.1 database on W2K3 Query SELECT toode, nimetus FROM toode WHERE toode ILIKE 'x10%' ESCAPE '!' ORDER BY UPPER(toode ),nimetus LIMIT 100 runs 1 minute in first time for small table size. Toode field type is CHAR(20) How to create index on toode field so that query can use it ? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query runs 38 seconds for small database!
> The only reason for being so conservative that I'm aware of was that it > was a best guess. Everyone I've talked to cuts the defaults down by at > least a factor of 2, sometimes even more. Can we ask that Tom will change default values to 2 times smaller in 8.1.4 ? > BTW, these parameters are already tweaked from what we started with in > contrib/pg_autovacuum. It would allow a table to grow to 2x larger than > it should be before vacuuming, as opposed to the 40% that the current > settings allow. But even there, is there any real reason you want to > have 40% bloat? To make matters worse, those settings ensure that all > but the smallest databases will suffer runaway bloat unless you bump up recprd> the FSM settings. I created empty table konto and loaded more that 219 records to it during database creation. So it seems that if table grows from zero to more than 219 times larger then it was still not processed. Andrus. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Query runs 38 seconds for small database!
> BTW, you might want to cut all the autovac thresholds in half; that's > what I typically do. I added ANALYZE command to my procedure which creates and loads data to postgres database from other DBMS. This runs only onvce after installing my application. I hope this is sufficient. If default threshold is so conservative values I expect there is some reason for it. Andrus. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Query runs 38 seconds for small database!
> The default autovac thresholds are not very aggressive; this table was > probably not large enough to get selected for analysis. Tom, thank you. Excellent. Andrus. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Query runs 38 seconds for small database!
>> "-> Seq Scan on konto dbkonto (cost=0.00..23.30 >> rows=1 >> width=44) (actual time=0.017..1.390 rows=219 loops=1)" >> " Filter: (iseloom = 'A'::bpchar)" > > Anytime you see a row estimate that far off about a simple single-column > condition, it means your statistics are out-of-date. Than you. I addded ANALYZE command and now query works fast. I see autovacuum: processing database "mydb" messages in log file and I have stats_start_collector = on stats_row_level = on in config file. Why statistics was out-of-date ? Andrus. My postgres.conf file (only uncommented settings are listed): listen_addresses = '*' max_connections = 40 shared_buffers = 1000 log_destination = 'stderr' redirect_stderr = on # Enable capturing of stderr into log log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern. log_rotation_age = 1440 # Automatic rotation of logfiles will log_rotation_size = 10240 # Automatic rotation of logfiles will log_min_error_statement = 'warning' # Values in order of increasing severity: silent_mode = on log_line_prefix = "'%t %u %d %h %p %i %l %x %q'" stats_start_collector = on stats_row_level = on autovacuum = on # enable autovacuum subprocess? lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C'# locale for time formatting ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Query runs 38 seconds for small database!
I have small database. However the following query takes 38 (!) seconds to run. How to speed it up (preferably not changing table structures but possibly creating indexes) ? Andrus. set search_path to public,firma1; explain analyze select bilkaib.summa from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND crkonto.iseloom='A' join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND dbkonto.iseloom='A' left join klient on bilkaib.klient=klient.kood where ( bilkaib.cr LIKE '3'||'%' OR bilkaib.db LIKE '3'||'%' ) AND bilkaib.kuupaev BETWEEN '2006-01-01' AND '2006-03-31' AND ( kuupaev='20060101' OR (cr!='00' and db!='00')) AND ( 3 IN(2,3) or (NOT bilkaib.ratediffer and ( TRIM(bilkaib.masin)='' or bilkaib.masin IS NULL or bilkaib.alusdok not in ('KV', 'DU', 'DJ') or bilkaib.andmik is NULL or bilkaib.alusdok is NULL or substring(andmik from 1 for 9)!='Kursivahe' ))) and (position(bilkaib.laustyyp IN 'x')=0 or bilkaib.laustyyp is null or bilkaib.laustyyp=' ') "Nested Loop Left Join (cost=23.30..1964.10 rows=1 width=10) (actual time=7975.470..38531.724 rows=3151 loops=1)" " -> Nested Loop (cost=23.30..1958.08 rows=1 width=26) (actual time=7975.407..37978.718 rows=3151 loops=1)" "Join Filter: ("inner".cr = "outer".kontonr)" "-> Seq Scan on konto crkonto (cost=0.00..23.30 rows=1 width=44) (actual time=0.135..13.913 rows=219 loops=1)" " Filter: (iseloom = 'A'::bpchar)" "-> Hash Join (cost=23.30..1934.64 rows=11 width=40) (actual time=1.650..155.734 rows=3151 loops=219)" " Hash Cond: ("outer".db = "inner".kontonr)" " -> Index Scan using bilkaib_kuupaev_idx on bilkaib (cost=0.00..1897.10 rows=2826 width=54) (actual time=1.628..111.216 rows=3151 loops=219)" "Index Cond: ((kuupaev >= '2006-01-01'::date) AND (kuupaev <= '2006-03-31'::date))" "Filter: (((cr ~~ '3%'::text) OR (db ~~ '3%'::text)) AND ((kuupaev = '2006-01-01'::date) OR ((cr <> '00'::bpchar) AND (db <> '00'::bpchar))) AND (("position"('x'::text, (laustyyp)::text) = 0) OR (laustyyp IS NULL) OR (laustyyp = ' '::bpc (..)" " -> Hash (cost=23.30..23.30 rows=1 width=44) (actual time=2.278..2.278 rows=219 loops=1)" "-> Seq Scan on konto dbkonto (cost=0.00..23.30 rows=1 width=44) (actual time=0.017..1.390 rows=219 loops=1)" " Filter: (iseloom = 'A'::bpchar)" " -> Index Scan using klient_pkey on klient (cost=0.00..6.01 rows=1 width=52) (actual time=0.138..0.158 rows=1 loops=3151)" "Index Cond: ("outer".klient = klient.kood)" "Total runtime: 38561.745 ms" CREATE TABLE firma1.bilkaib ( id int4 NOT NULL DEFAULT nextval('bilkaib_id_seq'::regclass), kuupaev date NOT NULL, db char(10) NOT NULL, dbobjekt char(10), cr char(10) NOT NULL, crobjekt char(10), summa numeric(14,2) NOT NULL, raha char(3) NOT NULL, masin char(5), klient char(12), alusdok char(2), dokumnr int4 NOT NULL DEFAULT nextval('bilkaib_dokumnr_seq'::regclass), db2objekt char(10), cr2objekt char(10), db3objekt char(10), db4objekt char(10), db5objekt char(10), db6objekt char(10), db7objekt char(10), db8objekt char(10), db9objekt char(10), cr3objekt char(10), cr4objekt char(10), cr5objekt char(10), cr6objekt char(10), cr7objekt char(10), cr8objekt char(10), cr9objekt char(10), exchrate numeric(13,8), doknr char(25), andmik text, laustyyp char(1), ratediffer ebool, adoknr char(25), jarjeknr numeric(7), CONSTRAINT bilkaib_pkey PRIMARY KEY (id), CONSTRAINT bilkaib_alusdok_fkey FOREIGN KEY (alusdok) REFERENCES firma1.alusdok (alusdok) MATCH SIMPLE ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT bilkaib_cr2objekt_fkey FOREIGN KEY (cr2objekt) REFERENCES firma1.yksus2 (yksus) MATCH SIMPLE ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT bilkaib_cr3objekt_fkey FOREIGN KEY (cr3objekt) REFERENCES firma1.yksus3 (yksus) MATCH SIMPLE ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT bilkaib_cr4objekt_fkey FOREIGN KEY (cr4objekt) REFERENCES firma1.yksus4 (yksus) MATCH SIMPLE ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT bilkaib_cr5objekt_fkey FOREIGN KEY (cr5o
Re: [PERFORM] CPU usage goes to 100%, query seems to ran forever
> Something seems to have truncated your EXPLAIN output, but anyway we > can see where the problem is: I copied it from pgAdmin in 640x480 screen resolution in XP Maybe pgAdmin bug ? > The planner is expecting to get one row from "dok" passing the filter > condition, and hence chooses a plan that is suitable for a small number > of rows ... but in reality there are 7670 rows matching the filter > condition, and that's what blows the runtime out of the water. (Most of > the runtime is actually going into 7670 repeated scans of "rid", which > wouldn't have happened with another plan type.) I added index CREATE INDEX rid_toode_idx ON firma2.rid USING btree (toode); and query start working fast ! > So you need to see about getting that estimate to be more accurate. > First thing is to make sure that "dok" has been ANALYZEd --- just do it > by hand. As I wrote I have autovacuum running. Is'nt this sufficient ? > I can see at least three things you are doing that are > unnecessarily destroying the planner's ability to estimate the number of > matching rows: > > dok.laonr='1'::float8 and > Since laonr apparently isn't float8, this forces a runtime type > conversion as well as interfering with statistics use. (The planner > will have ANALYZE stats about dok.laonr, but the connection to > dok.laonr::float8 escapes it.) Just write the constant with quotes > and no type coercion. I re-wrote it as dok.laonr=1 this query is automatically generated by VFP and ODBC parameter substitution which adds those type conversions. VFP has only float8 type and it probably forces ODBC driver convert numbers to float8 > POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND > > This is completely unestimatable given the available statistics, and it > doesn't look to me like it is all that great a semantic representation > either. Perhaps the query that's really meant here is "dok.doktypp IN > ('O','S','I', ...)"? If so, you should say what you mean, not play > games with converting the query into some strange string operation. 'OSID ...' is a string parameter substituted to SELECT template. changing this to IN ( 'O', 'S', .. requires re-writing parts of code and I'm not sure it makes code faster. > AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59' > This is another case where the planner is not going to have any ability > to make a useful estimate, and it's because you are using a crummy > representation of your data. You should merge those two columns into > one timestamp column and just do a simple BETWEEN test. > By and large, unnatural representations of data that you use in WHERE > clauses are going to cost you big-time in SQL queries. It's worth > taking time up front to design a clean table schema, and taking time > to revise it when requirements change. date range test in other part of where clause dok.kuupaev BETWEEN is optimizable. AND dok.kuupaev||dok.kellaaeg adds time range test to date range. There are less that some thousands documents per day. Wasting time to re-engineer database and deployed application seems not reasonable in this case. Andrus. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] CPU usage goes to 100%, query seems to ran forever
> You have ANALYZEd all these tables recently, I hope? The planner > certainly doesn't think this query will take very long. I have autovacuum running so I expect it takes care of ANALYZE, isn't it ? I ran also analyze command before running explain analyze. > To find out what's wrong, you're going to have to be patient enough to > let an EXPLAIN ANALYZE run to completion. Plain EXPLAIN won't tell. Here it is running in my local computer. I'm expecting run time no more 1 second "Nested Loop Left Join (cost=0.00..1829.95 rows=1 width=24) (actual time=492064.990..492064.990 rows=0 loops=1)" " Filter: ((("inner".arttyyp <> 'Teenus'::bpchar) AND ("inner".arttyyp <> 'Komplekt'::bpchar)) OR ("inner".arttyyp IS NULL))" " -> Nested Loop (cost=0.00..1825.01 rows=1 width=43) (actual time=492064.983..492064.983 rows=0 loops=1)" "-> Nested Loop (cost=0.00..1819.04 rows=1 width=24) (actual time=492064.978..492064.978 rows=0 loops=1)" " Join Filter: (("outer".dokumnr = "inner".dokumnr) AND ((("position"('VGYKITDNHMEBARCFJ'::text, ("outer".doktyyp)::text) <> 0) AND (CASE WHEN ((NOT ("outer".objrealt)::boolean) OR ("outer".doktyyp = 'I'::bpchar)) THEN "outer".yksus ELSE "inner (..)" " -> Seq Scan on dok (cost=0.00..787.80 rows=1 width=39) (actual time=0.152..878.198 rows=7670 loops=1)" "Filter: ((kuupaev >= '2006-04-08'::date) AND (kuupaev <= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double precision = 1::double precision) AND ("position"('OSIDVGYKIF'::text, (doktyyp)::text) <> 0) AND (((kuupaev):: (..)" " -> Seq Scan on rid (cost=0.00..1019.42 rows=315 width=51) (actual time=22.003..62.216 rows=839 loops=7670)" "Filter: ((toode = '1EEKPANT'::bpchar) AND (length(btrim((toode)::text)) > 2) AND (toode IS NOT NULL))" " -> Index Scan using toode_pkey on toode (cost=0.00..5.96 rows=1 width=43) (never executed)" " Index Cond: ('1EEKPANT'::bpchar = toode)" " -> Index Scan using artliik_pkey on artliik (cost=0.00..4.92 rows=1 width=31) (never executed)" "Index Cond: (("outer".grupp = artliik.grupp) AND ("outer".liik = artliik.liik))" "Total runtime: 492065.840 ms" Andrus. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] CPU usage goes to 100%, query seems to ran forever
I have small database running in 8.1.3 in W2K server. The following query causes Postgres process to use 100% CPU and seems to run forever. If I change '1EEKPANT' to less frequently used item code, it runs fast. How to speed it up ? set search_path to public,firma2; select rid.toode FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik ON toode.grupp=artliik.grupp and toode.liik=artliik.liik WHERE (NOT '0' or dok.kinnitatud) AND dok.kuupaev BETWEEN '2006-04-08' AND '2006-04-27' AND rid.toode='1EEKPANT' AND (NOT dok.eimuuda or '0' ) and dok.laonr='1'::float8 and POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND ( ( ('1' OR (POSITION(dok.doktyyp IN 'TUNH')=0 and (rid.kogus<0 or ('1' and rid.kogus=0 and POSITION(dok.doktyyp IN 'VGYKITDNHMEBARCFJ' )!=0 AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus ELSE rid.kuluobjekt END LIKE 'LADU%' ESCAPE '!' ) OR (POSITION(dok.doktyyp IN 'OSIUDP' )!=0 AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus END LIKE 'LADU%' ESCAPE '!' ) ) AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59' AND ('0' or ( length(trim(rid.toode))>2 AND rid.toode is NOT NULL)) AND ( LENGTH('' )=0 OR rid.partii='' OR (dok.doktyyp='I' AND rid.kulupartii='' ) ) AND (NOT dok.inventuur or rid.kogus!=0) AND dok.dokumnr!= 0 AND ( artliik.arttyyp NOT IN ('Teenus', 'Komplekt' ) OR artliik.arttyyp IS NULL) explain returns: "Nested Loop Left Join (cost=0.00..1828.18 rows=1 width=24)" " Filter: ((("inner".arttyyp <> 'Teenus'::bpchar) AND ("inner".arttyyp <> 'Komplekt'::bpchar)) OR ("inner".arttyyp IS NULL))" " -> Nested Loop (cost=0.00..1822.51 rows=1 width=43)" "-> Nested Loop (cost=0.00..1816.56 rows=1 width=24)" " Join Filter: (("outer".dokumnr = "inner".dokumnr) AND ((("position"('VGYKITDNHMEBARCFJ'::text, ("outer".doktyyp)::text) <> 0) AND (CASE WHEN ((NOT ("outer".objrealt)::boolean) OR ("outer".doktyyp = 'I'::bpchar)) THEN "outer".yksus ELSE "inner (..)" " -> Seq Scan on dok (cost=0.00..787.80 rows=1 width=39)" "Filter: ((kuupaev >= '2006-04-08'::date) AND (kuupaev <= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double precision = 1::double precision) AND ("position"('OSIDVGYKIF'::text, (doktyyp)::text) <> 0) AND (((kuupaev):: (..)" " -> Seq Scan on rid (cost=0.00..1019.42 rows=249 width=51)" "Filter: ((toode = '1EEKPANT'::bpchar) AND (length(btrim((toode)::text)) > 2) AND (toode IS NOT NULL))" "-> Index Scan using toode_pkey on toode (cost=0.00..5.94 rows=1 width=43)" " Index Cond: ('1EEKPANT'::bpchar = toode)" " -> Index Scan using artliik_pkey on artliik (cost=0.00..5.65 rows=1 width=88)" "Index Cond: (("outer".grupp = artliik.grupp) AND ("outer".liik = artliik.liik))" Andrus. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query runs too long for indexed tables
> Why not use a LEFT JOIN for this ? UPDATE firma1.rid SET rid.toode=NULL LEFT join firma1.toode using(toode) WHERE rid.toode IS NOT NULL AND toode.toode IS NULL; Causes: ERROR: syntax error at or near "LEFT" at character 41 outer joins are not supported in Postgres UPDATE command. Andrus. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Query runs too long for indexed tables
> Let me guess, you've updated it a lot and aren't familiar with Vacuum? > > run a vacuum full on your database. schedule a vacuum (plain one) to > run every so often (hours or days are a good interval for most folks) > > If that's NOT your problem, then please, let us know. Scot, thank you. Excellent. If database is created and VACUUM ANALYZE is issued, this query runs fast. However, I need to speed up it during running script. This is a database creation script. Script does the following: 1. CREATE DATABASE foo; 2. START TRANSACTION; 3. Create 145 tables with primary keys. Add data to those tables. 4. Create some additional indexes 5. ANALYZE 6. Clear bad bad foreign keys fields using commands like UPDATE firma1.rid SET toode=NULL WHERE toode IS NOT NULL AND toode NOT IN (SELECT TOODE FROM firma1.TOODE); 7. Create foreign key references 8. COMMIT This script runs about 1 hour in modern server with fsync off. Largest table has 10 records, few other tables have 15000 records and remaining have fewer records. How to speed this up ? Is'nt running ANALYZE sufficient to speed up foreign key clearing ? It seems that ANALYZE does'nt work. Should I isse COMMIT before running ANALYZE or issue more commits? Server has 4 GB RAM postgres.conf file is default from 8.1.3 window zip file except the following settings are added to end: fsync=off shared_buffers = 3 redirect_stderr = on log_min_error_statement = error autovacuum = on ... also 2 stats settings from aurtovacuur max_fsm_pages = 3 Andrus. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Query runs too long for indexed tables
I have relatively small tables (toode and rid) in fast server. Both tables are indexed on toode field. Following query takes long time to run. toode field type is char(20). It is difficult to change this field type. Any idea how to speed up this query ? UPDATE firma1.rid SET toode=NULL WHERE toode IS NOT NULL AND toode NOT IN (SELECT TOODE FROM firma1.TOODE); Query returned successfully: 0 rows affected, 594813 ms execution time. explain window shows: Seq Scan on rid (cost=2581.07..20862553.77 rows=51848 width=1207) Filter: ((toode IS NOT NULL) AND (NOT (subplan))) SubPlan -> Materialize (cost=2581.07..2944.41 rows=14734 width=84) -> Seq Scan on toode (cost=0.00..2350.34 rows=14734 width=84) Andrus. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings