Re: [PERFORM] Load experimentation
On Tue, Dec 8, 2009 at 12:58 AM, Scott Marlowe scott.marl...@gmail.com wrote: On Tue, Dec 8, 2009 at 12:22 AM, Ben Brehmer benbreh...@gmail.com wrote: Thanks for all the responses. I have one more thought; Since my input data is split into about 200 files (3GB each), I could potentially spawn one load command for each file. What would be the maximum number of input connections Postgres can handle without bogging down? When I say 'input connection' I mean psql -U postgres -d dbname -f one_of_many_sql_files. This is VERY dependent on your IO capacity and number of cores. My experience is that unless you're running on a decent number of disks, you'll run out of IO capacity first in most machines. n pairs of mirrors in a RAID-10 can handle x input threads where x has some near linear relation to n. Have 100 disks in a RAID-10 array? You can surely handle dozens of load threads with no IO wait. Have 4 disks in a RAID-10? Maybe two to four load threads will max you out. Once you're IO bound, adding more threads and more CPUs won't help, it'll hurt. The only way to really know is to benchmark it, but i'd guess that about half as many import threads as mirror pairs in a RAID-10 (or just drives if you're using RAID-0) would be a good place to start and work from there. Note that if you start running out of CPU horsepower first the degradation will be less harsh as you go past the knee in the performance curve.IO has a sharper knee than CPU. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Load experimentation
Hi, Ben Brehmer benbreh...@gmail.com writes: By Loading data I am implying: psql -U postgres -d somedatabase -f sql_file.sql. The sql_file.sql contains table creates and insert statements. There are no indexes present nor created during the load. OS: x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) PostgreSQL: I will try upgrading to latest version. COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data was generated (Hadoop/MapReduce). What I think you could do is the followings: - switch to using 8.4 - load your files in a *local* database - pg_dump -Fc - now pg_restore -j X on the amazon setup That way you will be using COPY rather than INSERTs and parallel loading built-in pg_restore (and optimisations of when to add the indexes and constraints). The X is to choose depending on the IO power and the number of CPU... Regards, -- dim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Load experimentation
On Tue, Dec 8, 2009 at 2:08 AM, Dimitri Fontaine dfonta...@hi-media.com wrote: Hi, Ben Brehmer benbreh...@gmail.com writes: By Loading data I am implying: psql -U postgres -d somedatabase -f sql_file.sql. The sql_file.sql contains table creates and insert statements. There are no indexes present nor created during the load. OS: x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) PostgreSQL: I will try upgrading to latest version. COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data was generated (Hadoop/MapReduce). What I think you could do is the followings: - switch to using 8.4 - load your files in a *local* database - pg_dump -Fc - now pg_restore -j X on the amazon setup That way you will be using COPY rather than INSERTs and parallel loading built-in pg_restore (and optimisations of when to add the indexes and constraints). The X is to choose depending on the IO power and the number of CPU... That's a lot of work to get to COPY. It might be enough to drop all FK relations and indexes on the destination db in the cloud, load the data in a few (or one) transaction(s), then recreate indexes and FK relationships. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Load experimentation
Scott Marlowe scott.marl...@gmail.com writes: That's a lot of work to get to COPY. Well, yes. I though about it this way only after having read that OP is uneasy with producing another format from his source data, and considering it's a one-shot operation. Ah, tradeoffs, how to find the right one! -- dim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
Hi Andres, This query returns for 8.4.1 and for 8.3.8 the same result: stadistinct = -1 stanullfrac = 0 stawidth = 4 array_upper nothing Regards David -Ursprüngliche Nachricht- Von: Robert Haas [mailto:robertmh...@gmail.com] Gesendet: Dienstag, 8. Dezember 2009 05:05 An: Kevin Grittner Cc: Schmitz, David; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1 On Mon, Dec 7, 2009 at 5:19 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Schmitz, David david.schm...@harman.com wrote: It is carried out with poor performance on postgresql 8.4.1 However postgresql 8.3.8 performs just fine. If you take a closer look at the query with EXPLAIN, it becomes obvious, that postgresql 8.4 does not consider the primary key at level 3 and instead generates a hash join: Postgresql 8.4.1: Sort (cost=129346.71..129498.64 rows=60772 width=61) Postgresql 8.3.8: Sort (cost=3792.75..3792.95 rows=81 width=61) It determines the plan based on available statistics, which in this case seem to indicate rather different data. Do the two databases have identical data? Have they both been recently analyzed? What is the default_statistics_target on each? Do any columns in these tables have overrides? I think Tom made some changes to the join selectivity code which might be relevant here, though I'm not sure exactly what's going on. Can we see, on the 8.4.1 database: SELECT SUM(1) FROM rdf_admin_hierarchy; SELECT s.stadistinct, s.stanullfrac, s.stawidth, array_upper(s.stanumbers1, 1) FROM pg_statistic s WHERE s.starelid = 'rdf_admin_hierarchy'::regclass AND s.staattnum = (SELECT a.attnum FROM pg_attribute a WHERE a.attname = 'admin_place_id' AND a.attrelid = 'rdf_admin_hierarchy'::regclass); ...Robert *** innovative systems GmbH Navigation-Multimedia Geschaeftsfuehrung: Edwin Summers - Michael Juergen Mauser Sitz der Gesellschaft: Hamburg - Registergericht: Hamburg HRB 59980 *** Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the contents in this e-mail is strictly forbidden. *** -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
Hi Andres, EXPLAIN ANALYZE select ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID, la.SIDE, rl.ROAD_NAME_ID, rl.LEFT_ADDRESS_RANGE_ID, rl.RIGHT_ADDRESS_RANGE_ID, rl.IS_EXIT_NAME, rl.EXPLICATABLE, rl.IS_JUNCTION_NAME, rl.IS_NAME_ON_ROADSIGN, rl.IS_POSTAL_NAME, rl.IS_STALE_NAME, rl.IS_VANITY_NAME, rl.ROAD_LINK_ID, rn.STREET_NAME, rn.ROUTE_TYPE from rdf.xdf_ADMIN_HIERARCHY ah join xdf.xdf_LINK_ADMIN la on ah.ADMIN_PLACE_ID = la.ADMIN_PLACE_ID join xdf.xdf_ROAD_LINK rl on la.LINK_ID = rl.LINK_ID join xdf.xdf_ROAD_NAME rn on rl.ROAD_NAME_ID = rn.ROAD_NAME_ID where rl.IS_EXIT_NAME = 'N' and rl.IS_JUNCTION_NAME = 'N' and rn.ROAD_NAME_ID between 158348561 and 158348660 order by rl.ROAD_NAME_ID, ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID; On Postgresql 8.4.1 Sort (cost=129346.71..129498.64 rows=60772 width=61) (actual time=100.358..100.496 rows=1444 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: quicksort Memory: 252kB - Hash Join (cost=2603.57..124518.03 rows=60772 width=61) (actual time=62.359..97.268 rows=1444 loops=1) Hash Cond: (la.admin_place_id = ah.admin_place_id) - Nested Loop (cost=6.82..120781.81 rows=60772 width=57) (actual time=0.318..33.600 rows=1444 loops=1) - Nested Loop (cost=6.82..72383.98 rows=21451 width=51) (actual time=0.232..12.359 rows=722 loops=1) - Index Scan using pk_xdf_road_name on xdf_road_name rn (cost=0.00..11.24 rows=97 width=21) (actual time=0.117..0.185 rows=100 loops=1) Index Cond: ((road_name_id = 158348561) AND (road_name_id = 158348660)) - Bitmap Heap Scan on xdf_road_link rl (cost=6.82..743.34 rows=222 width=34) (actual time=0.025..0.115 rows=7 loops=100) Recheck Cond: (rl.road_name_id = rn.road_name_id) Filter: ((rl.is_exit_name = 'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar)) - Bitmap Index Scan on nx_xdfroadlink_roadnameid (cost=0.00..6.76 rows=222 width=0) (actual time=0.008..0.008 rows=7 loops=100) Index Cond: (rl.road_name_id = rn.road_name_id) - Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la (cost=0.00..2.22 rows=3 width=10) (actual time=0.023..0.028 rows=2 loops=722) Index Cond: (la.link_id = rl.link_id) - Hash (cost=1544.11..1544.11 rows=84211 width=12) (actual time=61.924..61.924 rows=84211 loops=1) - Seq Scan on xdf_admin_hierarchy ah (cost=0.00..1544.11 rows=84211 width=12) (actual time=0.017..33.442 rows=84211 loops=1) Total runtime: 101.446 ms and on Postgresql 8.3.8: Sort (cost=3792.75..3792.95 rows=81 width=61) (actual time=28.928..29.074 rows=1444 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: quicksort Memory: 252kB - Nested Loop (cost=21.00..3790.18 rows=81 width=61) (actual time=0.210..26.098 rows=1444 loops=1) - Nested Loop (cost=21.00..3766.73 rows=81 width=57) (actual time=0.172..19.148 rows=1444 loops=1) - Nested Loop (cost=21.00..3733.04 rows=14 width=51) (actual time=0.129..6.126 rows=722 loops=1) - Index Scan using pk_xdf_road_name on xdf_road_name rn (cost=0.00..8.32 rows=1 width=21) (actual time=0.059..0.117 rows=100 loops=1) Index Cond: ((road_name_id = 158348561) AND (road_name_id = 158348660)) - Bitmap Heap Scan on xdf_road_link rl (cost=21.00..3711.97 rows=1020 width=34) (actual time=0.015..0.055 rows=7 loops=100) Recheck Cond: (rl.road_name_id = rn.road_name_id) Filter: ((rl.is_exit_name = 'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar)) - Bitmap Index Scan on nx_xdfroadlink_roadnameid (cost=0.00..20.75 rows=1020 width=0) (actual time=0.007..0.007 rows=7 loops=100) Index Cond: (rl.road_name_id = rn.road_name_id) - Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la (cost=0.00..2.31 rows=8 width=10) (actual time=0.014..0.017 rows=2 loops=722) Index Cond: (la.link_id = rl.link_id) - Index Scan using pk_xdf_admin_hierarchy on xdf_admin_hierarchy ah (cost=0.00..0.28 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=1444) Index Cond: (ah.admin_place_id = la.admin_place_id) Total runtime: 29.366 ms Hope this gives any clue. Or did I missunderstand you? Regards David -Ursprüngliche Nachricht- Von: Andres Freund
Re: [PERFORM] Checkpoint spikes
Dear All, Thanks for all your help so far. This page was particularly helpful: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm (does the advice for 8.3 apply unchanged to 8.4?) I'm still hitting issues with this though: sync is taking 7-10 seconds and I need to get it down to nearer 3. We're running a lightly-loaded system which has a realtime response requirement of 0.5 seconds all the time (with a few seconds permissible occasionally, but never exceeding 10). So far, I've set checkpoint_segments to 128, timeout to 10min, and completion_target to 0.8. This helps, but not as much as I'd hoped. But I haven't touched any of the other WAL or BG Writer settings. Where should I look next? Should I be looking at the BG Writer settings, or should I look at the Linux VM configuration? (eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1) Or would it be most useful to try to move the WAL to a different disk? Latest messages: # tail -f /var/log/postgresql/postgresql-8.4-main.log | grep check 2009-12-08 09:12:00 GMT LOG: checkpoint starting: time 2009-12-08 09:20:09 GMT LOG: checkpoint complete: wrote 51151 buffers (8.9%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=479.669 s, sync=9.852 s, total=489.553 s 2009-12-08 09:22:00 GMT LOG: checkpoint starting: time 2009-12-08 09:30:07 GMT LOG: checkpoint complete: wrote 45772 buffers (7.9%); 0 transaction log file(s) added, 0 removed, 24 recycled; write=479.706 s, sync=7.337 s, total=487.120 s 2009-12-08 09:32:00 GMT LOG: checkpoint starting: time 2009-12-08 09:40:09 GMT LOG: checkpoint complete: wrote 47043 buffers (8.2%); 0 transaction log file(s) added, 0 removed, 22 recycled; write=479.744 s, sync=9.300 s, total=489.122 s 2009-12-08 09:42:00 GMT LOG: checkpoint starting: time 2009-12-08 09:50:07 GMT LOG: checkpoint complete: wrote 48210 buffers (8.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=479.689 s, sync=7.707 s, total=487.416 s Thanks a lot, Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Checkpoint spikes
Dear All, Thanks for all your help so far. This page was particularly helpful: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm (does the advice for 8.3 apply unchanged to 8.4?) I'm still hitting issues with this though: sync is taking 7-10 seconds and I need to get it down to nearer 3. We're running a lightly-loaded system which has a realtime response requirement of 0.5 seconds all the time (with a few seconds permissible occasionally, but never exceeding 10). So far, I've set checkpoint_segments to 128, timeout to 10min, and completion_target to 0.8. This helps, but not as much as I'd hoped. But I haven't touched any of the other WAL or BG Writer settings. Where should I look next? Should I be looking at the BG Writer settings, or should I look at the Linux VM configuration? (eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1) Or would it be most useful to try to move the WAL to a different disk? Latest messages: # tail -f /var/log/postgresql/postgresql-8.4-main.log | grep check 2009-12-08 09:12:00 GMT LOG: checkpoint starting: time 2009-12-08 09:20:09 GMT LOG: checkpoint complete: wrote 51151 buffers (8.9%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=479.669 s, sync=9.852 s, total=489.553 s 2009-12-08 09:22:00 GMT LOG: checkpoint starting: time 2009-12-08 09:30:07 GMT LOG: checkpoint complete: wrote 45772 buffers (7.9%); 0 transaction log file(s) added, 0 removed, 24 recycled; write=479.706 s, sync=7.337 s, total=487.120 s 2009-12-08 09:32:00 GMT LOG: checkpoint starting: time 2009-12-08 09:40:09 GMT LOG: checkpoint complete: wrote 47043 buffers (8.2%); 0 transaction log file(s) added, 0 removed, 22 recycled; write=479.744 s, sync=9.300 s, total=489.122 s 2009-12-08 09:42:00 GMT LOG: checkpoint starting: time 2009-12-08 09:50:07 GMT LOG: checkpoint complete: wrote 48210 buffers (8.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=479.689 s, sync=7.707 s, total=487.416 s Thanks a lot, Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
2009/12/8 Schmitz, David david.schm...@harman.com Hi Andres, EXPLAIN ANALYZE select ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID, la.SIDE, rl.ROAD_NAME_ID, rl.LEFT_ADDRESS_RANGE_ID, rl.RIGHT_ADDRESS_RANGE_ID, rl.IS_EXIT_NAME, rl.EXPLICATABLE, rl.IS_JUNCTION_NAME, rl.IS_NAME_ON_ROADSIGN, rl.IS_POSTAL_NAME, rl.IS_STALE_NAME, rl.IS_VANITY_NAME, rl.ROAD_LINK_ID, rn.STREET_NAME, rn.ROUTE_TYPE from rdf.xdf_ADMIN_HIERARCHY ah join xdf.xdf_LINK_ADMIN la on ah.ADMIN_PLACE_ID = la.ADMIN_PLACE_ID join xdf.xdf_ROAD_LINK rl on la.LINK_ID = rl.LINK_ID join xdf.xdf_ROAD_NAME rn on rl.ROAD_NAME_ID = rn.ROAD_NAME_ID where rl.IS_EXIT_NAME = 'N' and rl.IS_JUNCTION_NAME = 'N' and rn.ROAD_NAME_ID between 158348561 and 158348660 order by rl.ROAD_NAME_ID, ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID; On Postgresql 8.4.1 Sort (cost=129346.71..129498.64 rows=60772 width=61) (actual time=100.358..100.496 rows=1444 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: quicksort Memory: 252kB - Hash Join (cost=2603.57..124518.03 rows=60772 width=61) (actual time=62.359..97.268 rows=1444 loops=1) Hash Cond: (la.admin_place_id = ah.admin_place_id) - Nested Loop (cost=6.82..120781.81 rows=60772 width=57) (actual time=0.318..33.600 rows=1444 loops=1) - Nested Loop (cost=6.82..72383.98 rows=21451 width=51) (actual time=0.232..12.359 rows=722 loops=1) - Index Scan using pk_xdf_road_name on xdf_road_name rn (cost=0.00..11.24 rows=97 width=21) (actual time=0.117..0.185 rows=100 loops=1) Index Cond: ((road_name_id = 158348561) AND (road_name_id = 158348660)) - Bitmap Heap Scan on xdf_road_link rl (cost=6.82..743.34 rows=222 width=34) (actual time=0.025..0.115 rows=7 loops=100) Recheck Cond: (rl.road_name_id = rn.road_name_id) Filter: ((rl.is_exit_name = 'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar)) - Bitmap Index Scan on nx_xdfroadlink_roadnameid (cost=0.00..6.76 rows=222 width=0) (actual time=0.008..0.008 rows=7 loops=100) Index Cond: (rl.road_name_id = rn.road_name_id) - Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la (cost=0.00..2.22 rows=3 width=10) (actual time=0.023..0.028 rows=2 loops=722) Index Cond: (la.link_id = rl.link_id) - Hash (cost=1544.11..1544.11 rows=84211 width=12) (actual time=61.924..61.924 rows=84211 loops=1) - Seq Scan on xdf_admin_hierarchy ah (cost=0.00..1544.11 rows=84211 width=12) (actual time=0.017..33.442 rows=84211 loops=1) Total runtime: 101.446 ms and on Postgresql 8.3.8: Sort (cost=3792.75..3792.95 rows=81 width=61) (actual time=28.928..29.074 rows=1444 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: quicksort Memory: 252kB - Nested Loop (cost=21.00..3790.18 rows=81 width=61) (actual time=0.210..26.098 rows=1444 loops=1) - Nested Loop (cost=21.00..3766.73 rows=81 width=57) (actual time=0.172..19.148 rows=1444 loops=1) - Nested Loop (cost=21.00..3733.04 rows=14 width=51) (actual time=0.129..6.126 rows=722 loops=1) - Index Scan using pk_xdf_road_name on xdf_road_name rn (cost=0.00..8.32 rows=1 width=21) (actual time=0.059..0.117 rows=100 loops=1) Index Cond: ((road_name_id = 158348561) AND (road_name_id = 158348660)) - Bitmap Heap Scan on xdf_road_link rl (cost=21.00..3711.97 rows=1020 width=34) (actual time=0.015..0.055 rows=7 loops=100) Recheck Cond: (rl.road_name_id = rn.road_name_id) Filter: ((rl.is_exit_name = 'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar)) - Bitmap Index Scan on nx_xdfroadlink_roadnameid (cost=0.00..20.75 rows=1020 width=0) (actual time=0.007..0.007 rows=7 loops=100) Index Cond: (rl.road_name_id = rn.road_name_id) - Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la (cost=0.00..2.31 rows=8 width=10) (actual time=0.014..0.017 rows=2 loops=722) Index Cond: (la.link_id = rl.link_id) - Index Scan using pk_xdf_admin_hierarchy on xdf_admin_hierarchy ah (cost=0.00..0.28 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=1444) Index Cond: (ah.admin_place_id = la.admin_place_id) Total runtime: 29.366 ms Hope this gives any clue. Or did I missunderstand you? Regards David
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
Hi Thom, I did a select count(*) from xdf.xdf_admin_hierarchy and it returns 84211 on both databases postgres 8.3.8 and 8.4.1. The amount of data is exactly the same in both databases as they are restored from the same dump. Regards David _ Von: Thom Brown [mailto:thombr...@gmail.com] Gesendet: Dienstag, 8. Dezember 2009 11:12 An: Schmitz, David Cc: Andres Freund; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1 2009/12/8 Schmitz, David david.schm...@harman.com Hi Andres, EXPLAIN ANALYZE select ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID, la.SIDE, rl.ROAD_NAME_ID, rl.LEFT_ADDRESS_RANGE_ID, rl.RIGHT_ADDRESS_RANGE_ID, rl.IS_EXIT_NAME, rl.EXPLICATABLE, rl.IS_JUNCTION_NAME, rl.IS_NAME_ON_ROADSIGN, rl.IS_POSTAL_NAME, rl.IS_STALE_NAME, rl.IS_VANITY_NAME, rl.ROAD_LINK_ID, rn.STREET_NAME, rn.ROUTE_TYPE from rdf.xdf_ADMIN_HIERARCHY ah join xdf.xdf_LINK_ADMIN la on ah.ADMIN_PLACE_ID = la.ADMIN_PLACE_ID join xdf.xdf_ROAD_LINK rl on la.LINK_ID = rl.LINK_ID join xdf.xdf_ROAD_NAME rn on rl.ROAD_NAME_ID = rn.ROAD_NAME_ID where rl.IS_EXIT_NAME = 'N' and rl.IS_JUNCTION_NAME = 'N' and rn.ROAD_NAME_ID between 158348561 and 158348660 order by rl.ROAD_NAME_ID, ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID; On Postgresql 8.4.1 Sort (cost=129346.71..129498.64 rows=60772 width=61) (actual time=100.358..100.496 rows=1444 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: quicksort Memory: 252kB - Hash Join (cost=2603.57..124518.03 rows=60772 width=61) (actual time=62.359..97.268 rows=1444 loops=1) Hash Cond: (la.admin_place_id = ah.admin_place_id) - Nested Loop (cost=6.82..120781.81 rows=60772 width=57) (actual time=0.318..33.600 rows=1444 loops=1) - Nested Loop (cost=6.82..72383.98 rows=21451 width=51) (actual time=0.232..12.359 rows=722 loops=1) - Index Scan using pk_xdf_road_name on xdf_road_name rn (cost=0.00..11.24 rows=97 width=21) (actual time=0.117..0.185 rows=100 loops=1) Index Cond: ((road_name_id = 158348561) AND (road_name_id = 158348660)) - Bitmap Heap Scan on xdf_road_link rl (cost=6.82..743.34 rows=222 width=34) (actual time=0.025..0.115 rows=7 loops=100) Recheck Cond: (rl.road_name_id = rn.road_name_id) Filter: ((rl.is_exit_name = 'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar)) - Bitmap Index Scan on nx_xdfroadlink_roadnameid (cost=0.00..6.76 rows=222 width=0) (actual time=0.008..0.008 rows=7 loops=100) Index Cond: (rl.road_name_id = rn.road_name_id) - Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la (cost=0.00..2.22 rows=3 width=10) (actual time=0.023..0.028 rows=2 loops=722) Index Cond: (la.link_id = rl.link_id) - Hash (cost=1544.11..1544.11 rows=84211 width=12) (actual time=61.924..61.924 rows=84211 loops=1) - Seq Scan on xdf_admin_hierarchy ah (cost=0.00..1544.11 rows=84211 width=12) (actual time=0.017..33.442 rows=84211 loops=1) Total runtime: 101.446 ms and on Postgresql 8.3.8: Sort (cost=3792.75..3792.95 rows=81 width=61) (actual time=28.928..29.074 rows=1444 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: quicksort Memory: 252kB - Nested Loop (cost=21.00..3790.18 rows=81 width=61) (actual time=0.210..26.098 rows=1444 loops=1) - Nested Loop
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
Hi David, On Tuesday 08 December 2009 10:59:51 Schmitz, David wrote: With our data it is a performance difference from 1h16min (8.3.8) to 2h43min (8.4.1) On Postgresql 8.4.1 Total runtime: 101.446 ms and on Postgresql 8.3.8: Total runtime: 29.366 ms Hm. There obviously is more going on than these queries? Hash Join (cost=2603.57..124518.03 rows=60772 width=61) (actual time=62.359..97.268 rows=1444 loops=1) Nested Loop (cost=21.00..3790.18 rows=81 width=61) (actual time=0.210..26.098 rows=1444 loops=1) Both misestimate the resultset quite a bit. It looks like happenstance that the one on 8.3 turns out to be better... Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
Hi Andres, this is just one of many of these queries. There are a lot of jobs calculating stuff for different ranges which are defined via between in the where clause. When I leave out the between in the where clause it returns: On Postgresql 8.4.1: Sort (cost=5390066.42..5435347.78 rows=18112546 width=61) (actual time=84382.275..91367.983 rows=12742796 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: external merge Disk: 924536kB - Hash Join (cost=1082249.40..2525563.48 rows=18112546 width=61) (actual time=23367.205..52256.209 rows=12742796 loops=1) Hash Cond: (la.admin_place_id = ah.admin_place_id) - Merge Join (cost=1079652.65..2183356.50 rows=18112546 width=57) (actual time=23306.643..45541.157 rows=12742796 loops=1) Merge Cond: (la.link_id = rl.link_id) - Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la (cost=0.00..798398.53 rows=16822372 width=10) (actual time=0.098..12622.576 rows=16822399 loops=1) - Sort (cost=1071304.95..1087287.81 rows=6393147 width=51) (actual time=23302.596..25640.559 rows=12742795 loops=1) Sort Key: rl.link_id Sort Method: external sort Disk: 405896kB - Hash Join (cost=15735.91..348620.58 rows=6393147 width=51) (actual time=327.064..9189.938 rows=6371398 loops=1) Hash Cond: (rl.road_name_id = rn.road_name_id) - Seq Scan on xdf_road_link rl (cost=0.00..182236.41 rows=7708159 width=34) (actual time=0.028..2689.085 rows=7709085 loops=1) Filter: ((is_exit_name = 'N'::bpchar) AND (is_junction_name = 'N'::bpchar)) - Hash (cost=9885.96..9885.96 rows=467996 width=21) (actual time=326.740..326.740 rows=467996 loops=1) - Seq Scan on xdf_road_name rn (cost=0.00..9885.96 rows=467996 width=21) (actual time=0.019..191.473 rows=467996 loops=1) - Hash (cost=1544.11..1544.11 rows=84211 width=12) (actual time=60.453..60.453 rows=84211 loops=1) - Seq Scan on xdf_admin_hierarchy ah (cost=0.00..1544.11 rows=84211 width=12) (actual time=0.019..31.723 rows=84211 loops=1) Total runtime: 92199.676 ms On Postgresql 8.3.8: Sort (cost=9419546.57..9514635.57 rows=38035597 width=61) (actual time=82790.473..88847.963 rows=12742796 loops=1) Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id Sort Method: external merge Disk: 999272kB - Hash Join (cost=1079404.97..3200652.85 rows=38035597 width=61) (actual time=22583.059..51197.249 rows=12742796 loops=1) Hash Cond: (la.admin_place_id = ah.admin_place_id) - Merge Join (cost=1076808.22..2484888.66 rows=38035597 width=57) (actual time=22524.015..44539.246 rows=12742796 loops=1) Merge Cond: (la.link_id = rl.link_id) - Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la (cost=0.00..795583.17 rows=16822420 width=10) (actual time=0.086..11725.990 rows=16822399 loops=1) - Sort (cost=1076734.49..1092821.79 rows=6434920 width=51) (actual time=22514.553..25083.253 rows=12742795 loops=1) Sort Key: rl.link_id Sort Method: external sort Disk: 443264kB - Hash Join (cost=15743.47..349025.77 rows=6434920 width=51) (actual time=330.211..9014.353 rows=6371398 loops=1) Hash Cond: (rl.road_name_id = rn.road_name_id) - Seq Scan on xdf_road_link rl (cost=0.00..182235.08 rows=7706491 width=34) (actual time=0.018..2565.983 rows=7709085 loops=1) Filter: ((is_exit_name = 'N'::bpchar) AND (is_junction_name = 'N'::bpchar)) - Hash (cost=9890.43..9890.43 rows=468243 width=21) (actual time=329.906..329.906 rows=467996 loops=1) - Seq Scan on xdf_road_name rn (cost=0.00..9890.43 rows=468243 width=21) (actual time=0.018..190.764 rows=467996 loops=1) - Hash (cost=1544.11..1544.11 rows=84211 width=12) (actual time=58.910..58.910 rows=84211 loops=1) - Seq Scan on xdf_admin_hierarchy ah (cost=0.00..1544.11 rows=84211 width=12) (actual time=0.009..28.725 rows=84211 loops=1) Total runtime: 89612.801 ms Regards David -Ursprüngliche Nachricht- Von: Andres Freund [mailto:and...@anarazel.de] Gesendet: Dienstag, 8. Dezember 2009 11:29 An: pgsql-performance@postgresql.org Cc: Schmitz, David Betreff: Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1 Hi David, On Tuesday 08 December 2009 10:59:51 Schmitz, David wrote: With our data it is a performance difference from 1h16min (8.3.8) to 2h43min (8.4.1) On Postgresql 8.4.1 Total runtime: 101.446 ms and on Postgresql 8.3.8: Total runtime: 29.366 ms Hm. There obviously is more going on than these queries? Hash Join (cost=2603.57..124518.03 rows=60772 width=61) (actual time=62.359..97.268 rows=1444 loops=1)
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
On 8/12/2009 6:11 PM, Thom Brown wrote: Your output shows that the xdf_admin_hierarchy tables between versions are drastically different. 8.3.8 only contains 1 row, whereas 8.4.1 contains 84211 rows. That's just because one of them is doing a nested loop where it looks up a single row from xdf_admin_hierarchy via its primary key on each iteration. The other plan is doing a hash join on a sequential scan over xdf_admin_hierarchy so it reports all the rows at once. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
Hi Craig, that is exactly the problem postgresql 8.4.1 does not consider the primary key but instead calculates a hash join. This can only result in poorer performance. I think this is a bug. Regards David -Ursprüngliche Nachricht- Von: Craig Ringer [mailto:cr...@postnewspapers.com.au] Gesendet: Dienstag, 8. Dezember 2009 13:12 An: Thom Brown Cc: Schmitz, David; Andres Freund; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1 On 8/12/2009 6:11 PM, Thom Brown wrote: Your output shows that the xdf_admin_hierarchy tables between versions are drastically different. 8.3.8 only contains 1 row, whereas 8.4.1 contains 84211 rows. That's just because one of them is doing a nested loop where it looks up a single row from xdf_admin_hierarchy via its primary key on each iteration. The other plan is doing a hash join on a sequential scan over xdf_admin_hierarchy so it reports all the rows at once. -- Craig Ringer *** innovative systems GmbH Navigation-Multimedia Geschaeftsfuehrung: Edwin Summers - Michael Juergen Mauser Sitz der Gesellschaft: Hamburg - Registergericht: Hamburg HRB 59980 *** Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the contents in this e-mail is strictly forbidden. *** -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Optimizing Bitmap Heap Scan.
Hi All, I have to optimize following query : SELECT r.TopFamilyID AS FamilyID, FROM CMRules r WHERE r.WorkspaceID =18512 GROUP BY r.TopFamilyID ; The explain plan is as follows : Group (cost=509989.19..511518.30 rows=9 width=10) (actual time=1783.102..2362.587 rows=261 loops=1) - Sort (cost=509989.19..510753.74 rows=305821 width=10) (actual time=1783.097..2121.378 rows=272211 loops=1) Sort Key: topfamilyid - Bitmap Heap Scan on cmrules r (cost=14501.36..476896.34 rows=305821 width=10) (actual time=51.507..351.487 rows=272211 loops=1) Recheck Cond: (workspaceid = 18512::numeric) - Bitmap Index Scan on pk_ws_fea_fam_cmrules (cost=0.00..14424.90 rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1) Index Cond: (workspaceid = 18512::numeric) Total runtime: 2373.008 ms (8 rows) - \d CMRules gives follows indexes Indexes: pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, topfamilyid, ruleenddate, gid) idx_cmrules btree (topfamilyid) idx_gid_ws_cmrules btree (gid, workspaceid) - SELECT count(distinct r.TopFamilyID) FROM CMRules r WHERE r.WorkspaceID =18512 Gives me 261 Rows SELECT count(r.TopFamilyID) FROM CMRules r WHERE r.WorkspaceID =18512 ; Gives me 272 211 Rows select count(*) from cmrules; Gives me 17 643 532 Rows Please suggest me something to optimize this query Thanks Niraj Patel
Re: [PERFORM] Optimizing Bitmap Heap Scan.
Hi gryzman, I have run vacuum full analyze on the cmrules tables. The version of pstgres is 8.2.13. How should I change stats to 100 ? Thanks From: Grzegorz Jaśkiewicz gryz...@gmail.com To: niraj patel npa...@gridsolv.com Cc: pgsql-performance@postgresql.org Sent: Tue, 8 December, 2009 7:12:49 PM Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan. it looks like it might choose wrong plan, cos it gets the stats wrong. Try increasing number of stats to 100. Btw, what version it is ?
Re: [PERFORM] Optimizing Bitmap Heap Scan.
On Tue, 8 Dec 2009, niraj patel wrote: Group (cost=509989.19..511518.30 rows=9 width=10) (actual time=1783.102..2362.587 rows=261 loops=1) - Sort (cost=509989.19..510753.74 rows=305821 width=10) (actual time=1783.097..2121.378 rows=272211 loops=1) Sort Key: topfamilyid - Bitmap Heap Scan on cmrules r (cost=14501.36..476896.34 rows=305821 width=10) (actual time=51.507..351.487 rows=272211 loops=1) Recheck Cond: (workspaceid = 18512::numeric) - Bitmap Index Scan on pk_ws_fea_fam_cmrules (cost=0.00..14424.90 rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1) Index Cond: (workspaceid = 18512::numeric) Total runtime: 2373.008 ms (8 rows) select count(*) from cmrules; Gives me 17 643 532 Rows Looks good from here. Think about what you're asking the database to do. It has to select 272211 rows out of a large table with 17643532 rows. That in itself could take a very long time. It is clear that in your EXPLAIN this data is already cached, otherwise it would have to perform nigh on 27 seeks over the discs, which would take (depending on the disc system) something on the order of twenty minutes. Those 272211 rows then have to be sorted, which takes a couple of seconds, which again is pretty good. The rows are then uniqued, which is really quick, before returning the results. It's hard to think how you would expect the database to do this any faster, really. Indexes: pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, topfamilyid, ruleenddate, gid) idx_cmrules btree (topfamilyid) idx_gid_ws_cmrules btree (gid, workspaceid) You may perhaps benefit from an index on just the workspaceid column, but the benefit may be minor. You may think of clustering the table on the index, but that will only be of benefit if the data is not in the cache. The statistics seem to be pretty accurate, predicting 305821 instead of 272211 rows. The database is not going to easily predict the number of unique results (9 instead of 261), but that doesn't affect the query plan much, so I wouldn't worry about it. I would consider upgrading to Postgres 8.4 if possible, as it does have some considerable performance improvements, especially for bitmap index scans if you are using a RAID array. I'd also try using SELECT DISTINCT rather than GROUP BY and seeing if that helps. Matthew -- Now the reason people powdered their faces back then was to change the values s and n in this equation here. - Computer science lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Bitmap Heap Scan.
Hi Matthew , Thanks very much for the analysis. It does takes 17 sec to execute when data is not in cache. I cannot use distinct as I have aggregate operators in select clause in original query. What I would like to ask can partitioning around workspaceid would help ? Or any sort of selective index would help me. Thanks. From: Matthew Wakeling matt...@flymine.org To: niraj patel npa...@gridsolv.com Cc: pgsql-performance@postgresql.org Sent: Tue, 8 December, 2009 7:33:38 PM Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan. On Tue, 8 Dec 2009, niraj patel wrote: Group (cost=509989.19..511518.30 rows=9 width=10) (actual time=1783.102..2362.587 rows=261 loops=1) - Sort (cost=509989.19..510753.74 rows=305821 width=10) (actual time=1783.097..2121.378 rows=272211 loops=1) Sort Key: topfamilyid - Bitmap Heap Scan on cmrules r (cost=14501.36..476896.34 rows=305821 width=10) (actual time=51.507..351.487 rows=272211 loops=1) Recheck Cond: (workspaceid = 18512::numeric) - Bitmap Index Scan on pk_ws_fea_fam_cmrules (cost=0.00..14424.90 rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1) Index Cond: (workspaceid = 18512::numeric) Total runtime: 2373.008 ms (8 rows) select count(*) from cmrules; Gives me 17 643 532 Rows Looks good from here. Think about what you're asking the database to do. It has to select 272211 rows out of a large table with 17643532 rows. That in itself could take a very long time. It is clear that in your EXPLAIN this data is already cached, otherwise it would have to perform nigh on 27 seeks over the discs, which would take (depending on the disc system) something on the order of twenty minutes. Those 272211 rows then have to be sorted, which takes a couple of seconds, which again is pretty good. The rows are then uniqued, which is really quick, before returning the results. It's hard to think how you would expect the database to do this any faster, really. Indexes: pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, topfamilyid, ruleenddate, gid) idx_cmrules btree (topfamilyid) idx_gid_ws_cmrules btree (gid, workspaceid) You may perhaps benefit from an index on just the workspaceid column, but the benefit may be minor. You may think of clustering the table on the index, but that will only be of benefit if the data is not in the cache. The statistics seem to be pretty accurate, predicting 305821 instead of 272211 rows. The database is not going to easily predict the number of unique results (9 instead of 261), but that doesn't affect the query plan much, so I wouldn't worry about it. I would consider upgrading to Postgres 8.4 if possible, as it does have some considerable performance improvements, especially for bitmap index scans if you are using a RAID array. I'd also try using SELECT DISTINCT rather than GROUP BY and seeing if that helps. Matthew -- Now the reason people powdered their faces back then was to change the values s and n in this equation here. - Computer science lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Bitmap Heap Scan.
On Tue, 8 Dec 2009, niraj patel wrote: Thanks very much for the analysis. It does takes 17 sec to execute when data is not in cache. It sounds like the table is already very much ordered by the workspaceid, otherwise this would have taken much longer. What I would like to ask can partitioning around workspaceid would help? Or any sort of selective index would help me. Depends on how many distinct values of workspaceid there are. I would suggest that given how well ordered your table is, and if you aren't doing too many writes, then there would be little benefit, and much hassle. Matthew -- Now, you would have thought these coefficients would be integers, given that we're working out integer results. Using a fraction would seem really stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to use complex numbers.-- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
On Tue, Dec 8, 2009 at 7:12 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 8/12/2009 6:11 PM, Thom Brown wrote: Your output shows that the xdf_admin_hierarchy tables between versions are drastically different. 8.3.8 only contains 1 row, whereas 8.4.1 contains 84211 rows. That's just because one of them is doing a nested loop where it looks up a single row from xdf_admin_hierarchy via its primary key on each iteration. The other plan is doing a hash join on a sequential scan over xdf_admin_hierarchy so it reports all the rows at once. I've been meaning to write a patch to show the places after the decimal point in that case. Rounding off to an integer is horribly misleading and obscures what is really going on. Although in this case maybe it would come out 1.000 anyway. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Checkpoint spikes
Richard Neill rn...@cam.ac.uk wrote: So far, I've set checkpoint_segments to 128, timeout to 10min, and completion_target to 0.8. This helps, but not as much as I'd hoped. But I haven't touched any of the other WAL or BG Writer settings. Where should I look next? On our web servers, where we had similar issues, we seem to be doing OK using: bgwriter_lru_maxpages = 1000 bgwriter_lru_multiplier = 4.0 The other thing which can help this problem is keeping shared_buffers smaller than most people recommend. We use 512MB on our larger web server and 256MB on other servers. (Be sure to test with your actual load, as this might or might not degrade overall performance.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
On Tue, Dec 8, 2009 at 8:27 AM, Schmitz, David david.schm...@harman.com wrot that is exactly the problem postgresql 8.4.1 does not consider the primary key but instead calculates a hash join. This can only result in poorer performance. I think this is a bug. Your statement that this can only result in poorer performance is flat wrong. Just because there's a primary key doesn't mean that an inner-indexscan plan is fastest. Frequently a hash join is faster. I can think of a couple of possible explanations for the behavior you're seeing: - Something could be blocking PostgreSQL from using that index at all. If you do EXPLAIN SELECT * FROM xdf_admin_hierarchy WHERE admin_place_id = some particular value, does it use the index or seq-scan the table? - The index on your 8.4.1 system might be bloated. You could perhaps SELECT reltuples FROM pg_class WHERE oid = 'pk_xdf_admin_hierarchy'::regclass on both systems to see if one index is larger than the other. - You might have changed the value of the work_mem parameter on one system vs. the other. Try show work_mem; on each system and see what you get. If it's none of those things, it's could be the result of a code change, but I'm at a loss to think of which one would apply in this case. I suppose we could do a bisection search but that's a lot of work for you. If you could extract a reproducible test case (complete with data) that would allow someone else to try to track it down. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Bitmap Heap Scan.
From: niraj patel npa...@gridsolv.com Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan. To: Grzegorz Jaśkiewicz gryz...@gmail.com Cc: pgsql-performance@postgresql.org Date: Tuesday, December 8, 2009, 1:50 PM Hi gryzman, I have run vacuum full analyze on the cmrules tables. The version of pstgres is 8.2.13. How should I change stats to 100 ? Thanks From: Grzegorz Jaśkiewicz gryz...@gmail.com To: niraj patel npa...@gridsolv.com Cc: pgsql-performance@postgresql.org Sent: Tue, 8 December, 2009 7:12:49 PM Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan. it looks like it might choose wrong plan, cos it gets the stats wrong. Try increasing number of stats to 100. Btw, what version it is ? in psql mydb=# set default_statistics_target = 100;
Re: [PERFORM] error occured in dbt2 against with postgresql
On Tue, Dec 8, 2009 at 12:37 AM, Niu Yan vivian@gmail.com wrote: Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/site_perl/5.8.8/Test/Parser/Dbt2.pm line 521. I'm guessing this is intended as a bug report, but this is a PostgreSQL mailing list, and that's a Perl error message. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
Robert Haas robertmh...@gmail.com writes: I can think of a couple of possible explanations for the behavior you're seeing: The reason it's switching from a nestloop to something else is pretty obvious: the estimate of the number of rows coming out of the lower join has gone from 81 to 60772. Neither of which is real accurate :-(, but the larger value pretty strongly discourages using a nestloop. The estimates for the individual scans mostly seem to be better than before, in the case of xdf_road_name far better: 97 vs 1, against a true value of 100. So that's good; I suspect though that it just comes from the increase in default stats target and doesn't reflect any logic change. The bottom line though is that it's gone from a considerable underestimate of the join size to a considerable overestimate, and that pushes it to use a different plan that turns out to be inferior. I don't see any fixable bug here. This is just a corner case where the inherent inaccuracies in join size estimation went wrong for us; but for every one of those there's another one where we'd get the right answer for the wrong reason. One thing that might be worth considering is to try to improve the accuracy of this rowcount estimate: - Bitmap Heap Scan on xdf_road_link rl (cost=6.82..743.34 rows=222 width=34) (actual time=0.025..0.115 rows=7 loops=100) Recheck Cond: (rl.road_name_id = rn.road_name_id) Filter: ((rl.is_exit_name = 'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar)) - Bitmap Index Scan on nx_xdfroadlink_roadnameid (cost=0.00..6.76 rows=222 width=0) (actual time=0.008..0.008 rows=7 loops=100) Index Cond: (rl.road_name_id = rn.road_name_id) I think a large part of the inaccuracy here has to do with not having good stats for the joint effect of the is_exit_name and is_junction_name conditions. But to be frank that looks like bad schema design. Consider merging those and any related flags into one entry type column. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Vacuum running out of memory
Jonathan Foy the...@gmail.com writes: My vacuums have suddenly started to fail, seemingly at random. I am confused. I'm running 8.1.3, with close to a dozen servers, up to 150 databases each. I have 8GB of RAM. Vacuums have started to fail on all servers (though only the occasional vacuum) with the following error: VACUUM,ERROR: out of memory VACUUM,DETAIL: Failed on request of size 268435452 I'd back off maintenance_work_mem if I were you. I think you don't have enough RAM to be running a lot of concurrent VACUUMs all with the same large memory consumption. Also, if it's really 8.1.3, consider an update to 8.1.something-recent. Not only are you exposed to a number of very serious known bugs, but this patch in particular would likely help you: http://archives.postgresql.org/pgsql-committers/2007-09/msg00377.php regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Vacuum running out of memory
On Tue, Dec 8, 2009 at 4:31 PM, Jonathan Foy the...@gmail.com wrote: I was wondering if that was the problem. So I'm correct in thinking that the failure occurred when the vacuum tried to pull its 256 MB as defined in the maintenance_work_mem value, and the system just did not have enough available... Correct any idea why that would suddenly start happening? The indexes I created shouldn't have affected that, should they? Well the 8.1 vacuum was pretty inefficient in how it scanned indexes so adding lots of indexes will make it take a lot longer. That might mean you're running more vacuums at the same time now. The 8.2 vacuum is much improved on that front, though adding lots of indexes will still make vacuum take longer (along with updates and inserts). And point taken with the update. I'm pushing to get us to 8.4, unsuccessfully so far, but management might be more amenable to minor version upgrades, since as I understand it there shouldn't be any risk of application problems with minor version changes... You're always better off running the most recent minor release. Minor releases fix security holes, data corruption bugs, crashing bugs, etc. Occasionally those bugs do fix behavioural bugs, especially early in the release cycle before the next major release is out but mostly they're real bugs that if you had run into you would know. You should still read all the release notes for them though. -- greg -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Vacuum running out of memory
Jonathan Foy the...@gmail.com writes: I was wondering if that was the problem. So I'm correct in thinking that the failure occurred when the vacuum tried to pull its 256 MB as defined in the maintenance_work_mem value, and the system just did not have enough available...any idea why that would suddenly start happening? The indexes I created shouldn't have affected that, should they? Not directly, AFAICS, but they could stretch out the time required to vacuum their tables, thus possibly leading to vacuums overlapping that didn't overlap before. Just a guess though. Another likely bet is that this is just an effect of the overall system load increasing over time (more backends == more memory needed). regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1
On Tue, Dec 8, 2009 at 11:07 AM, Schmitz, David david.schm...@harman.com wrote: So how should we proceed with this issue? I think Tom nailed it. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Bitmap Heap Scan.
2009/12/8 Lennin Caro lennin.c...@yahoo.com From: niraj patel npa...@gridsolv.com Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan. To: Grzegorz Jaśkiewicz gryz...@gmail.com Cc: pgsql-performance@postgresql.org Date: Tuesday, December 8, 2009, 1:50 PM Hi gryzman, I have run vacuum full analyze on the cmrules tables. The version of pstgres is 8.2.13. How should I change stats to 100 ? Thanks From: Grzegorz Jaśkiewicz gryz...@gmail.com To: niraj patel npa...@gridsolv.com Cc: pgsql-performance@postgresql.org Sent: Tue, 8 December, 2009 7:12:49 PM Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan. it looks like it might choose wrong plan, cos it gets the stats wrong. Try increasing number of stats to 100. Btw, what version it is ? in psql mydb=# set default_statistics_target = 100; That's only going to affect the current session. To change it permanently, edit postgresql.conf and do pg_ctl reload. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Checkpoint spikes
Richard Neill wrote: (does the advice for 8.3 apply unchanged to 8.4?) Yes; no changes in this area for 8.4. The main things performance related that changed between 8.3 and 8.4 are: 1) VACUUM free space management reimplemented so that the max_fsm_* parameters aren't needed anymore 2) default_statistics_target now starts at 100 instead of 10 So far, I've set checkpoint_segments to 128, timeout to 10min, and completion_target to 0.8. This helps, but not as much as I'd hoped. Good, if the problem is moving in the right direction you're making progress. But I haven't touched any of the other WAL or BG Writer settings. Where should I look next? Should I be looking at the BG Writer settings, or should I look at the Linux VM configuration? (eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1) I would start by reducing dirty_background_ratio; as RAM sizes climb, this keeps becoming a bigger issue. The whole disk flushing code finally got a major overhaul in the 2.6.32 Linux kernel, I'm hoping this whole class of problem was improved from the changes made. Changes to the background writer behavior will probably not work as you'd expect. The first thing I'd try it in your situation turning it off altogether; it can be slightly counterproductive for reducing checkpoint issues if they're really bad, which yours are. If that goes in the wrong direction, experimenting with increasing the maximum pages and the multiplier might be useful, I wouldn't bet on it helping through. As Kevin already mentioned, reducing the size of the buffer cache can help too. That's worth trying if you're exhausted the other obvious possibilities. Or would it be most useful to try to move the WAL to a different disk? On Linux having the WAL on a separate disk can improve things much more than you might expect, simply because of how brain-dead the filesystem fsync implementation is. Reducing the seeks for WAL traffic can help a lot too. If you've lowered Linux's caching, tried some BGW tweaks, and moved the WAL to somewhere else, if latency is still high you may be facing a hardware upgrade to improve things. Sometimes these problems just require more burst write throughput (regardless of how good average performance looks) and nothing else will substitute. Hopefully you'll find a tuning solution before that though. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Checkpoint spikes
On Wednesday 09 December 2009 03:05:40 Greg Smith wrote: On Linux having the WAL on a separate disk can improve things much more than you might expect, simply because of how brain-dead the filesystem fsync implementation is. Reducing the seeks for WAL traffic can help a lot too. Not using ext3's data=ordered helps massively already. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance