Re: [PERFORM] Load experimentation

2009-12-08 Thread Scott Marlowe
On Tue, Dec 8, 2009 at 12:58 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Tue, Dec 8, 2009 at 12:22 AM, Ben Brehmer benbreh...@gmail.com wrote:
 Thanks for all the responses. I have one more thought;

 Since my input data is split into about 200 files (3GB each), I could
 potentially spawn one load command for each file. What would be the maximum
 number of input connections Postgres can handle without bogging down? When I
 say 'input connection' I mean psql -U postgres -d dbname -f
 one_of_many_sql_files.

 This is VERY dependent on your IO capacity and number of cores.  My
 experience is that unless you're running on a decent number of disks,
 you'll run out of IO capacity first in most machines.  n pairs of
 mirrors in a RAID-10 can handle x input threads where x has some near
 linear relation to n.  Have 100 disks in a RAID-10 array?  You can
 surely handle dozens of load threads with no IO wait.  Have 4 disks in
 a RAID-10?  Maybe two to four load threads will max you out.  Once
 you're IO bound, adding more threads and more CPUs won't help, it'll
 hurt.  The only way to really know is to benchmark it, but i'd guess
 that about half as many import threads as mirror pairs in a RAID-10
 (or just drives if you're using RAID-0) would be a good place to start
 and work from there.

Note that if you start running out of CPU horsepower first the
degradation will be less harsh as you go past the knee in the
performance curve.IO has a sharper knee than CPU.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Load experimentation

2009-12-08 Thread Dimitri Fontaine
Hi,

Ben Brehmer benbreh...@gmail.com writes:
 By Loading data I am implying: psql -U postgres -d somedatabase -f 
 sql_file.sql.  The sql_file.sql contains table creates and insert 
 statements. There are no
 indexes present nor created during the load.

 OS: x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red 
 Hat 4.1.2-44)

 PostgreSQL: I will try upgrading to latest version.

 COPY command: Unfortunately I'm stuck with INSERTS due to the nature
 this data was generated (Hadoop/MapReduce).

What I think you could do is the followings:

 - switch to using 8.4
 - load your files in a *local* database
 - pg_dump -Fc
 - now pg_restore -j X on the amazon setup

That way you will be using COPY rather than INSERTs and parallel loading
built-in pg_restore (and optimisations of when to add the indexes and
constraints). The X is to choose depending on the IO power and the
number of CPU...

Regards,
-- 
dim

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Load experimentation

2009-12-08 Thread Scott Marlowe
On Tue, Dec 8, 2009 at 2:08 AM, Dimitri Fontaine dfonta...@hi-media.com wrote:
 Hi,

 Ben Brehmer benbreh...@gmail.com writes:
 By Loading data I am implying: psql -U postgres -d somedatabase -f 
 sql_file.sql.  The sql_file.sql contains table creates and insert 
 statements. There are no
 indexes present nor created during the load.

 OS: x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red 
 Hat 4.1.2-44)

 PostgreSQL: I will try upgrading to latest version.

 COPY command: Unfortunately I'm stuck with INSERTS due to the nature
 this data was generated (Hadoop/MapReduce).

 What I think you could do is the followings:

  - switch to using 8.4
  - load your files in a *local* database
  - pg_dump -Fc
  - now pg_restore -j X on the amazon setup

 That way you will be using COPY rather than INSERTs and parallel loading
 built-in pg_restore (and optimisations of when to add the indexes and
 constraints). The X is to choose depending on the IO power and the
 number of CPU...

That's a lot of work to get to COPY.  It might be enough to drop all
FK relations and indexes on the destination db in the cloud, load the
data in a few (or one) transaction(s), then recreate indexes and FK
relationships.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Load experimentation

2009-12-08 Thread Dimitri Fontaine
Scott Marlowe scott.marl...@gmail.com writes:
 That's a lot of work to get to COPY. 

Well, yes. I though about it this way only after having read that OP is
uneasy with producing another format from his source data, and
considering it's a one-shot operation.

Ah, tradeoffs, how to find the right one!
-- 
dim

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Schmitz, David
Hi Andres,

This query returns for 8.4.1 and for 8.3.8 the same result:

stadistinct = -1
stanullfrac = 0
stawidth = 4
array_upper nothing 

Regards 

David

-Ursprüngliche Nachricht-
Von: Robert Haas [mailto:robertmh...@gmail.com] 
Gesendet: Dienstag, 8. Dezember 2009 05:05
An: Kevin Grittner
Cc: Schmitz, David; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance penalty between Postgresql 
8.3.8 and 8.4.1

On Mon, Dec 7, 2009 at 5:19 PM, Kevin Grittner 
kevin.gritt...@wicourts.gov wrote:
 Schmitz, David david.schm...@harman.com wrote:

 It is carried out with poor performance on postgresql 8.4.1 However 
 postgresql 8.3.8 performs just fine.
 If you take a closer look at the query with EXPLAIN, it becomes 
 obvious, that postgresql 8.4 does not consider the primary key at 
 level 3 and instead generates a hash join:

 Postgresql 8.4.1:

 Sort  (cost=129346.71..129498.64 rows=60772 width=61)

 Postgresql 8.3.8:

 Sort  (cost=3792.75..3792.95 rows=81 width=61)

 It determines the plan based on available statistics, which in this 
 case seem to indicate rather different data.  Do the two databases 
 have identical data?  Have they both been recently analyzed? 
 What is 
 the default_statistics_target on each?  Do any columns in 
these tables 
 have overrides?

I think Tom made some changes to the join selectivity code 
which might be relevant here, though I'm not sure exactly 
what's going on.  Can we see, on the 8.4.1 database:

SELECT SUM(1) FROM rdf_admin_hierarchy;
SELECT s.stadistinct, s.stanullfrac, s.stawidth, 
array_upper(s.stanumbers1, 1) FROM pg_statistic s WHERE 
s.starelid = 'rdf_admin_hierarchy'::regclass AND s.staattnum = 
(SELECT a.attnum FROM pg_attribute a WHERE a.attname = 
'admin_place_id' AND a.attrelid = 'rdf_admin_hierarchy'::regclass);

...Robert
 
 
***
innovative systems GmbH Navigation-Multimedia
Geschaeftsfuehrung: Edwin Summers - Michael Juergen Mauser
Sitz der Gesellschaft: Hamburg - Registergericht: Hamburg HRB 59980 
 
***
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte 
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und 
loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe 
dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorized copying, 
disclosure or distribution of the contents in this e-mail is strictly forbidden.
***

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Schmitz, David
Hi Andres,

EXPLAIN ANALYZE 
select ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID, la.SIDE,  
rl.ROAD_NAME_ID, rl.LEFT_ADDRESS_RANGE_ID, 
rl.RIGHT_ADDRESS_RANGE_ID,  
rl.IS_EXIT_NAME, rl.EXPLICATABLE, rl.IS_JUNCTION_NAME,  
rl.IS_NAME_ON_ROADSIGN, rl.IS_POSTAL_NAME, 
rl.IS_STALE_NAME,  
rl.IS_VANITY_NAME, rl.ROAD_LINK_ID, rn.STREET_NAME,  
rn.ROUTE_TYPE  
from rdf.xdf_ADMIN_HIERARCHY ah  
join xdf.xdf_LINK_ADMIN la  
on ah.ADMIN_PLACE_ID = la.ADMIN_PLACE_ID  
join xdf.xdf_ROAD_LINK rl  
on la.LINK_ID = rl.LINK_ID  
join xdf.xdf_ROAD_NAME rn  
on rl.ROAD_NAME_ID = rn.ROAD_NAME_ID  
where rl.IS_EXIT_NAME = 'N'  
and rl.IS_JUNCTION_NAME = 'N'  
and rn.ROAD_NAME_ID between 158348561  and 158348660 
order by rl.ROAD_NAME_ID, ah.ORDER8_ID, ah.BUILTUP_ID, 
rl.LINK_ID;

On Postgresql 8.4.1

Sort  (cost=129346.71..129498.64 rows=60772 width=61) (actual 
time=100.358..100.496 rows=1444 loops=1)
  Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id
  Sort Method:  quicksort  Memory: 252kB
  -  Hash Join  (cost=2603.57..124518.03 rows=60772 width=61) (actual 
time=62.359..97.268 rows=1444 loops=1)
Hash Cond: (la.admin_place_id = ah.admin_place_id)
-  Nested Loop  (cost=6.82..120781.81 rows=60772 width=57) (actual 
time=0.318..33.600 rows=1444 loops=1)
  -  Nested Loop  (cost=6.82..72383.98 rows=21451 width=51) 
(actual time=0.232..12.359 rows=722 loops=1)
-  Index Scan using pk_xdf_road_name on xdf_road_name rn  
(cost=0.00..11.24 rows=97 width=21) (actual time=0.117..0.185 rows=100 loops=1)
  Index Cond: ((road_name_id = 158348561) AND 
(road_name_id = 158348660))
-  Bitmap Heap Scan on xdf_road_link rl  
(cost=6.82..743.34 rows=222 width=34) (actual time=0.025..0.115 rows=7 
loops=100)
  Recheck Cond: (rl.road_name_id = rn.road_name_id)
  Filter: ((rl.is_exit_name = 'N'::bpchar) AND 
(rl.is_junction_name = 'N'::bpchar))
  -  Bitmap Index Scan on nx_xdfroadlink_roadnameid  
(cost=0.00..6.76 rows=222 width=0) (actual time=0.008..0.008 rows=7 loops=100)
Index Cond: (rl.road_name_id = rn.road_name_id)
  -  Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la  
(cost=0.00..2.22 rows=3 width=10) (actual time=0.023..0.028 rows=2 loops=722)
Index Cond: (la.link_id = rl.link_id)
-  Hash  (cost=1544.11..1544.11 rows=84211 width=12) (actual 
time=61.924..61.924 rows=84211 loops=1)
  -  Seq Scan on xdf_admin_hierarchy ah  (cost=0.00..1544.11 
rows=84211 width=12) (actual time=0.017..33.442 rows=84211 loops=1)
Total runtime: 101.446 ms


and on Postgresql  8.3.8:

Sort  (cost=3792.75..3792.95 rows=81 width=61) (actual time=28.928..29.074 
rows=1444 loops=1)
  Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id
  Sort Method:  quicksort  Memory: 252kB
  -  Nested Loop  (cost=21.00..3790.18 rows=81 width=61) (actual 
time=0.210..26.098 rows=1444 loops=1)
-  Nested Loop  (cost=21.00..3766.73 rows=81 width=57) (actual 
time=0.172..19.148 rows=1444 loops=1)
  -  Nested Loop  (cost=21.00..3733.04 rows=14 width=51) (actual 
time=0.129..6.126 rows=722 loops=1)
-  Index Scan using pk_xdf_road_name on xdf_road_name rn  
(cost=0.00..8.32 rows=1 width=21) (actual time=0.059..0.117 rows=100 loops=1)
  Index Cond: ((road_name_id = 158348561) AND 
(road_name_id = 158348660))
-  Bitmap Heap Scan on xdf_road_link rl  
(cost=21.00..3711.97 rows=1020 width=34) (actual time=0.015..0.055 rows=7 
loops=100)
  Recheck Cond: (rl.road_name_id = rn.road_name_id)
  Filter: ((rl.is_exit_name = 'N'::bpchar) AND 
(rl.is_junction_name = 'N'::bpchar))
  -  Bitmap Index Scan on nx_xdfroadlink_roadnameid  
(cost=0.00..20.75 rows=1020 width=0) (actual time=0.007..0.007 rows=7 loops=100)
Index Cond: (rl.road_name_id = rn.road_name_id)
  -  Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la  
(cost=0.00..2.31 rows=8 width=10) (actual time=0.014..0.017 rows=2 loops=722)
Index Cond: (la.link_id = rl.link_id)
-  Index Scan using pk_xdf_admin_hierarchy on xdf_admin_hierarchy ah  
(cost=0.00..0.28 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=1444)
  Index Cond: (ah.admin_place_id = la.admin_place_id)
Total runtime: 29.366 ms

Hope this gives any clue. Or did I missunderstand you?

Regards

David


-Ursprüngliche Nachricht-
Von: Andres Freund 

Re: [PERFORM] Checkpoint spikes

2009-12-08 Thread Richard Neill

Dear All,

Thanks for all your help so far. This page was particularly helpful:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
(does the advice for 8.3 apply unchanged to 8.4?)

I'm still hitting issues with this though: sync is taking 7-10 seconds
and I need to get it down to nearer 3.

We're running a lightly-loaded system which has a realtime response 
requirement of 0.5 seconds all the time (with a few seconds permissible 
occasionally, but never exceeding 10).


So far, I've set checkpoint_segments to 128, timeout to 10min, and 
completion_target to 0.8. This helps, but not as much as I'd hoped.


But I haven't touched any of the other WAL or BG Writer settings.

Where should I look next?
  Should I be looking at the BG Writer settings,
  or should I look at the Linux VM configuration?
(eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1)

  Or would it be most useful to try to move the WAL to a different disk?


Latest messages:

# tail -f /var/log/postgresql/postgresql-8.4-main.log | grep check

2009-12-08 09:12:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:20:09 GMT LOG:  checkpoint complete: wrote 51151 buffers 
(8.9%); 0 transaction log file(s) added, 0 removed, 23 recycled; 
write=479.669 s, sync=9.852 s, total=489.553 s


2009-12-08 09:22:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:30:07 GMT LOG:  checkpoint complete: wrote 45772 buffers 
(7.9%); 0 transaction log file(s) added, 0 removed, 24 recycled; 
write=479.706 s, sync=7.337 s, total=487.120 s


2009-12-08 09:32:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:40:09 GMT LOG:  checkpoint complete: wrote 47043 buffers 
(8.2%); 0 transaction log file(s) added, 0 removed, 22 recycled; 
write=479.744 s, sync=9.300 s, total=489.122 s


2009-12-08 09:42:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:50:07 GMT LOG:  checkpoint complete: wrote 48210 buffers 
(8.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; 
write=479.689 s, sync=7.707 s, total=487.416 s



Thanks a lot,

Richard



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Checkpoint spikes

2009-12-08 Thread Richard Neill

Dear All,

Thanks for all your help so far. This page was particularly helpful:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
(does the advice for 8.3 apply unchanged to 8.4?)

I'm still hitting issues with this though: sync is taking 7-10 seconds
and I need to get it down to nearer 3.

We're running a lightly-loaded system which has a realtime response
requirement of 0.5 seconds all the time (with a few seconds permissible
occasionally, but never exceeding 10).

So far, I've set checkpoint_segments to 128, timeout to 10min, and
completion_target to 0.8. This helps, but not as much as I'd hoped.

But I haven't touched any of the other WAL or BG Writer settings.

Where should I look next?
  Should I be looking at the BG Writer settings,
  or should I look at the Linux VM configuration?
(eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1)

  Or would it be most useful to try to move the WAL to a different disk?


Latest messages:

# tail -f /var/log/postgresql/postgresql-8.4-main.log | grep check

2009-12-08 09:12:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:20:09 GMT LOG:  checkpoint complete: wrote 51151 buffers
(8.9%); 0 transaction log file(s) added, 0 removed, 23 recycled;
write=479.669 s, sync=9.852 s, total=489.553 s

2009-12-08 09:22:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:30:07 GMT LOG:  checkpoint complete: wrote 45772 buffers
(7.9%); 0 transaction log file(s) added, 0 removed, 24 recycled;
write=479.706 s, sync=7.337 s, total=487.120 s

2009-12-08 09:32:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:40:09 GMT LOG:  checkpoint complete: wrote 47043 buffers
(8.2%); 0 transaction log file(s) added, 0 removed, 22 recycled;
write=479.744 s, sync=9.300 s, total=489.122 s

2009-12-08 09:42:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:50:07 GMT LOG:  checkpoint complete: wrote 48210 buffers
(8.4%); 0 transaction log file(s) added, 0 removed, 23 recycled;
write=479.689 s, sync=7.707 s, total=487.416 s


Thanks a lot,

Richard




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Thom Brown
2009/12/8 Schmitz, David david.schm...@harman.com

 Hi Andres,

 EXPLAIN ANALYZE
 select ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID, la.SIDE,
rl.ROAD_NAME_ID, rl.LEFT_ADDRESS_RANGE_ID,
 rl.RIGHT_ADDRESS_RANGE_ID,
rl.IS_EXIT_NAME, rl.EXPLICATABLE, rl.IS_JUNCTION_NAME,
rl.IS_NAME_ON_ROADSIGN, rl.IS_POSTAL_NAME,
 rl.IS_STALE_NAME,
rl.IS_VANITY_NAME, rl.ROAD_LINK_ID, rn.STREET_NAME,
rn.ROUTE_TYPE
from rdf.xdf_ADMIN_HIERARCHY ah
join xdf.xdf_LINK_ADMIN la
on ah.ADMIN_PLACE_ID = la.ADMIN_PLACE_ID
join xdf.xdf_ROAD_LINK rl
on la.LINK_ID = rl.LINK_ID
join xdf.xdf_ROAD_NAME rn
on rl.ROAD_NAME_ID = rn.ROAD_NAME_ID
where rl.IS_EXIT_NAME = 'N'
and rl.IS_JUNCTION_NAME = 'N'
and rn.ROAD_NAME_ID between 158348561  and 158348660
order by rl.ROAD_NAME_ID, ah.ORDER8_ID, ah.BUILTUP_ID,
 rl.LINK_ID;

 On Postgresql 8.4.1

 Sort  (cost=129346.71..129498.64 rows=60772 width=61) (actual
 time=100.358..100.496 rows=1444 loops=1)
   Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id
   Sort Method:  quicksort  Memory: 252kB
  -  Hash Join  (cost=2603.57..124518.03 rows=60772 width=61) (actual
 time=62.359..97.268 rows=1444 loops=1)
 Hash Cond: (la.admin_place_id = ah.admin_place_id)
 -  Nested Loop  (cost=6.82..120781.81 rows=60772 width=57) (actual
 time=0.318..33.600 rows=1444 loops=1)
  -  Nested Loop  (cost=6.82..72383.98 rows=21451 width=51)
 (actual time=0.232..12.359 rows=722 loops=1)
-  Index Scan using pk_xdf_road_name on xdf_road_name
 rn  (cost=0.00..11.24 rows=97 width=21) (actual time=0.117..0.185 rows=100
 loops=1)
   Index Cond: ((road_name_id = 158348561) AND
 (road_name_id = 158348660))
 -  Bitmap Heap Scan on xdf_road_link rl
  (cost=6.82..743.34 rows=222 width=34) (actual time=0.025..0.115 rows=7
 loops=100)
   Recheck Cond: (rl.road_name_id = rn.road_name_id)
  Filter: ((rl.is_exit_name = 'N'::bpchar) AND
 (rl.is_junction_name = 'N'::bpchar))
   -  Bitmap Index Scan on
 nx_xdfroadlink_roadnameid  (cost=0.00..6.76 rows=222 width=0) (actual
 time=0.008..0.008 rows=7 loops=100)
 Index Cond: (rl.road_name_id =
 rn.road_name_id)
   -  Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin
 la  (cost=0.00..2.22 rows=3 width=10) (actual time=0.023..0.028 rows=2
 loops=722)
 Index Cond: (la.link_id = rl.link_id)
 -  Hash  (cost=1544.11..1544.11 rows=84211 width=12) (actual
 time=61.924..61.924 rows=84211 loops=1)
  -  Seq Scan on xdf_admin_hierarchy ah  (cost=0.00..1544.11
 rows=84211 width=12) (actual time=0.017..33.442 rows=84211 loops=1)
 Total runtime: 101.446 ms


 and on Postgresql  8.3.8:

 Sort  (cost=3792.75..3792.95 rows=81 width=61) (actual time=28.928..29.074
 rows=1444 loops=1)
   Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id
   Sort Method:  quicksort  Memory: 252kB
  -  Nested Loop  (cost=21.00..3790.18 rows=81 width=61) (actual
 time=0.210..26.098 rows=1444 loops=1)
-  Nested Loop  (cost=21.00..3766.73 rows=81 width=57) (actual
 time=0.172..19.148 rows=1444 loops=1)
  -  Nested Loop  (cost=21.00..3733.04 rows=14 width=51)
 (actual time=0.129..6.126 rows=722 loops=1)
-  Index Scan using pk_xdf_road_name on xdf_road_name
 rn  (cost=0.00..8.32 rows=1 width=21) (actual time=0.059..0.117 rows=100
 loops=1)
   Index Cond: ((road_name_id = 158348561) AND
 (road_name_id = 158348660))
 -  Bitmap Heap Scan on xdf_road_link rl
  (cost=21.00..3711.97 rows=1020 width=34) (actual time=0.015..0.055 rows=7
 loops=100)
   Recheck Cond: (rl.road_name_id = rn.road_name_id)
  Filter: ((rl.is_exit_name = 'N'::bpchar) AND
 (rl.is_junction_name = 'N'::bpchar))
   -  Bitmap Index Scan on
 nx_xdfroadlink_roadnameid  (cost=0.00..20.75 rows=1020 width=0) (actual
 time=0.007..0.007 rows=7 loops=100)
 Index Cond: (rl.road_name_id =
 rn.road_name_id)
   -  Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin
 la  (cost=0.00..2.31 rows=8 width=10) (actual time=0.014..0.017 rows=2
 loops=722)
 Index Cond: (la.link_id = rl.link_id)
 -  Index Scan using pk_xdf_admin_hierarchy on xdf_admin_hierarchy
 ah  (cost=0.00..0.28 rows=1 width=12) (actual time=0.003..0.004 rows=1
 loops=1444)
   Index Cond: (ah.admin_place_id = la.admin_place_id)
 Total runtime: 29.366 ms

 Hope this gives any clue. Or did I missunderstand you?

 Regards

 David


 

Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Schmitz, David
Hi Thom,
 
I did a select count(*) from xdf.xdf_admin_hierarchy and it returns 84211 on 
both databases postgres 8.3.8 and 8.4.1.
The amount of data is exactly the same in both databases as they are restored 
from the same dump.
 
Regards
 
David


  _  

Von: Thom Brown [mailto:thombr...@gmail.com] 
Gesendet: Dienstag, 8. Dezember 2009 11:12
An: Schmitz, David
Cc: Andres Freund; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 
8.4.1


2009/12/8 Schmitz, David david.schm...@harman.com


Hi Andres,

EXPLAIN ANALYZE
select ah.ORDER8_ID, ah.BUILTUP_ID, rl.LINK_ID, la.SIDE,
   rl.ROAD_NAME_ID, rl.LEFT_ADDRESS_RANGE_ID, 
rl.RIGHT_ADDRESS_RANGE_ID,
   rl.IS_EXIT_NAME, rl.EXPLICATABLE, 
rl.IS_JUNCTION_NAME,
   rl.IS_NAME_ON_ROADSIGN, rl.IS_POSTAL_NAME, 
rl.IS_STALE_NAME,
   rl.IS_VANITY_NAME, rl.ROAD_LINK_ID, 
rn.STREET_NAME,
   rn.ROUTE_TYPE
   from rdf.xdf_ADMIN_HIERARCHY ah
   join xdf.xdf_LINK_ADMIN la
   on ah.ADMIN_PLACE_ID = la.ADMIN_PLACE_ID
   join xdf.xdf_ROAD_LINK rl
   on la.LINK_ID = rl.LINK_ID
   join xdf.xdf_ROAD_NAME rn
   on rl.ROAD_NAME_ID = rn.ROAD_NAME_ID
   where rl.IS_EXIT_NAME = 'N'
   and rl.IS_JUNCTION_NAME = 'N'
   and rn.ROAD_NAME_ID between 158348561  and 
158348660
   order by rl.ROAD_NAME_ID, ah.ORDER8_ID, 
ah.BUILTUP_ID, rl.LINK_ID;

On Postgresql 8.4.1

Sort  (cost=129346.71..129498.64 rows=60772 width=61) (actual 
time=100.358..100.496 rows=1444 loops=1)

 Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, 
rl.link_id

 Sort Method:  quicksort  Memory: 252kB
 -  Hash Join  (cost=2603.57..124518.03 rows=60772 width=61) 
(actual time=62.359..97.268 rows=1444 loops=1)

   Hash Cond: (la.admin_place_id = ah.admin_place_id)

   -  Nested Loop  (cost=6.82..120781.81 rows=60772 
width=57) (actual time=0.318..33.600 rows=1444 loops=1)
 -  Nested Loop  (cost=6.82..72383.98 rows=21451 
width=51) (actual time=0.232..12.359 rows=722 loops=1)
   -  Index Scan using pk_xdf_road_name on 
xdf_road_name rn  (cost=0.00..11.24 rows=97 width=21) (actual time=0.117..0.185 
rows=100 loops=1)

 Index Cond: ((road_name_id = 
158348561) AND (road_name_id = 158348660))

   -  Bitmap Heap Scan on xdf_road_link rl  
(cost=6.82..743.34 rows=222 width=34) (actual time=0.025..0.115 rows=7 
loops=100)

 Recheck Cond: (rl.road_name_id = 
rn.road_name_id)
 Filter: ((rl.is_exit_name = 
'N'::bpchar) AND (rl.is_junction_name = 'N'::bpchar))

 -  Bitmap Index Scan on 
nx_xdfroadlink_roadnameid  (cost=0.00..6.76 rows=222 width=0) (actual 
time=0.008..0.008 rows=7 loops=100)

   Index Cond: (rl.road_name_id = 
rn.road_name_id)

 -  Index Scan using nx_xdflinkadmin_linkid on 
xdf_link_admin la  (cost=0.00..2.22 rows=3 width=10) (actual time=0.023..0.028 
rows=2 loops=722)

   Index Cond: (la.link_id = rl.link_id)

   -  Hash  (cost=1544.11..1544.11 rows=84211 width=12) 
(actual time=61.924..61.924 rows=84211 loops=1)
 -  Seq Scan on xdf_admin_hierarchy ah  
(cost=0.00..1544.11 rows=84211 width=12) (actual time=0.017..33.442 rows=84211 
loops=1)
Total runtime: 101.446 ms


and on Postgresql  8.3.8:

Sort  (cost=3792.75..3792.95 rows=81 width=61) (actual 
time=28.928..29.074 rows=1444 loops=1)

 Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, 
rl.link_id

 Sort Method:  quicksort  Memory: 252kB
 -  Nested Loop  (cost=21.00..3790.18 rows=81 width=61) 
(actual time=0.210..26.098 rows=1444 loops=1)
   -  Nested Loop  

Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Andres Freund
Hi David,

On Tuesday 08 December 2009 10:59:51 Schmitz, David wrote:
  With our data it is a performance difference from 1h16min
  (8.3.8) to 2h43min (8.4.1)
 On Postgresql 8.4.1
 Total runtime: 101.446 ms
 and on Postgresql  8.3.8:
 Total runtime: 29.366 ms
Hm. There obviously is more going on than these queries?

 Hash Join  (cost=2603.57..124518.03 rows=60772 width=61) (actual 
time=62.359..97.268 rows=1444 loops=1)
 Nested Loop  (cost=21.00..3790.18 rows=81 width=61) (actual 
time=0.210..26.098 rows=1444 loops=1)
Both misestimate the resultset quite a bit. It looks like happenstance that 
the one on 8.3 turns out to be better...

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Schmitz, David
Hi Andres,

this is just one of many of these queries. There are a lot of jobs calculating 
stuff for different ranges which are defined via between in the where clause.


When I leave out the between in the where clause it returns:

On Postgresql 8.4.1:

Sort  (cost=5390066.42..5435347.78 rows=18112546 width=61) (actual 
time=84382.275..91367.983 rows=12742796 loops=1)
Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id
Sort Method:  external merge  Disk: 924536kB
-  Hash Join  (cost=1082249.40..2525563.48 rows=18112546 width=61) (actual 
time=23367.205..52256.209 rows=12742796 loops=1)
Hash Cond: (la.admin_place_id = ah.admin_place_id)
-  Merge Join  (cost=1079652.65..2183356.50 rows=18112546 width=57) (actual 
time=23306.643..45541.157 rows=12742796 loops=1)
  Merge Cond: (la.link_id = rl.link_id)
  -  Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la  
(cost=0.00..798398.53 rows=16822372 width=10) (actual time=0.098..12622.576 
rows=16822399 loops=1)
  -  Sort  (cost=1071304.95..1087287.81 rows=6393147 width=51) (actual 
time=23302.596..25640.559 rows=12742795 loops=1)
Sort Key: rl.link_id
Sort Method:  external sort  Disk: 405896kB
-  Hash Join  (cost=15735.91..348620.58 rows=6393147 width=51) 
(actual time=327.064..9189.938 rows=6371398 loops=1)
  Hash Cond: (rl.road_name_id = rn.road_name_id)
  -  Seq Scan on xdf_road_link rl  (cost=0.00..182236.41 
rows=7708159 width=34) (actual time=0.028..2689.085 rows=7709085 loops=1)
Filter: ((is_exit_name = 'N'::bpchar) AND 
(is_junction_name = 'N'::bpchar))
  -  Hash  (cost=9885.96..9885.96 rows=467996 width=21) 
(actual time=326.740..326.740 rows=467996 loops=1)
-  Seq Scan on xdf_road_name rn  (cost=0.00..9885.96 
rows=467996 width=21) (actual time=0.019..191.473 rows=467996 loops=1)
-  Hash  (cost=1544.11..1544.11 rows=84211 width=12) (actual 
time=60.453..60.453 rows=84211 loops=1)
  -  Seq Scan on xdf_admin_hierarchy ah  (cost=0.00..1544.11 rows=84211 
width=12) (actual time=0.019..31.723 rows=84211 loops=1)
Total runtime: 92199.676 ms

On Postgresql 8.3.8:

Sort  (cost=9419546.57..9514635.57 rows=38035597 width=61) (actual 
time=82790.473..88847.963 rows=12742796 loops=1)
  Sort Key: rl.road_name_id, ah.order8_id, ah.builtup_id, rl.link_id
  Sort Method:  external merge  Disk: 999272kB
  -  Hash Join  (cost=1079404.97..3200652.85 rows=38035597 width=61) (actual 
time=22583.059..51197.249 rows=12742796 loops=1)
Hash Cond: (la.admin_place_id = ah.admin_place_id)
-  Merge Join  (cost=1076808.22..2484888.66 rows=38035597 width=57) 
(actual time=22524.015..44539.246 rows=12742796 loops=1)
  Merge Cond: (la.link_id = rl.link_id)
  -  Index Scan using nx_xdflinkadmin_linkid on xdf_link_admin la  
(cost=0.00..795583.17 rows=16822420 width=10) (actual time=0.086..11725.990 
rows=16822399 loops=1)
  -  Sort  (cost=1076734.49..1092821.79 rows=6434920 width=51) 
(actual time=22514.553..25083.253 rows=12742795 loops=1)
Sort Key: rl.link_id
Sort Method:  external sort  Disk: 443264kB
-  Hash Join  (cost=15743.47..349025.77 rows=6434920 
width=51) (actual time=330.211..9014.353 rows=6371398 loops=1)
  Hash Cond: (rl.road_name_id = rn.road_name_id)
  -  Seq Scan on xdf_road_link rl  
(cost=0.00..182235.08 rows=7706491 width=34) (actual time=0.018..2565.983 
rows=7709085 loops=1)
Filter: ((is_exit_name = 'N'::bpchar) AND 
(is_junction_name = 'N'::bpchar))
  -  Hash  (cost=9890.43..9890.43 rows=468243 
width=21) (actual time=329.906..329.906 rows=467996 loops=1)
-  Seq Scan on xdf_road_name rn  
(cost=0.00..9890.43 rows=468243 width=21) (actual time=0.018..190.764 
rows=467996 loops=1)
-  Hash  (cost=1544.11..1544.11 rows=84211 width=12) (actual 
time=58.910..58.910 rows=84211 loops=1)
  -  Seq Scan on xdf_admin_hierarchy ah  (cost=0.00..1544.11 
rows=84211 width=12) (actual time=0.009..28.725 rows=84211 loops=1)
Total runtime: 89612.801 ms

Regards 

David 

-Ursprüngliche Nachricht-
Von: Andres Freund [mailto:and...@anarazel.de] 
Gesendet: Dienstag, 8. Dezember 2009 11:29
An: pgsql-performance@postgresql.org
Cc: Schmitz, David
Betreff: Re: [PERFORM] performance penalty between Postgresql 
8.3.8 and 8.4.1

Hi David,

On Tuesday 08 December 2009 10:59:51 Schmitz, David wrote:
  With our data it is a performance difference from 1h16min
  (8.3.8) to 2h43min (8.4.1)
 On Postgresql 8.4.1
 Total runtime: 101.446 ms
 and on Postgresql  8.3.8:
 Total runtime: 29.366 ms
Hm. There obviously is more going on than these queries?

 Hash Join  (cost=2603.57..124518.03 rows=60772 width=61) (actual
time=62.359..97.268 rows=1444 loops=1)
 

Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Craig Ringer

On 8/12/2009 6:11 PM, Thom Brown wrote:


Your output shows that the xdf_admin_hierarchy tables between versions
are drastically different.  8.3.8 only contains 1 row, whereas 8.4.1
contains 84211 rows.


That's just because one of them is doing a nested loop where it looks up 
a single row from xdf_admin_hierarchy via its primary key on each 
iteration. The other plan is doing a hash join on a sequential scan over 
xdf_admin_hierarchy so it reports all the rows at once.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Schmitz, David
Hi Craig,

that is exactly the problem postgresql 8.4.1 does not consider the primary key 
but instead calculates 
a hash join. This can only result in poorer performance. I think this is a bug.

Regards

David 

-Ursprüngliche Nachricht-
Von: Craig Ringer [mailto:cr...@postnewspapers.com.au] 
Gesendet: Dienstag, 8. Dezember 2009 13:12
An: Thom Brown
Cc: Schmitz, David; Andres Freund; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] performance penalty between Postgresql 
8.3.8 and 8.4.1

On 8/12/2009 6:11 PM, Thom Brown wrote:

 Your output shows that the xdf_admin_hierarchy tables 
between versions 
 are drastically different.  8.3.8 only contains 1 row, whereas 8.4.1 
 contains 84211 rows.

That's just because one of them is doing a nested loop where 
it looks up a single row from xdf_admin_hierarchy via its 
primary key on each iteration. The other plan is doing a hash 
join on a sequential scan over xdf_admin_hierarchy so it 
reports all the rows at once.

--
Craig Ringer
 
 
***
innovative systems GmbH Navigation-Multimedia
Geschaeftsfuehrung: Edwin Summers - Michael Juergen Mauser
Sitz der Gesellschaft: Hamburg - Registergericht: Hamburg HRB 59980 
 
***
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte 
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und 
loeschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe 
dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorized copying, 
disclosure or distribution of the contents in this e-mail is strictly forbidden.
***

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread niraj patel
Hi All,

I have to optimize following query :

SELECT r.TopFamilyID AS FamilyID,  FROM CMRules r 
   WHERE r.WorkspaceID =18512  
GROUP BY r.TopFamilyID ;

The explain plan is as follows :

 Group  (cost=509989.19..511518.30 rows=9 width=10) (actual 
time=1783.102..2362.587 rows=261 loops=1)
   -  Sort  (cost=509989.19..510753.74 rows=305821 width=10) (actual 
time=1783.097..2121.378 rows=272211 loops=1)
 Sort Key: topfamilyid
 -  Bitmap Heap Scan on cmrules r  (cost=14501.36..476896.34 
rows=305821 width=10) (actual time=51.507..351.487 rows=272211 loops=1)
   Recheck Cond: (workspaceid = 18512::numeric)
   -  Bitmap Index Scan on pk_ws_fea_fam_cmrules  
(cost=0.00..14424.90 rows=305821 width=0) (actual time=48.097..48.097 
rows=272211 loops=1)
 Index Cond: (workspaceid = 18512::numeric)
 Total runtime: 2373.008 ms
(8 rows)
-
\d CMRules gives follows indexes

Indexes:
pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, 
topfamilyid, ruleenddate, gid)
idx_cmrules btree (topfamilyid)
idx_gid_ws_cmrules btree (gid, workspaceid)
-
SELECT count(distinct r.TopFamilyID) FROM CMRules r  WHERE r.WorkspaceID =18512

Gives me 261 Rows 

SELECT count(r.TopFamilyID) FROM CMRules r  WHERE r.WorkspaceID =18512  ;

Gives me 272 211 Rows

select count(*) from  cmrules;

Gives me 17 643 532 Rows


Please suggest me something to optimize this query

Thanks 
Niraj Patel


Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread niraj patel
Hi gryzman,

I have run vacuum full analyze on the cmrules tables. The version of pstgres is 
8.2.13. How should I change stats to 100 ?

Thanks




From: Grzegorz Jaśkiewicz gryz...@gmail.com
To: niraj patel npa...@gridsolv.com
Cc: pgsql-performance@postgresql.org
Sent: Tue, 8 December, 2009 7:12:49 PM
Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.

it looks like it might choose wrong plan, cos it gets the stats wrong. 
Try increasing number of stats to 100. 
Btw, what version it is ?

Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread Matthew Wakeling

On Tue, 8 Dec 2009, niraj patel wrote:

 Group  (cost=509989.19..511518.30 rows=9 width=10) (actual 
time=1783.102..2362.587
rows=261 loops=1)
   -  Sort  (cost=509989.19..510753.74 rows=305821 width=10) (actual
time=1783.097..2121.378 rows=272211 loops=1)
 Sort Key: topfamilyid
 -  Bitmap Heap Scan on cmrules r  (cost=14501.36..476896.34 
rows=305821
width=10) (actual time=51.507..351.487 rows=272211 loops=1)
   Recheck Cond: (workspaceid = 18512::numeric)
   -  Bitmap Index Scan on pk_ws_fea_fam_cmrules  
(cost=0.00..14424.90
rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1)
 Index Cond: (workspaceid = 18512::numeric)
 Total runtime: 2373.008 ms
(8 rows)



select count(*) from  cmrules;

Gives me 17 643 532 Rows


Looks good from here. Think about what you're asking the database to do. 
It has to select 272211 rows out of a large table with 17643532 rows. That 
in itself could take a very long time. It is clear that in your EXPLAIN 
this data is already cached, otherwise it would have to perform nigh on 
27 seeks over the discs, which would take (depending on the disc 
system) something on the order of twenty minutes. Those 272211 rows then 
have to be sorted, which takes a couple of seconds, which again is pretty 
good. The rows are then uniqued, which is really quick, before returning 
the results.


It's hard to think how you would expect the database to do this any 
faster, really.



Indexes:
    pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, 
topfamilyid,
ruleenddate, gid)
    idx_cmrules btree (topfamilyid)
    idx_gid_ws_cmrules btree (gid, workspaceid)


You may perhaps benefit from an index on just the workspaceid column, but 
the benefit may be minor.


You may think of clustering the table on the index, but that will only be 
of benefit if the data is not in the cache.


The statistics seem to be pretty accurate, predicting 305821 instead of 
272211 rows. The database is not going to easily predict the number of 
unique results (9 instead of 261), but that doesn't affect the query plan 
much, so I wouldn't worry about it.


I would consider upgrading to Postgres 8.4 if possible, as it does have 
some considerable performance improvements, especially for bitmap index 
scans if you are using a RAID array. I'd also try using SELECT DISTINCT 
rather than GROUP BY and seeing if that helps.


Matthew

--
Now the reason people powdered their faces back then was to change the values
s and n in this equation here. - Computer science lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread niraj patel
Hi Matthew ,

Thanks very much for the analysis. It does takes 17 sec to execute when data is 
not in cache. I cannot use distinct as I have aggregate operators in select 
clause in original query. What I would like to ask can partitioning around 
workspaceid would help ? Or any sort of selective index would help me. 

Thanks.





From: Matthew Wakeling matt...@flymine.org
To: niraj patel npa...@gridsolv.com
Cc: pgsql-performance@postgresql.org
Sent: Tue, 8 December, 2009 7:33:38 PM
Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.

On Tue, 8 Dec 2009, niraj patel wrote:
  Group  (cost=509989.19..511518.30 rows=9 width=10) (actual 
 time=1783.102..2362.587
 rows=261 loops=1)
-  Sort  (cost=509989.19..510753.74 rows=305821 width=10) (actual
 time=1783.097..2121.378 rows=272211 loops=1)
  Sort Key: topfamilyid
  -  Bitmap Heap Scan on cmrules r  (cost=14501.36..476896.34 
 rows=305821
 width=10) (actual time=51.507..351.487 rows=272211 loops=1)
Recheck Cond: (workspaceid = 18512::numeric)
-  Bitmap Index Scan on pk_ws_fea_fam_cmrules  
 (cost=0.00..14424.90
 rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1)
  Index Cond: (workspaceid = 18512::numeric)
  Total runtime: 2373.008 ms
 (8 rows)

 select count(*) from  cmrules;
 
 Gives me 17 643 532 Rows

Looks good from here. Think about what you're asking the database to do. It has 
to select 272211 rows out of a large table with 17643532 rows. That in itself 
could take a very long time. It is clear that in your EXPLAIN this data is 
already cached, otherwise it would have to perform nigh on 27 seeks over 
the discs, which would take (depending on the disc system) something on the 
order of twenty minutes. Those 272211 rows then have to be sorted, which takes 
a couple of seconds, which again is pretty good. The rows are then uniqued, 
which is really quick, before returning the results.

It's hard to think how you would expect the database to do this any faster, 
really.

 Indexes:
 pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, 
 topfamilyid,
 ruleenddate, gid)
 idx_cmrules btree (topfamilyid)
 idx_gid_ws_cmrules btree (gid, workspaceid)

You may perhaps benefit from an index on just the workspaceid column, but the 
benefit may be minor.

You may think of clustering the table on the index, but that will only be of 
benefit if the data is not in the cache.

The statistics seem to be pretty accurate, predicting 305821 instead of 272211 
rows. The database is not going to easily predict the number of unique results 
(9 instead of 261), but that doesn't affect the query plan much, so I wouldn't 
worry about it.

I would consider upgrading to Postgres 8.4 if possible, as it does have some 
considerable performance improvements, especially for bitmap index scans if you 
are using a RAID array. I'd also try using SELECT DISTINCT rather than GROUP 
BY and seeing if that helps.

Matthew

-- Now the reason people powdered their faces back then was to change the values
s and n in this equation here. - Computer science lecturer
-- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread Matthew Wakeling

On Tue, 8 Dec 2009, niraj patel wrote:
Thanks very much for the analysis. It does takes 17 sec to execute when 
data is not in cache.


It sounds like the table is already very much ordered by the workspaceid, 
otherwise this would have taken much longer.


What I would like to ask can partitioning around workspaceid would help? 
Or any sort of selective index would help me.


Depends on how many distinct values of workspaceid there are. I would 
suggest that given how well ordered your table is, and if you aren't doing 
too many writes, then there would be little benefit, and much hassle.


Matthew

--
Now, you would have thought these coefficients would be integers, given that
we're working out integer results. Using a fraction would seem really
stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to
use complex numbers.-- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Robert Haas
On Tue, Dec 8, 2009 at 7:12 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 8/12/2009 6:11 PM, Thom Brown wrote:

 Your output shows that the xdf_admin_hierarchy tables between versions
 are drastically different.  8.3.8 only contains 1 row, whereas 8.4.1
 contains 84211 rows.

 That's just because one of them is doing a nested loop where it looks up a
 single row from xdf_admin_hierarchy via its primary key on each iteration.
 The other plan is doing a hash join on a sequential scan over
 xdf_admin_hierarchy so it reports all the rows at once.

I've been meaning to write a patch to show the places after the
decimal point in that case.  Rounding off to an integer is horribly
misleading and obscures what is really going on.  Although in this
case maybe it would come out 1.000 anyway.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Checkpoint spikes

2009-12-08 Thread Kevin Grittner
Richard Neill rn...@cam.ac.uk wrote:
 
 So far, I've set checkpoint_segments to 128, timeout to 10min, and
 completion_target to 0.8. This helps, but not as much as I'd
 hoped.
 
 But I haven't touched any of the other WAL or BG Writer settings.
 
 Where should I look next?
 
On our web servers, where we had similar issues, we seem to be doing
OK using:
 
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4.0
 
The other thing which can help this problem is keeping
shared_buffers smaller than most people recommend.  We use 512MB on
our larger web server and 256MB on other servers.  (Be sure to test
with your actual load, as this might or might not degrade overall
performance.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Robert Haas
On Tue, Dec 8, 2009 at 8:27 AM, Schmitz, David david.schm...@harman.com wrot
 that is exactly the problem postgresql 8.4.1 does not consider the primary 
 key but instead calculates
 a hash join. This can only result in poorer performance. I think this is a 
 bug.

Your statement that this can only result in poorer performance is
flat wrong.  Just because there's a primary key doesn't mean that an
inner-indexscan plan is fastest.  Frequently a hash join is faster.  I
can think of a couple of possible explanations for the behavior you're
seeing:

- Something could be blocking PostgreSQL from using that index at all.
 If you do EXPLAIN SELECT * FROM xdf_admin_hierarchy WHERE
admin_place_id = some particular value, does it use the index or
seq-scan the table?

- The index on your 8.4.1 system might be bloated.  You could perhaps
SELECT reltuples FROM pg_class WHERE oid =
'pk_xdf_admin_hierarchy'::regclass on both systems to see if one index
is larger than the other.

- You might have changed the value of the work_mem parameter on one
system vs. the other.  Try show work_mem; on each system and see
what you get.

If it's none of those things, it's could be the result of a code
change, but I'm at a loss to think of which one would apply in this
case.  I suppose we could do a bisection search but that's a lot of
work for you.  If you could extract a reproducible test case (complete
with data) that would allow someone else to try to track it down.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread Lennin Caro
From: niraj patel npa...@gridsolv.com
Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.
To: Grzegorz Jaśkiewicz gryz...@gmail.com
Cc: pgsql-performance@postgresql.org
Date: Tuesday, December 8, 2009, 1:50 PM

Hi gryzman,

I have run vacuum full analyze on the cmrules tables. The version of pstgres is 
8.2.13. How should I change stats to 100 ?

Thanks
From: Grzegorz Jaśkiewicz gryz...@gmail.com
To: niraj patel npa...@gridsolv.com
Cc: pgsql-performance@postgresql.org
Sent: Tue, 8 December, 2009 7:12:49 PM
Subject:
 Re: [PERFORM] Optimizing Bitmap Heap Scan.

 it looks like it might choose wrong plan, cos it gets the stats wrong. 
Try increasing number of stats to 100. 
Btw, what version it is ?


in psql 
mydb=# set default_statistics_target = 100;

 




  

Re: [PERFORM] error occured in dbt2 against with postgresql

2009-12-08 Thread Robert Haas
On Tue, Dec 8, 2009 at 12:37 AM, Niu Yan vivian@gmail.com wrote:
 Can't use an undefined value as an ARRAY reference at
 /usr/lib/perl5/site_perl/5.8.8/Test/Parser/Dbt2.pm line 521.

I'm guessing this is intended as a bug report, but this is a
PostgreSQL mailing list, and that's a Perl error message.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I can think of a couple of possible explanations for the behavior you're
 seeing:

The reason it's switching from a nestloop to something else is pretty
obvious: the estimate of the number of rows coming out of the lower
join has gone from 81 to 60772.  Neither of which is real accurate :-(,
but the larger value pretty strongly discourages using a nestloop.

The estimates for the individual scans mostly seem to be better than
before, in the case of xdf_road_name far better: 97 vs 1, against a true
value of 100.  So that's good; I suspect though that it just comes from
the increase in default stats target and doesn't reflect any logic
change.  The bottom line though is that it's gone from a considerable
underestimate of the join size to a considerable overestimate, and that
pushes it to use a different plan that turns out to be inferior.

I don't see any fixable bug here.  This is just a corner case where
the inherent inaccuracies in join size estimation went wrong for us;
but for every one of those there's another one where we'd get the
right answer for the wrong reason.

One thing that might be worth considering is to try to improve the
accuracy of this rowcount estimate:

  -  Bitmap Heap Scan on xdf_road_link rl  (cost=6.82..743.34 
rows=222 width=34) (actual time=0.025..0.115 rows=7 loops=100)
  Recheck Cond: (rl.road_name_id = rn.road_name_id)
  Filter: ((rl.is_exit_name = 'N'::bpchar) AND 
(rl.is_junction_name = 'N'::bpchar))
  -  Bitmap Index Scan on nx_xdfroadlink_roadnameid  
(cost=0.00..6.76 rows=222 width=0) (actual time=0.008..0.008 rows=7 loops=100)
Index Cond: (rl.road_name_id = rn.road_name_id)

I think a large part of the inaccuracy here has to do with not having
good stats for the joint effect of the is_exit_name and is_junction_name
conditions.  But to be frank that looks like bad schema design.
Consider merging those and any related flags into one entry type
column.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Vacuum running out of memory

2009-12-08 Thread Tom Lane
Jonathan Foy the...@gmail.com writes:
 My vacuums have suddenly started to fail, seemingly at random.  I am
 confused.

 I'm running 8.1.3, with close to a dozen servers, up to 150 databases each.
 I have 8GB of RAM.  Vacuums have started to fail on all servers (though only
 the occasional vacuum) with the following error:

 VACUUM,ERROR:  out of memory
 VACUUM,DETAIL:  Failed on request of size 268435452

I'd back off maintenance_work_mem if I were you.  I think you don't have
enough RAM to be running a lot of concurrent VACUUMs all with the same
large memory consumption.

Also, if it's really 8.1.3, consider an update to 8.1.something-recent.
Not only are you exposed to a number of very serious known bugs, but
this patch in particular would likely help you:
http://archives.postgresql.org/pgsql-committers/2007-09/msg00377.php

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Vacuum running out of memory

2009-12-08 Thread Greg Stark
On Tue, Dec 8, 2009 at 4:31 PM, Jonathan Foy the...@gmail.com wrote:
 I was wondering if that was the problem.  So I'm correct in thinking that
 the failure occurred when the vacuum tried to pull its 256 MB as defined in
 the maintenance_work_mem value, and the system just did not have enough
 available...

Correct

 any idea why that would suddenly start happening?  The indexes I
 created shouldn't have affected that, should they?

Well the 8.1 vacuum was pretty inefficient in how it scanned indexes
so adding lots of indexes will make it take a lot longer. That might
mean you're running more vacuums at the same time now. The 8.2 vacuum
is much improved on that front, though adding lots of indexes will
still make vacuum take longer (along with updates and inserts).

 And point taken with the update.  I'm pushing to get us to 8.4,
 unsuccessfully so far, but management might be more amenable to minor
 version upgrades, since as I understand it there shouldn't be any risk of
 application problems with minor version changes...

You're always better off running the most recent minor release. Minor
releases fix security holes, data corruption bugs, crashing bugs, etc.
Occasionally those bugs do fix behavioural bugs, especially early in
the release cycle before the next major release is out but mostly
they're real bugs that if you had run into you would know. You should
still read all the release notes for them though.


-- 
greg

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Vacuum running out of memory

2009-12-08 Thread Tom Lane
Jonathan Foy the...@gmail.com writes:
 I was wondering if that was the problem.  So I'm correct in thinking that
 the failure occurred when the vacuum tried to pull its 256 MB as defined in
 the maintenance_work_mem value, and the system just did not have enough
 available...any idea why that would suddenly start happening?  The indexes I
 created shouldn't have affected that, should they?

Not directly, AFAICS, but they could stretch out the time required to
vacuum their tables, thus possibly leading to vacuums overlapping that
didn't overlap before.  Just a guess though.  Another likely bet is
that this is just an effect of the overall system load increasing
over time (more backends == more memory needed).

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance penalty between Postgresql 8.3.8 and 8.4.1

2009-12-08 Thread Robert Haas
On Tue, Dec 8, 2009 at 11:07 AM, Schmitz, David
david.schm...@harman.com wrote:
 So how should we proceed with this issue?

I think Tom nailed it.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread Robert Haas
2009/12/8 Lennin Caro lennin.c...@yahoo.com

 From: niraj patel npa...@gridsolv.com

 Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.
 To: Grzegorz Jaśkiewicz gryz...@gmail.com
 Cc: pgsql-performance@postgresql.org
 Date: Tuesday, December 8, 2009, 1:50 PM

 Hi gryzman,

 I have run vacuum full analyze on the cmrules tables. The version of pstgres 
 is 8.2.13. How should I change stats to 100 ?

 Thanks
 
 From: Grzegorz Jaśkiewicz gryz...@gmail.com
 To: niraj patel npa...@gridsolv.com
 Cc: pgsql-performance@postgresql.org
 Sent: Tue, 8 December, 2009 7:12:49 PM
 Subject: Re: [PERFORM] Optimizing Bitmap Heap Scan.

 it looks like it might choose wrong plan, cos it gets the stats wrong.
 Try increasing number of stats to 100.
 Btw, what version it is ?


 in psql
 mydb=# set default_statistics_target = 100;

That's only going to affect the current session.  To change it
permanently, edit postgresql.conf and do pg_ctl reload.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Checkpoint spikes

2009-12-08 Thread Greg Smith

Richard Neill wrote:

(does the advice for 8.3 apply unchanged to 8.4?)
Yes; no changes in this area for 8.4.  The main things performance 
related that changed between 8.3 and 8.4 are:
1) VACUUM free space management reimplemented so that the max_fsm_* 
parameters aren't needed anymore

2) default_statistics_target now starts at 100 instead of 10


So far, I've set checkpoint_segments to 128, timeout to 10min, and
completion_target to 0.8. This helps, but not as much as I'd hoped.
Good, if the problem is moving in the right direction you're making 
progress.



But I haven't touched any of the other WAL or BG Writer settings.
Where should I look next?
  Should I be looking at the BG Writer settings,
  or should I look at the Linux VM configuration?
(eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1)
I would start by reducing dirty_background_ratio; as RAM sizes climb, 
this keeps becoming a bigger issue.  The whole disk flushing code 
finally got a major overhaul in the 2.6.32 Linux kernel, I'm hoping this 
whole class of problem was improved from the changes made.


Changes to the background writer behavior will probably not work as 
you'd expect.  The first thing I'd try it in your situation turning it 
off altogether; it can be slightly counterproductive for reducing 
checkpoint issues if they're really bad, which yours are.  If that goes 
in the wrong direction, experimenting with increasing the maximum pages 
and the multiplier might be useful, I wouldn't bet on it helping through.


As Kevin already mentioned, reducing the size of the buffer cache can 
help too.  That's worth trying if you're exhausted the other obvious 
possibilities.



  Or would it be most useful to try to move the WAL to a different disk?
On Linux having the WAL on a separate disk can improve things much more 
than you might expect, simply because of how brain-dead the filesystem 
fsync implementation is.  Reducing the seeks for WAL traffic can help a 
lot too.


If you've lowered Linux's caching, tried some BGW tweaks, and moved the 
WAL to somewhere else, if latency is still high you may be facing a 
hardware upgrade to improve things.  Sometimes these problems just 
require more burst write throughput (regardless of how good average 
performance looks) and nothing else will substitute.  Hopefully you'll 
find a tuning solution before that though.



--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Checkpoint spikes

2009-12-08 Thread Andres Freund
On Wednesday 09 December 2009 03:05:40 Greg Smith wrote:
 On Linux having the WAL on a separate disk can improve things much more
 than you might expect, simply because of how brain-dead the filesystem
 fsync implementation is.  Reducing the seeks for WAL traffic can help a
 lot too.
Not using ext3's data=ordered helps massively already. 

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance