Re: [PERFORM] Kernel Resources and max_connections
Chris Hebrard wrote: kern.ipc.shmmax and kern.ipc.shmmin will not stay to what I set them to. What am I doing wrong or not doing at all? These need to go in /etc/sysctl.conf. You might need to set shmall as well. (This not-very-clear distinction between what is sysctl'abe and what is a kernel tunable is a bit of a downer). cheers Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Kernel Resources and max_connections
Chris Hebrard wrote: I set the values in etc/sysctl.conf: # $FreeBSD: src/etc/sysctl.conf,v 1.1.2.3 2002/04/15 00:44:13 dougb Exp $ # # This file is read when going to multi-user and its contents piped thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. # # Added by IMP 2005-05-04 net.inet.tcp.rfc1323=1 kern.ipc.somaxconn=1024 kern.ipc.maxsockbuf=8388608 net.inet.tcp.sendspace=3217968 net.inet.tcp.recvspace=3217968 kern.ipc.semmns="272" kern.ipc.semmni="256" kern.ipc.shmmax="66099200" kern.ipc.shmmin="256" After a restart both shmmax and shmmin are now 0 and postgres failed to start. Hmmm - puzzling. One point to check, did you take them out of /boot/loader.conf ? Assuming so, maybe don't quote 'em (see below). Finally you need to to set shmall, otherwise it will over(under)ride the shmmax setting. So try: net.inet.tcp.rfc1323=1 kern.ipc.somaxconn=1024 kern.ipc.maxsockbuf=8388608 net.inet.tcp.sendspace=3217968 net.inet.tcp.recvspace=3217968 kern.ipc.semmns=272 kern.ipc.semmni=256 kern.ipc.shmmax=66099200 kern.ipc.shmmin=256 kern.ipc.shmall=32768 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] tuning
list wrote: hi- i would like to see if someone could recommend something to make my query run faster. Values in postgresql.conf: shared_buffers = 1000 sort_mem is commented out effective_cache_size is commented out random_page_cost is commented out I would increase shared_buffers (say 5000 - 1), and also effective_cache_size (say around 2 - 5 - but work out how much memory this box has free or cached and adjust accordingly). From your explain output, it looks like sorting is not too much of a problem - so you can leave it unchanged (for this query anyway). Here is the query in question: select * from productvendorview where (productlistid=3 or productlistid=5 or productlistid=4) and (lower(item) like '9229%' or lower(descrip) like 'toner%') order by vendorname,item limit 100; You might want to break this into 2 queries and union them, so you can (potentially) use the indexes on productlistid,lower(item) and productlistid, lower(descrip) separately. This query scans 412,457 records. Here is the EXPLAIN ANALYZE for the query: Limit (cost=45718.83..45719.08 rows=100 width=108) (actual time=39093.636..39093.708 rows=100 loops=1) -> Sort (cost=45718.83..45727.48 rows=3458 width=108) (actual time=39093.629..39093.655 rows=100 loops=1) Sort Key: v.vendorname, p.item -> Hash Join (cost=22.50..45515.57 rows=3458 width=108) (actual time=95.490..39062.927 rows=2440 loops=1) Hash Cond: ("outer".vendorid = "inner".id) -> Seq Scan on test p (cost=0.00..45432.57 rows=3457 width=62) (actual time=89.066..39041.654 rows=2444 loops=1) Filter: (((productlistid = 3) OR (productlistid = 5) OR (productlistid = 4)) AND ((lower((item)::text) ~~ '9229%'::text) OR (lower((descrip)::text) ~~ 'toner%'::text))) -> Hash (cost=20.00..20.00 rows=1000 width=54) (actual time=6.289..6.289 rows=0 loops=1) -> Seq Scan on vendor v (cost=0.00..20.00 rows=1000 width=54) (actual time=0.060..3.653 rows=2797 loops=1) Total runtime: 39094.713 ms (10 rows) I guess the relation 'test' is a copy of product (?) Cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] postgresql-8.0.1 performance tuning
Cosimo Streppone wrote: # Config /etc/sysctl.conf: kernel.shmall = 786432000 kernel.shmmax = 786432000 I think you have a problem here. kernel.shmmax should *not* be set to an amount of RAM, but to maximum number of shared memory pages, which on a typical linux system is 4kb. Google around: This is somewhat confusing : kernel.shmmax is in bytes (max single segment size) kernel.shmall is in (4k) pages (max system wide allocated segment pages) cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] postgresql-8.0.1 performance tuning
Cosimo Streppone wrote: Mark Kirkwood ha scritto: Cosimo Streppone wrote: # Config /etc/sysctl.conf: kernel.shmall = 786432000 kernel.shmmax = 786432000 I think you have a problem here. kernel.shmmax should *not* be set to an amount of RAM, but Sorry, I thought "shmall" but written "shmmax". Thanks Mark! Hehe - happens to me all the time! On the shmall front - altho there is *probably* no real performance impact setting it to the same as shmmax (i.e. allowing 4096 allocations of size shmmax!), it is overkill. In addition it does allow for a DOS by a program that allocates thousands of segments (or somehow starts thousands of Pg servers on different ports...)! For a dedicated Pg server I would size shmall using a calculation along the lines of: shmall = (no. of postgresql servers) * (shmmax/4096) If there are other daemons on the box that need to use shared memory, then add their likely requirements to shmall too! cheers Mark ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: SHMMAX / SHMALL Was (Re: [PERFORM] postgresql-8.0.1 performance
Paul McGarry wrote: Based on the powerpostgresql.com Performance Checklist [1] and Annotated Postgresql.conf [2] I understand that: -I should have less than 1/3 of my total memory as shared_buffers -For my server 15000 is a fairly reasonable starting point for shared_buffers which is ~120MB -I have 100 max_connections. So I was going to set SHMMAX to 134217728 (ie 128 Meg) What should SHMALL be? The current system values are [EMAIL PROTECTED]:~/data$ cat /proc/sys/kernel/shmmax 33554432 [EMAIL PROTECTED]:~/data$ cat /proc/sys/kernel/shmall 2097152 ie SHMALL seems to be 1/16 of SHMMAX No - shmall is in 4k pages _ so this amounts to 8G! This is fine - unless you wish to decrease it in order to prevent too many shared memory applications running. BTW - the docs have been amended for 8.1 to suggest shmmax=134217728 and shmall=2097152 (was going to point you at them - but I cannot find them on the Postgresql site anymore...). There seems to be some longstanding confusion in the Linux community about the units for shmall (some incorrect documentation from Oracle on the issue does not help I am sure) - to the point where I downloaded kernel source to check (reproducing here): linux-2.6.11.1/include/linux/shm.h:13-> #define SHMMAX 0x200 /* max shared seg size (bytes) */ #define SHMMIN 1 /* min shared seg size (bytes) */ #define SHMMNI 4096 /* max num of segs system wide */ #define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide (pages) */ #define SHMSEG SHMMNI Hope that helps Best wishes Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: SHMMAX / SHMALL Was (Re: [PERFORM] postgresql-8.0.1 performance
Martin Fandel wrote: Aah ok :) I've set my values now as follow (2GB RAM): SHMMAX=`cat /proc/meminfo | grep MemTotal | cut -d: -f 2 | awk '{print $1*1024/3}'` echo kernel.shmmax=${SHMMAX} >> /etc/sysctl.conf SHMALL=`expr ${SHMALL} / 4096 \* \( 4096 / 16 \)` echo kernel.shmall=${SHMALL} >> /etc/sysctl.conf sysctl.conf: kernel.shmmax=708329472 kernel.shmall=44270592 postgresql.conf: max_connections=500 shared_buffers=4 # ~312MB, min. 1000, max ~ 83000 Hmmm - shmall set to 168G... err why? Apologies for nit picking a little - but shmall seems unreasonably high. I can't see much reason for setting it bigger than (physical RAM in bytes)/4096 myself. So in your case this is 2*(1024*1024*1024)/4096 = 524288 Cheers Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Filesystem
Martin Fandel wrote: Hi @ all, i have only a little question. Which filesystem is preferred for postgresql? I'm plan to use xfs (before i used reiserfs). The reason is the xfs_freeze Tool to make filesystem-snapshots. Is the performance better than reiserfs, is it reliable? I used postgresql with xfs on mandrake 9.0/9.1 a while ago - reliability was great, performance seemed better than ext3. I didn't compare with reiserfs - the only time I have ever lost data from a Linux box has been when I used reiserfs, hence I am not a fan :-( best wishes Mark ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Updates on large tables are extremely slow
Yves Vindevogel wrote: I'm trying to update a table that has about 600.000 records. The update query is very simple : update mytable set pagesdesc = - pages ; The query takes about half an hour to an hour to execute. I have tried a lot of things. Half an hour seem a bit long - I would expect less than 5 minutes on reasonable hardware. You may have dead tuple bloat - can you post the output of 'ANALYZE VERBOSE mytable' ? Cheers Mark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Updates on large tables are extremely slow
Apologies - I should have said output of 'VACUUM VERBOSE mytable'. (been using 8.1, which displays dead tuple info in ANALYZE...). Mark Yves Vindevogel wrote: rvponp=# analyze verbose tblPrintjobs ; INFO: analyzing "public.tblprintjobs" INFO: "tblprintjobs": 19076 pages, 3000 rows sampled, 588209 estimated total rows ANALYZE On 13 Jun 2005, at 04:43, Mark Kirkwood wrote: Yves Vindevogel wrote: I'm trying to update a table that has about 600.000 records. The update query is very simple : update mytable set pagesdesc = - pages ; The query takes about half an hour to an hour to execute. I have tried a lot of things. Half an hour seem a bit long - I would expect less than 5 minutes on reasonable hardware. You may have dead tuple bloat - can you post the output of 'ANALYZE VERBOSE mytable' ? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] QRY seems not using indexes
Qingqing Zhou wrote: <[EMAIL PROTECTED]> writes so, if I do a qry like "EXPLAIN ANALYZE select * from pridecdr where idsede=8977758488" it tooks a lot of time before i get back any result: Index Scan using prd_id_sede on pridecdr (cost=0.00..699079.90 rows=181850 width=138) (actual time=51.241..483068.255 rows=150511 loops=1) Index Cond: (idsede = 8977758488::bigint) Total runtime: 483355.325 ms The query plan looks ok. Try to do EXPLAIN ANALYZE twice and see if there is any difference. This could reduce the IO time to read your index/data since you got enough RAM. Also, if you haven't done VACUUM FULL for a long time, do so and compare the difference. Could also be libpq buffering all 15 rows before showing any. It might be worthwhile using a CURSOR and doing 1 FETCH. If that is quick, then buffering is probably the issue. BTW - do you really want all the rows? Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Limit + group + join
Tobias, Interesting example: The 'desc' seems to be the guy triggering the sort, e.g: explain select c.id from c join b on c_id=c.id group by c.id order by c.id limit 5; QUERY PLAN - Limit (cost=0.00..0.28 rows=5 width=4) -> Group (cost=0.00..4476.00 rows=8 width=4) -> Merge Join (cost=0.00..4276.00 rows=8 width=4) Merge Cond: ("outer".id = "inner".c_id) -> Index Scan using c_pkey on c (cost=0.00..1518.00 rows=8 width=4) -> Index Scan using b_on_c on b (cost=0.00..1558.00 rows=8 width=4) (6 rows) Whereas with it back in again: explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; QUERY PLAN -- Limit (cost=10741.08..10741.11 rows=5 width=4) -> Group (cost=10741.08..11141.08 rows=8 width=4) -> Sort (cost=10741.08..10941.08 rows=8 width=4) Sort Key: c.id -> Hash Join (cost=1393.00..4226.00 rows=8 width=4) Hash Cond: ("outer".c_id = "inner".id) -> Seq Scan on b (cost=0.00..1233.00 rows=8 width=4) -> Hash (cost=1193.00..1193.00 rows=8 width=4) -> Seq Scan on c (cost=0.00..1193.00 rows=8 width=4) (9 rows) However being a bit brutal: set enable_mergejoin=false; set enable_hashjoin=false; explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; QUERY PLAN -- Limit (cost=0.00..15.24 rows=5 width=4) -> Group (cost=0.00..243798.00 rows=8 width=4) -> Nested Loop (cost=0.00..243598.00 rows=8 width=4) -> Index Scan Backward using c_pkey on c (cost=0.00..1518.00 rows=8 width=4) -> Index Scan using b_on_c on b (cost=0.00..3.01 rows=1 width=4) Index Cond: (b.c_id = "outer".id) (6 rows) What is interesting is why this plan is being rejected... Cheers Mark Tobias Brox wrote: Consider this setup - which is a gross simplification of parts of our production system ;-) create table c (id integer primary key); create table b (id integer primary key, c_id integer); create index b_on_c on b(c_id) insert into c (select ... lots of IDs ...); insert into b (select id, id from c); /* keep it simple :-) */ Now, I'm just interessted in some few rows. All those gives good plans: explain select c.id from c order by c.id limit 1; explain select c.id from c group by c.id order by c.id limit 1; explain select c.id from c join b on c_id=c.id order by c.id limit 1; ... BUT ... combining join, group and limit makes havoc: explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Limit + group + join
Tom Lane wrote: Mark Kirkwood <[EMAIL PROTECTED]> writes: What is interesting is why this plan is being rejected... Which PG version are you using exactly? That mistake looks like an artifact of the 8.0 "fuzzy plan cost" patch, which we fixed recently: http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php Right on - 8.0.3 (I might look at how CVS tip handles this, could be interesting). But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy cost issue explains his results. As far as the "desc" point goes, the problem is that mergejoins aren't capable of dealing with backward sort order, so a merge plan isn't considered for that case (or at least, it would have to have a sort after it, which pretty much defeats the point for a fast-start plan). I have some ideas about fixing this but it won't happen before 8.2. That doesn't explain why the nested loop is being kicked tho', or have I missed something obvious? - it's been known to happen :-)... Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Limit + group + join
Interestingly enough, 7.4.8 and 8.1devel-2005-08-23 all behave the same as 8.0.3 for me (tables freshly ANALYZEd): joinlimit=# SELECT version(); version - PostgreSQL 7.4.8 on i386-unknown-freebsd5.4, compiled by GCC gcc (GCC) 3.4.2 [FreeBSD] 20040728 (1 row) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; QUERY PLAN --- Limit (cost=10591.36..10591.39 rows=5 width=4) -> Group (cost=10591.36..10992.02 rows=80131 width=4) -> Sort (cost=10591.36..10791.69 rows=80131 width=4) Sort Key: c.id -> Merge Join (cost=0.00..4064.66 rows=80131 width=4) Merge Cond: ("outer".id = "inner".c_id) -> Index Scan using c_pkey on c (cost=0.00..1411.31 rows=80131 width=4) -> Index Scan using b_on_c on b (cost=0.00..1451.72 rows=80172 width=4) (8 rows) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id LIMIT 5; QUERY PLAN - Limit (cost=0.00..0.27 rows=5 width=4) -> Group (cost=0.00..4264.99 rows=80131 width=4) -> Merge Join (cost=0.00..4064.66 rows=80131 width=4) Merge Cond: ("outer".id = "inner".c_id) -> Index Scan using c_pkey on c (cost=0.00..1411.31 rows=80131 width=4) -> Index Scan using b_on_c on b (cost=0.00..1451.72 rows=80172 width=4) (6 rows) joinlimit=# SELECT version(); version PostgreSQL 8.1devel on i386-unknown-freebsd5.4, compiled by GCC gcc (GCC) 3.4.2 [FreeBSD] 20040728 (1 row) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; QUERY PLAN --- Limit (cost=10654.53..10654.55 rows=5 width=4) -> Group (cost=10654.53..11054.53 rows=8 width=4) -> Sort (cost=10654.53..10854.53 rows=8 width=4) Sort Key: c.id -> Merge Join (cost=0.00..4139.44 rows=8 width=4) Merge Cond: ("outer".id = "inner".c_id) -> Index Scan using c_pkey on c (cost=0.00..1450.00 rows=8 width=4) -> Index Scan using b_on_c on b (cost=0.00..1490.00 rows=8 width=4) (8 rows) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id LIMIT 5; QUERY PLAN - Limit (cost=0.00..0.27 rows=5 width=4) -> Group (cost=0.00..4339.44 rows=8 width=4) -> Merge Join (cost=0.00..4139.44 rows=8 width=4) Merge Cond: ("outer".id = "inner".c_id) -> Index Scan using c_pkey on c (cost=0.00..1450.00 rows=8 width=4) -> Index Scan using b_on_c on b (cost=0.00..1490.00 rows=8 width=4) (6 rows) The non default server params of relevance are: shared_buffers = 12000 effective_cache_size = 10 work_mem/sort_mem = 20480 I did wonder if the highish sort_mem might be a factor, but no, with it set to 1024 I get the same behaviour (just higher sort cost estimates). Cheers Mark Tom Lane wrote: Which PG version are you using exactly? That mistake looks like an artifact of the 8.0 "fuzzy plan cost" patch, which we fixed recently: http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy cost issue explains his results. As far as the "desc" point goes, the problem is that mergejoins aren't capable of dealing with backward sort order, so a merge plan isn't considered for that case (or at least, it would have to have a sort after it, which pretty much defeats the point for a fast-start plan). I have some ideas about fixing this but it won't happen before 8.2. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Limit + group + join
Tom Lane wrote: I looked into this and found that indeed the desirable join plan was getting generated, but it wasn't picked because query_planner didn't have an accurate idea of how much of the join needed to be scanned to satisfy the GROUP BY step. I've committed some changes that hopefully will let 8.1 be smarter about GROUP BY ... LIMIT queries. Very nice :-) joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id GROUP BY c.id ORDER BY c.id DESC LIMIT 5; QUERY PLAN -- Limit (cost=0.00..15.23 rows=5 width=4) -> Group (cost=0.00..243730.00 rows=8 width=4) -> Nested Loop (cost=0.00..243530.00 rows=8 width=4) -> Index Scan Backward using c_pkey on c (cost=0.00..1450.00 rows=8 width=4) -> Index Scan using b_on_c on b (cost=0.00..3.01 rows=1 width=4) Index Cond: (b.c_id = "outer".id) (6 rows) This is 8.1devel from today. regards Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Massive performance issues
It would be good to see EXPLAIN ANALYZE output for the three queries below (the real vs. estimated row counts being of interest). The number of pages in your address table might be interesting to know too. regards Mark Matthew Sackman wrote (with a fair bit of snippage): explain select locality_2 from address where locality_2 = 'Manchester'; gives QUERY PLAN Seq Scan on address (cost=0.00..80677.16 rows=27923 width=12) Filter: ((locality_2)::text = 'Manchester'::text) explain select locality_1 from address where locality_1 = 'Manchester'; gives QUERY PLAN Index Scan using address_locality_1_index on address (cost=0.00..69882.18 rows=17708 width=13) Index Cond: ((locality_1)::text = 'Manchester'::text) > select street, locality_1, locality_2, city from address where (city = 'Nottingham' or locality_2 = 'Nottingham' or locality_1 = 'Nottingham') and upper(substring(street from 1 for 1)) = 'A' group by street, locality_1, locality_2, city order by street limit 20 offset 0 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Massive performance issues
Matthew Sackman wrote: I need to get to the stage where I can run queries such as: > select street, locality_1, locality_2, city from address where (city = 'Nottingham' or locality_2 = 'Nottingham' or locality_1 = 'Nottingham') and upper(substring(street from 1 for 1)) = 'A' group by street, locality_1, locality_2, city order by street limit 20 offset 0 and have the results very quickly. This sort of query will be handled nicely in 8.1 - it has bitmap and/or processing to make use of multiple indexes. Note that 8.1 is in beta now. Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Massive performance issues
Mark Kirkwood wrote: Matthew Sackman wrote: I need to get to the stage where I can run queries such as: > select street, locality_1, locality_2, city from address where (city = 'Nottingham' or locality_2 = 'Nottingham' or locality_1 = 'Nottingham') and upper(substring(street from 1 for 1)) = 'A' group by street, locality_1, locality_2, city order by street limit 20 offset 0 and have the results very quickly. This sort of query will be handled nicely in 8.1 - it has bitmap and/or processing to make use of multiple indexes. Note that 8.1 is in beta now. As others have commented, you will probably need better hardware to achieve a factor of 1000 improvement, However I think using 8.1 could by itself give you a factor or 10->100 improvement. e.g. Using your schema and generating synthetic data: EXPLAIN SELECT street, locality_1, locality_2, city FROM address WHERE (city = '500TH CITY' OR locality_2 = '50TH LOCALITY' OR locality_1 = '500TH LOCALITY') AND upper(substring(street from 1 for 1)) = 'A' GROUP BY street, locality_1, locality_2, city ORDER BY street LIMIT 20 OFFSET 0 QUERY PLAN -- Limit (cost=59559.04..59559.09 rows=20 width=125) -> Sort (cost=59559.04..59559.09 rows=21 width=125) Sort Key: street -> HashAggregate (cost=59558.37..59558.58 rows=21 width=125) -> Bitmap Heap Scan on address (cost=323.19..59556.35 rows=202 width=125) Recheck Cond: (((city)::text = '500TH CITY'::text) OR ((locality_2)::text = '50TH LOCALITY'::text) OR ((locality_1)::text = '500TH LOCALITY'::text)) Filter: (upper("substring"((street)::text, 1, 1)) = 'A'::text) -> BitmapOr (cost=323.19..323.19 rows=40625 width=0) -> Bitmap Index Scan on address_city_index (cost=0.00..15.85 rows=1958 width=0) Index Cond: ((city)::text = '500TH CITY'::text) -> Bitmap Index Scan on address_locality_2_index (cost=0.00..143.00 rows=18000 width=0) Index Cond: ((locality_2)::text = '50TH LOCALITY'::text) -> Bitmap Index Scan on address_locality_1_index (cost=0.00..164.33 rows=20667 width=0) Index Cond: ((locality_1)::text = '500TH LOCALITY'::text) (14 rows) This takes 0.5s -> 2s to execute (depending on the frequencies generated for the two localities). So we are a factor of 10 better already, on modest HW (2xPIII 1Ghz 2G running FreeBSD 5.4). To go better than this you could try a specific summary table: CREATE TABLE address_summary AS SELECT street, locality_1, locality_2, city, upper(substring(street from 1 for 1)) AS cut_street FROM address GROUP BY street, locality_1, locality_2, city ; CREATE INDEX address_summary_city_index ON address_summary(city); CREATE INDEX address_summary_locality_1_index ON address_summary(locality_1); CREATE INDEX address_summary_locality_2_index ON address_summary(locality_2); CREATE INDEX address_summary_street_index ON address_summary(street); CREATE INDEX street_summary_prefix ON address_summary(cut_street); And the query can be rewritten as: EXPLAIN SELECT street, locality_1, locality_2, city FROM address_summary WHERE (city = '500TH CITY' OR locality_2 = '50TH LOCALITY' OR locality_1 = '500TH LOCALITY') AND cut_street = 'A' ORDER BY street LIMIT 20 OFFSET 0 QUERY PLAN --- Limit (cost=0.00..2006.05 rows=20 width=125) -> Index Scan using address_summary_street_index on address_summary (cost=0.00..109028.81 rows=1087 width=125) Filter: city)::text = '500TH CITY'::text) OR ((locality_2)::text = '50TH LOCALITY'::text) OR ((locality_1)::text = '500TH LOCALITY'::text)) AND (cut_street = 'A'::text)) (3 rows) This takes 0.02s - so getting close to the factor of 1000 (a modern machine with 3-5 times the memory access speed will get you there easily). The effectiveness of the summary table will depend on the how much the GROUP BY reduces the cardinality (not much in this synthetic case), so you will probably get better improvement with the real data! Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
K C Lau wrote: Thank you all for your suggestions. I' tried, with some variations too, but still no success. The times given are the best of a few repeated tries on an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows. A small denormalization, where you mark the row with the latest atdate for each playerid may get you the performance you want. e.g: (8.1beta1) ALTER TABLE player ADD islastatdate boolean; UPDATE player SET islastatdate = true where (playerid,atdate) IN (SELECT playerid, atdate FROM vcurplayer); CREATE OR REPLACE VIEW vcurplayer AS SELECT * FROM player a WHERE islastatdate; CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate) WHERE islastatdate; ANALYZE player; Generating some test data produced: EXPLAIN ANALYZE SELECT playerid,atdate FROM vcurplayer WHERE playerid='0'; QUERY PLAN Index Scan using player_id_lastatdate on player a (cost=0.00..4.33 rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1) Index Cond: ((playerid = '0'::text) AND (lastatdate = true)) Filter: lastatdate Total runtime: 0.272 ms (4 rows) Whereas with the original view definition: CREATE OR REPLACE VIEW vcurplayer AS SELECT * FROM player a WHERE a.atdate = ( SELECT max(b.atdate) FROM player b WHERE a.playerid = b.playerid); EXPLAIN ANALYZE SELECT playerid,atdate FROM vcurplayer WHERE playerid='0'; QUERY PLAN - Index Scan using player_id_date on player a (cost=0.00..7399.23 rows=11 width=13) (actual time=121.738..121.745 rows=1 loops=1) Index Cond: (playerid = '0'::text) Filter: (atdate = (subplan)) SubPlan -> Result (cost=1.72..1.73 rows=1 width=0) (actual time=0.044..0.047 rows=1 loops=2000) InitPlan -> Limit (cost=0.00..1.72 rows=1 width=4) (actual time=0.028..0.031 rows=1 loops=2000) -> Index Scan Backward using player_id_date on player b (cost=0.00..3787.94 rows=2198 width=4) (actual time=0.019..0.019 rows=1 loops=2000) Index Cond: ($0 = playerid) Filter: (atdate IS NOT NULL) Total runtime: 121.916 ms (11 rows) Note that my generated data has too many rows for each playerid, but the difference in performance should illustrate the idea. Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
K C Lau wrote: I'm wondering if this performance issue is common enough for other users to merit a fix in pg, especially as it seems that with MVCC, each of the data records need to be accessed in addition to scanning the index. Yes - there are certainly cases where index only access (or something similar, like b+tree tables) would be highly desirable. From what I have understood from previous discussions, there are difficulties involved with producing a design that does not cause new problems... regards Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Bytea poor performance
NSO wrote: Hello, Yes, I can understand that, but then why the same app on the server machine is done in 4 seconds? (big difference from 20-30 seconds). I tryed to monitor network traffic and it is used only for 1-2% of total 100mbit. Is this a web app? If so, then check you are using the same browser settings on the server and client (or the same browser for that matter). Note that some browsers really suck for large (wide or long) table display! cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Bytea poor performance
NSO wrote: Well, no. Delphi isn't better, same time just for downloading data... But as I told before, if for ex. pgAdminIII is running on server machine it is a lot faster, I do not know why, I was monitoring network connection between client and server and it is using only up to 2% of full speed.. is server can't send faster? or client is not accepting data faster? That difference is suspiciously high - you need to get one of your network boys to check that the NIC in your client box is operating at full speed (and/or does not clash with whatever network device it is plugged into). The other thing to check that that your client box is reasonably spec'ed : e.g. not running out of ram or disk in particular - or suffering from massively fragmented disk (the latter if its win32). With respect to the Delphi, you can probably narrow where it has issues by running test versions of your app that have bits of functionality removed: - retrieves the bytea but does not display it - retrieves the bytea but displays it unformatted, or truncated - does not retrieve the bytea at all The difference between these should tell you where your issue is! By way of comparison, I have a Php page (no Delphi sorry) that essentially shows 50 rows from your files table over a 100Mbit network. Some experiments with that show: - takes 2 seconds to display in Firefox - takes 0.2 seconds to complete a request (i.e. "display") using httperf This indicates that (in my case) most of the 2 seconds is being used by Firefox (not being very good at) formatting the wide output for display. The figure of about 2-5 seconds seems about right, so your 20-30 seconds certainly seems high! cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Used Memory
Christian Paul B. Cosinas wrote: Here is the result of “free” command” I am talking about. What does this result mean? I seem to recall the Linux man page for 'free' being most unenlightening, so have a look at: http://gentoo-wiki.com/FAQ_Linux_Memory_Management (For Gentoo, but should be applicable to RHEL). The basic idea is that modern operating systems try to make as much use of the memory as possible. Postgresql depends on this behavior - e.g. a page that has previously been fetched from disk, will be cached, so it can be read from memory next time, as this is faster(!) I just noticed that as long as the free memory in the first row (which is 55036 as of now) became low, the slower is the response of the database server. Well, you could be swapping - what does the swap line of 'free' show then? Also, how about posting your postgresql.conf (or just the non-default parameters) to this list? Some other stuff that could be relevant: - Is the machine just a database server, or does it run (say) Apache + Php? - When the slowdown is noticed, does this coincide with certain activities - e.g, backup , daily maintenance, data load(!) etc. regards Mark I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html Nope, not me either. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Used Memory
Christian Paul B. Cosinas wrote: Hi To all those who replied. Thank You. I monitor my database server a while ago and found out that memory is used extensively when I am fetching records from the database. I use the command "fetch all" in my VB Code and put it in a recordset.Also in this command the CPU utilization is used extensively. Is there something wrong with my code or is it just the way postgresql is behaving which I cannot do something about it? I just monitor one workstation connecting to the database server and it is already eating up about 20 % of the CPU of database server. Which I think will not be applicable to our system since we have a target of 25 PC connecting to the database server most of the time. Could you post the query and the output of EXPLAIN ANALYZE? In addition, have you run ANALYZE on all the tables in that database ? (sorry, have to ask :-) ). cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Used Memory
Christian Paul B. Cosinas wrote: Hi mark I have so many functions, more than 100 functions in the database :) And I am dealing about 3 million of records in one database. And about 100 databases :) LOL - sorry, mis-understood your previous message to mean you had identified *one* query where 'fetch all' was causing the problem! Having said that, to make much more progress, you probably want to identify those queries that are consuming your resource, pick one of two of the particularly bad ones and post 'em. There are a number of ways to perform said identification, enabling stats collection might be worth a try. regards Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: (mass snippage) time psql -c "select count(*) from ivp.bigtable1" dgtestdb [EMAIL PROTECTED] IVP]$ cat sysout3 count -- 8000 (1 row) real1m9.875s user0m0.000s sys 0m0.004s [EMAIL PROTECTED] IVP]$ !du du -sk dgtestdb/base 17021260dgtestdb/base Summary: OK – you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead. This is a cheap alternative to overhauling the executor to use asynch I/O. Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings. If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s. Luke, Interesting - but possibly only representative for a workload consisting entirely of one executor doing "SELECT ... FROM my_single_table". If you alter this to involve more complex joins (e.g 4. way star) and (maybe add a small number of concurrent executors too) - is it still the case? Cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote: If you alter this to involve more complex joins (e.g 4. way star) and (maybe add a small number of concurrent executors too) - is it still the case? I may not have listened to you - are you asking about whether the readahead works for these cases? I’ll be running some massive TPC-H benchmarks on these machines soon – we’ll see then. That too, meaning the business of 1 executor random reading a given relation file whilst another is sequentially scanning (some other) part of it Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote: If you alter this to involve more complex joins (e.g 4. way star) and (maybe add a small number of concurrent executors too) - is it still the case? 4-way star, same result, that's part of my point. With Bizgres MPP, the 4-way star uses 4 concurrent scanners, though not all are active all the time. And that's per segment instance - we normally use one segment instance per CPU, so our concurrency is NCPUs plus some. Luke - I don't think I was clear enough about what I was asking, sorry. I added the more "complex joins" comment because: - I am happy that seqscan is cpu bound after ~110M/s (It's cpu bound on my old P3 system even earlier than that) - I am curious if the *other* access methods (indexscan, nested loop, hash, merge, bitmap) also suffer then same fate. I'm guessing from your comment that you have tested this too, but I think its worth clarifying! With respect to Bizgres MPP, scan parallelism is a great addition... very nice! (BTW - is that in 0.8, or are we talking a new product variant?) regards Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
Mark Kirkwood wrote: - I am happy that seqscan is cpu bound after ~110M/s (It's cpu bound on my old P3 system even earlier than that) Ahem - after reading Alan's postings I am not so sure, ISTM that there is some more investigation required here too :-). ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Alan Stange wrote: Another data point. We had some down time on our system today to complete some maintenance work. It took the opportunity to rebuild the 700GB file system using XFS instead of Reiser. One iostat output for 30 seconds is avg-cpu: %user %nice%sys %iowait %idle 1.580.00 19.69 31.94 46.78 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdd 343.73175035.73 277.555251072 8326 while doing a select count(1) on the same large table as before. Subsequent iostat output all showed that this data rate was being maintained. The system is otherwise mostly idle during this measurement. The sequential read rate is 175MB/s. The system is the same as earlier, one cpu is idle and the second is ~40% busy doing the scan and ~60% idle. This is postgresql 8.1rc1, 32KB block size. No tuning except for using a 1024KB read ahead. The peak speed of the attached storage is 200MB/s (a 2Gb/s fiber channel controller). I see no reason why this configuration wouldn't generate higher IO rates if a faster IO connection were available. Can you explain again why you think there's an IO ceiling of 120MB/s because I really don't understand? I think what is going on here is that Luke's observation of the 120 Mb/s rate is taken from data using 8K block size - it looks like we can get higher rates with 32K. A quick test on my P3 system seems to support this (the numbers are a bit feeble, but the difference is interesting): The test is SELECT 1 FROM table, stopping Pg and unmounting the file system after each test. 8K blocksize: 25 s elapsed 48 % idle from vmstat (dual cpu system) 70 % busy from gstat (Freebsd GEOM io monitor) 181819 pages in relation 56 Mb/s effective IO throughput 32K blocksize: 23 s elapsed 44 % idle from vmstat 80 % busy from gstat 45249 pages in relation 60 Mb/s effective IO throughput I re-ran these several times - very repeatable (+/- 0.25 seconds). This is Freebsd 6.0 with the readahead set to 16 blocks, UFS2 filesystem created with 32K blocksize (both cases). It might be interesting to see the effect of using 16K (the default) with the 8K Pg block size, I would expect this to widen the gap. Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Mark Kirkwood wrote: The test is SELECT 1 FROM table That should read "The test is SELECT count(1) FROM table" ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: So that leaves the question - why not more than 64% of the I/O scan rate? And why is it a flat 64% as the I/O subsystem increases in speed from 333-400MB/s? It might be interesting to see what effect reducing the cpu consumption entailed by the count aggregation has - by (say) writing a little bit of code to heap scan the desired relation (sample attached). Cheers Mark /* * fastcount.c * * Do a count that uses considerably less CPU time than an aggregate. */ #include "postgres.h" #include "funcapi.h" #include "access/heapam.h" #include "catalog/namespace.h" #include "utils/builtins.h" extern Datum fastcount(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(fastcount); Datum fastcount(PG_FUNCTION_ARGS) { text *relname = PG_GETARG_TEXT_P(0); RangeVar *relrv; Relationrel; HeapScanDesc scan; HeapTuple tuple; int64 result = 0; /* Use the name to get a suitable range variable and open the relation. */ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); rel = heap_openrv(relrv, AccessShareLock); /* Start a heap scan on the relation. */ scan = heap_beginscan(rel, SnapshotNow, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { result++; } /* End the scan and close up the relation. */ heap_endscan(scan); heap_close(rel, AccessShareLock); PG_RETURN_INT64(result); } ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, This is an excellent idea – unfortunately I’m in Maui right now (Mahalo!) and I’m not getting to testing with this. My first try was with 8.0.3 and it’s an 8.1 function I presume. Not to be lazy – but any hint as to how to do the same thing for 8.0? Yeah, it's 8.1 - I didn't think to check against 8.0. The attached variant works with 8.0.4 (textToQualifiedNameList needs 2 args) cheers Mark P.s. Maui eh, sounds real nice. /* * fastcount.c * * Do a count that uses considerably less CPU time than an aggregate. * * (Variant for 8.0.x - textToQualifiedNameList needs 2 args) */ #include "postgres.h" #include "funcapi.h" #include "access/heapam.h" #include "catalog/namespace.h" #include "utils/builtins.h" extern Datum fastcount(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(fastcount); Datum fastcount(PG_FUNCTION_ARGS) { text *relname = PG_GETARG_TEXT_P(0); RangeVar *relrv; Relationrel; HeapScanDesc scan; HeapTuple tuple; int64 result = 0; /* Use the name to get a suitable range variable and open the relation. */ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname, "")); rel = heap_openrv(relrv, AccessShareLock); /* Start a heap scan on the relation. */ scan = heap_beginscan(rel, SnapshotNow, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { result++; } /* End the scan and close up the relation. */ heap_endscan(scan); heap_close(rel, AccessShareLock); PG_RETURN_INT64(result); } ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: 12.9GB of DBT-3 data from the lineitem table llonergan=# select relpages from pg_class where relname='lineitem'; relpages -- 1579270 (1 row) 1579270*8192/100 12937 Million Bytes or 12.9GB llonergan=# \timing Timing is on. llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 197870.105 ms So 198 seconds is the uncached read time with count (Just for clarity, did you clear the Pg and filesystem caches or unmount / remount the filesystem?) llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 49912.164 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 49218.739 ms and ~50 seconds is the (partially) cached read time with count llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 33752.778 ms llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 34543.646 ms llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 34528.053 ms so ~34 seconds is the (partially) cached read time for fastcount - I calculate this to give ~362Mb/s effective IO rate (I'm doing / by 1024*1024 not 1000*1000) FWIW. While this is interesting, you probably want to stop Pg, unmount the filesystem, and restart Pg to get the uncached time for fastcount too (and how does this compare to uncached read with dd using the same block size?). But at this stage it certainly looks the the heapscan code is pretty efficient - great! Oh - and do you want to try out 32K block size, I'm interested to see what level of improvement you get (as my system is hopelessly cpu bound...)! Analysis: Bandwidth Percent of max dd Read 407MB/s 100% Count(1)263MB/s 64.6% HeapScan383MB/s 94.1% Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, It would be nice to put some tracers into the executor and see where the time is going. I'm also curious about the impact of the new 8.1 virtual tuples in reducing the executor overhead. In this case my bet's on the agg node itself, what do you think? Yeah - it's pretty clear that the count aggregate is fairly expensive wrt cpu - However, I am not sure if all agg nodes suffer this way (guess we could try a trivial aggregate that does nothing for all tuples bar the last and just reports the final value it sees). Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: That says it's something else in the path. As you probably know there is a page lock taken, a copy of the tuple from the page, lock removed, count incremented for every iteration of the agg node on a count(*). Is the same true of a count(1)? Sorry Luke - message 3 - I seem to be suffering from a very small working memory buffer myself right now, I think it's after a day of working with DB2 ... :-) Anyway, as I read src/backend/parser/gram.y:6542 - count(*) is transformed into count(1), so these two are identical. Cheers (last time tonight, promise!) Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, Time: 197870.105 ms So 198 seconds is the uncached read time with count (Just for clarity, did you clear the Pg and filesystem caches or unmount / remount the filesystem?) Nope - the longer time is due to the "second write" known issue with Postgres - it writes the data to the table, but all of the pages are marked dirty? So, always on the first scan after loading they are written again. This is clear as you watch vmstat - the pattern on the first seq scan is half read / half write. Ah - indeed - first access after a COPY no? I should have thought of that, sorry! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hardware/OS recommendations for large databases (
Tom Lane wrote: Greg Stark <[EMAIL PROTECTED]> writes: Last I heard the reason count(*) was so expensive was because its state variable was a bigint. That means it doesn't fit in a Datum and has to be alloced and stored as a pointer. And because of the Aggregate API that means it has to be allocated and freed for every tuple processed. There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil Conway IIRC). It certainly makes quite a difference as I measure it: doing select(1) from a 181000 page table (completely uncached) on my PIII: 8.0 : 32 s 8.1 : 25 s Note that the 'fastcount()' function takes 21 s in both cases - so all the improvement seems to be from the count overhead reduction. Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Merlin Moncure wrote: It certainly makes quite a difference as I measure it: doing select(1) from a 181000 page table (completely uncached) on my PIII: 8.0 : 32 s 8.1 : 25 s Note that the 'fastcount()' function takes 21 s in both cases - so all the improvement seems to be from the count overhead reduction. Are you running windows? There is a big performance improvement in count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate... No - FreeBSD 6.0 on a dual PIII 1 Ghz. The slow cpu means that the 8.1 improvements are very noticeable! A point of interest - applying Niels palloc - avoiding changes to NodeAgg.c and int8.c in 8.0 changes those results to: 8.0 + palloc avoiding patch : 27 s (I am guessing the remaining 2 s could be shaved off if I backported 8.1's virtual tuples - however that looked like a lot of work) Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SAN/NAS options
Charles Sprickman wrote: Hello all, It seems that I'm starting to outgrow our current Postgres setup. We've been running a handful of machines as standalone db servers. This is all in a colocation environment, so everything is stuffed into 1U Supermicro boxes. Our standard build looks like this: Supermicro 1U w/SCA backplane and 4 bays 2x2.8 GHz Xeons Adaptec 2015S "zero channel" RAID card 2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped) 2GB RAM FreeBSD 4.11 PGSQL data from 5-10GB per box Recently I started studying what we were running up against in our nightly runs that do a ton of updates/inserts to prep things for the tasks the db does during the business day (light mix of selects/inserts/updates). While we have plenty of disk bandwidth (according to bonnie), we are really dying on IOPS. I'm guessing this is a mix of a rather anemic RAID controller (ever notice how adaptec doesn't publish any real performance specs on raid cards?) and having only two or four spindles (effectively 1 or 2 on writes). So that's where we are... I'm new to the whole SAN thing, but did recently pick up a few used NetApp shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, also used) to toy with. I started wondering if I could put something together to both get our storage on one set of boxes and allow me to get data striped across more drives. Our budget is not huge and we are not adverse to getting used gear where appropriate. What do you folks recommend? I'm just starting to look at what's out there for SANs and NAS, and from what I've seen, our options are: Leaving the whole SAN issue for a moment: It would be interesting to see if moving to FreeBSD 6.0 would help you - the vfs layer is no longer throttled by the (SMP) GIANT lock in this version, and that may make quite a difference (given you have SMP boxes). Another interesting thing to try is rebuilding the database ufs filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 16K/2K - can't recall the default on 4.x). I found this to give a factor of 2 speedup on random disk access (specifically queries doing indexed joins). Is it mainly your 2 disk machines that are IOPS bound? if so, a cheap option may be to buy 2 more cheetahs for them! If it's the 4's, well how about a 2U U320 diskpack from whomever supplies you the Supermicro boxes? I have just noticed Luke's posting - I would second the advice to avoid SAN - in my experience it's an expensive way to buy storage. best wishes Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple Join
Kevin Brown wrote: I'll just start by warning that I'm new-ish to postgresql. I'm running 8.1 installed from source on a Debian Sarge server. I have a simple query that I believe I've placed the indexes correctly for, and I still end up with a seq scan. It makes sense, kinda, but it should be able to use the index to gather the right values. I do have a production set of data inserted into the tables, so this is running realistically: dli=# explain analyze SELECT ordered_products.product_id dli-# FROM to_ship, ordered_products dli-# WHERE to_ship.ordered_product_id = ordered_products.id AND dli-# ordered_products.paid = TRUE AND dli-# ordered_products.suspended_sub = FALSE; You scan 60 rows from to_ship to get about 25000 - so some way to cut this down would help. Try out an explicit INNER JOIN which includes the filter info for paid and suspended_sub in the join condition (you may need indexes on each of id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap scan): SELECT ordered_products.product_id FROM to_ship INNER JOIN ordered_products ON (to_ship.ordered_product_id = ordered_products.id AND ordered_products.paid = TRUE AND ordered_products.suspended_sub = FALSE); ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simple Join
Kevin Brown wrote: I only had two explicit indexes. One was on to_ship.ordered_product_id and the other was on ordered_products.paid. ordered_products.id is a primary key. This is on your query with an index added on suspended_sub: dli=# explain analyze SELECT ordered_products.product_id dli-# FROM to_ship INNER JOIN ordered_products dli-# ON (to_ship.ordered_product_id = ordered_products.id dli(# AND ordered_products.paid = TRUE AND dli(# ordered_products.suspended_sub = FALSE); QUERY PLAN - Hash Join (cost=5126.19..31528.40 rows=20591 width=8) (actual time=4554.190..23519.618 rows=14367 loops=1) Hash Cond: ("outer".ordered_product_id = "inner".id) -> Seq Scan on to_ship (cost=0.00..11529.12 rows=611612 width=8) (actual time=11.254..15192.042 rows=611612 loops=1) -> Hash (cost=4954.79..4954.79 rows=21759 width=16) (actual time=4494.900..4494.900 rows=18042 loops=1) -> Index Scan using paid_index on ordered_products (cost=0.00..4954.79 rows=21759 width=16) (actual time=72.431..4414.697 rows=18042 loops=1) Index Cond: (paid = true) Filter: (paid AND (NOT suspended_sub)) Total runtime: 23532.785 ms (8 rows) Well - that had no effect at all :-) You don't have and index on to_ship.ordered_product_id do you? - try adding one (ANALYZE again), and let use know what happens (you may want to play with SET enable_seqscan=off as well). And also, if you are only ever interested in paid = true and suspended_sub = false, then you can recreate these indexes as partials - e.g: CREATE INDEX paid_index ON ordered_products (paid) WHERE paid = true; CREATE INDEX suspended_sub_index ON ordered_products (suspended_sub) WHERE suspended_sub = false; So what's the best way to performance wiggle this info out of the db? The list of values is only about 30 tuples long out of this query, so I was figuring I could trigger on insert to to_ship to place the value into another table if it didn't already exist. I'd rather the writing be slow than the reading. Yeah - all sort of horrible denormalizations are possible :-), hopefully we can get the original query to work ok, and avoid the need to add code or triggers to you app. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Simple Join
Kevin Brown wrote: On Wednesday 14 December 2005 18:36, you wrote: Well - that had no effect at all :-) You don't have and index on to_ship.ordered_product_id do you? - try adding one (ANALYZE again), and let use know what happens (you may want to play with SET enable_seqscan=off as well). I _DO_ have an index on to_ship.ordered_product_id. It's a btree. Sorry - read right past it! Did you try out enable_seqscan=off? I'm interested to see if we can get 8.1 bitmap anding the three possibly useful columns together on ordered_products and *then* doing the join to to_ship. Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple Join
Mitch Skinner wrote: I saw that; what I'm suggesting is that that you try creating a 3-column index on ordered_products using the paid, suspended_sub, and id columns. In that order, I think, although you could also try the reverse. It may or may not help, but it's worth a shot--the fact that all of those columns are used together in the query suggests that you might do better with a three-column index on those. With all three columns indexed individually, you're apparently not getting the bitmap plan that Mark is hoping for. I imagine this has to do with the lack of multi-column statistics in postgres, though you could also try raising the statistics target on the columns of interest. Setting enable_seqscan to off, as others have suggested, is also a worthwhile experiment, just to see what you get. Right on. Some of these "coerced" plans may perform much better. If so, we can look at tweaking your runtime config: e.g. effective_cache_size random_page_cost default_statistics_target to see if said plans can be chosen "naturally". cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Overriding the optimizer
Craig A. James wrote: I asked a while back if there were any plans to allow developers to override the optimizer's plan and force certain plans, and received a fairly resounding "No". The general feeling I get is that a lot of work has gone into the optimizer, and by God we're going to use it! I think this is just wrong, and I'm curious whether I'm alone in this opinion. Over and over, I see questions posted to this mailing list about execution plans that don't work out well. Many times there are good answers - add an index, refactor the design, etc. - that yield good results. But, all too often the answer comes down to something like this recent one: > Right on. Some of these "coerced" plans may perform > much better. If so, we can look at tweaking your runtime > config: e.g. > > effective_cache_size > random_page_cost > default_statistics_target > > to see if said plans can be chosen "naturally". I see this over and over. Tweak the parameters to "force" a certain plan, because there's no formal way for a developer to say, "I know the best plan." I hear what you are saying, but to use this fine example - I don't know what the best plan is - these experiments part of an investigation to find *if* there is a better plan, and if so, why Postgres is not finding it. There isn't a database in the world that is as smart as a developer, or that can have insight into things that only a developer can possibly know. That is often true - but the aim is to get Postgres's optimizer closer to developer smartness. After years of using several other database products (some supporting hint type constructs and some not), I have come to believe that hinting (or similar) actually *hinders* the development of a great optimizer. Best wishes Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Overriding the optimizer
Craig A. James wrote: What would be cool would be some way the developer could alter the plan, but they way of doing so would strongly encourage the developer to send the information to this mailing list. Postgres would essentially say, "Ok, you can do that, but we want to know why!" Yeah it would - an implementation I have seen that I like is where the developer can supply the *entire* execution plan with a query. This is complex enough to make casual use unlikely :-), but provides the ability to try out other plans, and also fix that vital query that must run today. cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simple Join
David Lang wrote: On Fri, 16 Dec 2005, Mark Kirkwood wrote: Right on. Some of these "coerced" plans may perform much better. If so, we can look at tweaking your runtime config: e.g. effective_cache_size random_page_cost default_statistics_target to see if said plans can be chosen "naturally". Mark, I've seen these config options listed as tweaking targets fairly frequently, has anyone put any thought or effort into creating a test program that could analyse the actual system and set the defaults based on the measured performance? I am sure this has been discussed before, I found this thread - http://archives.postgresql.org/pgsql-performance/2004-07/msg00189.php but I seem to recall others (but offhand can't find any of them). I think that the real difficultly here is that the construction of the test program is non trivial - for instance, the best test program for tuning *my* workload is my application with its collection of data, but it is probably not a good test program for *anyone else's* workload. cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN/NAS options
Jim C. Nasby wrote: On Wed, Dec 14, 2005 at 08:28:56PM +1300, Mark Kirkwood wrote: Another interesting thing to try is rebuilding the database ufs filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 16K/2K - can't recall the default on 4.x). I found this to give a factor of 2 speedup on random disk access (specifically queries doing indexed joins). Even if you're doing a lot of random IO? I would think that random IO would perform better if you use smaller (8K) blocks, since there's less data being read in and then just thrown away that way. Yeah, that's what I would have expected too! but the particular queries I tested do a ton of random IO (correlation of 0.013 on the join column for the big table). I did wonder if the gain has something to do with the underlying RAID stripe size (64K or 256K in my case), as I have only tested the 32K vs 8K/16K on RAIDed systems. I guess for a system where the number of concurrent users give rise to memory pressure, it will cause more thrashing of the file buffer cache, much could be a net loss. Still worth trying out I think, you will know soon enough if it is a win or lose! Note that I did *not* alter Postgres page/block size (BLCKSZ) from 8K, so no dump/reload is required to test this out. cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Bruce Momjian wrote: How are star joins different from what we do now? --- Recall that a "star" query with n tables means a query where there are (n - 1) supposedly small tables (dimensions) and 1 large table (fact) - which has foreign keys to each dimension. As I understand it, the classic "tar join" is planned like this: 1) The result of the restriction clauses on each of the (small) dimension tables is computed. 2) The cartesian product of all the results of 1) is formed. 3) The fact (big) table is joined to the pseudo relation formed by 2). From what I have seen most vendor implementations do not (always) perform the full cartesion product of the dimensions, but do some of them, join to the fact, then join to the remaining dimensions afterwards. There is another join strategy called the "star transformation" where some of the dimension joins get rewritten as subqueries, then the above method is used again! This tends to be useful when the cartesion products would be stupidly large (e.g. "sparse" facts, or few restriction clauses) regards Mark P.s : Luke or Simon might have a better definition... but thought I'd chuck in my 2c... :-) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Tom Lane wrote: Simon Riggs <[EMAIL PROTECTED]> writes: On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote: How are star joins different from what we do now? Methods: 1. join all N small tables together in a cartesian product, then join to main Large table once (rather than N times) Of course, the reason the current planner does not think of this is that it does not consider clauseless joins unless there is no alternative. However, I submit that it wouldn't pick such a plan anyway, and should not, because the idea is utterly stupid. The plan you currently get for this sort of scenario is typically a nest of hash joins: QUERY PLAN Hash Join (cost=2.25..4652.25 rows=102400 width=16) Hash Cond: ("outer".f1 = "inner".f1) -> Hash Join (cost=1.12..3115.12 rows=102400 width=12) Hash Cond: ("outer".f2 = "inner".f1) -> Seq Scan on fact (cost=0.00..1578.00 rows=102400 width=8) -> Hash (cost=1.10..1.10 rows=10 width=4) -> Seq Scan on d2 (cost=0.00..1.10 rows=10 width=4) -> Hash (cost=1.10..1.10 rows=10 width=4) -> Seq Scan on d1 (cost=0.00..1.10 rows=10 width=4) (9 rows) This involves only one scan of the fact table. As each row is pulled up through the nest of hash joins, we hash one dimension key and join to one small table at each level. This is at worst the same amount of work as hashing all the keys at once and probing a single cartesian-product hashtable, probably less work (fewer wasted key-comparisons). And definitely less memory. You do have to keep your eye on the ball that you don't waste a lot of overhead propagating the row up through multiple join levels, but we got rid of most of the problem there in 8.1 via the introduction of "virtual tuple slots". If this isn't fast enough yet, it'd make more sense to invest effort in further cutting the executor's join overhead (which of course benefits *all* plan types) than in trying to make the planner choose a star join. 2. transform joins into subselects, then return subselect rows via an index bitmap. Joins are performed via a bitmap addition process. This one might be interesting but it's not clear what you are talking about. "Bitmap addition"? Yeah - the quoted method of "make a cartesian product of the dimensions and then join to the fact all at once" is not actually used (as written) in many implementations - probably for the reasons you are pointing out. I found these two papers whilst browsing: http://www.cs.brown.edu/courses/cs227/Papers/Indexing/O'NeilGraefe.pdf http://www.dama.upc.edu/downloads/jaguilar-2005-4.pdf They seem to be describing a more subtle method making use of join indexes and bitmapped indexes. If I understand it correctly, the idea is to successively build up a list (hash / bitmap) of fact RIDS that will satisfy the query, and when complete actually perform the join and construct tuples. The goal being similar in intent to the star join method (i.e. access the fact table as little and as "late" as possible), but avoiding the cost of actually constructing the dimension cartesian product. cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Tom Lane wrote: 2. transform joins into subselects, then return subselect rows via an index bitmap. Joins are performed via a bitmap addition process. Looks like 8.1 pretty much does this right now: First the basic star: EXPLAIN ANALYZE SELECT d0.dmth, d1.dat, count(f.fval ) FROM dim0 AS d0, dim1 AS d1, fact0 AS f WHERE d0.d0key = f.d0key AND d1.d1key = f.d1key AND d0.dyr BETWEEN 2010 AND 2015 AND d1.dattyp BETWEEN '10th measure type' AND '14th measure type' GROUP BY d0.dmth, d1.dat ; QUERY PLAN HashAggregate (cost=334842.41..334846.53 rows=329 width=37) (actual time=144317.960..144318.814 rows=120 loops=1) -> Hash Join (cost=145.72..334636.91 rows=27400 width=37) (actual time=1586.363..142831.025 rows=201600 loops=1) Hash Cond: ("outer".d0key = "inner".d0key) -> Hash Join (cost=89.72..333279.41 rows=137001 width=37) (actual time=1467.322..135585.317 rows=100 loops=1) Hash Cond: ("outer".d1key = "inner".d1key) -> Seq Scan on fact0 f (cost=0.00..281819.45 rows=1045 width=12) (actual time=120.881..70364.473 rows=1000 loops=1) -> Hash (cost=89.38..89.38 rows=137 width=33) (actual time=24.822..24.822 rows=660 loops=1) -> Index Scan using dim1_dattyp on dim1 d1 (cost=0.00..89.38 rows=137 width=33) (actual time=0.502..19.374 rows=660 loops=1) Index Cond: (((dattyp)::text >= '10th measure type'::text) AND ((dattyp)::text <= '14th measure type'::text)) -> Hash (cost=51.00..51.00 rows=2000 width=8) (actual time=31.620..31.620 rows=2016 loops=1) -> Index Scan using dim0_dyr on dim0 d0 (cost=0.00..51.00 rows=2000 width=8) (actual time=0.379..17.377 rows=2016 loops=1) Index Cond: ((dyr >= 2010) AND (dyr <= 2015)) Total runtime: 144320.588 ms (13 rows) Now apply the star transformation: EXPLAIN ANALYZE SELECT d0.dmth, d1.dat, count(f.fval ) FROM dim0 AS d0, dim1 AS d1, fact0 AS f WHERE d0.d0key = f.d0key AND d1.d1key = f.d1key AND d0.dyr BETWEEN 2010 AND 2015 AND d1.dattyp BETWEEN '10th measure type' AND '14th measure type' AND f.d0key IN (SELECT cd0.d0key FROM dim0 cd0 WHERE cd0.dyr BETWEEN 2010 AND 2015) AND f.d1key IN (SELECT cd1.d1key FROM dim1 cd1 WHERE cd1.dattyp BETWEEN '10th measure type' AND '14th measure type') GROUP BY d0.dmth, d1.dat ; QUERY PLAN -- HashAggregate (cost=129230.89..129231.83 rows=75 width=37) (actual time=39798.192..39799.015 rows=120 loops=1) -> Nested Loop IN Join (cost=149.44..129230.33 rows=75 width=37) (actual time=269.919..38125.520 rows=201600 loops=1) -> Hash Join (cost=147.43..128171.03 rows=375 width=45) (actual time=269.516..27342.866 rows=201600 loops=1) Hash Cond: ("outer".d0key = "inner".d0key) -> Nested Loop (cost=91.43..128096.03 rows=2000 width=37) (actual time=152.084..19869.365 rows=100 loops=1) -> Hash Join (cost=91.43..181.52 rows=2 width=37) (actual time=29.931..46.339 rows=660 loops=1) Hash Cond: ("outer".d1key = "inner".d1key) -> Index Scan using dim1_dattyp on dim1 d1 (cost=0.00..89.38 rows=137 width=33) (actual time=0.516..7.683 rows=660 loops=1) Index Cond: (((dattyp)::text >= '10th measure type'::text) AND ((dattyp)::text <= '14th measure type'::text)) -> Hash (cost=91.09..91.09 rows=137 width=4) (actual time=29.238..29.238 rows=660 loops=1) -> HashAggregate (cost=89.72..91.09 rows=137 width=4) (actual time=20.940..24.900 rows=660 loops=1) -> Index Scan using dim1_dattyp on dim1 cd1 (cost=0.00..89.38 rows=137 width=4) (actual time=0.042..14.841 rows=660 loops=1) Index Cond: (((dattyp)::text >= '10th measure type'::text) AND ((dattyp)::text <= '14th measure type'::text)) -> Index Scan using fact0_d1key on fact0 f (cost=0.00..62707.26 rows=10 width=12) (actual time=0.205..12.691 rows=1515 loops=660) Index Cond: ("outer".d1key = f.d1key) -> Hash (cost=51.00..51.00 rows=2000 width=8) (actual time=31.264..31.264 rows=2016 loops=1) -> Index Scan using dim0_dyr on dim0 d0 (cost=0.00..51
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Simon Riggs wrote: On Sun, 2005-12-18 at 17:07 +1300, Mark Kirkwood wrote: Tom Lane wrote: 2. transform joins into subselects, then return subselect rows via an index bitmap. Joins are performed via a bitmap addition process. Looks like 8.1 pretty much does this right now: Good analysis. 8.1 doesn't do: - the transforms sufficiently well (you just performed them manually) Absolutely - I was intending to note that very point, but it got lost somewhere between brain and fingers :-) - doesn't AND together multiple bitmaps to assist with N-way joins Ah yes - I had overlooked that, good point! Cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Simon Riggs wrote: On Sun, 2005-12-18 at 15:02 +1300, Mark Kirkwood wrote: Yeah - the quoted method of "make a cartesian product of the dimensions and then join to the fact all at once" is not actually used (as written) in many implementations But it is used in some, which is why I mentioned it. I gave two implementations, that is just (1) Sorry Simon, didn't mean to imply you shouldn't have mentioned it - was merely opining about its effectiveness - probably for the reasons you are pointing out. I found these two papers whilst browsing: http://www.cs.brown.edu/courses/cs227/Papers/Indexing/O'NeilGraefe.pdf http://www.dama.upc.edu/downloads/jaguilar-2005-4.pdf They seem to be describing a more subtle method making use of join indexes and bitmapped indexes. Which is the option (2) I described. Ok - I misunderstood you on this one, and thought you were describing the "star transformation" - upon re-reading, I see that yes, it's more or less a description of the O'Neil Graefe method. best wishes Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Simon Riggs wrote: On Sat, 2005-12-17 at 13:13 -0500, Tom Lane wrote: Simon Riggs <[EMAIL PROTECTED]> writes: On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote: How are star joins different from what we do now? Methods: 1. join all N small tables together in a cartesian product, then join to main Large table once (rather than N times) Of course, the reason the current planner does not think of this is that it does not consider clauseless joins unless there is no alternative. Understood The plan you currently get for this sort of scenario is typically a nest of hash joins: QUERY PLAN Hash Join (cost=2.25..4652.25 rows=102400 width=16) Hash Cond: ("outer".f1 = "inner".f1) -> Hash Join (cost=1.12..3115.12 rows=102400 width=12) Hash Cond: ("outer".f2 = "inner".f1) -> Seq Scan on fact (cost=0.00..1578.00 rows=102400 width=8) -> Hash (cost=1.10..1.10 rows=10 width=4) -> Seq Scan on d2 (cost=0.00..1.10 rows=10 width=4) -> Hash (cost=1.10..1.10 rows=10 width=4) -> Seq Scan on d1 (cost=0.00..1.10 rows=10 width=4) (9 rows) This involves only one scan of the fact table. As each row is pulled up through the nest of hash joins, we hash one dimension key and join to one small table at each level. Understood This is at worst the same amount of work as hashing all the keys at once and probing a single cartesian-product hashtable, probably less work (fewer wasted key-comparisons). And definitely less memory. You do have to keep your eye on the ball that you don't waste a lot of overhead propagating the row up through multiple join levels, but we got rid of most of the problem there in 8.1 via the introduction of "virtual tuple slots". If this isn't fast enough yet, it'd make more sense to invest effort in further cutting the executor's join overhead (which of course benefits *all* plan types) than in trying to make the planner choose a star join. That join type is used when an index-organised table is available, so that a SeqScan of the larger table can be avoided. I'd say the plan would make sense if the columns of the cartesian product match a multi-column index on the larger table that would not ever be used unless sufficient columns are restricted in each lookup. That way you are able to avoid the SeqScan that occurs for the multiple nested Hash Join case. (Clearly, normal selectivity rules apply on the use of the index in this way). So I think that plan type still can be effective in some circumstances. Mind you: building an N-way index on a large table isn't such a good idea, unless you can partition the tables and still use a join. Which is why I've not centred on this case as being important before now. My understanding: Teradata and DB2 use this. FWIW - I think DB2 uses the successive fact RID buildup (i.e method 2), unfortunately I haven't got a working copy of DB2 in front of me to test. Cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Simple Join
Mark Kirkwood wrote: Kevin Brown wrote: I'll just start by warning that I'm new-ish to postgresql. I'm running 8.1 installed from source on a Debian Sarge server. I have a simple query that I believe I've placed the indexes correctly for, and I still end up with a seq scan. It makes sense, kinda, but it should be able to use the index to gather the right values. I do have a production set of data inserted into the tables, so this is running realistically: dli=# explain analyze SELECT ordered_products.product_id dli-# FROM to_ship, ordered_products dli-# WHERE to_ship.ordered_product_id = ordered_products.id AND dli-# ordered_products.paid = TRUE AND dli-# ordered_products.suspended_sub = FALSE; You scan 60 rows from to_ship to get about 25000 - so some way to cut this down would help. Try out an explicit INNER JOIN which includes the filter info for paid and suspended_sub in the join condition (you may need indexes on each of id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap scan): SELECT ordered_products.product_id FROM to_ship INNER JOIN ordered_products ON (to_ship.ordered_product_id = ordered_products.id AND ordered_products.paid = TRUE AND ordered_products.suspended_sub = FALSE); It has been a quiet day today, so I took another look at this. If the selectivity of clauses : paid = TRUE suspended_sub = FALSE is fairly high, then rewriting as a subquery might help: SELECT o.product_id FROM ordered_products o WHERE o.paid = TRUE AND o.suspended_sub = FALSE AND EXISTS ( SELECT 1 FROM to_ship s WHERE s.ordered_product_id = o.id ); However it depends on you not needing anything from to_ship in the SELECT list... Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?
Vivek Khera wrote: and only the opteron boxes needed to come from sun. add a zero return policy and you wonder how they expect to keep in business sorry, i had to vent. Just out of interest - why did the opterons need to come from Sun? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: RES: [PERFORM] Priority to a mission critical transaction
Josh Berkus wrote: All, The Bizgres project is working on resource management for PostgreSQL. So far, however, they have been able to come up with schemes that work for BI/DW at the expense of OLTP. Becuase of O^N lock checking issues, resource management for OLTP which doesn't greatly reduce overall performance seems a near-impossible task. Right - I guess it is probably more correct to say that the implementation used in Bizgres is specifically targeted at BI/DW workloads rather than OLTP. At this point we have not measured its impact on concurrency in anything other than a handwaving manner - e.g pgbench on an older SMP system showed what looked like about a 10% hit. However the noise level for pgbench is typically >10% so - a better benchmark on better hardware is needed. Cheers Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] RES: Priority to a mission critical transaction
Ron Mayer wrote: Short summary: * Papers studying priority inversion issues with databases including PosgreSQL and realistic workloads conclude setpriority() helps even in the presence of priority inversion issues for TCP-C and TCP-W like workloads. * Avoiding priority inversion with priority inheritance will further help some workloads (TCP-C) more than others (TCP-W) but even without such schedulers priority inversion does not cause as much harm as the benefit you get from indirectly scheduling I/O through setpriority() in any paper I've seen. Andreas Kostyrka wrote: * Carlos H. Reimer <[EMAIL PROTECTED]> [061128 20:02]: Will the setpriority() system call affect i/o queue too? Nope, and in fact the article shows the way not to do it. Actually *YES* setpriority() does have an indirect effect on the I/O queue. While I was at Greenplum a related point was made to me: For a TPC-H/BI type workload on a well configured box the IO subsystem can be fast enough so that CPU is the bottleneck for much of the time - so being able to use setpriority() as a resource controller makes sense. Also, with such a workload being mainly SELECT type queries, the dangers connected with priority inversion are considerably reduced. Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Bad iostat numbers
Carlos H. Reimer wrote: While collecting performance data I discovered very bad numbers in the I/O subsystem and I would like to know if I´m thinking correctly. Here is a typical iostat -x: avg-cpu: %user %nice %system %iowait %idle 50.400.000.501.10 48.00 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 7.80 0.40 6.40 41.60 113.6020.80 56.8022.82 570697.50 10.59 147.06 100.00 sdb 0.20 7.80 0.60 6.40 40.00 113.6020.00 56.8021.94 570697.509.83 142.86 100.00 md1 0.00 0.00 1.20 13.40 81.60 107.2040.80 53.6012.93 0.000.00 0.00 0.00 md0 0.00 0.00 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 Are they not saturated? They look it (if I'm reading your typical numbers correctly) - %util 100 and svctime in the region of 100 ms! On the face of it, looks like you need something better than a RAID1 setup - probably RAID10 (RAID5 is probably no good as you are writing more than you are reading it seems). However read on... If this is a sudden change in system behavior, then it is probably worth trying to figure out what is causing it (i.e which queries) - for instance it might be that you have some new queries that are doing disk based sorts (this would mean you really need more memory rather than better disk...) Cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Bad iostat numbers
David Boreham wrote: These number look a bit strange. I am wondering if there is a hardware problem on one of the drives or on the controller. Check in syslog for messages about disk timeouts etc. 100% util but 6 writes/s is just wrong (unless the drive is a 1980's vintage floppy). Agreed - good call, I was misreading the wkB/s as wMB/s... ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] SQL_CALC_FOUND_ROWS in POSTGRESQL / Some one can
Chris wrote: It's the same as doing a select count(*) type query using the same clauses, but all in one query instead of two. It doesn't return any extra rows on top of the limit query so it's better than using pg_numrows which runs the whole query and returns it to php (in this example). Their docs explain it: http://dev.mysql.com/doc/refman/4.1/en/information-functions.html See "FOUND_ROWS()" Note that from the same page: "If you are using SELECT SQL_CALC_FOUND_ROWS, MySQL must calculate how many rows are in the full result set. However, this is faster than running the query again without LIMIT, because the result set need not be sent to the client." So it is not as cost-free as it would seem - the CALC step is essentially doing "SELECT count(*) FROM (your-query)" in addition to your-query-with-the-limit. I don't buy the "its cheap 'cause nothing is returned to the client" bit, because 'SELECT count(*) ...' returns only 1 tuple of 1 element to the client anyway. On the face of it, it *looks* like you save an extra set of parse, execute, construct (trivially small) resultset calls - but 'SELECT FOUND_ROWS()' involves that set of steps too, so I'm not entirely convinced that doing a 2nd 'SELECT count(*)...' is really any different in impact. Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] New to PostgreSQL, performance considerations
Daniel van Ham Colchete wrote: On 12/11/06, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote: On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote: > What PostgreSQL benchmark software should I use??? Look up the list archives; search for "TPC". > I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get > the same version FC6 uses and install it at my Gentoo. I'll use the > same hardware (diferent partitions to each). Why do you want to compare FC6 and Gentoo? Wasn't your point that the -march= was supposed to be the relevant factor here? In that case, you want to keep all other things equal; so use the same distribution, only with -O2 -march=i686 vs. -march=athlon-xp (or whatever). /* Steinar */ Using Gentoo is just a easy way to make cflag optimizations to all the other libs as well: glibc, ... I can also mesure performance on Gentoo with cflag optimized PostgreSQL and plain PostgreSQL as well. I can certainly recall that when I switched from Fedora Core (2 or 3 can't recall now) to Gentoo that the machine was "faster" for many activities. Of course I can't recall precisely what now :-(. To actually track down *why* and *what* make it faster is another story, and custom CFLAGS is only 1 of the possible factors: others could be: - different kernel versions (Gentoo would have possibly been later) - different kernel patches (both RedHat and Gentoo patch 'em) - different versions of glibc (Gentoo possibly later again). - different config options for glibc (not sure if they in fact are, but it's possible...) - kernel and glibc built with different versions of gcc (again I suspect Gentoo may have used a later version) So there are a lot of variables to consider if you want to settle this debate once and for all :-)! Best wishes Mark ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] EXPLAIN ANALYZE on 8.2
Kelly Burkhart wrote: I hope this isn't a "crummy mainboard" but I can't seem to affect things by changing clock source (kernel 2.6.16 SLES10). I tried kernel command option clock=XXX where XXX in (cyclone,hpet,pmtmr,tsc,pit), no option was significantly better than the default. Anyone know how this might be improved (short of replacing hardware)? Updating the BIOS might be worth investigating, and then bugging your Linux distro mailing list/support etc for more help. (What sort of motherboard is it?) Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] slow result
Laurent Manchon wrote: Hi, I have a slow response of my PostgreSQL database 7.4 using this query below on a table with 80 rows: select count(*)from tbl; PostgreSQL return result in 28 sec every time. Can you post the results of: analyze verbose tbl; explain analyze select count(*) from tbl; The first will give us some info about how many pages tbl has (in 7.4 ISTR it does not state the # of dead rows... but anyway), the second should help us deduce why it is so slow. Also as others have pointed out, later versions are quite a bit faster for sequential scans... Cheers Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] extract(field from timestamp) vs date dimension
Tobias Brox wrote: I suppose the strongest argument for introducing date dimensions already now is that I probably will benefit from having conform and well-designed dimensions when I will be introducing more data marts. As for now I have only one fact table and some few dimensions in the system. Another factors to consider is that end user tools (and end users) may find a date/time dimension helpful. Best wishes Mark ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] How long should it take to insert 200,000 records?
Karen Hill wrote: The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). CREATE TABLE foo1 ( ) ; CREATE TABLE foo2 ( ); CREATE VIEW viewfoo AS ( ); CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( ); CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (x); END LOOP; END; $$ LANGUAGE plpgsql; Sorry - but we probably need *still* more detail! - the definition of viewfoo is likely to be critical. For instance a simplified variant of your setup does 20 inserts in 5s on my PIII tualatin machine: CREATE TABLE foo1 (x INTEGER); CREATE VIEW viewfoo AS SELECT * FROM foo1; CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( INSERT INTO foo1 VALUES (new.x); ) CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (i); END LOOP; END; $$ LANGUAGE plpgsql; postgres=# \timing postgres=# SELECT functionFoo() ; functionfoo - (1 row) Time: 4659.477 ms postgres=# SELECT count(*) FROM viewfoo; count 20 (1 row) Cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] quad or dual core Intel CPUs
Kenji Morishige wrote: Please comment on any issues you may see with this box and my assumptions. Also any FreeBSD kernel issues or tweaks you could recommend. I would recommend posting to freebsd-hardware or freebsd-stable and asking if there are any gotchas with the X7DBE+ and 6.2 (for instance X7DBR-8+ suffers from an intermittent hang at boot... so can't hurt to ask!) Cheers Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres performance Linux vs FreeBSD
Jacek Zarêba wrote: Hello, I've set up 2 identical machines, hp server 1ghz p3, 768mb ram, 18gb scsi3 drive. On the first one I've installed Debian/GNU 4.0 Linux, on the second FreeBSD 6.2. On both machines I've installed Postgresql 8.2.3 from sources. Now the point :)) According to my tests postgres on Linux box run much faster then on FreeBSD, here are my results: With respect to 'select count(*) from ...' being slower on FreeBSD, there are a number of things to try to make FreeBSD faster for this sort of query. Two I'm currently using are: - setting sysctl vfs.read_max to 16 or 32 - rebuilding the relevant filesystem with 32K blocks and 4K frags I have two (almost) identical systems - one running Gentoo, one running FreeBSD 6.2. With the indicated changes the FreeBSD system performs pretty much the same as the Gentoo one. With respect to the 'explain analyze' times, FreeBSD has a more accurate and more expensive gettimeofday call - which hammers its 'explain analyze' times compared to Linux. Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] increasing database connections
Joshua D. Drake wrote: Jonah H. Harris wrote: On 3/1/07, Shiva Sarna <[EMAIL PROTECTED]> wrote: I am sorry if it is a repeat question but I want to know if database performance will decrease if I increase the max-connections to 2000. At present it is 100. Most certainly. Adding connections over 200 will degrade performance dramatically. You should look into pgpool or connection pooling from the application. huh? That is certainly not my experience. I have systems that show no depreciable performance hit on even 1000+ connections. To be fair to the discussion, these are on systems with 4+ cores. Usually 8+ and significant ram 16/32 gig fo ram. Yeah - I thought that somewhere closer to 1 connections is where you get hit with socket management related performance issues. Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] SCSI vs SATA
Ron wrote: I read them as soon as they were available. Then I shrugged and noted YMMV to myself. 1= Those studies are valid for =those= users under =those= users' circumstances in =those= users' environments. How well do those circumstances and environments mimic anyone else's? Exactly, understanding whether the studies are applicable to you is the critical step - before acting on their conclusions! Thanks Ron, for the thoughtful analysis on this topic! Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] index structure for 114-dimension vector
Jeff Davis wrote: On Fri, 2007-04-20 at 12:07 -0700, Andrew Lazarus wrote: I have a table with 2.5 million real[] arrays. (They are points in a time series.) Given a new array X, I'd like to find, say, the 25 closest to X in some sense--for simplification, let's just say in the usual vector norm. Speed is critical here, and everything I have tried has been too slow. I imported the cube contrib package, and I tried creating an index on a cube of the last 6 elements, which are the most important. Then I Have you tried just making normal indexes on each of the last 6 elements and see if postgresql will use a BitmapAnd to combine them? I don't think that will work for the vector norm i.e: |x - y| = sqrt(sum over j ((x[j] - y[j])^2)) Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] index structure for 114-dimension vector
Andrew Lazarus wrote: Because I know the 25 closest are going to be fairly close in each coordinate, I did try a multicolumn index on the last 6 columns and used a +/- 0.1 or 0.2 tolerance on each. (The 25 best are very probably inside that hypercube on the distribution of data in question.) This hypercube tended to have 10-20K records, and took at least 4 seconds to retrieve. I was a little surprised by how long that took. So I'm wondering if my data representation is off the wall. I should mention I also tried a cube index using gist on all 114 elements, but CREATE INDEX hadn't finished in 36 hours, when I killed it, and I wasn't in retrospect sure an index that took something like 6GB by itself would be helpful on a 2GB of RAM box. MK> I don't think that will work for the vector norm i.e: MK> |x - y| = sqrt(sum over j ((x[j] - y[j])^2)) Sorry, in that case it probably *is* worth trying out 6 single column indexes and seeing if they get bitmap and'ed together... Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] postgres: 100% CPU utilization
Scott Marlowe wrote: (snippage) that's the kinda hardware I was running 7.0.2 on, so you might as well get retro in your hardware department while you're at it. Notice he's running FreeBSD 4.4(!), so it could well be a very old machine... Cheers Mark ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] postgres: 100% CPU utilization
Sergey Tsukinovsky wrote: Just for the record - the hardware that was used for the test has the following parameters: AMD Opteron 2GHZ 2GB RAM LSI Logic SCSI And you ran FreeBSD 4.4 on it right? This may be a source of high cpu utilization in itself if the box is SMP or dual core, as multi-cpu support was pretty primitive in that release (4.12 would be better if you are required to stick to the 4.x branch, if not the 6.2 is recommended)! Cheers Mark ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Josh Berkus wrote: Sebastian, Before inventing a hyper tool, we might consider to provide 3-5 example szenarios for common hardware configurations. This consumes less time and be discussed and defined in a couple of days. This is of course not the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system but these are probably not the target for default configurations. That's been suggested a number of times, but some GUCs are really tied to the *exact* amount of RAM you have available. So I've never seen how "example configurations" could help. I'm not convinced about this objection - having samples gives a bit of a heads up on *what* knobs you should at least look at changing. Also it might be helpful on the -general or -perf lists to be able to say "try config 3 (or whatever we call 'em) and see what changes..." I've certainly found the sample config files supplied with that database whose name begins with 'M' a useful *start* when I want something better than default... Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Disk Fills Up and fsck "Compresses" it
Jim C. Nasby wrote: No, it's part of FreeBSD's UFS. google FreeBSD softupdates and you should get plenty of info. As I said, it's probably not worth worrying about. On Wed, May 16, 2007 at 08:21:23AM -0700, Y Sidhu wrote: What do you mean by "softupdates?" Is that a parameter in what I am guessing is the conf file? Here is quite a good article on the interaction of fsck and softupdates in FreeBSD: http://www.usenix.org/publications/library/proceedings/bsdcon02/mckusick/mckusick_html/index.html Having said that, it seems to talk about space lost by unreferenced blocks and inodes is the context of panic or power outage, as opposed to normal softupdate operation (unless I'm missing something...) Cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Weird 8.2.4 performance
Kurt Overberg wrote: Explain Outputs: -- 8.2 -> Bitmap Heap Scan on taskinstance (cost=20.71..2143.26 rows=556 width=8) (actual time=421.423..5655.745 rows=98 loops=9) Recheck Cond: (taskinstance.taskid = task.id) -> Bitmap Index Scan on taskid_taskinstance_key (cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709 rows=196 loops=9) -- 8.0 -> Index Scan using taskid_taskinstance_key on taskinstance (cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832 rows=145 loops=11) 8.2 is deciding to use a bitmap index scan on taskid_taskinstance_key, which seems to be slower (!) than a plain old index scan that 8.0 is using. A dirty work around is to disable bitmap scans via: SET enable_bitmapscan=off but it is probably worthwhile to try to find out *why* the bitmap scan is 1) slow and 2) chosen at all given 1). One thought that comes to mind - is work_mem smaller on your 8.2 system than the 8.0 one? (or in fact is it very small on both?). Also it might be interesting to see your non-default postgresql.conf settings for both systems. Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Weird 8.2.4 performance
Steinar H. Gunderson wrote: On Thu, Jun 07, 2007 at 11:35:27AM +0200, Steinar H. Gunderson wrote: If that table doesn't fit in the cache on your Mac, you pretty much lose. From the EXPLAIN output, it looks like it fits very nicely in cache on your server. Thus, I don't think the difference is between 8.0 and 8.2, but rather your production server and your test machine. That's a good point, however its not immediately obvious that the production server is *not* running MacOSX Tiger (or has any more memory)... Kurt can you post the relevant specs for the the 8.0 and 8.2 boxes? Cheers Mark ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Weird 8.2.4 performance
Kurt Overberg wrote: work_mem = 100MB# when I ran the original query, this was set to 1MB, increased on Mark Kirkwood's advice, seemed to help a bit but not really For future reference, be careful with this parameter, as *every* connection will use this much memory for each sort or hash (i.e it's not shared and can be allocated several times by each connection!)...yeah, I know I suggested increasing it to see what effect it would have :-). And I'd agree with Steiner and others, looks like caching effects are the cause of the timing difference between production and the mac! Cheers Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] disk filling up
Brandon Shalton wrote: Hello all, My hard disk is filling up in the /base directory to where it has consumed all 200gig of that drive. All the posts that i see keep saying move to a bigger drive, but at some point a bigger drive would just get consumed. How can i keep the disk from filling up other than get like a half TB setup just to hold the ./base/* folder Two things come to mind: - Vacuum as already mentioned by others. - Temporary sort files from queries needing to sort massive amounts of data. But you need to help us out by supplying more info... Cheers Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] When/if to Reindex
Tom Lane wrote: The fly in the ointment is that after collecting the pg_index definition of the index, plancat.c also wants to know how big it is --- it calls RelationGetNumberOfBlocks. And that absolutely does look at the physical storage, which means it absolutely is unsafe to do in parallel with a REINDEX that will be dropping the old physical storage at some point. So maybe we are stuck and we have to say "that doesn't work anymore". But it feels like we might not be too far away from letting it still work. Thoughts, ideas? A suggestion that seems a bit like a leap backwards in time - maybe just use the pg_class.relpages entry for the index size? I'm punting that with autovacuum being enabled by default now, the relpages entries for all relations will be more representative than they used to in previous releases. Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] File system choice for Red Hat systems
I'm helping set up a Red Hat 5.5 system for Postgres. I was going to recommend xfs for the filesystem - however it seems that xfs is supported as a technology preview "layered product" for 5.5. This apparently means that the xfs tools are only available via special channels. What are Red Hat using people choosing for a good performing filesystem? regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] File system choice for Red Hat systems
On 02/06/10 15:26, Tom Lane wrote: What's your time horizon? RHEL6 will have full support for xfs. On RHEL5 I really wouldn't consider anything except ext3. Yeah, RHEL6 seems like the version we would prefer - unfortunately time frame is the next few days. Awesome - thanks for the quick reply! regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] File system choice for Red Hat systems
On 02/06/10 17:17, Devrim GÜNDÜZ wrote: For xfs, you may want to read this: http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html Thanks - yes RHEL6 is the version we would have liked to use I suspect! Regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] File system choice for Red Hat systems
On 03/06/10 02:53, Alan Hodgson wrote: On Tuesday 01 June 2010, Mark Kirkwood wrote: I'm helping set up a Red Hat 5.5 system for Postgres. I was going to recommend xfs for the filesystem - however it seems that xfs is supported as a technology preview "layered product" for 5.5. This apparently means that the xfs tools are only available via special channels. What are Red Hat using people choosing for a good performing filesystem? I've run PostgreSQL on XFS on CentOS for years. It works well. Make sure you have a good battery-backed RAID controller under it (true for all filesystems). Thanks - yes, left to myself I would consider using Centos instead. However os choice is prescribed in this case I believe. Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Weird XFS WAL problem
On 03/06/10 11:30, Craig James wrote: I'm testing/tuning a new midsize server and ran into an inexplicable problem. With an RAID10 drive, when I move the WAL to a separate RAID1 drive, TPS drops from over 1200 to less than 90! I've checked everything and can't find a reason. Are the 2 new RAID1 disks the same make and model as the 12 RAID10 ones? Also, are barriers *on* on the RAID1 mount and off on the RAID10 one? Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On 16/06/10 18:30, jgard...@jonathangardner.net wrote: On Jun 15, 4:18 pm, j...@agliodbs.com (Josh Berkus) wrote: On 6/15/10 10:37 AM, Chris Browne wrote: I'd like to see some figures about WAL on RAMfs vs. simply turning off fsync and full_page_writes. Per Gavin's tests, PostgreSQL is already close to TokyoCabinet/MongoDB performance just with those turned off; I wonder if actually having the WAL on a memory partition would make any real difference in throughput. I've seen a lot of call for this recently, especially since PostgreSQL seems to be increasingly in use as a reporting server for Hadoop. Might be worth experimenting with just making wal writing a no-op. We'd also want to disable checkpointing, of course. My back-of-the-envelope experiment: Inserting single integers into a table without indexes using a prepared query via psycopg2. Python Script: import psycopg2 from time import time conn = psycopg2.connect(database='jgardner') cursor = conn.cursor() cursor.execute("CREATE TABLE test (data int not null)") conn.commit() cursor.execute("PREPARE ins AS INSERT INTO test VALUES ($1)") conn.commit() start = time() tx = 0 while time() - start< 1.0: cursor.execute("EXECUTE ins(%s)", (tx,)); conn.commit() tx += 1 print tx cursor.execute("DROP TABLE test"); conn.commit(); Local disk, WAL on same FS: * Default config => 90 * full_page_writes=off => 90 * synchronous_commit=off => 4,500 * fsync=off => 5,100 * fsync=off and synchronous_commit=off => 5,500 * fsync=off and full_page_writes=off => 5,150 * fsync=off, synchronous_commit=off and full_page_writes=off => 5,500 tmpfs, WAL on same tmpfs: * Default config: 5,200 * full_page_writes=off => 5,200 * fsync=off => 5,250 * synchronous_commit=off => 5,200 * fsync=off and synchronous_commit=off => 5,450 * fsync=off and full_page_writes=off => 5,250 * fsync=off, synchronous_commit=off and full_page_writes=off => 5,500 NOTE: If I do one giant commit instead of lots of littler ones, I get much better speeds for the slower cases, but I never exceed 5,500 which appears to be some kind of wall I can't break through. If there's anything else I should tinker with, I'm all ears. Seeing some profiler output (e.g oprofile) for the fastest case (and maybe 'em all later) might be informative about what limit is being hit here. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] wal_synch_method = open_sync safe on RHEL 5.5?
Some more on the RHEL 5.5 system I'm helping to setup. Some benchmarking using pgbench appeared to suggest that wal_sync_method=open_sync was a little faster than fdatasync [1]. Now I recall some discussion about this enabling direct io and the general flakiness of this on Linux, so is the option regarded as safe? [1] The workout: $ pgbench -i -s 1000 bench $ pgbench -c [1,2,4,8,32,64,128] -t 1 Performance peaked around 2500 tps @32 clients using open_sync and 2200 with fdatasync. However the disk arrays are on a SAN and I suspect that when testing with fdatasync later in the day there may have been workload 'leakage' from other hosts hitting the SAN.
Re: [PERFORM] wal_synch_method = open_sync safe on RHEL 5.5?
On 18/06/10 15:29, Greg Smith wrote: P.S. Be wary of expecting pgbench to give you useful numbers on a single run. For the default write-heavy test, I recommend three runs of 10 minutes each (-T 600 on recent PostgreSQL versions) before I trust any results it gives. You can get useful data from the select-only test in only a few seconds, but not the one that writes a bunch. Yeah, I did several runs of each, and a couple with -c 128 and -t 10 to give the setup a good workout (also 2000-2400 tps, nice to see a well behaved SAN). Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [Slony1-general] WAL partition overloaded--by autovacuum?
On 07/07/10 13:10, Richard Yen wrote: This leads me to believe that there was a sudden flurry of write activity that occurred, and the process that would flush WAL files to /db/data/ couldn't keep up, thereby filling up the disk. I'm wondering if anyone else out there might be able to give me some insight or comments to my assessment--is it accurate? Any input would be helpful, and I'll try to make necessary architectural changes to keep this from happening again. Do you have wal archiving enabled? (if so lets see your archive_command). Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On 10/07/10 00:56, Brad Nicholson wrote: On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote: Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. This sounds similar to the approach to taken with Replication for years before being moved into core. Just like replication, pooling has different approaches. I do think that in both cases, having a solution that works, easily, out of the "box" will meet the needs of most users. There is also the issue of perception/adoption here as well. One of my colleagues mentioned that at PG East that he repeatedly heard people talking (negatively) about the over reliance on add-on packages to deal with core DB functionality. It would be interesting to know more about what they thought an 'over reliance' was and which packages they meant. While clearly in the case of replication something needed to be done to make it better and easier, it is not obvious that the situation with connection pools is analogous. For instance we make extensive use of PgBouncer, and it seems to do the job fine and is ridiculously easy to install and setup. So would having (something like) this in core be an improvement? Clearly if the 'in core' product is better then it is desirable... similarly if the packaged product is better... well let's have that then! I've certainly observed a 'fear of package installation' on the part of some folk, which is often a hangover from the 'Big IT shop' mentality where it requires blood signatures and child sacrifice to get anything new installed. regards Mark P.s Also note that Database Vendors like pooling integrated in the core of *their* product because it is another thing to charge a license for. Unfortunately this can also become an entrenched mentality of 'must be in core' on the part of consultants etc! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Linux Filesystems again - Ubuntu this time
I'm involved with the setup of replacement hardware for one of our systems. It is going to be using Ubuntu Lucid Server (kernel 2.6.32 I believe). The issue of filesystems has raised its head again. I note that ext4 is now the default for Lucid, what do folks think about using it: stable enough now? Also xfs has seen quite a bit of development in these later kernels, any thoughts on that? Cheers Mark P.s: We are quite keen to move away from ext3, as we have encountered its tendency to hit a wall under heavy load and leave us waiting for kjournald and pdflush to catch up -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Advice configuring ServeRAID 8k for performance
On 06/08/10 06:28, Kenneth Cox wrote: I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM running CentOS 5.4 x86_64. I have a ServeRAID 8k controller with 6 SATA 7500RPM disks in RAID 6, and for the OLAP workload it feels* slow. I have 6 more disks to add, and the RAID has to be rebuilt in any case, but first I would like to solicit general advice. I know that's little data to go on, and I believe in the scientific method, but in this case I don't have the time to make many iterations. My questions are simple, but in my reading I have not been able to find definitive answers: 1) Should I switch to RAID 10 for performance? I see things like "RAID 5 is bad for a DB" and "RAID 5 is slow with <= 6 drives" but I see little on RAID 6. RAID 6 was the original choice for more usable space with good redundancy. My current performance is 85MB/s write, 151 MB/s reads (using dd of 2xRAM per http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm). Normally I'd agree with the others and recommend RAID10 - but you say you have an OLAP workload - if it is *heavily* read biased you may get better performance with RAID5 (more effective disks to read from). Having said that, your sequential read performance right now is pretty low (151 MB/s - should be double this), which may point to an issue with this controller. Unfortunately this *may* be important for an OLAP workload (seq scans of big tables). 2) Should I configure the ext3 file system with noatime and/or data=writeback or data=ordered? My controller has a battery, the logical drive has write cache enabled (write-back), and the physical devices have write cache disabled (write-through). Probably wise to use noatime. If you have a heavy write workload (i.e so what I just wrote above does *not* apply), then you might find adjusting the ext3 commit interval upwards from its default of 5 seconds can help (I'm doing some testing at the moment and commit=20 seemed to improve performance by about 5-10%). 3) Do I just need to spend more time configuring postgresql? My non-default settings were largely generated by pgtune-0.9.3: max_locks_per_transaction = 128 # manual; avoiding "out of shared memory" default_statistics_target = 100 maintenance_work_mem = 1GB constraint_exclusion = on checkpoint_completion_target = 0.9 effective_cache_size = 16GB work_mem = 352MB wal_buffers = 32MB checkpoint_segments = 64 shared_buffers = 2316MB max_connections = 32 Possibly higher checkpoint_segments and lower wal_buffers (I recall someone - maybe Greg suggesting that there was no benefit in having the latter > 10MB). I wonder about setting shared_buffers higher - how large is the database? Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Advice configuring ServeRAID 8k for performance
On 06/08/10 11:58, Alan Hodgson wrote: On Thursday, August 05, 2010, Mark Kirkwood wrote: Normally I'd agree with the others and recommend RAID10 - but you say you have an OLAP workload - if it is *heavily* read biased you may get better performance with RAID5 (more effective disks to read from). Having said that, your sequential read performance right now is pretty low (151 MB/s - should be double this), which may point to an issue with this controller. Unfortunately this *may* be important for an OLAP workload (seq scans of big tables). Probably a low (default) readahead limitation. ext3 doesn't help but it can usually get up over 400MB/sec. Doubt it's the controller. Yeah - good suggestion, so cranking up readahead (man blockdev) and retesting is recommended. Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Advice configuring ServeRAID 8k for performance
On 06/08/10 12:31, Mark Kirkwood wrote: On 06/08/10 11:58, Alan Hodgson wrote: On Thursday, August 05, 2010, Mark Kirkwood wrote: Normally I'd agree with the others and recommend RAID10 - but you say you have an OLAP workload - if it is *heavily* read biased you may get better performance with RAID5 (more effective disks to read from). Having said that, your sequential read performance right now is pretty low (151 MB/s - should be double this), which may point to an issue with this controller. Unfortunately this *may* be important for an OLAP workload (seq scans of big tables). Probably a low (default) readahead limitation. ext3 doesn't help but it can usually get up over 400MB/sec. Doubt it's the controller. Yeah - good suggestion, so cranking up readahead (man blockdev) and retesting is recommended. ... sorry, it just occurred to wonder about the stripe or chunk size used in the array, as making this too small can also severely hamper sequential performance. Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance
On 18/08/10 06:19, Kevin Grittner wrote: Since you haven't set effective_cache_size, you're discouraging some types of plans which might be worth considering. This should normally be set to the sum of your shared_buffers setting and whatever is cached by the OS; try setting effective_cache_size to 15MB. Speaking of shared_buffers, are you really at the default for that, too? If so, try setting it to somewhere between 1GB and 4GB. (I would test at 1, 2, and 4 if possible, since the best setting is dependent on workload.) Kevin - I'm guessing you meant to suggest setting effective_cache_size to 15GB (not 15MB) Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
On 24/09/10 09:50, Tobias Brox wrote: We've come to a tipping point with one of our database servers, it's generally quite loaded but up until recently it was handling the load well - but now we're seeing that it struggles to process all the selects fast enough. Sometimes we're observing some weird lock-like behaviour (see my other post on that), but most of the time the database server is just not capable of handling the load fast enough (causing the queries to pile up in the pg_stat_activity-view). My main hypothesis is that all the important indexes would fit snuggly into the memory before, and now they don't. We'll eventually get the server moved over to new and improved hardware, but while waiting for that to happen we need to do focus on reducing the memory footprint of the database. I have some general questions now ... 1) Are there any good ways to verify my hypothesis? Some months ago I thought of running some small memory-gobbling program on the database server just to see how much memory I could remove before we would see indications of the database being overloaded. It seems a bit radical, but I think the information learned from such an experiment would be very useful ... and we never managed to set up any testing environment that faithfully replicates production traffic. Anyway, it's sort of too late now that we're already observing performance problems even without the memory gobbling script running. 2) I've seen it discussed earlier on this list ... shared_buffers vs OS caches. Some claims that it has very little effect to adjust the size of the shared buffers. Anyway, isn't it a risk that memory is wasted because important data is stored both in the OS cache and the shared buffers? What would happen if using almost all the available memory for shared buffers? Or turn it down to a bare minimum and let the OS do almost all the cache handling? 3) We're discussing to drop some overlapping indexes ... i.e. to drop one out of two indexes looking like this: some_table(a) some_table(a,b) Would the query "select * from some_table where a=?" run slower if we drop the first index? Significantly? (in our situation I found that the number of distinct b's for each a is low and that the usage stats on the second index is quite low compared with the first one, so I think we'll drop the second index). 4) We're discussing to drop other indexes. Does it make sense at all as long as we're not experiencing problems with inserts/updates? I suppose that if the index isn't used it will remain on disk and won't affect the memory usage ... but what if the index is rarely used ... wouldn't it be better to do a seqscan on a table that is frequently accessed and mostly in memory than to consult an index that is stored on the disk? Sorry for all the stupid questions ;-) All good questions! Before (or maybe as well as) looking at index sizes vs memory I'd check to see if any of your commonly run queries have suddenly started to use different plans due to data growth, e.g: - index scan to seq scan (perhaps because effective_cache_size is too small now) - hash agg to sort (work_mem too small now) We had a case of the 1st point happen here a while ago, symptoms looked very like what you are describing. Re index size, you could try indexes like: some_table(a) some_table(b) which may occupy less space, and the optimizer can bitmap and/or them to work like the compound index some_table(a,b). regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
On 29/09/10 19:41, Tobias Brox wrote: I just got this crazy, stupid or maybe genius idea :-) Now, my idea is to drop that fat index and replace it with conditional indexes for a dozen of heavy users - like those: acc_trans(trans_type, created) where customer_id=224885; acc_trans(trans_type, created) where customer_id=643112; acc_trans(trans_type, created) where customer_id=15; or maybe like this: acc_trans(customer_id, trans_type, created) where customer_id in ( ... ); Any comments? My sysadmin is worried that it would be a too big hit on performance when doing inserts. It may also cause more overhead when planning the queries. Is that significant? Is this idea genius or stupid or just somewhere in between? Yeah, I think the idea of trying to have a few smaller indexes for the 'hot' customers is a good idea. However I am wondering if just using single column indexes and seeing if the bitmap scan/merge of smaller indexes is actually more efficient is worth testing - i.e: acc_trans(trans_type); acc_trans(created); acc_trans(customer_id); It may mean that you have to to scrutinize your effective_cache_size and work_mem parameters, but could possibly be simpler and more flexible. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
On 30/09/10 01:09, Tobias Brox wrote: With the most popular trans type it chose another plan and it took more than 3s (totally unacceptable): Try tweeking effective_cache_size up a bit and see what happens - I've found these bitmap plans to be sensitive to it sometimes. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 13/10/10 19:47, Neil Whelchel wrote: Nope... So, possible conclusions are: 1. Even with VACUUM database table speed degrades as tables are updated. 2. Time testing on a freshly INSERTed table gives results that are not real- world. 3. Filesystem defragmentation helps (some). 4. Cache only makes a small difference once a table has been UPDATEd. I am going to leave this configuration running for the next day or so. This way I can try any suggestions and play with any more ideas that I have. I will try these same tests on ext4 later, along with any good suggested tests. I will try MySQL with the dame data with both XFS and ext4. -Neil- I think that major effect you are seeing here is that the UPDATE has made the table twice as big on disk (even after VACUUM etc), and it has gone from fitting in ram to not fitting in ram - so cannot be effectively cached anymore. This would not normally happen in real life (assuming UPDATEs only modify a small part of a table per transaction). However administration updates (e.g 'oh! - ref 1 should now be ref 2 please update everything') *will* cause the table size to double. This is an artifact of Postgres's non overwriting storage manager - Mysql will update in place and you will not see this. Try VACUUM FULL on the table and retest. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance