Re: [PERFORM] postgresql geqo optimization
On Fri, Feb 10, 2006 at 08:46:14PM -0500, uwcssa wrote: > I am wondering if anyone here ever had complex queries that the GEQO fails > to work properly, i.e., finds a terrible query plan as compared to one > found by DP optimizer (by forcing Postgresql always uses DP).This is > important to me since I am trying to see what type of queries will be worth > spending a lot of time doing a thorough DP optimization (if it is going to > be executed again and again). There have been a few problems earlier on this list which might have been the geqo's fault; search the list archives for "geqo" or "genetic", and you should be able to find them quite easily. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [HACKERS] What do the Windows pg hackers out there like for dev
Ron wrote: Subject line says it all. I'm going to be testing changes under both Linux and WinXP, so I'm hoping those of you that do M$ hacking will pass along your list of suggestions and/or favorite (and hated so I know what to avoid) tools. Testing only? So you really only need to build and run on Windows... I was doing exactly this about a year ago and used Mingw. The only annoyance was that I could compile everything on Linux in about 3 minutes (P4 2.8Ghz), but had to wait about 60-90 minutes for the same thing on Windows 2003 Server! (also a P4 2.8Ghz...). So I used to build a 'go for coffee' task into the build and test cycle. Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] postgresql geqo optimization
I have a question with regard to GEQO optimizer of Postgresql. For complex queries with over 12 tables in a join, (12 is the default value), the Postgresql optimizer by default will not use the dynamic programming style optimizer. Instead, it uses genetic algorithm to compute a sub-optimal query plan. The reason is that GEQO takes sub-seconds to find a query plan while the DP style optimizer will take minutes or even hours to optimize a complex query with large join degree. I am wondering if anyone here ever had complex queries that the GEQO fails to work properly, i.e., finds a terrible query plan as compared to one found by DP optimizer (by forcing Postgresql always uses DP). This is important to me since I am trying to see what type of queries will be worth spending a lot of time doing a thorough DP optimization (if it is going to be executed again and again). thanks a lot!
Re: [PERFORM] joining two tables slow due to sequential scan
OK, if I'm reading this correctly, it looks like the planner is choosing a sequential scan because it expects 48,000 rows for that patientidentifier, but its actually only getting 3. The planner has the number of rows right for the sequential scan, so it seems like the stats are up to date. I would try increasing the stats for the patientindentifier column with 'alter table set statistics...' or increasing the default_statistics_target for the whole DB. Once you have changed the stats I believe you need to run analyze again. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones Sent: Friday, February 10, 2006 4:59 PM To: Tom Lane Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] joining two tables slow due to sequential scan ok here is real db the first query I had seems to make no sense because it is only fast if I limit the rows since almost all rows have status = 'AC' second query tables both have about 10 million rows and it takes a long time as you can see but this person only has approx 160 total documents QUERY PLAN --- Hash Join (cost=84813.14..1510711.97 rows=48387 width=555) (actual time=83266.854..91166.315 rows=3 loops=1) Hash Cond: ("outer".documentidentifier = "inner".dssdocumentidentifier) -> Seq Scan on documentversions (cost=0.00..269141.98 rows=9677398 width=415) (actual time=0.056..49812.459 rows=9677398 loops=1) -> Hash (cost=83660.05..83660.05 rows=48036 width=140) (actual time=10.833..10.833 rows=3 loops=1) -> Bitmap Heap Scan on clinicaldocuments (cost=301.13..83660.05 rows=48036 width=140) (actual time=0.243..0.258 rows=3 loops=1) Recheck Cond: (patientidentifier = 690193) -> Bitmap Index Scan on ix_cdocpid (cost=0.00..301.13 rows=48036 width=0) (actual time=0.201..0.201 rows=3 loops=1) Index Cond: (patientidentifier = 690193) Total runtime: 91166.540 ms Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 5:52 PM To: Tim Jones Cc: Scott Marlowe; Dave Dutcher; pgsql-performance@postgresql.org Subject: Re: [PERFORM] joining two tables slow due to sequential scan "Tim Jones" <[EMAIL PROTECTED]> writes: > QUERY PLAN > 'Hash Join (cost=899.83..4384.17 rows=482 width=1350) (actual > time=0.203..0.203 rows=0 loops=1)' > ... > 'Total runtime: 0.392 ms' Hardly seems like evidence of a performance problem ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] joining two tables slow due to sequential scan
ok here is real db the first query I had seems to make no sense because it is only fast if I limit the rows since almost all rows have status = 'AC' second query tables both have about 10 million rows and it takes a long time as you can see but this person only has approx 160 total documents QUERY PLAN --- Hash Join (cost=84813.14..1510711.97 rows=48387 width=555) (actual time=83266.854..91166.315 rows=3 loops=1) Hash Cond: ("outer".documentidentifier = "inner".dssdocumentidentifier) -> Seq Scan on documentversions (cost=0.00..269141.98 rows=9677398 width=415) (actual time=0.056..49812.459 rows=9677398 loops=1) -> Hash (cost=83660.05..83660.05 rows=48036 width=140) (actual time=10.833..10.833 rows=3 loops=1) -> Bitmap Heap Scan on clinicaldocuments (cost=301.13..83660.05 rows=48036 width=140) (actual time=0.243..0.258 rows=3 loops=1) Recheck Cond: (patientidentifier = 690193) -> Bitmap Index Scan on ix_cdocpid (cost=0.00..301.13 rows=48036 width=0) (actual time=0.201..0.201 rows=3 loops=1) Index Cond: (patientidentifier = 690193) Total runtime: 91166.540 ms Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 5:52 PM To: Tim Jones Cc: Scott Marlowe; Dave Dutcher; pgsql-performance@postgresql.org Subject: Re: [PERFORM] joining two tables slow due to sequential scan "Tim Jones" <[EMAIL PROTECTED]> writes: > QUERY PLAN > 'Hash Join (cost=899.83..4384.17 rows=482 width=1350) (actual > time=0.203..0.203 rows=0 loops=1)' > ... > 'Total runtime: 0.392 ms' Hardly seems like evidence of a performance problem ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] joining two tables slow due to sequential scan
"Tim Jones" <[EMAIL PROTECTED]> writes: > QUERY PLAN > 'Hash Join (cost=899.83..4384.17 rows=482 width=1350) (actual > time=0.203..0.203 rows=0 loops=1)' > ... > 'Total runtime: 0.392 ms' Hardly seems like evidence of a performance problem ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] joining two tables slow due to sequential scan
On Fri, 2006-02-10 at 16:43, Tim Jones wrote: > oops > > QUERY PLAN > 'Hash Join (cost=899.83..4384.17 rows=482 width=1350) (actual > time=0.203..0.203 rows=0 loops=1)' > ' Hash Cond: ("outer".documentidentifier = > "inner".dssdocumentidentifier)' > ' -> Seq Scan on documentversions (cost=0.00..2997.68 rows=96368 > width=996) (actual time=0.007..0.007 rows=1 loops=1)' > ' -> Hash (cost=898.62..898.62 rows=482 width=354) (actual > time=0.161..0.161 rows=0 loops=1)' > '-> Bitmap Heap Scan on clinicaldocuments (cost=4.69..898.62 > rows=482 width=354) (actual time=0.159..0.159 rows=0 loops=1)' > ' Recheck Cond: (patientidentifier = 123)' > ' -> Bitmap Index Scan on ix_cdocpid (cost=0.00..4.69 > rows=482 width=0) (actual time=0.153..0.153 rows=0 loops=1)' > 'Index Cond: (patientidentifier = 123)' > 'Total runtime: 0.392 ms' > > note I have done these on a smaller db than what I am using but the > plans are the same H. We really need to see what's happening on the real database to see what's going wrong. i.e. if the real database thinks it'll get 30 rows and it gets back 5,000,000 that's a problem. The query planner in pgsql is cost based, so until you have real data underneath it, and analyze it, you can't really say how it will behave for you. I.e. small test sets don't work. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] joining two tables slow due to sequential scan
"Tim Jones" <[EMAIL PROTECTED]> writes: > QUERY PLAN > 'Hash Join (cost=899.83..4384.17 rows=482 width=1350)' > ' Hash Cond: ("outer".documentidentifier = > "inner".dssdocumentidentifier)' This is not EXPLAIN ANALYZE output. Also, the rowcount estimates seem far enough off in the other query to make me wonder how long it's been since you ANALYZEd the tables... More generally, though, I don't see anything particularly wrong with this query plan. You're selecting enough of the table that an indexscan isn't necessarily a good plan. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] joining two tables slow due to sequential scan
oops QUERY PLAN 'Hash Join (cost=899.83..4384.17 rows=482 width=1350) (actual time=0.203..0.203 rows=0 loops=1)' ' Hash Cond: ("outer".documentidentifier = "inner".dssdocumentidentifier)' ' -> Seq Scan on documentversions (cost=0.00..2997.68 rows=96368 width=996) (actual time=0.007..0.007 rows=1 loops=1)' ' -> Hash (cost=898.62..898.62 rows=482 width=354) (actual time=0.161..0.161 rows=0 loops=1)' '-> Bitmap Heap Scan on clinicaldocuments (cost=4.69..898.62 rows=482 width=354) (actual time=0.159..0.159 rows=0 loops=1)' ' Recheck Cond: (patientidentifier = 123)' ' -> Bitmap Index Scan on ix_cdocpid (cost=0.00..4.69 rows=482 width=0) (actual time=0.153..0.153 rows=0 loops=1)' 'Index Cond: (patientidentifier = 123)' 'Total runtime: 0.392 ms' note I have done these on a smaller db than what I am using but the plans are the same Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 5:39 PM To: Tim Jones Cc: Dave Dutcher; pgsql-performance@postgresql.org Subject: Re: [PERFORM] joining two tables slow due to sequential scan On Fri, 2006-02-10 at 16:37, Tim Jones wrote: > for first query > > QUERY PLAN > 'Limit (cost=4.69..88.47 rows=10 width=1350) (actual > time=32.195..32.338 rows=10 loops=1)' > ' -> Nested Loop (cost=4.69..4043.09 rows=482 width=1350) (actual > time=32.190..32.316 rows=10 loops=1)' > '-> Bitmap Heap Scan on documentversions (cost=4.69..1139.40 > rows=482 width=996) (actual time=32.161..32.171 rows=10 loops=1)' > ' Recheck Cond: (documentstatus = ''AC''::bpchar)' > ' -> Bitmap Index Scan on ix_docstatus (cost=0.00..4.69 > rows=482 width=0) (actual time=31.467..31.467 rows=96368 loops=1)' > 'Index Cond: (documentstatus = ''AC''::bpchar)' > '-> Index Scan using ix_cdocdid on clinicaldocuments > (cost=0.00..6.01 rows=1 width=354) (actual time=0.006..0.007 rows=1 > loops=10)' > ' Index Cond: ("outer".documentidentifier = > clinicaldocuments.dssdocumentidentifier)' > > > for second query > > QUERY PLAN > 'Hash Join (cost=899.83..4384.17 rows=482 width=1350)' > ' Hash Cond: ("outer".documentidentifier = > "inner".dssdocumentidentifier)' > ' -> Seq Scan on documentversions (cost=0.00..2997.68 rows=96368 > width=996)' > ' -> Hash (cost=898.62..898.62 rows=482 width=354)' > '-> Bitmap Heap Scan on clinicaldocuments (cost=4.69..898.62 > rows=482 width=354)' > ' Recheck Cond: (patientidentifier = 123)' > ' -> Bitmap Index Scan on ix_cdocpid (cost=0.00..4.69 > rows=482 width=0)' > 'Index Cond: (patientidentifier = 123)' OK, the first one is explain analyze, but the second one is just plain explain... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large Database Design Help
On Fri, 2006-02-10 at 16:39, Ragnar wrote: > On Fri, 2006-02-10 at 11:24 +0100, Markus Schaber wrote: > > > For lots non-read-only database workloads, RAID5 is a performance > > killer. Raid 1/0 might be better, or having two mirrors of two disks > > each, the first mirror holding system, swap, and the PostgreSQL WAL > > files, the second one holding the data. > > I was under the impression that it is preferable to keep the WAL on > its own spindles with no other activity there, to take full advantage > of the sequential nature of the WAL writes. > > That would mean one mirror for the WAL, and one for the rest. > This, of course, may sometimes be too much wasted disk space, as the WAL > typically will not use a whole disk, so you might partition this mirror > into a small ext2 filesystem for WAL, and use the rest for files seldom > accessed, such as backups. Well, on most database servers, the actual access to the OS and swap drives should drop to about zero over time, so this is a workable solution if you've only got enough drives / drive slots for two mirrors. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] joining two tables slow due to sequential scan
On Fri, 2006-02-10 at 16:37, Tim Jones wrote: > for first query > > QUERY PLAN > 'Limit (cost=4.69..88.47 rows=10 width=1350) (actual > time=32.195..32.338 rows=10 loops=1)' > ' -> Nested Loop (cost=4.69..4043.09 rows=482 width=1350) (actual > time=32.190..32.316 rows=10 loops=1)' > '-> Bitmap Heap Scan on documentversions (cost=4.69..1139.40 > rows=482 width=996) (actual time=32.161..32.171 rows=10 loops=1)' > ' Recheck Cond: (documentstatus = ''AC''::bpchar)' > ' -> Bitmap Index Scan on ix_docstatus (cost=0.00..4.69 > rows=482 width=0) (actual time=31.467..31.467 rows=96368 loops=1)' > 'Index Cond: (documentstatus = ''AC''::bpchar)' > '-> Index Scan using ix_cdocdid on clinicaldocuments > (cost=0.00..6.01 rows=1 width=354) (actual time=0.006..0.007 rows=1 > loops=10)' > ' Index Cond: ("outer".documentidentifier = > clinicaldocuments.dssdocumentidentifier)' > > > for second query > > QUERY PLAN > 'Hash Join (cost=899.83..4384.17 rows=482 width=1350)' > ' Hash Cond: ("outer".documentidentifier = > "inner".dssdocumentidentifier)' > ' -> Seq Scan on documentversions (cost=0.00..2997.68 rows=96368 > width=996)' > ' -> Hash (cost=898.62..898.62 rows=482 width=354)' > '-> Bitmap Heap Scan on clinicaldocuments (cost=4.69..898.62 > rows=482 width=354)' > ' Recheck Cond: (patientidentifier = 123)' > ' -> Bitmap Index Scan on ix_cdocpid (cost=0.00..4.69 > rows=482 width=0)' > 'Index Cond: (patientidentifier = 123)' OK, the first one is explain analyze, but the second one is just plain explain... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large Database Design Help
On Fri, 2006-02-10 at 11:24 +0100, Markus Schaber wrote: > For lots non-read-only database workloads, RAID5 is a performance > killer. Raid 1/0 might be better, or having two mirrors of two disks > each, the first mirror holding system, swap, and the PostgreSQL WAL > files, the second one holding the data. I was under the impression that it is preferable to keep the WAL on its own spindles with no other activity there, to take full advantage of the sequential nature of the WAL writes. That would mean one mirror for the WAL, and one for the rest. This, of course, may sometimes be too much wasted disk space, as the WAL typically will not use a whole disk, so you might partition this mirror into a small ext2 filesystem for WAL, and use the rest for files seldom accessed, such as backups. gnari ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] joining two tables slow due to sequential scan
for first query QUERY PLAN 'Limit (cost=4.69..88.47 rows=10 width=1350) (actual time=32.195..32.338 rows=10 loops=1)' ' -> Nested Loop (cost=4.69..4043.09 rows=482 width=1350) (actual time=32.190..32.316 rows=10 loops=1)' '-> Bitmap Heap Scan on documentversions (cost=4.69..1139.40 rows=482 width=996) (actual time=32.161..32.171 rows=10 loops=1)' ' Recheck Cond: (documentstatus = ''AC''::bpchar)' ' -> Bitmap Index Scan on ix_docstatus (cost=0.00..4.69 rows=482 width=0) (actual time=31.467..31.467 rows=96368 loops=1)' 'Index Cond: (documentstatus = ''AC''::bpchar)' '-> Index Scan using ix_cdocdid on clinicaldocuments (cost=0.00..6.01 rows=1 width=354) (actual time=0.006..0.007 rows=1 loops=10)' ' Index Cond: ("outer".documentidentifier = clinicaldocuments.dssdocumentidentifier)' for second query QUERY PLAN 'Hash Join (cost=899.83..4384.17 rows=482 width=1350)' ' Hash Cond: ("outer".documentidentifier = "inner".dssdocumentidentifier)' ' -> Seq Scan on documentversions (cost=0.00..2997.68 rows=96368 width=996)' ' -> Hash (cost=898.62..898.62 rows=482 width=354)' '-> Bitmap Heap Scan on clinicaldocuments (cost=4.69..898.62 rows=482 width=354)' ' Recheck Cond: (patientidentifier = 123)' ' -> Bitmap Index Scan on ix_cdocpid (cost=0.00..4.69 rows=482 width=0)' 'Index Cond: (patientidentifier = 123)' thnx Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 From: Dave Dutcher [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 5:15 PM To: Tim Jones; pgsql-performance@postgresql.org Subject: RE: [PERFORM] joining two tables slow due to sequential scan What version of postgres are you using? Can you post the output from EXPLAIN ANALYZE? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones Sent: Friday, February 10, 2006 4:07 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] joining two tables slow due to sequential scan I am trying to join two tables and keep getting a sequential scan in the plan even though there is an index on the columns I am joining on. Basically this the deal ... I have two tables with docid in them which is what I am using for the join. ClinicalDocs ... (no primary key) though it does not help if I make docid primary key docid integer (index) patientid integer (index) visitid integer (index) ... Documentversions docid integer (index) docversionnumber (index) docversionidentifier (primary key) It seems to do an index scan if I put the primary key as docid. This is what occurs when I link on the patid from ClinicalDocs to patient table. However I can not make the docid primary key because it gets repeated depending on how may versions of a document I have. I have tried using a foreign key on documentversions with no sucess. In addition this query select * from documentversions join clinicaldocuments on documentversions.documentidentifier = clinicaldocuments.dssdocumentidentifier where documentversions.documentstatus = 'AC'; does index scan but if I change the order e.g select * from clinicaldocuments join documentversions on clinicaldocuments.dssdocumentidentifier = documentversions .documentidentifier where clinicaldocuments.patientidentifier= 123; does sequential scan what I need is bottom query it is extremely slow ... Any ideas ? Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] joining two tables slow due to sequential scan
On Fri, 2006-02-10 at 16:35, Tim Jones wrote: > OK. I'm gonna make a couple of guesses here: > > 1: clinicaldocuments.patientidentifier is an int8 and you're running > 7.4 or before. > > -- nope int4 and 8.1 > > 2: There are more rows with clinicaldocuments.patientidentifier= 123 > than with documentversions.documentstatus = 'AC'. > > -- nope generally speaking all statuses are 'AC' > > 3: documentversions.documentidentifier and > clinicaldocuments.dssdocumentidentifier are not the same type. > > -- nope both int4 OK then, I guess we'll need to see the explain analyze output of both of those queries. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] joining two tables slow due to sequential scan
OK. I'm gonna make a couple of guesses here: 1: clinicaldocuments.patientidentifier is an int8 and you're running 7.4 or before. -- nope int4 and 8.1 2: There are more rows with clinicaldocuments.patientidentifier= 123 than with documentversions.documentstatus = 'AC'. -- nope generally speaking all statuses are 'AC' 3: documentversions.documentidentifier and clinicaldocuments.dssdocumentidentifier are not the same type. -- nope both int4 Any of those things true? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] joining two tables slow due to sequential scan
On Fri, 2006-02-10 at 16:06, Tim Jones wrote: > > I am trying to join two tables and keep getting a sequential scan in > the plan even though there is an index on the columns I am joining > on. Basically this the deal ... I have two tables with docid in them > which is what I am using for the join. > SNIP > select * from documentversions join clinicaldocuments on > documentversions.documentidentifier > = clinicaldocuments.dssdocumentidentifier where > documentversions.documentstatus = 'AC'; > > does index scan > but if I change the order e.g > > select * from clinicaldocuments join documentversions on > clinicaldocuments.dssdocumentidentifier > = documentversions .documentidentifier where > clinicaldocuments.patientidentifier= 123; OK. I'm gonna make a couple of guesses here: 1: clinicaldocuments.patientidentifier is an int8 and you're running 7.4 or before. 2: There are more rows with clinicaldocuments.patientidentifier= 123 than with documentversions.documentstatus = 'AC'. 3: documentversions.documentidentifier and clinicaldocuments.dssdocumentidentifier are not the same type. Any of those things true? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] joining two tables slow due to sequential scan
What version of postgres are you using? Can you post the output from EXPLAIN ANALYZE? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones Sent: Friday, February 10, 2006 4:07 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] joining two tables slow due to sequential scan I am trying to join two tables and keep getting a sequential scan in the plan even though there is an index on the columns I am joining on. Basically this the deal ... I have two tables with docid in them which is what I am using for the join. ClinicalDocs ... (no primary key) though it does not help if I make docid primary key docid integer (index) patientid integer (index) visitid integer (index) ... Documentversions docid integer (index) docversionnumber (index) docversionidentifier (primary key) It seems to do an index scan if I put the primary key as docid. This is what occurs when I link on the patid from ClinicalDocs to patient table. However I can not make the docid primary key because it gets repeated depending on how may versions of a document I have. I have tried using a foreign key on documentversions with no sucess. In addition this query select * from documentversions join clinicaldocuments on documentversions.documentidentifier = clinicaldocuments.dssdocumentidentifier where documentversions.documentstatus = 'AC'; does index scan but if I change the order e.g select * from clinicaldocuments join documentversions on clinicaldocuments.dssdocumentidentifier = documentversions .documentidentifier where clinicaldocuments.patientidentifier= 123; does sequential scan what I need is bottom query it is extremely slow ... Any ideas ? Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555
[PERFORM] joining two tables slow due to sequential scan
I am trying to join two tables and keep getting a sequential scan in the plan even though there is an index on the columns I am joining on. Basically this the deal ... I have two tables with docid in them which is what I am using for the join. ClinicalDocs ... (no primary key) though it does not help if I make docid primary key docid integer (index) patientid integer (index) visitid integer (index) ... Documentversions docid integer (index) docversionnumber (index) docversionidentifier (primary key) It seems to do an index scan if I put the primary key as docid. This is what occurs when I link on the patid from ClinicalDocs to patient table. However I can not make the docid primary key because it gets repeated depending on how may versions of a document I have. I have tried using a foreign key on documentversions with no sucess. In addition this query select * from documentversions join clinicaldocuments on documentversions.documentidentifier= clinicaldocuments.dssdocumentidentifier where documentversions.documentstatus = 'AC'; does index scan but if I change the order e.g select * from clinicaldocuments join documentversions on clinicaldocuments.dssdocumentidentifier= documentversions .documentidentifier where clinicaldocuments.patientidentifier= 123; does sequential scan what I need is bottom query it is extremely slow ... Any ideas ? Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555
Re: [PERFORM] help required in design of database
On Fri, Feb 10, 2006 at 12:20:34PM -0800, david drummard wrote: > 1) create a new table every time a new feed file comes in. Create table with > indexes. Use the copy command to dump the data into the table. > 2) rename the current table to some old table name and rename the new table > to current table name so that applications can access them directly. That sounds like a working plan. > Should i create indexes before or after import ? I need to do this in > shortest period of time so that the data is always uptodate. Note that > incremental updates are not possible since almost every row will be changed > in the new file. You should create indexes after the import. Remember to pump up your memory settings (maintenance_work_mem) if you want this to be quick. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] help required in design of database
Hi, I have an unique requirement. I have a feed of 2.5 - 3 million rows of data which arrives every 1/2 an hour. Each row has 2 small string values (about 50 chars each) and 10 int values. I need searcheability and running arbitrary queries on any of these values. This means i have to create an index on every column. The feed comes in as a text file comma separated. Here is what i am planning to do 1) create a new table every time a new feed file comes in. Create table with indexes. Use the copy command to dump the data into the table. 2) rename the current table to some old table name and rename the new table to current table name so that applications can access them directly. Note that these are read only tables and it is fine if the step 2 takes a small amount of time (it is not a mission critical table hence, a small downtime of some secs is fine). My question is what is the best way to do step (1) so that after the copy is done, the table is fully indexed and properly balanced and optimized for query. Should i create indexes before or after import ? I need to do this in shortest period of time so that the data is always uptodate. Note that incremental updates are not possible since almost every row will be changed in the new file. my table creation script looks like this create table datatablenew(fe varchar(40), va varchar(60), a int, b int, c int, d int, e int, f int, g int, h int, i int, j int, k int, l int, m int, n int, o int, p int, q real); create index fe_idx on datatablenew using hash (fe); create index va_idx on datatablenew using hash(va); create index a_idx on datatablenew (a); .. create index q_idx on datatablenew(q); please advice. thanks vijay
Re: [PERFORM] Large Database Design Help
was origionally designed for Postgres 7.0 on a PIII 500Mhz and some Argh. 1) The database is very large, the largest table has 40 million tuples. Is this simple types (like a few ints, text...) ? How much space does it use on disk ? can it fit in RAM ? 2) The database needs to import 10's of thousands of tuples each night quickly. The current method is VERY slow. You bet, COMMIT'ing after each insert or update is about the worst that can be done. It works fine on MySQL/MyISAM (which doesn't know about commit...) so I'd guess the system designer had a previous experience with MySQL. My advice woule be : - get a decent machine with some RAM (I guess you already knew this)... Now, the update. I would tend to do this : - Generate a text file with your update data, using whatever tool you like best (perl, php, python, java...) - CREATE TEMPORARY TABLE blah ... - COPY blah FROM your update file. COPY is super fast. I think temporary tables don't write to the xlog, so they are also very fast. This should not take more than a few seconds for a few 10 K's of simple rows on modern hardware. It actually takes a fraction of a second on my PC for about 9K rows with 5 INTEGERs on them. You can also add constraints on your temporary table, to sanitize your data, in order to be reasonably sure that the following updates will work. The data you feed to copy should be correct, or it will rollback. This is your script's job to escape everything. Now you got your data in the database. You have several options : - You are confident that the UPDATE will work without being rolled back by some constraint violation. Therefore, you issue a big joined UPDATE to update all the rows in your main table which are also in your temp table. Then you issue an INSERT INTO ... SELECT ... to insert the ones which were not already in the big table. Joined updates can be slow if your RAM is too small and it has to thrash the disk looking for every tuple around. You can cheat and CLUSTER your main table (say, once a week), so it is all in index order. Then you arrange your update data so it is in the same order (for instance, you SELECT INTO another temp table, with an ORDER BY corresponding to the CLUSTER on the main table). Having both in the same order will help reducing random disk accesses. - If you don't like this method, then you might want to use the same strategy as before (ie. a zillion queries), but write it in PSQL instead. PSQL is a lot faster, because everything is already parsed and planned beforehand. So you could do the following : - for each row in the temporary update table : - UPDATE the corresponding row in the main table - IF FOUND, then cool, it was updated, nothing more to do. You don't need to SELECT in order to know if the row is there. UPDATE does it for you, without the race condition. - IF NOT FOUND, then insert. This has a race condition. You know your application, so you'll know if it matters or not. What do you think ? 3) I can't import new records with a COPY or drop my indexes b/c some of them are new records (INSERTS) and some are altered records (UPDATES) and the only way I can think of to identify these records is to perform a select for each record. Yes and no ; if you must do this, then use PSQL, it's a lot faster. And skip the SELECT. Also, use the latest version. It really rocks. Like many said on the list, put pg_xlog on its own physical disk, with ext2fs. 3) Wrap each load into a transaction ( tens of thousands of records per load ) That's the idea. The first strategy here (big update) uses one transaction anyway. For the other one, your choice. You can either do it all in 1 transaction, or in bunches of 1000 rows... but 1 row at a time would be horrendously slow. Regards, P.F.C ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Basic Database Performance
We are running a prototype of a system running on PHP/Postgresql on an Intel Xeon 2ghz server, 1GB RAM, 40GB hard drive, I think this is a decent server... Now, I guess you are using Apache and PHP like everyone. Know these facts : - A client connection means an apache process (think HTTP 1.1 Keep-Alives...) - The PHP interpreter in mod_php will be active during all the time it takes to receive the request, parse it, generate the dynamic page, and send it to the client to the last byte (because it is sent streaming). So, a php page that might take 10 ms to generate will actually hog an interpreter for between 200 ms and 1 second, depending on client ping time and other network latency figures. - This is actually on-topic for this list, because it will also hog a postgres connection and server process during all that time. Thus, it will most probably be slow and unscalable. The solutions I use are simple : First, use lighttpd instead of apache. Not only is it simpler to use and configure, it uses a lot less RAM and resources, is faster, lighter, etc. It uses an asynchronous model. It's there on my server, a crap Celeron, pushing about 100 hits/s, and it sits at 4% CPU and 18 megabytes of RAM in the top. It's impossible to overload this thing unless you benchmark it on gigabit lan, with 100 bytes files. Then, plug php in, using the fast-cgi protocol. Basically php spawns a process pool, and you chose the size of this pool. Say you spawn 20 PHP interpreters for instance. When a PHP page is requested, lighttpd asks the process pool to generate it. Then, a PHP interpreter from the pool does the job, and hands the page over to lighttpd. This is very fast. lighttpd handles the slow transmission of the data to the client, while the PHP interpreter goes back to the pool to service another request. This gives you database connection pooling for free, actually. The connections are limited to the number of processes in the pool, so you won't get hundreds of them all over the place. You can use php's persistent connections without worries. You don't need to configure a connection pool. It just works (TM). Also you might want to use eaccelerator on your PHP. It precompiles your PHP pages, so you don't lose time on parsing. Page time on my site went from 50-200 ms to 5-20 ms just by installing this. It's free. Try this and you might realize that after all, postgres was fast enough ! ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] What do the Windows pg hackers out there like for dev tools?
Subject line says it all. I'm going to be testing changes under both Linux and WinXP, so I'm hoping those of you that do M$ hacking will pass along your list of suggestions and/or favorite (and hated so I know what to avoid) tools. TiA, Ron ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 10+hrs vs 15min because of just one index
On 2/10/06, Matthew T. O'Connor wrote: > Aaron Turner wrote: > > So I'm trying to figure out how to optimize my PG install (8.0.3) to > > get better performance without dropping one of my indexes. > > What about something like this: > > begin; > drop slow_index_name; > update; > create index slow_index_name; > commit; > vacuum; Right. That's exactly what I'm doing to get the update to occur in 15 minutes. Unfortunately though, I'm basically at the point of every time I insert/update into that table I have to drop the index which is making my life very painful (having to de-dupe records in RAM in my application is a lot faster but also more complicated/error prone). Basically, I need some way to optimize PG so that I don't have to drop that index every time. Suggestions? -- Aaron Turner http://synfin.net/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 10+hrs vs 15min because of just one index
Aaron Turner wrote: So I'm trying to figure out how to optimize my PG install (8.0.3) to get better performance without dropping one of my indexes. What about something like this: begin; drop slow_index_name; update; create index slow_index_name; commit; vacuum; Matt ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 10+hrs vs 15min because of just one index
On 2/10/06, hubert depesz lubaczewski <[EMAIL PROTECTED]> wrote: > On 2/10/06, Aaron Turner <[EMAIL PROTECTED]> wrote: > > So I'm trying to figure out how to optimize my PG install (8.0.3) to > > get better performance without dropping one of my indexes. > > Basically, I have a table of 5M records with 3 columns: > > pri_key (SERIAL) > > data char(48) > > groupid integer > > there is an additional unique index on the data column. > > The problem is that when I update the groupid column for all the > > records, the query takes over 10hrs (after that I just canceled the > > update). Looking at iostat, top, vmstat shows I'm horribly disk IO > > bound (for data not WAL, CPU 85-90% iowait) and not swapping. > > Dropping the unique index on data (which isn't used in the query), > > for such a large update i would suggest to go with different scenario: > split update into packets (1, or 5 rows at the time) > and do: > update packet > vacuum table > for all packets. and then reindex the table. should work much nicer. The problem is that all 5M records are being updated by a single UPDATE statement, not 5M individual statements. Also, vacuum can't run inside of a transaction. On a side note, is there any performance information on updating indexes (via insert/update) over the size of the column? Obviously, char(48) is larger then most for indexing purposes, but I wonder if performance drops linerally or exponentially as the column width increases. Right now my column is hexidecimal... if I stored it as a binary representation it would be smaller. Thanks, Aaron ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pgbench output
Hi All, Here are some of the results i got after performing pgbench marking between postgresql 7.4.5 and postgresql 8.1.2. having parameters with same values in the postgresql.conf file. [EMAIL PROTECTED]:/newdisk/postgres/data> /usr/local/pgsql7.4.5/bin/pgbench -c 10 -t 1 regressionstarting vacuum...end.transaction type: TPC-B (sort of) scaling factor: 10number of clients: 10number of transactions per client: 1number of transactions actually processed: 10/10tps = 80.642615 (including connections establishing)tps = 80.650638 (excluding connections establishing) [EMAIL PROTECTED]:/newdisk/postgres/data> /usr/local/pgsql/bin/pgbench -c 10 -t 1 regressionstarting vacuum...end.transaction type: TPC-B (sort of) scaling factor: 10number of clients: 10number of transactions per client: 1number of transactions actually processed: 10/10tps = 124.134926 (including connections establishing)tps = 124.148749 (excluding connections establishing) Conclusion : So please correct me if i am wrong ... this result set shows that the postgresql version 8.1.2 has perform better than 7.4.5 in the bench marking process since 8.1.2 was able to complete more transcations per second successfully . On 2/7/06, Jim C. Nasby <[EMAIL PROTECTED]> wrote: Well, it tells you how many transactions per second it was able to do.Do you have specific questions? On Thu, Feb 02, 2006 at 12:39:59PM +0530, Pradeep Parmar wrote:> Hi,>> I'm fairly new to PostgreSQL. I was trying pgbench , but could not> understand the output . Can anyone help me out to understand the output of > pgbench>>> Pradeep--Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED]Pervasive Software http://pervasive.comwork: 512-231-6117vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster-- Best,Gourish Singbal
Re: [PERFORM] Basic Database Performance
Hi, James, James Dey wrote: > Apologies if this is a novice queston, but I think it is a performance > one nevertheless. We are running a prototype of a system running on > PHP/Postgresql on an Intel Xeon 2ghz server, 1GB RAM, 40GB hard drive, > as a test bench. The system will be used for tens of thousands of users, > and at the moment we are testing on a base of around 400 users > concurrently during the day. The first thing that comes into my mind here is "connection pooling / recycling". Try to make shure that connections are reused between http requests. Reopening the connection on every http request will break your system, as the backend startup time is rather high. > During the day, the system is incredibly slow to a point where it is > unusable. The reason we are testing on such as small server is to test > performance under pressure, and my estimation is that spec should handle > thousands of users. Note that amount of data, concurrent users, hardware and speed don't always scale linearly. > The server spikes from 5% usage to 95% up and down. The system is a very > simple e-learning and management system and has not given us any issues > to date, only since we’ve been testing with more users has it done so. > The fact that 400 users doing inserts and queries every few minutes is > very concerning, I would like to know if I could be tweaking some config > settings. You should make shure that you run vacuum / analyze regularly (either autovacuum, or vacuum full at night when you have no users on the system). Use statement logging or other profiling means to isolate the slow queries, and EXPLAIN ANALYZE them to see what goes wrong. Create the needed indices, and drop unneded one. (insert usual performance tuning tips here...) > We are running PG 7.4 on a Debian Sarge server, and will be upgrading to > pg8.0 on a new server, but have some migration issues (that’s for > another list!) Ignore 8.0 and go to 8.1 directly. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Large Database Design Help
Hi, Henry, Orion Henry wrote: > 1) The database is very large, the largest table has 40 million tuples. I'm afraid this doesn't qualify as '_very_ large' yet, but it definitively is large enough to have some deep thoughts about it. :-) > 1) The data is easily partitionable by client ID. In an attempt to keep > the indexes small and the inserts fast one table was made per client > ID. Thus the primary table in the database (the one with 40 million > tuples) is really 133 tables each ending with a three digit suffix. > The largest of these client tables has 8 million of the 40 million > tuples. The system started with around a half dozen clients and is now > a huge pain to manage with so many tables. I was hoping new hardware > and new postgres features would allow for this data to be merged safely > into a single table. It possibly is a good idea to merge them. If you decide to keep them separated for whatever reason, you might want to use schemas instead of three digit suffixes. Together with appropriate named users or 'set search_path', this may help you to simplify your software. In case you want to keep separate tables, but need some reports touching all tables from time to time, table inheritance may help you. Just create a base table, and then inherit all user specific tables from that base table. Of course, this can be combined with the schema approach by having the child tables in their appropriate schemas. > 2) The imports are not done inside of transactions. I'm assuming the > system designers excluded this for a reason. Will I run into problems > performing tens of thousands of inserts and updates inside a single > transaction? Yes, it should give you a huge boost. Every commit has to flush the WAL out to disk, which takes at least one disk spin. So on a simple 7200 RPM disk, you cannot have more than 120 transactions/second. It may make sense to split such a bulk load into transactions of some tens of thousands of rows, but that depends on how easy it is for your application to resume in the middle of the bulk if the connection aborts, and how much concurrent access you have on the backend. > 3) The current code that bulk loads data into the database is a loop > that looks like this: > >$result = exe("INSERT INTO $table ($name_str) SELECT > $val_str WHERE NOT EXISTS (SELECT 1 FROM $table WHERE $keys)"); >if ($result == 0) >{ >$result = exe("UPDATE $table SET $non_keys WHERE > $keys"); >} > Is there a faster way to bulk load data when it's not known ahead of > time if it's a new record or an updated record? Perhaps the easiest way might be to issue the update first. Update returns a row count of the updated rows. If it is 0, you have to insert the row. This can even be encapsulated into a "before insert" trigger on the table, which tries the update and ignores the insert if the update succeeded. This way, you can even use COPY on the client side. We're using this approach for one of our databases, where a client side crash can result in occasional duplicates being COPYed to the table. > Dual Opteron 246, 4 disk SCSI RAID5, 4GB of RAM For lots non-read-only database workloads, RAID5 is a performance killer. Raid 1/0 might be better, or having two mirrors of two disks each, the first mirror holding system, swap, and the PostgreSQL WAL files, the second one holding the data. Don't forget to tune the postgresql settings appropriately. :-) > # du -sh /var/lib/postgres/data/ > 16G /var/lib/postgres/data/ Your database seems to be small enough to fit on a single disk, so the two mirrors approach I described above will be the best IMHO. > ( the current database is PG 7.4 - I intend to upgrade it to 8.1 if and > when I do this refactoring ) This is a very good idea, 8.1 is miles ahead of 7.4 in many aspects. > ( the current OS is Debian Unstable but I intend to be running RHEL 4.0 > if and when I do this refactoring ) This should not make too much difference. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Large Database Design Help
Hi, Greg, Greg Stark wrote: >>(Aside question: if I were to find a way to use COPY and I were loading >>data on a single client_id, would dropping just the indexes for that client_id >>accelerate the load?) > Dropping indexes would accelerate the load but unless you're loading a large > number of records relative the current size I'm not sure it would be a win > since you would then have to rebuild the index for the entire segment. And, additionally, rebuilding a partial index with "WHERE client_id=42" needs a full table scan, which is very slow, so temporarily dropping the indices will not be useful if you merge the tables. Btw, I don't know whether PostgreSQL can make use of partial indices when building other partial indices. If yes, you could temporarily drop all but one of the partial indices for a specific client. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Basic Database Performance
Sorry about that James Dey tel +27 11 704-1945 cell+27 82 785-5102 fax +27 11 388-8907 mail[EMAIL PROTECTED] -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: 10 February 2006 11:50 AM To: James Dey Cc: 'Postgresql Performance' Subject: Re: [PERFORM] Basic Database Performance Don't forget to cc: the list. James Dey wrote: > Hi Richard, > > Firstly, thanks a million for the reply. > > To answer your questions: > 1. Are you limited by CPU, memory or disk i/o? > I am not limited, but would like to get the most out of the config I have in > order to be able to know what I'll get, when I scale up. But you said: "During the day, the system is incredibly slow to a point where it is unusable". So presumably one or more of cpu,memory or disk i/o is the problem. > 2. Are you happy your config settings are good? How do you know? > I'm not, and would appreciate any help with these. If you have a look here, there is an introduction for 7.4 http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php For 8.x you might find the following more useful. http://www.powerpostgresql.com/PerfList > 3. Are there particular queries that are causing the problem, or lock > contention? > Not that I can see What is the balance between activity on Apache/PHP/PostgreSQL? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Basic Database Performance
Don't forget to cc: the list. James Dey wrote: Hi Richard, Firstly, thanks a million for the reply. To answer your questions: 1. Are you limited by CPU, memory or disk i/o? I am not limited, but would like to get the most out of the config I have in order to be able to know what I'll get, when I scale up. But you said: "During the day, the system is incredibly slow to a point where it is unusable". So presumably one or more of cpu,memory or disk i/o is the problem. 2. Are you happy your config settings are good? How do you know? I'm not, and would appreciate any help with these. If you have a look here, there is an introduction for 7.4 http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php For 8.x you might find the following more useful. http://www.powerpostgresql.com/PerfList 3. Are there particular queries that are causing the problem, or lock contention? Not that I can see What is the balance between activity on Apache/PHP/PostgreSQL? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Basic Database Performance
James Dey wrote: Apologies if this is a novice queston, but I think it is a performance one nevertheless. We are running a prototype of a system running on PHP/Postgresql on an Intel Xeon 2ghz server, 1GB RAM, 40GB hard drive, as a test bench. The system will be used for tens of thousands of users, and at the moment we are testing on a base of around 400 users concurrently during the day. OK, that's 400 web-users, so presumably a fraction of that for concurrent database connections. During the day, the system is incredibly slow to a point where it is unusable. The reason we are testing on such as small server is to test performance under pressure, and my estimation is that spec should handle thousands of users. It'll depend on what the users are doing It'll depend on what your code is doing It'll depend on how you've configured PostgreSQL. The server spikes from 5% usage to 95% up and down. Usage? Do you mean CPU? > The system is a very simple e-learning and management system and has not given us any issues to date, only since we've been testing with more users has it done so. The fact that 400 users doing inserts and queries every few minutes is very concerning, I would like to know if I could be tweaking some config settings. You haven't said what config settings you're working with. OK - the main questions have to be: 1. Are you limited by CPU, memory or disk i/o? 2. Are you happy your config settings are good? How do you know? 3. Are there particular queries that are causing the problem, or lock contention? We are running PG 7.4 on a Debian Sarge server, and will be upgrading to pg8.0 on a new server, but have some migration issues (that's for another list!) Go straight to 8.1 - no point in upgrading half-way. If you don't like compiling from source it's in backports.org -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 10+hrs vs 15min because of just one index
On 2/10/06, Aaron Turner <[EMAIL PROTECTED]> wrote: > So I'm trying to figure out how to optimize my PG install (8.0.3) to > get better performance without dropping one of my indexes. > Basically, I have a table of 5M records with 3 columns: > pri_key (SERIAL) > data char(48) > groupid integer > there is an additional unique index on the data column. > The problem is that when I update the groupid column for all the > records, the query takes over 10hrs (after that I just canceled the > update). Looking at iostat, top, vmstat shows I'm horribly disk IO > bound (for data not WAL, CPU 85-90% iowait) and not swapping. > Dropping the unique index on data (which isn't used in the query), for such a large update i would suggest to go with different scenario: split update into packets (1, or 5 rows at the time) and do: update packet vacuum table for all packets. and then reindex the table. should work much nicer. depesz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Basic Database Performance
Hi Guys, Apologies if this is a novice queston, but I think it is a performance one nevertheless. We are running a prototype of a system running on PHP/Postgresql on an Intel Xeon 2ghz server, 1GB RAM, 40GB hard drive, as a test bench. The system will be used for tens of thousands of users, and at the moment we are testing on a base of around 400 users concurrently during the day. During the day, the system is incredibly slow to a point where it is unusable. The reason we are testing on such as small server is to test performance under pressure, and my estimation is that spec should handle thousands of users. The server spikes from 5% usage to 95% up and down. The system is a very simple e-learning and management system and has not given us any issues to date, only since we’ve been testing with more users has it done so. The fact that 400 users doing inserts and queries every few minutes is very concerning, I would like to know if I could be tweaking some config settings. We are running PG 7.4 on a Debian Sarge server, and will be upgrading to pg8.0 on a new server, but have some migration issues (that’s for another list!) Any help would be greatly appreciated! All the very best, James Dey tel +27 11 704-1945 cell +27 82 785-5102 fax +27 11 388-8907 mail [EMAIL PROTECTED] myGUS / SLT retains all its intellectual property rights in any information contained in e-mail messages (or any attachments thereto) which relates to the official business of myGUS / SLT or of any of its associates. Such information may be legally privileged, is to be treated as confidential and myGUS / SLT will take legal steps against any unauthorised use. myGUS / SLT does not take any responsibility for, or endorses any information which does not relate to its official business, including personal mail and/or opinions by senders who may or may not be employed by myGUS / SLT. In the event that you receive a message not intended for you, we request that you notify the sender immediately, do not read, disclose or use the content in any way whatsoever and destroy/delete the message immediately. While myGUS / SLT will take reasonable precautions, it cannot ensure that this e-mail will be free of errors, viruses, interception or interference therewith. myGUS / SLT does not, therefore, issue any guarantees or warranties in this regard and cannot be held liable for any loss or damages incurred by the recipient which have been caused by any of the above-mentioned factors.
[PERFORM] 10+hrs vs 15min because of just one index
So I'm trying to figure out how to optimize my PG install (8.0.3) to get better performance without dropping one of my indexes. Basically, I have a table of 5M records with 3 columns: pri_key (SERIAL) data char(48) groupid integer there is an additional unique index on the data column. The problem is that when I update the groupid column for all the records, the query takes over 10hrs (after that I just canceled the update). Looking at iostat, top, vmstat shows I'm horribly disk IO bound (for data not WAL, CPU 85-90% iowait) and not swapping. Dropping the unique index on data (which isn't used in the query), running the update and recreating the index runs in under 15 min. Hence it's pretty clear to me that the index is the problem and there's really nothing worth optimizing in my query. As I understand from #postgresql, doing an UPDATE on one column causes all indexes for the effected row to have to be updated due to the way PG replaces the old row with a new one for updates. This seems to explain why dropping the unique index on data solves the performance problem. interesting settings: shared_buffers = 32768 maintenance_work_mem = 262144 fsync = true wal_sync_method = open_sync wal_buffers = 512 checkpoint_segments = 30 effective_cache_size = 1 work_mem = (1024 i think?) box: Linux 2.6.9-11EL (CentOS 4.1) 2x Xeon 3.4 HT 2GB of RAM (but Apache and other services are running) 4 disk raid 10 (74G Raptor) for data 4 disk raid 10 (7200rpm) for WAL other then throwing more spindles at the problem, any suggestions? Thanks, Aaron -- Aaron Turner http://synfin.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings