Re: [PERFORM] Delete query takes exorbitant amount of time
There is clear benefit from forcing them to be the same. In logical data terms, they *should* be the same. I don't check fruit.apple_grade against fruit_type.orange_grade. When would I want to make a check of that nature? If there is a reason, thats great, lets keep status quo then. I respect the effort and thought that has already gone into the implementation; I seek only to offer a very minor improvement based upon recent list issues. The main problem would be people getting errors when upgrading their databases, or restoring from a backup, say. Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to improve db performance with $7K?
Tom, From what I understand, the vendor used ProIV for development, when they attempted to use 7.4.3, they had ODBC issues and something else I honestly don't know, but I was told that data was not coming through properly. Their somewhat at the mercy of the ProIV people to give them the stamp of approval, then the vendor will tell us what they support. Thanks. Steve Poe Tom Lane wrote: Steve Poe <[EMAIL PROTECTED]> writes: Situation: An 24/7 animal hospital (100 employees) runs their business on Centos 3.3 (RHEL 3) Postgres 7.4.2 (because they have to) [ itch... ] Surely they could at least move to 7.4.7 without pain. There are serious data-loss bugs known in 7.4.2. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve db performance with $7K?
Steve Poe wrote: Situation: An 24/7 animal hospital (100 employees) runs their business on Centos 3.3 (RHEL 3) Postgres 7.4.2 (because they have to) off a 2-CPU Xeon 2.8MHz, 4GB of RAM, (3) SCSI disks RAID 0 (zcav value 35MB per sec). The databse is 11GB comprised over 100 tables and indexes from 1MB to 2GB in size. I recently told the hospital management team worst-case scenerio they need to get the database on its own drive array since the RAID0 is a disaster wating to happen. I said ideally a new dual AMD server with 6/7-disk configuration would be ideal for safety and performance, but they don't have $15K. I said a seperate drive array offer the balance of safety and performance. I have been given budget of $7K to accomplish a safer/faster database through hardware upgrades. The objective is to get a drive array, but I can use the budget any way I see fit to accomplish the goal. You could build a dual opteron with 4 GB of ram, 12 10k raptor SATA drives with a battery backed cache for about 7k or less. Or if they are not CPU bound just IO bound you could easily just add an external 12 drive array (even if scsi) for less than 7k. Sincerely, Joshua D. Drake Since I am a dba novice, I did not physically build this server, nor did I write the application the hospital runs on, but I have the opportunity to make it better, I'd thought I should seek some advice from those who have been down this road before. Suggestions/ideas anyone? Thanks. Steve Poe ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
You can purchase a whole new dual opteron 740,with 6 gigs of ram, a case to match and 6 74 gig ultra320 sca drives for about $7k I know because that's what I bought one for 2 weeks ago. Using Tyan's dual board. If you need some details and are willing to go that route, let me know and I'll get you the information. Sincerely, Will LaShell Steve Poe wrote: Situation: An 24/7 animal hospital (100 employees) runs their business on Centos 3.3 (RHEL 3) Postgres 7.4.2 (because they have to) off a 2-CPU Xeon 2.8MHz, 4GB of RAM, (3) SCSI disks RAID 0 (zcav value 35MB per sec). The databse is 11GB comprised over 100 tables and indexes from 1MB to 2GB in size. I recently told the hospital management team worst-case scenerio they need to get the database on its own drive array since the RAID0 is a disaster wating to happen. I said ideally a new dual AMD server with 6/7-disk configuration would be ideal for safety and performance, but they don't have $15K. I said a seperate drive array offer the balance of safety and performance. I have been given budget of $7K to accomplish a safer/faster database through hardware upgrades. The objective is to get a drive array, but I can use the budget any way I see fit to accomplish the goal. Since I am a dba novice, I did not physically build this server, nor did I write the application the hospital runs on, but I have the opportunity to make it better, I'd thought I should seek some advice from those who have been down this road before. Suggestions/ideas anyone? Thanks. Steve Poe ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
Steve Poe <[EMAIL PROTECTED]> writes: > Situation: An 24/7 animal hospital (100 employees) runs their business > on Centos 3.3 (RHEL 3) Postgres 7.4.2 (because they have to) [ itch... ] Surely they could at least move to 7.4.7 without pain. There are serious data-loss bugs known in 7.4.2. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] How to improve db performance with $7K?
Situation: An 24/7 animal hospital (100 employees) runs their business on Centos 3.3 (RHEL 3) Postgres 7.4.2 (because they have to) off a 2-CPU Xeon 2.8MHz, 4GB of RAM, (3) SCSI disks RAID 0 (zcav value 35MB per sec). The databse is 11GB comprised over 100 tables and indexes from 1MB to 2GB in size. I recently told the hospital management team worst-case scenerio they need to get the database on its own drive array since the RAID0 is a disaster wating to happen. I said ideally a new dual AMD server with 6/7-disk configuration would be ideal for safety and performance, but they don't have $15K. I said a seperate drive array offer the balance of safety and performance. I have been given budget of $7K to accomplish a safer/faster database through hardware upgrades. The objective is to get a drive array, but I can use the budget any way I see fit to accomplish the goal. Since I am a dba novice, I did not physically build this server, nor did I write the application the hospital runs on, but I have the opportunity to make it better, I'd thought I should seek some advice from those who have been down this road before. Suggestions/ideas anyone? Thanks. Steve Poe ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] pg_autovacuum not having enough suction ?
ok, Thanks a lot for your time guys ! I guess my table is pretty unusual and thats why this problem has not surfaced until now. Better late then never ;) I'll cron a "manual" vacuum full on the table. "Tom Lane" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > Otto Blomqvist wrote: > >> This table is basically a queue full of records waiting to get transfered > >> over from our 68030 system to the PG database. The records are then moved > >> into folders (using a trigger) like file_92_myy depending on what month the > >> record was created on the 68030. During normal operations there should not > >> be more than 10 records at a time in the table, although during the course > >> of a day a normal system will get about 50k records. I create 5 records > >> to simulate incoming traffic, since we don't have much traffic in the test > >> lab. > > Really the right way to do housekeeping for a table like that is to > VACUUM FULL (or better yet, TRUNCATE, if possible) immediately after > discarding a batch of records. The VACUUM FULL will take very little > time if it only has to repack <10 records. Plain VACUUM is likely to > leave the table nearly empty but physically sizable, which is bad news > from a statistical point of view: as the table fills up again, it won't > get physically larger, thereby giving the planner no clue that it > doesn't still have <10 records. This means the queries that process > the 50K-record patch are going to get horrible plans :-( > > I'm not sure if autovacuum could be taught to do that --- it could > perhaps launch a vacuum as soon as it notices a large fraction of the > table got deleted, but do we really want to authorize it to launch > VACUUM FULL? It'd be better to issue the vacuum synchronously > as part of the batch updating script, I feel. > > regards, tom lane > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] pg_autovacuum not having enough suction ?
Tom Lane wrote: > Bruce Momjian writes: > > Tom Lane wrote: > >> I'm not sure if autovacuum could be taught to do that --- it could > >> perhaps launch a vacuum as soon as it notices a large fraction of the > >> table got deleted, but do we really want to authorize it to launch > >> VACUUM FULL? > > > One problem with VACUUM FULL would be autovacuum waiting for an > > exclusive lock on the table. Anyway, it is documented now as a possible > > issue. > > I don't care too much about autovacuum waiting awhile to get a lock. > I do care about other processes getting queued up behind it, though. > > Perhaps it would be possible to alter the normal lock queuing semantics > for this case, so that autovacuum's request doesn't block later > arrivals, and it can only get the lock when no one is interested in the > table. Of course, that might never happen, or by the time it does > there's no point in VACUUM FULL anymore :-( Can we issue a LOCK TABLE with a statement_timeout, and only do the VACUUM FULL if we can get a lock quickly? That seems like a plan. The only problem is that you can't VACUUM FULL in a transaction: test=> create table test (x int); CREATE TABLE test=> insert into test values (1); INSERT 0 1 test=> begin; BEGIN test=> lock table test; LOCK TABLE test=> vacuum full; ERROR: VACUUM cannot run inside a transaction block -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pg_autovacuum not having enough suction ?
Tom Lane wrote: > > Otto Blomqvist wrote: > >> This table is basically a queue full of records waiting to get transfered > >> over from our 68030 system to the PG database. The records are then moved > >> into folders (using a trigger) like file_92_myy depending on what month the > >> record was created on the 68030. During normal operations there should not > >> be more than 10 records at a time in the table, although during the course > >> of a day a normal system will get about 50k records. I create 5 records > >> to simulate incoming traffic, since we don't have much traffic in the test > >> lab. > > Really the right way to do housekeeping for a table like that is to > VACUUM FULL (or better yet, TRUNCATE, if possible) immediately after > discarding a batch of records. The VACUUM FULL will take very little > time if it only has to repack <10 records. Plain VACUUM is likely to > leave the table nearly empty but physically sizable, which is bad news > from a statistical point of view: as the table fills up again, it won't > get physically larger, thereby giving the planner no clue that it > doesn't still have <10 records. This means the queries that process > the 50K-record patch are going to get horrible plans :-( > > I'm not sure if autovacuum could be taught to do that --- it could > perhaps launch a vacuum as soon as it notices a large fraction of the > table got deleted, but do we really want to authorize it to launch > VACUUM FULL? It'd be better to issue the vacuum synchronously > as part of the batch updating script, I feel. I added this to the TODO section for autovacuum: o Do VACUUM FULL if table is nearly empty? I don't think autovacuum is every going to be smart enough to recycle during the delete, especially since the rows can't be reused until the transaction completes. One problem with VACUUM FULL would be autovacuum waiting for an exclusive lock on the table. Anyway, it is documented now as a possible issue. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] pg_autovacuum not having enough suction ?
Bruce Momjian writes: > Tom Lane wrote: >> I'm not sure if autovacuum could be taught to do that --- it could >> perhaps launch a vacuum as soon as it notices a large fraction of the >> table got deleted, but do we really want to authorize it to launch >> VACUUM FULL? > One problem with VACUUM FULL would be autovacuum waiting for an > exclusive lock on the table. Anyway, it is documented now as a possible > issue. I don't care too much about autovacuum waiting awhile to get a lock. I do care about other processes getting queued up behind it, though. Perhaps it would be possible to alter the normal lock queuing semantics for this case, so that autovacuum's request doesn't block later arrivals, and it can only get the lock when no one is interested in the table. Of course, that might never happen, or by the time it does there's no point in VACUUM FULL anymore :-( regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] pg_autovacuum not having enough suction ?
> > I'm not sure if autovacuum could be taught to do that --- it could > > perhaps launch a vacuum as soon as it notices a large fraction of the > > table got deleted, but do we really want to authorize it to launch > > VACUUM FULL? It'd be better to issue the vacuum synchronously > > as part of the batch updating script, I feel. > > I added this to the TODO section for autovacuum: > > o Do VACUUM FULL if table is nearly empty? We should never automatically launch a vacuum full. That seems like a really bad idea. Sincerely, Joshua D. Drake > > I don't think autovacuum is every going to be smart enough to recycle > during the delete, especially since the rows can't be reused until the > transaction completes. > > One problem with VACUUM FULL would be autovacuum waiting for an > exclusive lock on the table. Anyway, it is documented now as a possible > issue. > -- Command Prompt, Inc., Your PostgreSQL solutions company. 503-667-4564 Custom programming, 24x7 support, managed services, and hosting Open Source Authors: plPHP, pgManage, Co-Authors: plPerlNG Reliable replication, Mammoth Replicator - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] pg_autovacuum not having enough suction ?
> Otto Blomqvist wrote: >> This table is basically a queue full of records waiting to get transfered >> over from our 68030 system to the PG database. The records are then moved >> into folders (using a trigger) like file_92_myy depending on what month the >> record was created on the 68030. During normal operations there should not >> be more than 10 records at a time in the table, although during the course >> of a day a normal system will get about 50k records. I create 5 records >> to simulate incoming traffic, since we don't have much traffic in the test >> lab. Really the right way to do housekeeping for a table like that is to VACUUM FULL (or better yet, TRUNCATE, if possible) immediately after discarding a batch of records. The VACUUM FULL will take very little time if it only has to repack <10 records. Plain VACUUM is likely to leave the table nearly empty but physically sizable, which is bad news from a statistical point of view: as the table fills up again, it won't get physically larger, thereby giving the planner no clue that it doesn't still have <10 records. This means the queries that process the 50K-record patch are going to get horrible plans :-( I'm not sure if autovacuum could be taught to do that --- it could perhaps launch a vacuum as soon as it notices a large fraction of the table got deleted, but do we really want to authorize it to launch VACUUM FULL? It'd be better to issue the vacuum synchronously as part of the batch updating script, I feel. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: lazy_update_relstats considered harmful (was Re: [PERFORM] pg_autovacuum not having enough suction ?)
Simon Riggs <[EMAIL PROTECTED]> writes: > On Fri, 2005-03-25 at 15:22 -0500, Tom Lane wrote: >> 2. Dead tuples don't have that much influence on scan costs either, at >> least not once they are marked as known-dead. Certainly they shouldn't >> be charged at full freight. > Yes, minor additional CPU time, but the main issue is when the dead > tuples force additional I/O. I/O costs are mostly estimated off relpages, though, not reltuples. The only time you really pay through the nose for a dead tuple is when an indexscan visits it, but with the known-dead marking we now do in btree indexes, I'm pretty sure that path is seldom taken. >> It's possible that there'd be some value in adding a column to pg_class >> to record dead tuple count, but given what we have now, the calculation >> in lazy_update_relstats is totally wrong. > Yes, thats the way. We can record the (averaged?) dead tuple count, but > also record the actual row count in reltuples. What I'd be inclined to record is the actual number of dead rows removed by the most recent VACUUM. Any math on that is best done in the planner, since we can change the logic more easily than the database contents. It'd probably be reasonable to take half of that number as the estimate of the average number of dead tuples. But in any case, that's for the future; we can't have it in 8.0.*, and right at the moment I'm focusing on what to push out for 8.0.2. > We definitely need to record the physical and logical tuple counts, > since each of them have different contributions to run-times. There isn't any difference, if you are talking about fully dead tuples. It would be possible for VACUUM to also count the number of not-committed-but-not-removable tuples (ie, new from still-open transactions, plus dead-but-still-visible-to-somebody), but I'm not sure that it would be useful to do so, because that sort of count is hugely transient. The stat would be irrelevant moments after it was taken. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: lazy_update_relstats considered harmful (was Re: [PERFORM]
On Fri, 2005-03-25 at 15:22 -0500, Tom Lane wrote: > 2. Dead tuples don't have that much influence on scan costs either, at > least not once they are marked as known-dead. Certainly they shouldn't > be charged at full freight. Yes, minor additional CPU time, but the main issue is when the dead tuples force additional I/O. > It's possible that there'd be some value in adding a column to pg_class > to record dead tuple count, but given what we have now, the calculation > in lazy_update_relstats is totally wrong. Yes, thats the way. We can record the (averaged?) dead tuple count, but also record the actual row count in reltuples. We definitely need to record the physical and logical tuple counts, since each of them have different contributions to run-times. For comparing seq scan v index, we need to look at the physical tuples count * avg row size, whereas when we calculate number of rows returned we should look at fractions of the logical row count. > The idea I was trying to capture is that the tuple density is at a > minimum right after VACUUM, and will increase as free space is filled > in until the next VACUUM, so that recording the exact tuple count > underestimates the number of tuples that will be seen on-the-average. > But I'm not sure that idea really holds water. The only way that a > table can be at "steady state" over a long period is if the number of > live tuples remains roughly constant (ie, inserts balance deletes). > What actually increases and decreases over a VACUUM cycle is the density > of *dead* tuples ... but per the above arguments this isn't something > we should adjust reltuples for. > > So I'm thinking lazy_update_relstats should be ripped out and we should > go back to recording just the actual stats. > > Sound reasonable? Or was I right the first time and suffering brain > fade today? Well, I think the original idea had some validity, but clearly lazy_update_relstats isn't the way to do it even though we thought so at the time. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 2005-03-25 at 16:25 -0500, Tom Lane wrote: > Stephan Szabo <[EMAIL PROTECTED]> writes: > > On Fri, 25 Mar 2005, Simon Riggs wrote: > >> Could it be that because PostgreSQL has a very highly developed sense of > >> datatype comparison that we might be taking this to extremes? Would any > >> other RDBMS consider two different datatypes to be comparable? > > > We do have a broader comparable than the spec. > > However, the set of comparisons that we can presently support *with > indexes* is narrower than the spec, so rejecting nonindexable cases > would be a problem. OK. Can we have a TODO item then? * Ensure that all SQL:2003 comparable datatypes are also indexable when compared ...or something like that > It's worth noting also that the test being discussed checks whether the > PK index is usable for testing the RI constraint. In the problem that > started this thread, the difficulty is lack of a usable index on the FK > column, not the PK (because that's the table that has to be searched to > do a delete in the PK table). We cannot enforce that there be a usable > index on the FK column (since indexes on the FK table may not have been > built yet when the constraint is declared), and shouldn't anyway because > there are reasonable usage patterns where you don't need one. Yes, I agree for CASCADE we wouldn't always want an index. Alright then, time to leave it there. I want to write up some additional comments for performance tips: - Consider defining RI constraints after tables have been loaded - Remember to add an index on the referencing table if the constraint is defined as CASCADEing Have a good Easter, all, wherever you are and whatever you believe in. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 2005-03-25 at 15:10 +, Simon Riggs wrote: > Karim: Did this happen? If not, can you drop and re-create and confirm > that you get the WARNING? If not, we have problems. No. Nor do I think that I should. SERIAL is shortcut for INTEGER, no? I think there is some other (TBD) problem causing my big seq scan. orfs=# ALTER TABLE measurement DROP CONSTRAINT measurement_id_int_sensor_meas_type_fkey; ALTER TABLE orfs=# ALTER TABLE ONLY measurement ADD CONSTRAINT measurement_id_int_sensor_meas_type_fkey orfs-# FOREIGN KEY (id_int_sensor_meas_type) REFERENCES int_sensor_meas_type(id_int_sensor_meas_type); ALTER TABLE orfs=# The add constraint statement comes directly from a pg_dump. For clarity, the table/indexes were created as such: CREATE TABLE int_sensor_meas_type( id_int_sensor_meas_type SERIAL PRIMARY KEY, id_sensor integer NOT NULL REFERENCES sensor, id_meas_type integer NOT NULL REFERENCES meas_type UNIQUE); CREATE TABLE measurement ( id_measurement SERIAL PRIMARY KEY, id_int_sensor_meas_type integer NOT NULL REFERENCES int_sensor_meas_type, datetime timestamp WITH TIME ZONE NOT NULL, value numeric(15,5) NOT NULL, created timestamp with time zone NOT NULL DEFAULT now(), created_by TEXT NOT NULL REFERENCES public.person(id_person)); CREATE INDEX measurement__id_int_sensor_meas_type_idx ON measurement(id_int_sensor_meas_type); Regards, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Delete query takes exorbitant amount of time
Stephan Szabo <[EMAIL PROTECTED]> writes: > On Fri, 25 Mar 2005, Simon Riggs wrote: >> Could it be that because PostgreSQL has a very highly developed sense of >> datatype comparison that we might be taking this to extremes? Would any >> other RDBMS consider two different datatypes to be comparable? > We do have a broader comparable than the spec. However, the set of comparisons that we can presently support *with indexes* is narrower than the spec, so rejecting nonindexable cases would be a problem. It's worth noting also that the test being discussed checks whether the PK index is usable for testing the RI constraint. In the problem that started this thread, the difficulty is lack of a usable index on the FK column, not the PK (because that's the table that has to be searched to do a delete in the PK table). We cannot enforce that there be a usable index on the FK column (since indexes on the FK table may not have been built yet when the constraint is declared), and shouldn't anyway because there are reasonable usage patterns where you don't need one. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] lazy_update_relstats considered harmful (was Re: [PERFORM]
Tom Lane wrote: I wrote: One thing that is possibly relevant here is that in 8.0 a plain VACUUM doesn't set reltuples to the exactly correct number, but to an interpolated value that reflects our estimate of the "steady state" average between vacuums. I wonder if that code is wrong, or if it's operating as designed but is confusing autovac. Now that I think it over, I'm thinking that I must have been suffering severe brain fade the day I wrote lazy_update_relstats() (see vacuumlazy.c). The numbers that that routine is averaging are the pre- and post-vacuum physical tuple counts. But the difference between them consists of known-dead tuples, and we shouldn't be factoring dead tuples into reltuples. The planner has always considered reltuples to count only live tuples, and I think this is correct on two grounds: 1. The numbers of tuples estimated to be returned by scans certainly shouldn't count dead ones. 2. Dead tuples don't have that much influence on scan costs either, at least not once they are marked as known-dead. Certainly they shouldn't be charged at full freight. It's possible that there'd be some value in adding a column to pg_class to record dead tuple count, but given what we have now, the calculation in lazy_update_relstats is totally wrong. The idea I was trying to capture is that the tuple density is at a minimum right after VACUUM, and will increase as free space is filled in until the next VACUUM, so that recording the exact tuple count underestimates the number of tuples that will be seen on-the-average. But I'm not sure that idea really holds water. The only way that a table can be at "steady state" over a long period is if the number of live tuples remains roughly constant (ie, inserts balance deletes). What actually increases and decreases over a VACUUM cycle is the density of *dead* tuples ... but per the above arguments this isn't something we should adjust reltuples for. So I'm thinking lazy_update_relstats should be ripped out and we should go back to recording just the actual stats. Sound reasonable? Or was I right the first time and suffering brain fade today? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 25 Mar 2005, Simon Riggs wrote: > On Fri, 2005-03-25 at 13:47 -0500, Tom Lane wrote: > > Simon Riggs <[EMAIL PROTECTED]> writes: > > > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > > >>> Other than spec compliance, you mean? SQL99 says > > >>> > > >>> ... The declared type of each referencing column shall be > > >>> comparable to the declared type of the corresponding referenced > > >>> column. > > > > > Tom had said SQL99 required this; I have pointed out SQL:2003, which > > > supercedes the SQL99 standard, does not require this. > > > > You're reading the wrong part of SQL:2003. 11.8 > definition> syntax rule 9 still has the text I quoted. > > So, we have this from SQL:2003 section 11.8 p.550 > - 3a) requires us to have an index > - 9) requires the data types to be "comparable" > > In the name of spec-compliance we wish to accept an interpretation of > the word "comparable" that means we will accept two datatypes that are > not actually the same. > > So we are happy to enforce having the index, but not happy to ensure the > index is actually usable for the task? The indexes "usability" only applies to the purpose of guaranteeing uniqueness which doesn't depend on the referencing type AFAICS. > > > Leading us back to my original point - what is the benefit of continuing > > > with having a WARNING when that leads people into trouble later? > > > > Accepting spec-compliant schemas. > > I definitely want this too - as you know I have worked on documenting > compliance previously. > > Is the word "comparable" defined elsewhere in the standard? Yes. And at least in SQL99, there's a bunch of statements in 4.* about what are comparable. > Currently, datatypes with similar type categories are comparable and yet > (in 8.0) will now use the index. So, we are taking comparable to include > fairly radically different datatypes? Not entirely. I believe a referenced column of int, and a referencing column of numeric currently displays that warning, but appears to be allowed by the spec (as the numeric types are considered mutually comparable). > Could it be that because PostgreSQL has a very highly developed sense of > datatype comparison that we might be taking this to extremes? Would any > other RDBMS consider two different datatypes to be comparable? We do have a broader comparable than the spec. However, if we were to limit it to the spec then many of the implicit casts and cross-type comparison operators we have would be invalid as well since the comparison between those types would have to fail as well unless we treat the comparable used by differently. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] pg_autovacuum not having enough suction ?
Tom Lane wrote: "Matthew T. O'Connor" writes: hmm the value in reltuples should be accurate after a vacuum (or vacuum analyze) if it's not it's a vacuum bug or something is going on that isn't understood. If you or pg_autovacuum are running plain analyze commands, that could explain the invalid reltules numbers. Was reltuples = 113082 correct right after the vacuum? Another thing to check is whether the reltuples (and relpages!) that autovacuum is reporting are the same as what's actually in the pg_class row for the relation. I'm wondering if this could be a similar issue to the old autovac bug where it wasn't reading the value correctly. I don't think so, as he did some manual selects from pg_class and pg_stat_all in one of the emails he sent that were showing similar numbers to what autovac was reporting. If they are the same then it seems like it must be a backend issue. One thing that is possibly relevant here is that in 8.0 a plain VACUUM doesn't set reltuples to the exactly correct number, but to an interpolated value that reflects our estimate of the "steady state" average between vacuums. I wonder if that code is wrong, or if it's operating as designed but is confusing autovac. Ahh Now that you mention it, I do remember the discussion during 8.0 development. This sounds very much like the cause of the problem. Autovac is not vacuuming often enough for this table because reltuples is telling autovac that there are alot more tuples in this table than there really are. Really this is just another case of the more general problem with autovac as it stands now. That is, you can't set vacuum thresholds on a per table basis, and databases like this can't survive with a one size fits all threshold. I would suggest that Otto perform regular cron based vacuums of this one table in addition to autovac, that is what several people I have heard from in the field are doing. Come hell or high water I'm gonna get autovac integrated into 8.1, at which point per table thresholds would be easy todo. Can autovac be told to run the vacuums in VERBOSE mode? It would be useful to compare what VERBOSE has to say to the changes in reltuples/relpages. Not as it stands now. That would be an interesting feature for debugging purposes though. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 2005-03-25 at 13:47 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > >>> Other than spec compliance, you mean? SQL99 says > >>> > >>> ... The declared type of each referencing column shall be > >>> comparable to the declared type of the corresponding referenced > >>> column. > > > Tom had said SQL99 required this; I have pointed out SQL:2003, which > > supercedes the SQL99 standard, does not require this. > > You're reading the wrong part of SQL:2003. 11.8 definition> syntax rule 9 still has the text I quoted. So, we have this from SQL:2003 section 11.8 p.550 - 3a) requires us to have an index - 9) requires the data types to be "comparable" In the name of spec-compliance we wish to accept an interpretation of the word "comparable" that means we will accept two datatypes that are not actually the same. So we are happy to enforce having the index, but not happy to ensure the index is actually usable for the task? > > Leading us back to my original point - what is the benefit of continuing > > with having a WARNING when that leads people into trouble later? > > Accepting spec-compliant schemas. I definitely want this too - as you know I have worked on documenting compliance previously. Is the word "comparable" defined elsewhere in the standard? Currently, datatypes with similar type categories are comparable and yet (in 8.0) will now use the index. So, we are taking comparable to include fairly radically different datatypes? Could it be that because PostgreSQL has a very highly developed sense of datatype comparison that we might be taking this to extremes? Would any other RDBMS consider two different datatypes to be comparable? Please consider this. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] pg_autovacuum not having enough suction ?
> Another thing to check is whether the reltuples (and relpages!) that > autovacuum is reporting are the same as what's actually in the pg_class > row for the relation. I'm wondering if this could be a similar issue > to the old autovac bug where it wasn't reading the value correctly. These values where extracted at roughly the same time. relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl -+--+-+--+---+-+ ---+--+---+---+---+- +-+-+--+---+-+--+--- ---+-+++-++- --- file_92 | 2200 | 9384220 | 100 | 0 | 9384219 | 0 | 6624 |113082 | 0 | 0 | t | f | r | 23 | 0 | 1 |0 |0 | 0 | t | f | f | f | (1 row) secom=# select count(*) from file_92; count --- 17579 (1 row) [2005-03-25 12:16:32 EST] INFO: table name: secom."public"."file_92" [2005-03-25 12:16:32 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-25 12:16:32 EST] INFO: reltuples: 113082.00; relpages: 6624 [2005-03-25 12:16:32 EST] INFO: curr_analyze_count: 993780; curr_vacuum_count: 732470 [2005-03-25 12:16:32 EST] INFO: last_analyze_count: 923820; last_vacuum_count: 662699 [2005-03-25 12:16:32 EST] INFO: analyze_threshold: 113582; vacuum_threshold: 227164 Hope this helps, if there is anything else I can do please let me know. > If they are the same then it seems like it must be a backend issue. > > One thing that is possibly relevant here is that in 8.0 a plain VACUUM > doesn't set reltuples to the exactly correct number, but to an > interpolated value that reflects our estimate of the "steady state" > average between vacuums. I wonder if that code is wrong, or if it's > operating as designed but is confusing autovac. This average steady state value might be hard to interpolete in this case since this is only a temporary holding place for the records ..? Normaly the table has < 10 records in it at the same time. In the lab we create a "lump-traffic" by sending over 5 Records. It takes about 20 hours to transfer over all of the 50k records. ---(end of broadcast)--- TIP 8: explain analyze is your friend
lazy_update_relstats considered harmful (was Re: [PERFORM] pg_autovacuum not having enough suction ?)
I wrote: > One thing that is possibly relevant here is that in 8.0 a plain VACUUM > doesn't set reltuples to the exactly correct number, but to an > interpolated value that reflects our estimate of the "steady state" > average between vacuums. I wonder if that code is wrong, or if it's > operating as designed but is confusing autovac. Now that I think it over, I'm thinking that I must have been suffering severe brain fade the day I wrote lazy_update_relstats() (see vacuumlazy.c). The numbers that that routine is averaging are the pre- and post-vacuum physical tuple counts. But the difference between them consists of known-dead tuples, and we shouldn't be factoring dead tuples into reltuples. The planner has always considered reltuples to count only live tuples, and I think this is correct on two grounds: 1. The numbers of tuples estimated to be returned by scans certainly shouldn't count dead ones. 2. Dead tuples don't have that much influence on scan costs either, at least not once they are marked as known-dead. Certainly they shouldn't be charged at full freight. It's possible that there'd be some value in adding a column to pg_class to record dead tuple count, but given what we have now, the calculation in lazy_update_relstats is totally wrong. The idea I was trying to capture is that the tuple density is at a minimum right after VACUUM, and will increase as free space is filled in until the next VACUUM, so that recording the exact tuple count underestimates the number of tuples that will be seen on-the-average. But I'm not sure that idea really holds water. The only way that a table can be at "steady state" over a long period is if the number of live tuples remains roughly constant (ie, inserts balance deletes). What actually increases and decreases over a VACUUM cycle is the density of *dead* tuples ... but per the above arguments this isn't something we should adjust reltuples for. So I'm thinking lazy_update_relstats should be ripped out and we should go back to recording just the actual stats. Sound reasonable? Or was I right the first time and suffering brain fade today? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] pg_autovacuum not having enough suction ?
> Was reltuples = 113082 correct right after the vacuum? No, There where about 31000 rows after the vacuum. I'm no expert but tuples = rows, right ? This is not a "normal" table though, in the sence that it is only a temporary holding ground as I explained earlier. I create 5 records and these get sent over from our custom 68030 system, to tables like file_92_myy, depending on the date of the record. A pl/pgsql script is used as a trigger to move the records after they get data from the 68030. Don't know if that is of interest or not. I could post the trigger if you'd like. ""Matthew T. O'Connor"" wrote in message news:[EMAIL PROTECTED] > hmm the value in reltuples should be accurate after a vacuum (or > vacuum analyze) if it's not it's a vacuum bug or something is going on > that isn't understood. If you or pg_autovacuum are running plain > analyze commands, that could explain the invalid reltules numbers. > > Was reltuples = 113082 correct right after the vacuum? > > Matthew > > > Otto Blomqvist wrote: > > >It looks like the reltuples-values are screwed up. Even though rows are > >constantly being removed from the table the reltuples keep going up. If I > >understand correctly that also makes the Vacuum threshold go up and we end > >up in a vicious circle. Right after pg_autovacuum performed a vacuum analyze > >on the table it actually had 31000 records, but reltuples reports over 100k. > >I'm not sure if this means anything But i thought i would pass it along. > > > >PG version 8.0.0, 31MB tarred DB. > > > >[2005-03-25 09:16:14 EST] INFO:dbname: testing > >[2005-03-25 09:16:14 EST] INFO: oid: 9383816 > >[2005-03-25 09:16:14 EST] INFO: username: (null) > >[2005-03-25 09:16:14 EST] INFO: password: (null) > >[2005-03-25 09:16:14 EST] INFO: conn is null, (not connected) > >[2005-03-25 09:16:14 EST] INFO: default_analyze_threshold: 1000 > >[2005-03-25 09:16:14 EST] INFO: default_vacuum_threshold: 500 > > > > > >[2005-03-25 09:05:12 EST] INFO: table name: secom."public"."file_92" > >[2005-03-25 09:05:12 EST] INFO: relid: 9384219; relisshared: 0 > >[2005-03-25 09:05:12 EST] INFO: reltuples: 49185.00; relpages: > >8423 > >[2005-03-25 09:05:12 EST] INFO: curr_analyze_count: 919274; > >curr_vacuum_count: 658176 > >[2005-03-25 09:05:12 EST] INFO: last_analyze_count: 899272; > >last_vacuum_count: 560541 > >[2005-03-25 09:05:12 EST] INFO: analyze_threshold: 49685; > >vacuum_threshold: 100674 > > > > > >[2005-03-25 09:10:12 EST] DEBUG: Performing: VACUUM ANALYZE > >"public"."file_92" > >[2005-03-25 09:10:33 EST] INFO: table name: secom."public"."file_92" > >[2005-03-25 09:10:33 EST] INFO: relid: 9384219; relisshared: 0 > >[2005-03-25 09:10:33 EST] INFO: reltuples: 113082.00; relpages: > >6624 > >[2005-03-25 09:10:33 EST] INFO: curr_analyze_count: 923820; > >curr_vacuum_count: 662699 > >[2005-03-25 09:10:33 EST] INFO: last_analyze_count: 923820; > >last_vacuum_count: 662699 > >[2005-03-25 09:10:33 EST] INFO: analyze_threshold: 113582; > >vacuum_threshold: 227164 > > > > > >[2005-03-25 09:16:14 EST] INFO: table name: secom."public"."file_92" > >[2005-03-25 09:16:14 EST] INFO: relid: 9384219; relisshared: 0 > >[2005-03-25 09:16:14 EST] INFO: reltuples: 113082.00; relpages: > >6624 <-- Actually has 31k rows > >[2005-03-25 09:16:14 EST] INFO: curr_analyze_count: 923820; > >curr_vacuum_count: 662699 > >[2005-03-25 09:16:14 EST] INFO: last_analyze_count: 923820; > >last_vacuum_count: 662699 > >[2005-03-25 09:16:14 EST] INFO: analyze_threshold: 113582; > >vacuum_threshold: 227164 > > > >DETAIL: Allocated FSM size: 1000 relations + 200 pages = 11784 kB > >shared memory. > > > > > > > > > >- Original Message - > >From: "Matthew T. O'Connor" > >To: "Otto Blomqvist" <[EMAIL PROTECTED]>; > > > >Sent: Thursday, March 24, 2005 3:58 PM > >Subject: Re: [PERFORM] pg_autovacuum not having enough suction ? > > > > > > > > > >>I would rather keep this on list since other people can chime in. > >> > >>Otto Blomqvist wrote: > >> > >> > >> > >>>It does not seem to be a Stats collector problem. > >>> > >>> oid | relname | relnamespace | relpages | relisshared | reltuples | > >>>schemaname | n_tup_ins | n_tup_upd | n_tup_del > >>> > >>> > >>-+-+--+--+-+---+-- - > >> > >> > >- > > > > > >>>+---+---+--- > >>>9384219 | file_92 | 2200 | 8423 | f | 49837 | > >>>public |158176 |318527 |158176 > >>>(1 row) > >>> > >>>I insert 5 records > >>> > >>>secom=# select createfile_92records(1, 5);<--- this is a pg > >>> > >>> > >script > > > > > >>>that inserts records 1 threw 5. > >>>createfile_92records > >>>-- > >>> 0 > >>> > >>> > >>> oid | relname | r
Re: [PERFORM] pg_autovacuum not having enough suction ?
"Matthew T. O'Connor" writes: > hmm the value in reltuples should be accurate after a vacuum (or > vacuum analyze) if it's not it's a vacuum bug or something is going on > that isn't understood. If you or pg_autovacuum are running plain > analyze commands, that could explain the invalid reltules numbers. > Was reltuples = 113082 correct right after the vacuum? Another thing to check is whether the reltuples (and relpages!) that autovacuum is reporting are the same as what's actually in the pg_class row for the relation. I'm wondering if this could be a similar issue to the old autovac bug where it wasn't reading the value correctly. If they are the same then it seems like it must be a backend issue. One thing that is possibly relevant here is that in 8.0 a plain VACUUM doesn't set reltuples to the exactly correct number, but to an interpolated value that reflects our estimate of the "steady state" average between vacuums. I wonder if that code is wrong, or if it's operating as designed but is confusing autovac. Can autovac be told to run the vacuums in VERBOSE mode? It would be useful to compare what VERBOSE has to say to the changes in reltuples/relpages. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] pg_autovacuum not having enough suction ?
hmm the value in reltuples should be accurate after a vacuum (or vacuum analyze) if it's not it's a vacuum bug or something is going on that isn't understood. If you or pg_autovacuum are running plain analyze commands, that could explain the invalid reltules numbers. Was reltuples = 113082 correct right after the vacuum? Matthew Otto Blomqvist wrote: It looks like the reltuples-values are screwed up. Even though rows are constantly being removed from the table the reltuples keep going up. If I understand correctly that also makes the Vacuum threshold go up and we end up in a vicious circle. Right after pg_autovacuum performed a vacuum analyze on the table it actually had 31000 records, but reltuples reports over 100k. I'm not sure if this means anything But i thought i would pass it along. PG version 8.0.0, 31MB tarred DB. [2005-03-25 09:16:14 EST] INFO:dbname: testing [2005-03-25 09:16:14 EST] INFO: oid: 9383816 [2005-03-25 09:16:14 EST] INFO: username: (null) [2005-03-25 09:16:14 EST] INFO: password: (null) [2005-03-25 09:16:14 EST] INFO: conn is null, (not connected) [2005-03-25 09:16:14 EST] INFO: default_analyze_threshold: 1000 [2005-03-25 09:16:14 EST] INFO: default_vacuum_threshold: 500 [2005-03-25 09:05:12 EST] INFO: table name: secom."public"."file_92" [2005-03-25 09:05:12 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-25 09:05:12 EST] INFO: reltuples: 49185.00; relpages: 8423 [2005-03-25 09:05:12 EST] INFO: curr_analyze_count: 919274; curr_vacuum_count: 658176 [2005-03-25 09:05:12 EST] INFO: last_analyze_count: 899272; last_vacuum_count: 560541 [2005-03-25 09:05:12 EST] INFO: analyze_threshold: 49685; vacuum_threshold: 100674 [2005-03-25 09:10:12 EST] DEBUG: Performing: VACUUM ANALYZE "public"."file_92" [2005-03-25 09:10:33 EST] INFO: table name: secom."public"."file_92" [2005-03-25 09:10:33 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-25 09:10:33 EST] INFO: reltuples: 113082.00; relpages: 6624 [2005-03-25 09:10:33 EST] INFO: curr_analyze_count: 923820; curr_vacuum_count: 662699 [2005-03-25 09:10:33 EST] INFO: last_analyze_count: 923820; last_vacuum_count: 662699 [2005-03-25 09:10:33 EST] INFO: analyze_threshold: 113582; vacuum_threshold: 227164 [2005-03-25 09:16:14 EST] INFO: table name: secom."public"."file_92" [2005-03-25 09:16:14 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-25 09:16:14 EST] INFO: reltuples: 113082.00; relpages: 6624 <-- Actually has 31k rows [2005-03-25 09:16:14 EST] INFO: curr_analyze_count: 923820; curr_vacuum_count: 662699 [2005-03-25 09:16:14 EST] INFO: last_analyze_count: 923820; last_vacuum_count: 662699 [2005-03-25 09:16:14 EST] INFO: analyze_threshold: 113582; vacuum_threshold: 227164 DETAIL: Allocated FSM size: 1000 relations + 200 pages = 11784 kB shared memory. - Original Message - From: "Matthew T. O'Connor" To: "Otto Blomqvist" <[EMAIL PROTECTED]>; Sent: Thursday, March 24, 2005 3:58 PM Subject: Re: [PERFORM] pg_autovacuum not having enough suction ? I would rather keep this on list since other people can chime in. Otto Blomqvist wrote: It does not seem to be a Stats collector problem. oid | relname | relnamespace | relpages | relisshared | reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del -+-+--+--+-+---+--- - +---+---+--- 9384219 | file_92 | 2200 | 8423 | f | 49837 | public |158176 |318527 |158176 (1 row) I insert 5 records secom=# select createfile_92records(1, 5);<--- this is a pg script that inserts records 1 threw 5. createfile_92records -- 0 oid | relname | relnamespace | relpages | relisshared | reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del -+-+--+--+-+---+--- - +---+---+--- 9384219 | file_92 | 2200 | 8423 | f | 49837 | public |208179 |318932 |158377 (1 row) reltuples does not change ? Hmm. n_tup_ins looks fine. That is expected, reltuples only gets updated by a vacuum or an analyze. This table is basically a queue full of records waiting to get transfered over from our 68030 system to the PG database. The records are then moved into folders (using a trigger) like file_92_myy depending on what month the record was created on the 68030. During normal operations there should not be more than 10 records at a time in the table, although during the course of a day a normal system will get about 50k records. I create 5 records to simulate incoming traffic, since we don't have much traffic in
Re: [PERFORM] Delete query takes exorbitant amount of time
Simon Riggs <[EMAIL PROTECTED]> writes: > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: >>> Other than spec compliance, you mean? SQL99 says >>> >>> ... The declared type of each referencing column shall be >>> comparable to the declared type of the corresponding referenced >>> column. > Tom had said SQL99 required this; I have pointed out SQL:2003, which > supercedes the SQL99 standard, does not require this. You're reading the wrong part of SQL:2003. 11.8 syntax rule 9 still has the text I quoted. > Leading us back to my original point - what is the benefit of continuing > with having a WARNING when that leads people into trouble later? Accepting spec-compliant schemas. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 2005-03-25 at 08:23 -0800, Stephan Szabo wrote: > On Fri, 25 Mar 2005, Simon Riggs wrote: > > > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > > > Simon Riggs <[EMAIL PROTECTED]> writes: > > > > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > > > > this situation to continue. > > > > > > Other than spec compliance, you mean? SQL99 says > > > > > > ... The declared type of each referencing column shall be > > > comparable to the declared type of the corresponding > > > referenced > > > column. > > > > > > It doesn't say that it has to be indexable, and most definitely not that > > > there has to be an index. > > > > specs at dawn, eh? > > > > Well, SQL:2003 Foundation, p.550 clause 3a) states that the the > > in the referencing table must match a unique > > constraint on the referenced table, or the PRIMARY KEY if the columns > > are not specified. Either way, the referenced columns are a unique > > constraint (which makes perfect sense from a logical data perspective). > > > > We implement unique constraints via an index, so for PostgreSQL the > > clause implies that it must refer to an index. > > IMHO, that reference is irrrelevant. Tom had said SQL99 required this; I have pointed out SQL:2003, which supercedes the SQL99 standard, does not require this. Leading us back to my original point - what is the benefit of continuing with having a WARNING when that leads people into trouble later? > Yes, there must be an index due to > our implementation, however that doesn't imply that the types must be the > same No, it doesn't imply it, but what benefit do you see from the interpretation that they are allowed to differ? That interpretation currently leads to many mistakes leading to poor performance. There is clear benefit from forcing them to be the same. In logical data terms, they *should* be the same. I don't check fruit.apple_grade against fruit_type.orange_grade. When would I want to make a check of that nature? If there is a reason, thats great, lets keep status quo then. I respect the effort and thought that has already gone into the implementation; I seek only to offer a very minor improvement based upon recent list issues. > nor even that the index must be usable for the cross table > comparison. Thats a separate discussion, possibly the next one. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pg_autovacuum not having enough suction ?
It looks like the reltuples-values are screwed up. Even though rows are constantly being removed from the table the reltuples keep going up. If I understand correctly that also makes the Vacuum threshold go up and we end up in a vicious circle. Right after pg_autovacuum performed a vacuum analyze on the table it actually had 31000 records, but reltuples reports over 100k. I'm not sure if this means anything But i thought i would pass it along. PG version 8.0.0, 31MB tarred DB. [2005-03-25 09:16:14 EST] INFO:dbname: testing [2005-03-25 09:16:14 EST] INFO: oid: 9383816 [2005-03-25 09:16:14 EST] INFO: username: (null) [2005-03-25 09:16:14 EST] INFO: password: (null) [2005-03-25 09:16:14 EST] INFO: conn is null, (not connected) [2005-03-25 09:16:14 EST] INFO: default_analyze_threshold: 1000 [2005-03-25 09:16:14 EST] INFO: default_vacuum_threshold: 500 [2005-03-25 09:05:12 EST] INFO: table name: secom."public"."file_92" [2005-03-25 09:05:12 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-25 09:05:12 EST] INFO: reltuples: 49185.00; relpages: 8423 [2005-03-25 09:05:12 EST] INFO: curr_analyze_count: 919274; curr_vacuum_count: 658176 [2005-03-25 09:05:12 EST] INFO: last_analyze_count: 899272; last_vacuum_count: 560541 [2005-03-25 09:05:12 EST] INFO: analyze_threshold: 49685; vacuum_threshold: 100674 [2005-03-25 09:10:12 EST] DEBUG: Performing: VACUUM ANALYZE "public"."file_92" [2005-03-25 09:10:33 EST] INFO: table name: secom."public"."file_92" [2005-03-25 09:10:33 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-25 09:10:33 EST] INFO: reltuples: 113082.00; relpages: 6624 [2005-03-25 09:10:33 EST] INFO: curr_analyze_count: 923820; curr_vacuum_count: 662699 [2005-03-25 09:10:33 EST] INFO: last_analyze_count: 923820; last_vacuum_count: 662699 [2005-03-25 09:10:33 EST] INFO: analyze_threshold: 113582; vacuum_threshold: 227164 [2005-03-25 09:16:14 EST] INFO: table name: secom."public"."file_92" [2005-03-25 09:16:14 EST] INFO: relid: 9384219; relisshared: 0 [2005-03-25 09:16:14 EST] INFO: reltuples: 113082.00; relpages: 6624 <-- Actually has 31k rows [2005-03-25 09:16:14 EST] INFO: curr_analyze_count: 923820; curr_vacuum_count: 662699 [2005-03-25 09:16:14 EST] INFO: last_analyze_count: 923820; last_vacuum_count: 662699 [2005-03-25 09:16:14 EST] INFO: analyze_threshold: 113582; vacuum_threshold: 227164 DETAIL: Allocated FSM size: 1000 relations + 200 pages = 11784 kB shared memory. - Original Message - From: "Matthew T. O'Connor" To: "Otto Blomqvist" <[EMAIL PROTECTED]>; Sent: Thursday, March 24, 2005 3:58 PM Subject: Re: [PERFORM] pg_autovacuum not having enough suction ? > I would rather keep this on list since other people can chime in. > > Otto Blomqvist wrote: > > >It does not seem to be a Stats collector problem. > > > > oid | relname | relnamespace | relpages | relisshared | reltuples | > >schemaname | n_tup_ins | n_tup_upd | n_tup_del > >-+-+--+--+-+---+--- - > >+---+---+--- > > 9384219 | file_92 | 2200 | 8423 | f | 49837 | > >public |158176 |318527 |158176 > >(1 row) > > > >I insert 5 records > > > >secom=# select createfile_92records(1, 5);<--- this is a pg script > >that inserts records 1 threw 5. > > createfile_92records > >-- > >0 > > > > > > oid | relname | relnamespace | relpages | relisshared | reltuples | > >schemaname | n_tup_ins | n_tup_upd | n_tup_del > >-+-+--+--+-+---+--- - > >+---+---+--- > > 9384219 | file_92 | 2200 | 8423 | f | 49837 | > >public |208179 |318932 |158377 > >(1 row) > > > >reltuples does not change ? Hmm. n_tup_ins looks fine. > > > > > > That is expected, reltuples only gets updated by a vacuum or an analyze. > > >This table is basically a queue full of records waiting to get transfered > >over from our 68030 system to the PG database. The records are then moved > >into folders (using a trigger) like file_92_myy depending on what month the > >record was created on the 68030. During normal operations there should not > >be more than 10 records at a time in the table, although during the course > >of a day a normal system will get about 50k records. I create 5 records > >to simulate incoming traffic, since we don't have much traffic in the test > >lab. > > > >After a few hours we have > > > >secom=# select count(*) from file_92; > > count > >--- > > 42072 > > > >So we have sent over approx 8000 Records. > > > > oid | relname | relnamespace | relpages | relisshared | reltuples | > >schemaname | n_tup_ins | n_tup_upd | n_tup_del >
Re: [PERFORM] Script for getting a table of reponse-time breakdown
Josh, The description of the Oracle script is: This script can be used to focus tuning attention on the most important issues. It reports a breakdown of total foreground response time into four major categories: CPU usage, disk I/O, resource waits, and routine latencies. These categories are broken down further into sub-categories, and the component wait events are shown. The 8.1.6 version of the script uses the ratio_to_report analytic function to calculate percentages. The 8.1.5 version can be used if percentages are not required. The 8.1.5 version of the script should work on Oracle8 also, but has not yet been tested. The print out of this script is: SQL> @response_time_breakdown MAJORMINOR WAIT_EVENTSECONDSPCT - -- CPU time parsing n/a 497 .57% reloads n/a13 .01% execution n/a 52618 59.99% disk I/O normal I/Odb file sequential read 21980 25.06% full scansdb file scattered read 9192 10.48% direct I/Odirect path read 1484 1.69% direct path write 354 .40% other I/O log file sequential read9 .01% db file parallel read 0 .00% control file sequential read0 .00% control file parallel write 0 .00% waitsDBWn writes rdbms ipc reply 143 .16% free buffer waits 36 .04% checkpoint completed 31 .04% LGWR writes log file switch completion698 .80% other locks latch free496 .57% sort segment request 108 .12% latency commits log file sync 6 .01% network SQL*Net break/reset to client 18 .02% SQL*Net more data to client 8 .01% SQL*Net message to client 7 .01% SQL*Net more data from client 3 .00% file ops file open 4 .01% file identify 1 .00% misc instance state change 0 .00% The script is pretty long: --- -- -- Script: response_time_breakdown.sql -- Purpose: to report the components of foreground response time in % terms -- For: 8.0 to 8.1.5 -- -- Copyright: (c) Ixora Pty Ltd -- Author: Steve Adams -- --- @save_sqlplus_settings column major format a8 column wait_event format a40 trunc column secondsformat 999 column pctjustify right break on major skip 1 on minor select substr(n_major, 3) major, substr(n_minor, 3) minor, wait_event, round(time/100) seconds from ( select /*+ ordered use_hash(b) */ '1 CPU time' n_major, decode(t.ksusdnam, 'redo size', '2 reloads', 'parse time cpu', '1 parsing', '3 execution' ) n_minor, 'n/a' wait_event, decode(t.ksusdnam, 'redo size', nvl(r.time, 0), 'parse time cpu', t.ksusgstv - nvl(b.time, 0), t.ksusgstv - nvl(b.time, 0) - nvl(r.time, 0) ) time from sys.x_$ksusgsta t, ( select /*+ ordered use_nl(s) */ -- star query: few rows from d and b s.ksusestn, -- statistic# sum(s.ksusestv) time -- time used by backgrounds from sys.x_$ksusd d, -- statname sys.x_$ksuse b, -- session sys.x_$ksbdp p, -- background process sys.x_$ksusesta s-- sesstat where d.ksusdnam in ( 'parse time cpu', 'CPU used when call started') and b.ksspaown = p.ksbdppro and s.ksusestn = d.indx and s.indx = b.indx
Re: [PERFORM] Script for getting a table of reponse-time breakdown
Jack, > This script can be used to focus tuning attention on the most important > issues. It reports a breakdown of total foreground response time into > four major categories: CPU usage, disk I/O, resource waits, and routine > latencies. These categories are broken down further into sub-categories, > and the component wait events are shown. This would be very nice. And very, very hard to build. No, we don't have anything similar. You can, of course, use profiling tools. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Script for getting a table of reponse-time breakdown
Jack, > I am thinking about how to continuously monitor the performance of a > PostgreSQL 8 database. I am interested in two things: (1) the growth of > tables with TOAST and indexes; This is queryable from the system tables, if you don't mind an approximate. > and (2) the respond time breakdown for a > query. The what? You mean EXPLAIN ANALYZE? > In Chapters 23 and 24 of the big manual, I found enough materials to > teach me how to do the 1st job. But I have difficulty with the 2nd one. > I found some script for Oracle > (http://www.ixora.com.au/scripts/waits.htm). Life's too short for reading Oracle docs. Can you just explain, in step-by-step detail, what you want? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Delete query takes exorbitant amount of time
Karim, > I guess I was wondering if there is other general tuning advice for such > large table indexes such as increasing statistics, etc. Well, your index use problem is being explained by Tom, Stephan and Simon; basically your FKed data types are incompatible for index use purposes so the system *can't* use an index while loading. If you're going with the drop/load/recreate option, then I'd suggest increasing work_mem for the duration. Hmmm ... or maintenance_work_mem? What gets used for FK checks? Simon? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 25 Mar 2005, Simon Riggs wrote: > On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > > Simon Riggs <[EMAIL PROTECTED]> writes: > > > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > > > this situation to continue. > > > > Other than spec compliance, you mean? SQL99 says > > > > ... The declared type of each referencing column shall be > > comparable to the declared type of the corresponding referenced > > column. > > > > It doesn't say that it has to be indexable, and most definitely not that > > there has to be an index. > > specs at dawn, eh? > > Well, SQL:2003 Foundation, p.550 clause 3a) states that the the > in the referencing table must match a unique > constraint on the referenced table, or the PRIMARY KEY if the columns > are not specified. Either way, the referenced columns are a unique > constraint (which makes perfect sense from a logical data perspective). > > We implement unique constraints via an index, so for PostgreSQL the > clause implies that it must refer to an index. IMHO, that reference is irrrelevant. Yes, there must be an index due to our implementation, however that doesn't imply that the types must be the same, nor even that the index must be usable for the cross table comparison. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 2005-03-25 at 10:17 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > > this situation to continue. > > Other than spec compliance, you mean? SQL99 says > > ... The declared type of each referencing column shall be > comparable to the declared type of the corresponding referenced > column. > > It doesn't say that it has to be indexable, and most definitely not that > there has to be an index. specs at dawn, eh? Well, SQL:2003 Foundation, p.550 clause 3a) states that the the in the referencing table must match a unique constraint on the referenced table, or the PRIMARY KEY if the columns are not specified. Either way, the referenced columns are a unique constraint (which makes perfect sense from a logical data perspective). We implement unique constraints via an index, so for PostgreSQL the clause implies that it must refer to an index. touche, Monsieur Lane and Happy Easter :-) But even without that, there is little benefit in allowing it... WARNING -> ERROR, please. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] [BUGS] BUG #1552: massive performance hit
On Fri, 2005-03-25 at 03:50 -0700, Karim Nassar wrote: > On Fri, 2005-03-25 at 10:18 +, Simon Riggs wrote: > > > When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine > > > with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were > > > able to insert all this data in 5-7 minutes. It's taken a while to > > > install Postgres 8.0.1 on the same machine, but now I have, and it's > > > taking 40-45 minutes to run the same insert script. > > > > > OK. Not-a-bug. > > > > Your situation is covered in the manual with some sage advice > > http://www.postgresql.org/docs/8.0/static/populate.html > > It doesn't go into great lengths about all the reasons why those > > recommendations are good ones - but they are clear. > Simon, this begs the question: what changed from 7.4->8.0 to require he > modify his script? Good question. Clearly, some combination of stats-plus-index-selection code changed but I suspect this is a case of more, not less accuracy, affecting us here. The FK code literally generates SQL statements, then prepares them. AFAICS it should be possible to add more code to src/backend/utils/adt/ritrigger.c to force the prepare of FK code to avoid seq scans by executing "SET enable_seqscan = off;" I'll have a play But, the wider point raised by this is whether Prepare should be more conservative in the plan it generates. When we Execute a single query, it is perfectly OK to go for the "best" plan, since it is being executed only this once and we can tell, right now, which one the "best" is. With a Prepared query, it is clearly going to be executed many times and so we should consider that the optimal plan may change over time. Index access has more overhead for small tables, but increases by (I think) only logN as the number of rows in a table, N, increases. Sequential scan access varies by N. Thus, as N increases from zero, first of all Seq Scan is the best plan - but only marginally better than Index access, then this changes at some value of N, then after that index access is the best plan. As N increases, Seq Scan access clearly diverges badly from Indexed access. The conservative choice for unknown, or varying N would be index access, rather than the best plan available when the query is prepared. I propose a more general TODO item: * Make Prepared queries always use indexed access, if it is available Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Delete query takes exorbitant amount of time
Simon Riggs <[EMAIL PROTECTED]> writes: > I vote to make this an ERROR in 8.1 - I see little benefit in allowing > this situation to continue. Other than spec compliance, you mean? SQL99 says ... The declared type of each referencing column shall be comparable to the declared type of the corresponding referenced column. It doesn't say that it has to be indexable, and most definitely not that there has to be an index. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Script for getting a table of reponse-time breakdown
Hi, I am thinking about how to continuously monitor the performance of a PostgreSQL 8 database. I am interested in two things: (1) the growth of tables with TOAST and indexes; and (2) the respond time breakdown for a query. In Chapters 23 and 24 of the big manual, I found enough materials to teach me how to do the 1st job. But I have difficulty with the 2nd one. I found some script for Oracle (http://www.ixora.com.au/scripts/waits.htm). Do we have something that can do the same job for PostgreSQL 8? Thanks. -Jack ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Delete query takes exorbitant amount of time
On Fri, 2005-03-25 at 01:58 -0500, Tom Lane wrote: > Christopher Kings-Lynne <[EMAIL PROTECTED]> writes: > >> In that case there's a datatype mismatch between the referencing and > >> referenced columns, which prevents the index from being used for the > >> FK check. > > > Is creating such a foreign key a WARNING yet? > > I believe so as of 8.0. It's a bit tricky since 8.0 does allow some > cross-type cases to be indexed, but IIRC we have a test that understands > about that... src/backend/commands/tablecmds.c, line 3966 in CVSTIP /* * Check that the found operator is compatible with the PK index, * and generate a warning if not, since otherwise costly seqscans * will be incurred to check FK validity. */ if (!op_in_opclass(oprid(o), opclasses[i])) ereport(WARNING, (errmsg("foreign key constraint \"%s\" " "will require costly sequential scans", fkconstraint->constr_name), errdetail("Key columns \"%s\" and \"%s\" " "are of different types: %s and %s.", strVal(list_nth(fkconstraint->fk_attrs, i)), strVal(list_nth(fkconstraint->pk_attrs, i)), format_type_be(fktypoid[i]), format_type_be(pktypoid[i]; So, yes to the WARNING. Not sure about the cross-type cases... Karim: Did this happen? If not, can you drop and re-create and confirm that you get the WARNING? If not, we have problems. I vote to make this an ERROR in 8.1 - I see little benefit in allowing this situation to continue. If users do create a FK like this, it just becomes another performance problem on list... Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [BUGS] BUG #1552: massive performance hit between 7.4 and 8.0.1
Simon Riggs <[EMAIL PROTECTED]> writes: > I think we should spawn a TODO item from this: > * Coerce FK lookups to always use an available index No, we aren't doing that. The correct TODO item is "Replan cached plans when table size has changed a lot" which of course depends on having a framework to do replanning at all. I intend to take a look at that once Neil has created such a framework ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [BUGS] BUG #1552: massive performance hit between 7.4 and 8.0.1
On Wed, 2005-03-23 at 14:22 -0500, Keith Browne wrote: > Simon Riggs wrote: > > > The EXPLAINs you've enclosed are for SELECTs, yet your bug report > > describes INSERTs as being the things that are slow. > > [You may find better performance from using COPY] > We're starting with an empty database, creating four tables, and > populating those tables with a total of 180,000-200,000 rows. Each > table has a primary key, and several of the tables reference foreign > keys in other tables. We've written a Python script, using psycopg, > which executes all the queries to create the tables and insert the rows. >The database is running on the same machine where the script runs. > > I've seen similar performance when issuing a COMMIT after each > insertion, and also after batching insertions in blocks of 250 per > COMMIT, so batching the commits is not helping much. I've looked at the > possibility of using COPY, but in our production environment it will be > prohibitive to build a flat file with all this data. I'd rather > generate it on the fly, as we've been able to do with PostgreSQL 7.4. > > > Also, your tests have compared two systems, so it might be that the > > hardware or configuration of one system is different from the other. > > When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine > with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were > able to insert all this data in 5-7 minutes. It's taken a while to > install Postgres 8.0.1 on the same machine, but now I have, and it's > taking 40-45 minutes to run the same insert script. This is similar to > the performance we saw on another machine, a fast single-CPU AMD64 box > running Gentoo. > > I don't think it's a hardware issue. I dug around a bit, and found > suggestions that this sort of problem could be worked around by breaking > the database connection and restarting it after the tables had been > partially filled. I modified our script to break and re-establish the > database connection when each table first has 4,000 records inserted, > and the performance is greatly improved; it now takes only about 3.5 > minutes to insert 180,000+ rows. > > I've since modified this script to build and populate a fifth table with > over 1.3 million rows. The fifth table has no primary key, but lists a > foreign key into one of the first four tables. With the above > modification (break and re-build the DB connection after 4,000 rows have > been inserted), the whole database can be populated in about 15 minutes. > I wouldn't have dared try to build a one-million-plus-row table until > I found this speed-up. > > > If you could repeat the test on one single system, then this would > > assist in the diagnosis of this bug report. Also, if you could describe > > the workload that is giving you a problem more exactly, that would help. > > Specifically, can you confirm that you have run ANALYZE on the tables, > > and also give us some idea of numbers of rows in each table at the time > > you first run your programs. > > Just to see if it would help, I tried modifying the script to run an > ANALYZE against each table after 4,000 insertions, instead of breaking > and re-establishing the DB connection. I still saw ~45-minute times to > insert 180,000 rows. I then tried running ANALYZE against each table > after *each* 4,000 rows inserted, and again, it took about 45 minutes to > run the insert. > > Each table is empty when I first run the program. I am dropping and > re-creating the database for each test run. > > > There is clearly a problem, but it is not yet clearly a bug. If it is a > > bug, we're interested in solving it as much as you. > > I'd be happy to run further tests or provide more details, if they'll > help. We now have a workaround which is allowing us to proceed with our > project, but I'd like to know if there's another way to do this. While > I understand that large or complex databases require careful tuning, I > was surprised to see a six- or seven-fold increase in run times between > PostgreSQL 7.4 and 8.0.1 on the same hardware, on an operation which > seems fairly straightforward: populating an empty table. > > One other thing which puzzled me: as a test, I tried modifying our > script to spit out raw SQL statements instead of connecting to the > database and performing the inserts itself. Normally, our script > populates two tables in one pass, and then populates the third and > fourth tables in a second pass. I massaged the SQL by hand to group the > inserts together by table, so that the first table would be entirely > populated, then the second, etc. When I ran this SQL script by piping > it straight into psql, it finished in about four minutes. This is > comparable to the time it takes to run my modified script which breaks > and re-establishes the connection to the database. OK. Not-a-bug. Your situation is covered in the manual