Re: [HACKERS] Frequently updated tables
[EMAIL PROTECTED] wrote: I have been talking about two types of problems which are both based on PostgreSQL's behavior with frequently updated tables. Summary table: In the single row table system, you have to vacuum very requently, and this affects performance. Frequently updated tables: think about the session table for a website. Each new user gets a new session row. Everytime they refresh or act in the site, the row is updated. When they leave or their session times out, the row is deleted. I wrote a RAM only session manager for PHP because PostgreSQL couldn't handle the volume. (2000 hits a second) It would be interesting to see if the vacuum delay patch, fsm tuning + vacuum scheduling could have changed this situation. Clearly there is an issue here (hence a patch...), but ISTM that just as significant is the fact that it is difficult to know how to configure the various bits and pieces, and also difficult to know if it has been done optimally. If you have an active site, with hundreds or thousands of hits a second, vacuuming the table constantly is not practical. I don't think anyone who has seriously looked at these issues has concluded that PostgreSQL works fine in these cases. The question is what, if anything, can be done? The frequent update issue really affects PostgreSQL's acceptance in web applications, and one which MySQL seems to do a better job. As an aside, I have had similar issues with DB2 and high update tables - lock escalations (locklist tuning needed). It is not just non-overwriting storage managers that need the magic tuning wand :-) Funny, I've used DB2 for a few projects, but never for a web session system. This is an interesting data point thanks. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Frequently updated tables
[EMAIL PROTECTED] wrote: I have been talking about two types of problems which are both based on PostgreSQL's behavior with frequently updated tables. Summary table: In the single row table system, you have to vacuum very requently, and this affects performance. Frequently updated tables: think about the session table for a website. Each new user gets a new session row. Everytime they refresh or act in the site, the row is updated. When they leave or their session times out, the row is deleted. I wrote a RAM only session manager for PHP because PostgreSQL couldn't handle the volume. (2000 hits a second) It would be interesting to see if the vacuum delay patch, fsm tuning + vacuum scheduling could have changed this situation. Clearly there is an issue here (hence a patch...), but ISTM that just as significant is the fact that it is difficult to know how to configure the various bits and pieces, and also difficult to know if it has been done optimally. If you have an active site, with hundreds or thousands of hits a second, vacuuming the table constantly is not practical. I don't think anyone who has seriously looked at these issues has concluded that PostgreSQL works fine in these cases. The question is what, if anything, can be done? The frequent update issue really affects PostgreSQL's acceptance in web applications, and one which MySQL seems to do a better job. As an aside, I have had similar issues with DB2 and high update tables - lock escalations (locklist tuning needed). It is not just non-overwriting storage managers that need the magic tuning wand :-) regards Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequently updated tables
Mohawksoft: I don't think anyone who has seriously looked at these issues has concluded that PostgreSQL works fine in these cases. The question is what, if anything, can be done? The frequent update issue really affects PostgreSQL's acceptance in web applications, and one which MySQL seems to do a better job. I think that we'd welcome any suggestions that don't break MVCC. Do you have any? MySQL is able to handle this situation -- in MyISAM tables -- precisely because there is no transaction isolation and they regard 97% data integrity as good enough. Essentially, the MyISAM tables are little better than delimited text flatfiles. That's not an approach we can take. IMHO, this issue, a two stage commit based replication system, and a real and usable setup/configuration system are all that stands between PostgreSQL and the serious enterprise deployment. There I have to disagree with you. The features you mention may be important to your clients, but they are not to mine; instead, we're waiting for clustering, and table partitioning in addition to what's in 7.5. Please don't assume that all DB applications have the same needs as yours. The problems you raise are legitimate, but not everyone shares your priorities. Besides, we already have serious enterprise deployment. 5 of my clients are startups which run on PostgreSQL. The .ORG and .INFO domains run on PostgreSQL. There are two commerical-grade, deployed, ERP systems for manufacturers which run on PostgreSQL.What is your definition of enterprise deployment, exactly? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Frequently updated tables
[EMAIL PROTECTED] wrote: The best phrasing would be the accumulating overhead of deletes and updates. Yes. Are you using 7.3? I am asking because in 7.3 high update / delete tables could suffer (index and toast) bloat that was untamable via (lazy) VACUUM and FSM. I believe this is fixed in 7.4, so it should be possible to achieve on disk size control of tables / indexes by configuring FSM and (lazy) VACUUM. Did you find this not to be the case? Interesting, the company is usng 7.3.4. One single row summary table got up to 2 million dead rows. A select from that single row took a quarter of a second. A regular vacuum did not fix it, only a vacuum full did. However, when the test was re-run with constant vacuums, it did not get out of hand. My concern is performance, and yes, for inserts PostgreSQL is great. For data that is constantly being updated, PostgreSQL is a bit weak. Think about a table with a few million rows that needs to be updated a few thousand times a minute. I love PG, I've been using it since version 6x, and it has gotten fantastic over the years, and in many cases, I would choose it over Oracle, but for systems that need frequent updates, I have a lot of concerns. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequently updated tables
I love PG, I've been using it since version 6x, and it has gotten fantastic over the years, and in many cases, I would choose it over Oracle, but for systems that need frequent updates, I have a lot of concerns. ...that's the price you pay for concurrency man... I think that's a cop-out. Other databases can handle this case fine and they have MVCC. Are we not open source free software proponents? Isn't one of our motivations that we can do it better? This *is* a problem with PostgreSQL, and it *is* a concern for a reasonable number of potential deployments. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequently updated tables
I love PG, I've been using it since version 6x, and it has gotten fantastic over the years, and in many cases, I would choose it over Oracle, but for systems that need frequent updates, I have a lot of concerns. ...that's the price you pay for concurrency man... Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Frequently updated tables
On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: I love PG, I've been using it since version 6x, and it has gotten fantastic over the years, and in many cases, I would choose it over Oracle, but for systems that need frequent updates, I have a lot of concerns. ...that's the price you pay for concurrency man... Also he said that the problem was solved with enough lazy VACUUM scheduling. I don't understand why he doesn't want to use that solution. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) People get annoyed when you try to debug them. (Larry Wall) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequently updated tables
On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: I love PG, I've been using it since version 6x, and it has gotten fantastic over the years, and in many cases, I would choose it over Oracle, but for systems that need frequent updates, I have a lot of concerns. ...that's the price you pay for concurrency man... Also he said that the problem was solved with enough lazy VACUUM scheduling. I don't understand why he doesn't want to use that solution. Sigh, because vacuums take away from performance. Imagine a table that has to be updated on the order of a few thousand times a minute. Think about the drop in performance during the vacuum. On a one row table, vacuum is not so bad, but try some benchmarks on a table with a goodly number of rows. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Frequently updated tables
On Wed, Jun 09, 2004 at 13:41:27 -0400, [EMAIL PROTECTED] wrote: Sigh, because vacuums take away from performance. Imagine a table that has to be updated on the order of a few thousand times a minute. Think about the drop in performance during the vacuum. On a one row table, vacuum is not so bad, but try some benchmarks on a table with a goodly number of rows. But you only need to rapidly vacuum the one table that is keeping your totals record. This isn't going to be a big hit in performance relative to the updates that are going on. You don't need to vacuum the tables you are doing the inserts or updates to at that same rate. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Frequently updated tables
Also he said that the problem was solved with enough lazy VACUUM scheduling. I don't understand why he doesn't want to use that solution. Because even lazy VACUUM is horrendous to performance but as I said in a further post this has been pretty much fixed by (Jan I believe) in 7.5. Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL begin:vcard fn:Joshua D. Drake n:Drake;Joshua D. org:Command Prompt, Inc. adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0034 note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl. x-mozilla-html:FALSE url:http://www.commandprompt.com/ version:2.1 end:vcard ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Frequently updated tables
Sigh, because vacuums take away from performance. This is a known issue that has been pretty much resolved for 7.5. Vacuum in 7.5 does not take even close to as much IO resources. Imagine a table that has to be updated on the order of a few thousand times a minute. Think about the drop in performance during the vacuum. On a one row table, vacuum is not so bad, but try some benchmarks on a table with a goodly number of rows. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL begin:vcard fn:Joshua D. Drake n:Drake;Joshua D. org:Command Prompt, Inc. adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0034 note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl. x-mozilla-html:FALSE url:http://www.commandprompt.com/ version:2.1 end:vcard ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Frequently updated tables
On Wed, 2004-06-09 at 11:41, [EMAIL PROTECTED] wrote: On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: I love PG, I've been using it since version 6x, and it has gotten fantastic over the years, and in many cases, I would choose it over Oracle, but for systems that need frequent updates, I have a lot of concerns. ...that's the price you pay for concurrency man... Also he said that the problem was solved with enough lazy VACUUM scheduling. I don't understand why he doesn't want to use that solution. Sigh, because vacuums take away from performance. Imagine a table that has to be updated on the order of a few thousand times a minute. Think about the drop in performance during the vacuum. On a one row table, vacuum is not so bad, but try some benchmarks on a table with a goodly number of rows. Several points: All databases pay to clean up the mess they've made, so to speak. In PostgreSQL you get to choose when, instead of always paying the price at the end of transaction. Lazy vacuum does not impact performance nearly as much as the old full vacuum. With the sleep / delay patch that's been passed around on hackers its impact is virtually zero on the rest of the database Properly setup fsm settings, pg_autovacuum deamon, and an installation of the sleep / delay patch mentioned aobve makes this a non-issue. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Frequently updated tables
On Wed, Jun 09, 2004 at 01:41:27PM -0400, [EMAIL PROTECTED] wrote: On Wed, Jun 09, 2004 at 10:49:20PM +0800, Christopher Kings-Lynne wrote: Also he said that the problem was solved with enough lazy VACUUM scheduling. I don't understand why he doesn't want to use that solution. Sigh, because vacuums take away from performance. Imagine a table that has to be updated on the order of a few thousand times a minute. Think about the drop in performance during the vacuum. On a one row table, vacuum is not so bad, but try some benchmarks on a table with a goodly number of rows. Hmm, this can be a problem if VACUUM pollutes the shared buffer pool. So what about a new buffer replacement policy that takes this into account and is not fooled by VACUUM? This is already implemented in 7.5. Also, how about a background writer process that writes dirty buffers so that backends don't have to wait for IO to complete when a dirty buffer has to be written? This is also in current CVS. Have you tried and measured how the current CVS code performs? Jan Wieck reported a lot of performance improvement some time ago while he was developing this. The code has changed since and I have not seen any measurement. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Oh, oh, las chicas galacianas, lo harán por las perlas, ¡Y las de Arrakis por el agua! Pero si buscas damas Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Frequently updated tables
On Tue, Jun 08, 2004 at 07:16:45PM -0400, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: I've been down several roads about how to handle data that has to change on a very frequent and rapid manner. Think about summary tables, WEB session tables, etc. As great as MVCC is for the vast majority of uses. The overhead of updates and deletes can kill a project that needs to constantly update tables. Are you saying that MVCC has *by design* a higher overhead for updates and deletes? or are you referring to the gradual loss of performance as a consequence of many dead tuples? I am guessing you mean the latter, but best to be sure :-) The best phrasing would be the accumulating overhead of deletes and updates. Yes. Doesn't pg_autovacuum largely take care of this issue? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Frequently updated tables
On Wed, Jun 09, 2004 at 13:41:27 -0400, [EMAIL PROTECTED] wrote: Sigh, because vacuums take away from performance. Imagine a table that has to be updated on the order of a few thousand times a minute. Think about the drop in performance during the vacuum. On a one row table, vacuum is not so bad, but try some benchmarks on a table with a goodly number of rows. But you only need to rapidly vacuum the one table that is keeping your totals record. This isn't going to be a big hit in performance relative to the updates that are going on. You don't need to vacuum the tables you are doing the inserts or updates to at that same rate. I have been talking about two types of problems which are both based on PostgreSQL's behavior with frequently updated tables. Summary table: In the single row table system, you have to vacuum very requently, and this affects performance. Frequently updated tables: think about the session table for a website. Each new user gets a new session row. Everytime they refresh or act in the site, the row is updated. When they leave or their session times out, the row is deleted. I wrote a RAM only session manager for PHP because PostgreSQL couldn't handle the volume. (2000 hits a second) If you have an active site, with hundreds or thousands of hits a second, vacuuming the table constantly is not practical. I don't think anyone who has seriously looked at these issues has concluded that PostgreSQL works fine in these cases. The question is what, if anything, can be done? The frequent update issue really affects PostgreSQL's acceptance in web applications, and one which MySQL seems to do a better job. IMHO, this issue, a two stage commit based replication system, and a real and usable setup/configuration system are all that stands between PostgreSQL and the serious enterprise deployment. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Frequently updated tables
I've been down several roads about how to handle data that has to change on a very frequent and rapid manner. Think about summary tables, WEB session tables, etc. As great as MVCC is for the vast majority of uses. The overhead of updates and deletes can kill a project that needs to constantly update tables. In most cases, the answer is just not to use PostgreSQL for that, but then you are stuck with Mysql or something worse. Would having a special class of table that is marked for frequent updates, which causes the system to lock and update in place, make sense? Is that even possible? There has to be a way of making PostgreSQL able to handle this class of problem. Anyone have a reasonable idea? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Frequently updated tables
[EMAIL PROTECTED] wrote: I've been down several roads about how to handle data that has to change on a very frequent and rapid manner. Think about summary tables, WEB session tables, etc. As great as MVCC is for the vast majority of uses. The overhead of updates and deletes can kill a project that needs to constantly update tables. Are you saying that MVCC has *by design* a higher overhead for updates and deletes? or are you referring to the gradual loss of performance as a consequence of many dead tuples? I am guessing you mean the latter, but best to be sure :-) The best phrasing would be the accumulating overhead of deletes and updates. Yes. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequently updated tables
[EMAIL PROTECTED] wrote: The best phrasing would be the accumulating overhead of deletes and updates. Yes. Are you using 7.3? I am asking because in 7.3 high update / delete tables could suffer (index and toast) bloat that was untamable via (lazy) VACUUM and FSM. I believe this is fixed in 7.4, so it should be possible to achieve on disk size control of tables / indexes by configuring FSM and (lazy) VACUUM. Did you find this not to be the case? regards Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html