[PERFORM] Pros and Cons of 8.3.1
Hi, Can anyone who have started using 8.3.1 list out the pros and cons. Thanx in advance ~ Gauri
Re: [PERFORM] Pros and Cons of 8.3.1
Can anyone who have started using 8.3.1 list out the pros and cons. I upgraded to 8.3.1 yesterday from 8.3.0. I've used 8.3.0 since it was released and it's working fine. I upgraded from 7.4 (dump/restore) and it was working out of the box. We have somewhat simple sql-queries so there was no need to change/alter these. The largest table has approx. 85 mill. records (image-table). One thing I had newer seen before was that duplicate rows was inserted into our order-table but I don't know whether this is due to changes in the web-app or 8.3.0. Now that I upgraded to 8.3.1 I will wait a few weeks and see if I get the same error before I alter the column and add a unique contraint. So from a 7.4-perspective and fairly simple queries I don't see any issues. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Vacuum statistics
What version of Postgres you are running ? If you are using 8.3, you can use pg_stat_all_tables.If Not you can use http://www.postgresql.org/docs/current/static/pgstattuple.html Chirag On Tue, Apr 29, 2008 at 8:14 AM, Francisco Reyes [EMAIL PROTECTED] wrote: I recall reading posts in the past where one could query the stat tables and see how well autovacuum was performing. Not finding the posts. I found this query: SELECT relname, relkind, reltuples, relpages FROM pg_class where relkind = 'r'; From the output how can I tell the number of dead tuples? Or how effective autovacuum is in the particular table.. Recently inheritted several large Postgresql DBs (tables in the hundreds of millions and some tables over a billion rows) and I am just starting to go over them and see how autovacuum has been performing. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Please ignore ...
Marc G. Fournier wrote: Someone on this list has one of those 'confirm your email' filters on their mailbox, which is bouncing back messages ... this is an attempt to try and narrow down the address that is causing this ... Did you find out? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pros and Cons of 8.3.1
Gauri Kanekar wrote: Hi, Can anyone who have started using 8.3.1 list out the pros and cons. Thanx in advance ~ Gauri don't know for sure if it is windows to linux but we moved to 8.2 that was install on windows and moved to 8.3.1 on Ubuntu using the compiled version from Ubuntu We had minor annoying problem with implicit data conversion no longer happens Had several pl/pgsql functions called each other where a programmer got lazy to not making sure the variable typed matched the parameter type so we get an error yelling at us can't find function due to data type mismatch . Its was very easy to fix. There is allot changes to how Text searches work and the indexes -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pros and Cons of 8.3.1
On Fri, May 2, 2008 at 2:31 AM, in message [EMAIL PROTECTED], Gauri Kanekar [EMAIL PROTECTED] wrote: Can anyone who have started using 8.3.1 list out the pros and cons. There are bugs in the 8.3.1 release which bit us when we started using it; however, these are fixed in the 8.3 stable branch of cvs. We are running successfully with that. These fixes will be in 8.3.2 when it is released. http://archives.postgresql.org/pgsql-bugs/2008-04/msg00168.php It's generally a good idea to test with a new release before putting it into production, especially a major release. Since you asked on the performance list -- we have found performance to be significantly better under 8.3 than earlier releases. Also, the data takes less space on the disk, and checkpoint disk activity spikes are reduced in 8.3. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pros and Cons of 8.3.1
Attempting to resend. My first attempt was rejected with this explanation: Your message to the pgsql-performance list has been denied for the following reason(s): A message was previous posted with this Message-ID Duplicate Message-ID - [EMAIL PROTECTED] (Fri May 2 13:36:52 2008) Duplicate Partial Message Checksum (Fri May 2 13:36:52 2008) On Fri, May 2, 2008 at 2:31 AM, in message [EMAIL PROTECTED], Gauri Kanekar [EMAIL PROTECTED] wrote: Can anyone who have started using 8.3.1 list out the pros and cons. There are bugs in the 8.3.1 release which bit us when we started using it; however, these are fixed in the 8.3 stable branch of cvs. We are running successfully with that. These fixes will be in 8.3.2 when it is released. http://archives.postgresql.org/pgsql-bugs/2008-04/msg00168.php It's generally a good idea to test with a new release before putting it into production, especially a major release. Since you asked on the performance list -- we have found performance to be significantly better under 8.3 than earlier releases. Also, the data takes less space on the disk, and checkpoint disk activity spikes are reduced in 8.3. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pros and Cons of 8.3.1
Attempting to resend. My first attempt was rejected with this explanation: Your message to the pgsql-performance list has been denied for the following reason(s): A message was previous posted with this Message-ID Duplicate Message-ID - [EMAIL PROTECTED] (Fri May 2 13:36:52 2008) Duplicate Partial Message Checksum (Fri May 2 13:36:52 2008) Gauri Kanekar wrote: Can anyone who have started using 8.3.1 list out the pros and cons. There are bugs in the 8.3.1 release which bit us when we started using it; however, these are fixed in the 8.3 stable branch of cvs. We are running successfully with that. These fixes will be in 8.3.2 when it is released. http://archives.postgresql.org/pgsql-bugs/2008-04/msg00168.php It's generally a good idea to test with a new release before putting it into production, especially a major release. Since you asked on the performance list -- we have found performance to be significantly better under 8.3 than earlier releases. Also, the data takes less space on the disk, and checkpoint disk activity spikes are reduced in 8.3. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pros and Cons of 8.3.1
On Fri, 02 May 2008 12:11:34 -0500 Justin [EMAIL PROTECTED] wrote: don't know for sure if it is windows to linux but we moved to 8.2 that was install on windows and moved to 8.3.1 on Ubuntu using the compiled version from Ubuntu We had minor annoying problem with implicit data conversion no longer happens It is 8.3.x and the change was documented in the release notes. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate signature.asc Description: PGP signature
[PERFORM] two memory-consuming postgres processes
Greetings -- I have an UPDATE query updating a 100 million row table, and allocate enough memory via shared_buffers=1500MB. However, I see two processes in top, the UPDATE process eating about 850 MB and the writer process eating about 750 MB. The box starts paging. Why is there the writer taking almost as much space as the UPDATE, and how can I shrink it? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: Greetings -- I have an UPDATE query updating a 100 million row table, and allocate enough memory via shared_buffers=1500MB. However, I see two processes in top, the UPDATE process eating about 850 MB and the writer process eating about 750 MB. The box starts paging. Why is there the writer taking almost as much space as the UPDATE, and how can I shrink it? Shared_buffers is NOT the main memory pool for all operations in pgsql, it is simply the buffer pool used to hold data being operated on. Things like sorts etc. use other memory and can exhaust your machine. However, I'd like to see the output of vmstat 1 or top while this is happening. How much memory does this machine have? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On May 2, 2008, at 12:30 PM, Scott Marlowe wrote: On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: Greetings -- I have an UPDATE query updating a 100 million row table, and allocate enough memory via shared_buffers=1500MB. However, I see two processes in top, the UPDATE process eating about 850 MB and the writer process eating about 750 MB. The box starts paging. Why is there the writer taking almost as much space as the UPDATE, and how can I shrink it? Shared_buffers is NOT the main memory pool for all operations in pgsql, it is simply the buffer pool used to hold data being operated on. Things like sorts etc. use other memory and can exhaust your machine. However, I'd like to see the output of vmstat 1 or top while this is happening. How much memory does this machine have? It's a 2GB RAM MacBook. Here's the top for postgres Processes: 117 total, 2 running, 6 stuck, 109 sleeping... 459 threads 12 :34:27 Load Avg: 0.27, 0.24, 0.32CPU usage: 8.41% user, 11.06% sys, 80.53% idle SharedLibs: num = 15, resident = 40M code, 2172K data, 3172K linkedit. MemRegions: num = 20719, resident = 265M + 12M private, 1054M shared. PhysMem: 354M wired, 1117M active, 551M inactive, 2022M used, 19M free. VM: 26G + 373M 1176145(160) pageins, 1446482(2) pageouts PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 51775 postgres 6.8% 2:40.16 1 9 39 1504K 896M 859M+ 1562M 51767 postgres 0.0% 0:39.74 1 8 28 752K 896M 752M 1560M the first is the UPDATE, the second is the writer. The query is very simple, netflix= create index movs_mid_idx on movs(mid); CREATE INDEX netflix= update ratings set offset1=avg-rating from movs where mid=movie_id; where the table ratings has about 100 million rows, movs has about 20,000. I randomly increased values in postgresql.conf to shared_buffers = 1500MB max_fsm_pages = 200 max_fsm_relations = 1 Should I set the background writer parameters somehow to decrease the RAM consumed by the writer? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: On May 2, 2008, at 12:30 PM, Scott Marlowe wrote: On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: Greetings -- I have an UPDATE query updating a 100 million row table, and allocate enough memory via shared_buffers=1500MB. However, I see two processes in top, the UPDATE process eating about 850 MB and the writer process eating about 750 MB. The box starts paging. Why is there the writer taking almost as much space as the UPDATE, and how can I shrink it? Shared_buffers is NOT the main memory pool for all operations in pgsql, it is simply the buffer pool used to hold data being operated on. Things like sorts etc. use other memory and can exhaust your machine. However, I'd like to see the output of vmstat 1 or top while this is happening. How much memory does this machine have? It's a 2GB RAM MacBook. Here's the top for postgres Processes: 117 total, 2 running, 6 stuck, 109 sleeping... 459 threads 12:34:27 Load Avg: 0.27, 0.24, 0.32CPU usage: 8.41% user, 11.06% sys, 80.53% idle SharedLibs: num = 15, resident = 40M code, 2172K data, 3172K linkedit. MemRegions: num = 20719, resident = 265M + 12M private, 1054M shared. PhysMem: 354M wired, 1117M active, 551M inactive, 2022M used, 19M free. VM: 26G + 373M 1176145(160) pageins, 1446482(2) pageouts PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 51775 postgres 6.8% 2:40.16 1 9 39 1504K 896M 859M+ 1562M 51767 postgres 0.0% 0:39.74 1 8 28 752K 896M 752M 1560M SOME snipping here. I randomly increased values in postgresql.conf to shared_buffers = 1500MB max_fsm_pages = 200 max_fsm_relations = 1 On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high. Should I set the background writer parameters somehow to decrease the RAM consumed by the writer? No, the background writer reads through the shared buffers for dirty ones and writes them out. so, it's not really using MORE memory, it's just showing that it's attached to the ginormous shared_buffer pool you've set up. Lower your shared_buffers to about 512M or so and see how it works. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
Scott Marlowe [EMAIL PROTECTED] writes: On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: I randomly increased values in postgresql.conf to shared_buffers = 1500MB max_fsm_pages = 200 max_fsm_relations = 1 On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high. s/probably/definitely/, especially seeing that OS X is a bit of a memory hog itself. I don't think you should figure on more than 1GB being usefully available to Postgres, and you can't give all or even most of that space to shared_buffers. No, the background writer reads through the shared buffers for dirty ones and writes them out. so, it's not really using MORE memory, it's just showing that it's attached to the ginormous shared_buffer pool you've set up. Yeah. You have to be aware of top's quirky behavior for shared memory: on most platforms it will count the shared memory against *each* process, but only as much of the shared memory as that process has touched so far. So over time the reported size of any PG process will tend to climb to something over the shared memory size, but most of that isn't real. I haven't directly checked whether OS X's top behaves that way, but given your report I think it does. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On Fri, 2 May 2008, Alexy Khrabrov wrote: I have an UPDATE query updating a 100 million row table, and allocate enough memory via shared_buffers=1500MB. In addition to reducing that as you've been advised, you'll probably need to increase checkpoint_segments significantly from the default (3) in order to get good performance on an update that large. Something like 30 would be a reasonable starting point. I'd suggest doing those two things, seeing how things go, and reporting back if you still think performance is unacceptable. We'd need to know your PostgreSQL version in order to really target future suggestions. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On May 2, 2008, at 1:13 PM, Tom Lane wrote: I don't think you should figure on more than 1GB being usefully available to Postgres, and you can't give all or even most of that space to shared_buffers. So how should I divide say a 512 MB between shared_buffers and, um, what else? (new to pg tuning :) I naively thought that if I have a 100,000,000 row table, of the form (integer,integer,smallint,date), and add a real coumn to it, it will scroll through the memory reasonably fast. Yet when I had shared_buffers=128 MB, it was hanging there 8 hours before I killed it, and now with 1500MB is paging again for several hours with no end in sight. Why can't it just add a column to a row at a time and be done with it soon enough? :) It takes inordinately long compared to a FORTRAN or even python program and there's no index usage for this table, a sequential scan, why all the paging? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
Interestingly, after shutting down the server with shared_buffer=1500MB in the middle of that UPDATE, I see this: bash-3.2$ /opt/bin/pg_ctl -D /data/pgsql/ stop waiting for server to shut downLOG: received smart shutdown request LOG: autovacuum launcher shutting down ... failed pg_ctl: server does not shut down bash-3.2$ /opt/bin/pg_ctl -D /data/pgsql/ stop waiting for server to shut down..LOG: shutting down LOG: database system is shut down done server stopped -- had to do it twice, the box was paging for a minute or two. Should I do something about the autovacuum e.g. to turn it off completely? I thought it's not on as all of it was still commented out in postgresql.conf as shipped, only tweaked a few numbers as reported before. Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: So how should I divide say a 512 MB between shared_buffers and, um, what else? (new to pg tuning :) Don't worry so much about the rest of the settings. Maybe increase sort_mem (aka work_mem) to something like 16M or so. that's about it. I naively thought that if I have a 100,000,000 row table, of the form (integer,integer,smallint,date), and add a real coumn to it, it will scroll through the memory reasonably fast. This is a database. It makes changes on disk in such a way that they won't be lost should power be cut off. If you're just gonna be batch processing data that it's ok to lose halfway through, then python / perl / php etc might be a better choice. Yet when I had shared_buffers=128 MB, it was hanging there 8 hours before I killed it, and now with 1500MB is paging again for several hours with no end in sight. You went from kinda small to WAY too big. 512M should be a happy medium. Why can't it just add a column to a row at a time and be done with it soon enough? :) Adding a column is instantaneous. populating it is not. It takes inordinately long compared to a FORTRAN or even python program and there's no index usage for this table, a sequential scan, why all the paging? Again, a database protects your data from getting scrambled should the program updating it quit halfway through etc... Have you been vacuuming between these update attempts? Each one has created millions of dead rows and bloated your data store. vacuum full / cluster / reindex may be needed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On May 2, 2008, at 1:40 PM, Scott Marlowe wrote: Again, a database protects your data from getting scrambled should the program updating it quit halfway through etc... Right -- but this is a data mining work, I add a derived column to a row, and it's computed from that very row and a small second table which should fit in RAM. Have you been vacuuming between these update attempts? Each one has created millions of dead rows and bloated your data store. vacuum full / cluster / reindex may be needed. I've read postgresql.conf better and see autovacuum = on is commented out, so it's on. That explains why shutting down was taking so long to shut autovacuum down too. Basically, the derived data is not critical at all, -- can I turn (1) off transactional behavior for an UPDATE, (2) should I care about vacuuming being done on the fly when saving RAM, or need I defer it/ manage it manually? I wonder what MySQL would do here on MyISAM tables without transactional behavior -- perhaps this is the case more suitable for them? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: I naively thought that if I have a 100,000,000 row table, of the form (integer,integer,smallint,date), and add a real coumn to it, it will scroll through the memory reasonably fast. In Postgres, an update is the same as a delete/insert. That means that changing the data in one column rewrites ALL of the columns for that row, and you end up with a table that's 50% dead space, which you then have to vacuum. Sometimes if you have a volatile column that goes with several static columns, you're far better off to create a second table for the volatile data, duplicating the primary key in both tables. In your case, it would mean the difference between 10^8 inserts of (int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow). The down side of this design is that later on, it requires a join to fetch all the data for each key. You do have a primary key on your data, right? Or some sort of index? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On May 2, 2008, at 2:02 PM, Craig James wrote: On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: I naively thought that if I have a 100,000,000 row table, of the form (integer,integer,smallint,date), and add a real coumn to it, it will scroll through the memory reasonably fast. In Postgres, an update is the same as a delete/insert. That means that changing the data in one column rewrites ALL of the columns for that row, and you end up with a table that's 50% dead space, which you then have to vacuum. Sometimes if you have a volatile column that goes with several static columns, you're far better off to create a second table for the volatile data, duplicating the primary key in both tables. In your case, it would mean the difference between 10^8 inserts of (int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow). The down side of this design is that later on, it requires a join to fetch all the data for each key. You do have a primary key on your data, right? Or some sort of index? I created several indices for the primary table, yes. Sure I can do a table for a volatile column, but then I'll have to create a new such table for each derived column -- that's why I tried to add a column to the existing table. Yet seeing this is really slow, and I need to to many derived analyses like this -- which are later scanned in other computations, so should persist -- I indeed see no other way but to procreate derived tables with the same key, one column per each... Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On Fri, 2 May 2008, Alexy Khrabrov wrote: I created several indices for the primary table, yes. That may be part of your problem. All of the indexes all are being updated along with the main data in the row each time you touch a record. There's some optimization there in 8.3 but it doesn't make index overhead go away completely. As mentioned already, the optimal solution to problems in this area is to adjust table normalization as much as feasible to limit what you're updating. Basically, the derived data is not critical at all, -- can I turn (1) off transactional behavior for an UPDATE, What you can do is defer transaction commits to only happen periodically rather than all the time by turning off syncronous_commit and increasing wal_writer_delay; see http://www.postgresql.com.cn/docs/8.3/static/wal-async-commit.html (2) should I care about vacuuming being done on the fly when saving RAM, or need I defer it/manage it manually? It's hard to speculate from here about what optimal vacuum behavior will be. You might find it more efficient to turn autovacuum off when doing these large updates. The flip side is that you'll be guaranteed to end up with more dead rows in the table and that has its own impact later. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
I created several indices for the primary table, yes. Sure I can do a table for a volatile column, but then I'll have to create a new such table for each derived column -- that's why I tried to add a column to the existing table. Yet seeing this is really slow, and I need to to many derived analyses like this -- which are later scanned in other computations, so should persist -- I indeed see no other way but to procreate derived tables with the same key, one column per each... OK, so in that case, if you could do all of your derived column calculations in one query like this : CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived calculations here) or : BEGIN; -- this is important to avoid writing xlog CREATE TABLE derived AS ... INSERT INTO derived SELECT ... FROM ... (perform all your derived calculations here) COMMIT; Basically, updating the entire table several times to add a few simple columns is a bad idea. If you can compute all the data you need in one query, like above, it will be much faster. Especially if you join one large table to several smaller ones, and as long as the huge data set doesn't need to be sorted (check the query plan using EXPLAIN). Try to do as much as possible in one query to scan the large dataset only once. Note that the above will be faster than updating the entire table since it needs to write much less data : it doesn't need to delete the old rows, and it doesn't need to write the transaction log, since if the transaction rolls back, the table never existed anyway. Also since your newly created table doesn't have any indexes, they won't need to be updated. If you really need to update an entire table multiple times, you will need to : - Use hardware that can handle disk writes at a decent speed (that isn't a characteristic of a laptop drive) - use MyIsam, yes (but if you need to make complex queries on the data afterwards, it could suck). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Very slow INFORMATION_SCHEMA
Hi, I'm porting an application written with pretty portable SQL, but tested almost exclusively on MySQL. I'm wondering why would this query take about 90 seconds to return 74 rows? SELECT INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_NAME, INFORMATION_SCHEMA.TABLE_CONSTRAINTS.TABLE_NAME, INFORMATION_SCHEMA.TABLE_CONSTRAINTS.TABLE_NAME, INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_TYPE, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.COLUMN_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS, INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_NAME=INFORMATION_SCHEMA.KEY_COLUMN_USAGE.CONSTRAINT_NAME AND INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_SCHEMA=INFORMATION_SCHEMA.KEY_COLUMN_USAGE.CONSTRAINT_SCHEMA AND INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_SCHEMA='mydbname' AND INFORMATION_SCHEMA.TABLE_CONSTRAINTS.CONSTRAINT_TYPE='FOREIGN KEY' ORDER BY INFORMATION_SCHEMA.TABLE_CONSTRAINTS.TABLE_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.ORDINAL_POSITION An equivalent query with the same data set on the same server takes a couple of milliseconds on MySQL 5. Is it something I'm doing wrong or it's just that PostgreSQL INFORMATION_SCHEMA is not optimized for speed? BTW, what I'm trying to do is get some info on every FOREIGN KEY in a database. It's PostgreSQL 8.2.7 on Fedora 8 64, Athlon 64 X2 3600+. Ernesto -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On May 2, 2008, at 2:23 PM, Greg Smith wrote: On Fri, 2 May 2008, Alexy Khrabrov wrote: I created several indices for the primary table, yes. That may be part of your problem. All of the indexes all are being updated along with the main data in the row each time you touch a record. There's some optimization there in 8.3 but it doesn't make index overhead go away completely. As mentioned already, the optimal solution to problems in this area is to adjust table normalization as much as feasible to limit what you're updating. Was wondering about it, too -- intuitively I 'd like to say, stop all indexing until the column is added, then say reindex, is it doable? Or would it take longer anyways? SInce I don't index on that new column, I'd assume my old indices would do -- do they change because of rows deletions/insertions, with the effective new rows addresses? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
Alexy Khrabrov wrote: SInce I don't index on that new column, I'd assume my old indices would do -- do they change because of rows deletions/insertions, with the effective new rows addresses? Every update is a delete and insert. The new version of the row must be added to the index. Every access through the index then has to look at both versions of the row to see which one is current for its transaction. Vacuum will make the space used by the dead rows available for reuse, as well as removing the old index entries and making that space available for new index entries. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow INFORMATION_SCHEMA
Ernesto [EMAIL PROTECTED] writes: I'm wondering why would this query take about 90 seconds to return 74 rows? EXPLAIN ANALYZE might tell you something. Is this really the query you're running? Because these two columns don't exist: INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_TABLE_NAME, INFORMATION_SCHEMA.KEY_COLUMN_USAGE.REFERENCED_COLUMN_NAME Leaving those out, I get sub-second runtimes for 70-odd foreign key constraints, on much slower hardware than I think you are using. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On May 2, 2008, at 2:43 PM, Kevin Grittner wrote: Alexy Khrabrov wrote: SInce I don't index on that new column, I'd assume my old indices would do -- do they change because of rows deletions/insertions, with the effective new rows addresses? Every update is a delete and insert. The new version of the row must be added to the index. Every access through the index then has to look at both versions of the row to see which one is current for its transaction. Vacuum will make the space used by the dead rows available for reuse, as well as removing the old index entries and making that space available for new index entries. OK. I've cancelled all previous attempts at UPDATE and will now create some derived tables. See no changes in the previous huge table -- the added column was completely empty. Dropped it. Should I vacuum just in case, or am I guaranteed not to have any extra rows since no UPDATE actually went through and none are showing? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
Alexy Khrabrov wrote: OK. I've cancelled all previous attempts at UPDATE and will now create some derived tables. See no changes in the previous huge table -- the added column was completely empty. Dropped it. Should I vacuum just in case, or am I guaranteed not to have any extra rows since no UPDATE actually went through and none are showing? The canceled attempts would have left dead space. If you have autovacuum running, it probably made the space available for reuse, but depending on exactly how you got to where you are, you may have bloat. Personally, I would do a VACUUM ANALYZE VERBOSE and capture the output. If bloat is too bad, you may want to CLUSTER the table (if you have the free disk space for a temporary extra copy of the table) or VACUUM FULL followed by REINDEX (if you don't have that much free disk space). Let us know if you need help interpreting the VERBOSE output. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance