Re: [PERFORM] serious problems with vacuuming databases
> Hi Tomas, > > Tomas wrote: > We've decided to remove unneeded 'old' data, which means removing about > 99.999% of rows from tables A, C and D (about 2 GB of data). At the > beginning, the B table (containing aggregated from A, C and D) was emptied > (dropped and created) and filled in with current data. Then, before the > deletion the data from tables A, C, D were backed up using another tables > (say A_old, C_old, D_old) filled in using > . > 1) drop, create and fill table B (aggregated data from A, C, D) > 2) copy 'old' data from A, C and D to A_old, C_old a D_old > 3) delete old data from A, C, D > 4) dump data from A_old, C_old and D_old > 5) truncate tables A, C, D > 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old > > > I think you do some difficult database maintainance. Why you do that, if you > just want to have some small piece of datas from your tables. Why don't you > try something like: > 1. create table A with no index (don't fill data to this table), > 2. create table A_week_year inherit table A, with index you want, and some > condition for insertion. (eg: table A1 you used for 1 week data of a year > and so on..) > 3. do this step for table B, C and D > 4. if you have relation, make the relation to inherit table (optional). > > I think you should read the postgresql help, for more information about > table inheritance. > > The impact is, you might have much table. But each table will only have > small piece of datas, example: just for one week. And you don't have to do a > difficult database maintainance like you have done. You just need to create > tables for every week of data, do vacuum/analyze and regular backup. > > > Best regards, > ahmad fajar, Thanks for your advice, but I've read the sections about inheritance and I don't see a way how to use that in my case, as I think the inheritance takes care about the structure, not about the data. But I've read a section about partitioning (using inheritance) too, and it seems useful. I'll try to solve the performance issues using this. Thanks for your advices Tomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] serious problems with vacuuming databases
Hi Tomas, Tomas wrote: We've decided to remove unneeded 'old' data, which means removing about 99.999% of rows from tables A, C and D (about 2 GB of data). At the beginning, the B table (containing aggregated from A, C and D) was emptied (dropped and created) and filled in with current data. Then, before the deletion the data from tables A, C, D were backed up using another tables (say A_old, C_old, D_old) filled in using . 1) drop, create and fill table B (aggregated data from A, C, D) 2) copy 'old' data from A, C and D to A_old, C_old a D_old 3) delete old data from A, C, D 4) dump data from A_old, C_old and D_old 5) truncate tables A, C, D 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old I think you do some difficult database maintainance. Why you do that, if you just want to have some small piece of datas from your tables. Why don't you try something like: 1. create table A with no index (don't fill data to this table), 2. create table A_week_year inherit table A, with index you want, and some condition for insertion. (eg: table A1 you used for 1 week data of a year and so on..) 3. do this step for table B, C and D 4. if you have relation, make the relation to inherit table (optional). I think you should read the postgresql help, for more information about table inheritance. The impact is, you might have much table. But each table will only have small piece of datas, example: just for one week. And you don't have to do a difficult database maintainance like you have done. You just need to create tables for every week of data, do vacuum/analyze and regular backup. Best regards, ahmad fajar, ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] serious problems with vacuuming databases
Tomas Vondra wrote: > > Huh, I didn't suggest to dump/reload. I suggested CLUSTER. You need to > > apply it only to tables where you have lots of dead tuples, which IIRC > > are A, C and D. > > Sorry, I should read more carefully. Will clustering a table according > to one index solve problems with all the indexes on the table (if the > table has for example two indexes?). Yes, it will rebuild all indexes. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] serious problems with vacuuming databases
> Huh, I didn't suggest to dump/reload. I suggested CLUSTER. You need to > apply it only to tables where you have lots of dead tuples, which IIRC > are A, C and D. Sorry, I should read more carefully. Will clustering a table according to one index solve problems with all the indexes on the table (if the table has for example two indexes?). t.v. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] serious problems with vacuuming databases
> I guess you're right. I forgot to mention there are 12 composed indexes > on the largest (and not deleted) table B, having about 14.000.000 rows > and 1 GB of data. I'll try to dump/reload the database ... Aaargh, the problem probably is not caused by the largest table, as it was dropped, filled in with the data and after that all the indexes were created. The problem could be caused by the tables with deleted data, of course. t.v. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] serious problems with vacuuming databases
Tomas Vondra wrote: > > Probably the indexes are bloated after the vacuum full. I think the > > best way to get rid of the "fat" is to recreate both tables and indexes > > anew. For this the best tool would be to CLUSTER the tables on some > > index, probably the primary key. This will be much faster than > > VACUUMing the tables, and the indexes will be much smaller as result. > > I guess you're right. I forgot to mention there are 12 composed indexes > on the largest (and not deleted) table B, having about 14.000.000 rows > and 1 GB of data. I'll try to dump/reload the database ... Huh, I didn't suggest to dump/reload. I suggested CLUSTER. You need to apply it only to tables where you have lots of dead tuples, which IIRC are A, C and D. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] serious problems with vacuuming databases
> Probably the indexes are bloated after the vacuum full. I think the > best way to get rid of the "fat" is to recreate both tables and indexes > anew. For this the best tool would be to CLUSTER the tables on some > index, probably the primary key. This will be much faster than > VACUUMing the tables, and the indexes will be much smaller as result. I guess you're right. I forgot to mention there are 12 composed indexes on the largest (and not deleted) table B, having about 14.000.000 rows and 1 GB of data. I'll try to dump/reload the database ... t.v. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] serious problems with vacuuming databases
Tom Lane wrote: > Tomas Vondra <[EMAIL PROTECTED]> writes: >> 1) drop, create and fill table B (aggregated data from A, C, D) >> 2) copy 'old' data from A, C and D to A_old, C_old a D_old >> 3) delete old data from A, C, D >> 4) dump data from A_old, C_old and D_old >> 5) truncate tables A, C, D >> 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old > > Steps 3/5/6 make no sense at all to me: why bother deleting data retail > when you are about to truncate the tables, and why bother vacuuming a > table you just truncated? Is the above *really* what you did? Yes, the above is exactly what I did with the exception that there's an error in the step (5) - there should be truncation of the _old tables. The reasons that led me to this particular steps are two: (a) I don't want to delete all the data, just data older than two days. Until today we've kept all the data (containing two years access log for one of our production websites), but now we've decided to remove the data we don't need and leave just the aggregated version. That's why I have used DELETE rather than TRUNCATE. (b) I want to create 'incremental' backups, so once I'll need the data I can take several packages (dumps of _old tables) and import them one after another. Using pg_dump doesn't allow me this - dumping the whole tables A, C and D is not an option, because I want to leave some of the data in the tables. From now on, the tables will be cleared on a daily (or maybe weekly) basis, which means much smaller amount of data (about 50.000 rows a day). > >> The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for >> the whole database, and it runs for about 10 hours already, which is >> much more than usual (and it is still running). > > Is it actually grinding the disk, or is it just blocked waiting for > someone's lock? If it's actually doing work, which table is it working > on? (You should be able to figure that out by looking in pg_locks, > or by strace'ing the process to see which files it's touching.) Thanks for the hint, I'll try to figure that in case the dump/reload recommended by Alvaro Herrera doesn't help. But as far as I know the disks are not grinded right now, so I guess it's the problem with indexes. t.v. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] serious problems with vacuuming databases
Tomas Vondra <[EMAIL PROTECTED]> writes: > 1) drop, create and fill table B (aggregated data from A, C, D) > 2) copy 'old' data from A, C and D to A_old, C_old a D_old > 3) delete old data from A, C, D > 4) dump data from A_old, C_old and D_old > 5) truncate tables A, C, D > 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old Steps 3/5/6 make no sense at all to me: why bother deleting data retail when you are about to truncate the tables, and why bother vacuuming a table you just truncated? Is the above *really* what you did? > The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for > the whole database, and it runs for about 10 hours already, which is > much more than usual (and it is still running). Is it actually grinding the disk, or is it just blocked waiting for someone's lock? If it's actually doing work, which table is it working on? (You should be able to figure that out by looking in pg_locks, or by strace'ing the process to see which files it's touching.) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] serious problems with vacuuming databases
Tomas Vondra wrote: Hi, > Then > these data were deleted from A, C, D and tables A_old, C_old and D_old > were dumped, truncated and all the tables were vacuumed (with FULL > ANALYZE options). So the procedure was this > > 1) drop, create and fill table B (aggregated data from A, C, D) > 2) copy 'old' data from A, C and D to A_old, C_old a D_old > 3) delete old data from A, C, D > 4) dump data from A_old, C_old and D_old > 5) truncate tables A, C, D > 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old > > So the dump of the fatabase has about 1.2 GB of data, from which about > 1 GB is in the B table (the one rebuilt in step 1). This was done yesterday. > > The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for > the whole database, and it runs for about 10 hours already, which is > much more than usual (and it is still running). Probably the indexes are bloated after the vacuum full. I think the best way to get rid of the "fat" is to recreate both tables and indexes anew. For this the best tool would be to CLUSTER the tables on some index, probably the primary key. This will be much faster than VACUUMing the tables, and the indexes will be much smaller as result. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] serious problems with vacuuming databases
Hello, we have some performance problems with postgres 8.0.4, more precisely with vacuuming 'large' database with a lot of deleted rows. We had a 3.2 GB database, consisting mainly from 4 large tables, two of them (say table A and B) having about 14.000.000 of rows and 1 GB of size each, and two (say C and D) having about 4.000.000 of rows and 500 MB each. The rest of the database is not important. We've decided to remove unneeded 'old' data, which means removing about 99.999% of rows from tables A, C and D (about 2 GB of data). At the beginning, the B table (containing aggregated from A, C and D) was emptied (dropped and created) and filled in with current data. Then, before the deletion the data from tables A, C, D were backed up using another tables (say A_old, C_old, D_old) filled in using INSERT INTO A SELECT * FROM A_old ... and fixed so there are no duplicities (rows both in A and A_old). Then these data were deleted from A, C, D and tables A_old, C_old and D_old were dumped, truncated and all the tables were vacuumed (with FULL ANALYZE options). So the procedure was this 1) drop, create and fill table B (aggregated data from A, C, D) 2) copy 'old' data from A, C and D to A_old, C_old a D_old 3) delete old data from A, C, D 4) dump data from A_old, C_old and D_old 5) truncate tables A, C, D 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old So the dump of the fatabase has about 1.2 GB of data, from which about 1 GB is in the B table (the one rebuilt in step 1). This was done yesterday. The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for the whole database, and it runs for about 10 hours already, which is much more than usual (and it is still running). The hardware is not too bad - it's Dell server with 2 x 3.0 GHz P4 HT, 4GB of RAM, 2x15k SCSI drives in hw RAID etc. The question is why this happens and how to get round that. I guess it's caused by a huge amount of data deleted yesterday, but on the other side all the modified tables were vacuumed at the end. But I guess dropping and reloading the whole database would be much faster (at most 1.5 hour including creating indexes etc.) thanks for your advices Tomas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings