Re: [PERFORM] 7k records into Sort node, 4.5m out?
Josh Berkus j...@agliodbs.com writes: SELECT COUNT(*) FROM user INNER JOIN house ON (user.house_id = house.id) LEFT OUTER JOIN district ON (house.district_id = district.id) WHERE (user.status = 0 AND (district.update_status = 2 OR district.update_status = 3 ) AND (user.valid = 1 OR user.valid = 3 ) AND district.is_test = false ); However, since the anonymization above doesn't quite match that used in the EXPLAIN plan, I'm not sure what you'll get out of it. And yes, we know that the outer join is being invalidated. Ah, I see where I was confused: in the original query plan I'd been imagining that charlie.sierra was a unique column, but your gloss on that as being house.district_id implies that it's highly non-unique. And looking at the rowcounts in the original plan backs that up: there are about 600 house rows per district row. So my thought of having district as the outer side of a nestloop scanning the index on house.district_id would not really work very well --- maybe it would end up cheaper than the mergejoin plan, but it's far from a clear-cut win. On the whole I'm thinking the code is operating as designed here. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Disk write and space taken by PostgreSQL
On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: Hi, On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com wrote: As you said, MySQL with MyISAM is better choice for my app. Because I don't need transaction/backup. May be I'll try with InnoDB and find the disk write/space difference. Is there any similar methods available in postgresql like MyISAM engine ? You can try unlogged tables: http://www.postgresql.org/docs/9.1/static/sql-createtable.html If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (see Chapter 29), which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well; however, unlogged GiST indexes are currently not supported and cannot be created on an unlogged table. I would set full_page_writes = off too. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Disk write and space taken by PostgreSQL
On Thu, Aug 16, 2012 at 10:53:21AM -0400, Bruce Momjian wrote: On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: Hi, On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com wrote: As you said, MySQL with MyISAM is better choice for my app. Because I don't need transaction/backup. May be I'll try with InnoDB and find the disk write/space difference. Is there any similar methods available in postgresql like MyISAM engine ? You can try unlogged tables: http://www.postgresql.org/docs/9.1/static/sql-createtable.html If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (see Chapter 29), which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well; however, unlogged GiST indexes are currently not supported and cannot be created on an unlogged table. I would set full_page_writes = off too. Better yet, read our documentation about non-durable settting: http://www.postgresql.org/docs/9.1/static/non-durability.html -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Disk write and space taken by PostgreSQL
On Wed, Aug 15, 2012 at 11:40 PM, J Ramesh Kumar rameshj1...@gmail.com wrote: Dear Scott Marlowe, Thanks for the details. As you said, MySQL with MyISAM is better choice for my app. Because I don't need transaction/backup. That's not exactly what I said. Remember that if you need to run complex queries postgresql is still likely the better candidate. May be I'll try with InnoDB and find the disk write/space difference. Is there any similar methods available in postgresql like MyISAM engine ? Unlogged tables as mentioned by others. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] cluster on conditional index?
On Wed, Aug 15, 2012 at 5:19 PM, Bosco Rama postg...@boscorama.com wrote: On 08/15/12 14:05, Josh Berkus wrote: That actually makes sense to me. Cluster the rows covered by that index, let the rest fall where they may. I'm typically only accessing the rows covered by that index, so I'd get the benefit of the cluster command but wouldn't have to spend cycles doing the cluster for rows I don't care about. Sure, that's a feature request though. And thinking about it, I'm willing to bet that it's far harder to implement than it sounds. How/where does file feature requests? In the meantime, you could ad-hoc this by splitting the table into two partitions and clustering one of the two partitions. Wouldn't creating a second index on the boolean itself and then clustering on that be much easier? that's what I was looking into doing actuallly -- Douglas J Hunley (doug.hun...@gmail.com) Twitter: @hunleyd Web: douglasjhunley.com G+: http://goo.gl/sajR3 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Disk write and space taken by PostgreSQL
Bruce Momjian br...@momjian.us schrieb: On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: Hi, On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com wrote: As you said, MySQL with MyISAM is better choice for my app. Because I don't need transaction/backup. May be I'll try with InnoDB and find the disk write/space difference. Is there any similar methods available in postgresql like MyISAM engine ? You can try unlogged tables: http://www.postgresql.org/docs/9.1/static/sql-createtable.html If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (see Chapter 29), which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well; however, unlogged GiST indexes are currently not supported and cannot be created on an unlogged table. I would set full_page_writes = off too. Why? There shouldn't be any such writes on unlogged tables. Andres Please excuse the brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] cluster on conditional index?
On Wed, Aug 15, 2012 at 2:19 PM, Bosco Rama postg...@boscorama.com wrote: On 08/15/12 14:05, Josh Berkus wrote: That actually makes sense to me. Cluster the rows covered by that index, let the rest fall where they may. I'm typically only accessing the rows covered by that index, so I'd get the benefit of the cluster command but wouldn't have to spend cycles doing the cluster for rows I don't care about. Sure, that's a feature request though. And thinking about it, I'm willing to bet that it's far harder to implement than it sounds. In the meantime, you could ad-hoc this by splitting the table into two partitions and clustering one of the two partitions. Wouldn't creating a second index on the boolean itself and then clustering on that be much easier? I would take an existing useful index, and build a new one on the same columns but with is_deleted prepended. That way, since you are going through the effort to rewrite the whole table anyway, the ties are broken in a way that might be of further use. Once the CLUSTER is done, the index might even be useful enough to keep around for use with queries, or even replace the original index altogether. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Disk write and space taken by PostgreSQL
On Thu, Aug 16, 2012 at 06:07:26PM +0200, anara...@anarazel.de wrote: Bruce Momjian br...@momjian.us schrieb: On Thu, Aug 16, 2012 at 03:48:57PM +1000, Ondrej Ivanič wrote: Hi, On 16 August 2012 15:40, J Ramesh Kumar rameshj1...@gmail.com wrote: As you said, MySQL with MyISAM is better choice for my app. Because I don't need transaction/backup. May be I'll try with InnoDB and find the disk write/space difference. Is there any similar methods available in postgresql like MyISAM engine ? You can try unlogged tables: http://www.postgresql.org/docs/9.1/static/sql-createtable.html If specified, the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log (see Chapter 29), which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well; however, unlogged GiST indexes are currently not supported and cannot be created on an unlogged table. I would set full_page_writes = off too. Why? There shouldn't be any such writes on unlogged tables. True. I was thinking more of the logged tables, and the system tables. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing WAL usage followed by sudden drop
We are not doing anything to postgres that would cause the rise and drop. Data base activity is pretty consistent. nor are we doing any kind of purge. This week the drop occurred after 6 days. We are thinking it must be some kind of internal postgres activity but we can't track it down. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Increasing-WAL-usage-followed-by-sudden-drop-tp5719567p5720136.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing WAL usage followed by sudden drop
delongboy sdel...@saucontech.com wrote: We are not doing anything to postgres that would cause the rise and drop. Data base activity is pretty consistent. nor are we doing any kind of purge. This week the drop occurred after 6 days. We are thinking it must be some kind of internal postgres activity but we can't track it down. Maybe autovacuum freezing tuples (which is a WAL-logged operation) as tables periodically hit the autovacuum_freeze_max_age limit? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] cluster on conditional index?
On Wed, Aug 15, 2012 at 6:43 AM, Doug Hunley doug.hun...@gmail.com wrote: On Tue, Aug 14, 2012 at 1:29 PM, k...@rice.edu k...@rice.edu wrote: It probably has to do with the fact that a conditional index, does not include every possible row in the table. Although, a cluster of the matching rows and then leave the rest in place, should work. How is that for hand-waving. :) That actually makes sense to me. Cluster the rows covered by that index, let the rest fall where they may. I'm typically only accessing the rows covered by that index, so I'd get the benefit of the cluster command but wouldn't have to spend cycles doing the cluster for rows I don't care about. IIRC, there isn't currently an in-place version of CLUSTER, it always rewrites the entire table. So it would still have to do something with those rows, so that they show up in the new table. But it could just treat them all as equal to one another and have them be in whatever order they happen to fall in. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index Bloat Problem
Thanks for the help everyone and sorry for not replying sooner, I was on a business trip. @Hubert pg_reorg looks really interesting and from the first read it looks to be a very good solution for maintenance, but for now I would rather try to slow down, or remove this bloat, so I have to do as less maintenance as possible. @Mark So basically I should decrease the autovacuum nap time from 60s to 10s, reduce the scale factor from 0.2 to 0.1. log_autovacuum_min_duration is already set to 0, which means everything is logged. @Jeff I'm not sure if I understand what you mean? I know that we never reuse key ranges. Could you be more clear, or give an example please. Thanks in advance, Strahinja On Tue, Aug 14, 2012 at 6:14 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Aug 10, 2012 at 3:15 PM, Strahinja Kustudić strahin...@nordeus.com wrote: For example, yesterday when I checked the database size on the production server it was 30GB, and the restored dump of that database was only 17GB. The most interesting thing is that the data wasn't bloated that much, but the indices were. Some of them were a few times bigger than they should be. For example an index on the production db is 440MB, while that same index after dump/restore is 17MB, and there are many indices with that high difference. Could your pattern of deletions be leaving sparsely populated, but not completely empty, index pages; which your insertions will then never reuse because they never again insert values in that key range? Cheers, Jeff
[PERFORM] best practice to avoid table bloat?
Hi, if I have a table that daily at night is deleted about 8 millions of rows (table maybe has 9 millions) is recommended to do a vacuum analyze after delete completes or can I leave this job to autovacuum? This table is very active during the day but less active during night I think that the only only thing where Postgres is weak, is in this area (table and index bloat). For some reason for the same amount of data every day postgres consume a little more. Thanks!
Re: [PERFORM] best practice to avoid table bloat?
On 08/16/2012 04:33 PM, Anibal David Acosta wrote: Hi, if I have a table that daily at night is deleted about 8 millions of rows (table maybe has 9 millions) is recommended to do a vacuum analyze after delete completes or can I leave this job to autovacuum? This table is very active during the day but less active during night I think that the only only thing where Postgres is weak, is in this area (table and index bloat). For some reason for the same amount of data every day postgres consume a little more. Check out pg_reorg. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best practice to avoid table bloat?
Anibal David Acosta a...@devshock.com wrote: if I have a table that daily at night is deleted about 8 millions of rows (table maybe has 9 millions) is recommended to do a vacuum analyze after delete completes or can I leave this job to autovacuum? Deleting a high percentage of the rows should cause autovacuum to deal with the table the next time it wakes up, so an explicit VACUUM ANALYZE shouldn't be needed. For some reason for the same amount of data every day postgres consume a little more. How are you measuring the data and how are you measuring the space? And what version of PostgreSQL is this? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best practice to avoid table bloat?
[please don't top-post] Anibal David Acosta a...@devshock.com wrote: Kevin Grittner kevin.gritt...@wicourts.go wrote: Anibal David Acosta a...@devshock.com wrote: if I have a table that daily at night is deleted about 8 millions of rows (table maybe has 9 millions) is recommended to do a vacuum analyze after delete completes or can I leave this job to autovacuum? Deleting a high percentage of the rows should cause autovacuum to deal with the table the next time it wakes up, so an explicit VACUUM ANALYZE shouldn't be needed. Every day the table has about 7 millions of new rows. The table hold the data for 60 days, so approx. the total rows must be around 420 millions. Every night a delete process run, and remove rows older than 60 days. Oh, I thought you were saying the table grew to 9 million rows each day and you deleted 8 million of them each night. That would definitely trigger autovacuum. Deleting 7 million rows from a table of 420 million rows would not, so an explicit VACUUM ANALYZE after the delete might be helpful. Even better, with a pattern like that, you might want to consider partitioning the table: http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html For some reason for the same amount of data every day postgres consume a little more. How are you measuring the data and how are you measuring the space? [no answer] Without knowing what is increasing, it's hard to say why it is increasing. For all we know you are logging all statements and never deleting log files. The solution for that would be entirely different from the solution for some other problem. So, the space used by postgres should not be increase drastically because every day arrive 7 millions of rows but also same quantity is deleted but my disk get out of space every 4 months. What is getting bigger over time? I must copy tables outside the server, delete local table and create it again, after this process I got again space for about 4 months. How do you do that? pg_dump, DROP TABLE, restore the dump? Have you captured sizes of heap, toast, indexes, etc. before and after this aggressive maintenance? Is the size going up by orders of magnitude, or are you running really tight and getting killed by a 10% increase. We don't know unless you tell us. Maybe is a wrong autovacuum config, but is really complicate to understand what values are correct to avoid performance penalty but to keep table in good fit. Please show us the entire result from running this query: http://wiki.postgresql.org/wiki/Server_Configuration -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best practice to avoid table bloat?
Thanks Kevin. Postgres version is 9.1.4 (lastest) Every day the table has about 7 millions of new rows. The table hold the data for 60 days, so approx. the total rows must be around 420 millions. Every night a delete process run, and remove rows older than 60 days. So, the space used by postgres should not be increase drastically because every day arrive 7 millions of rows but also same quantity is deleted but my disk get out of space every 4 months. I must copy tables outside the server, delete local table and create it again, after this process I got again space for about 4 months. Maybe is a wrong autovacuum config, but is really complicate to understand what values are correct to avoid performance penalty but to keep table in good fit. I think that autovacuum configuration should have some like auto-config that recalculate every day which is the best configuration for the server condition Thanks! -Mensaje original- De: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] Enviado el: jueves, 16 de agosto de 2012 04:52 p.m. Para: Anibal David Acosta; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] best practice to avoid table bloat? Anibal David Acosta a...@devshock.com wrote: if I have a table that daily at night is deleted about 8 millions of rows (table maybe has 9 millions) is recommended to do a vacuum analyze after delete completes or can I leave this job to autovacuum? Deleting a high percentage of the rows should cause autovacuum to deal with the table the next time it wakes up, so an explicit VACUUM ANALYZE shouldn't be needed. For some reason for the same amount of data every day postgres consume a little more. How are you measuring the data and how are you measuring the space? And what version of PostgreSQL is this? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Disk write and space taken by PostgreSQL
On Wed, Aug 15, 2012 at 11:30 PM, J Ramesh Kumar rameshj1...@gmail.com wrote: Hi David Barton, Please find the information below. Are you able to provide a table schema? There are 109 different types of table. I am maintaining some tables are daily tables and some tables are ID based. So totally we have created around 350 tables and dropped around 350 tables. I will drop the old table and I don't delete any records. I am maintaing only last 30 days tables. I dropped tables which are older than 30 days. All the tables are only have basic data types like int, smallint, bigint, varchar. Were you using MyISAM or InnoDB on MySQL? I am using MyISAM tables in MySQL. You can't compare a non-MVCC system such as MyISAM with a MVCC one. MVCC systems have to store extra accounting information in order to manage transactions and multiple versions of the same record for SQL updates. MVCC isn't all bad: for example you get much better performance in the face of highly concurrent activity. MyISAM does full table locks which are not scalable at all. The penalty for MVCC storage may in some cases seem quite high if your tables have very narrow records. BTW, I am suspicious that your claim that you 'don't need' transactions is correct, especially in the long term. Anyways, there are several techniques to try and mitigate data growth in postgres -- arrays for example. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance