Re: [PERFORM] why does swap not recover?
On 3/26/10 4:57 PM, Richard Yen wrote: I'm planning on lowering the shared_buffers to a more sane value, like 25GB (pgtune recommends this for a Mixed-purpose machine) or less (pgtune recommends 14GB for an OLTP machine). However, before I do this (and possibly resolve the issue), I was hoping to see if anyone would have an explanation for the constant reading from swap, but never writing back. Postgres does not control how swap is used. This would be an operating system issue. Leaving aside the distict possibility of a bug in handling swap (nobody seems to do it well), there's the distinct possibility that you're actually pinning more memory on the system than it has (through various processes) and it's wisely shifted some read-only files to the swap (as opposed to read-write ones). But that's a fairly handwavy guess. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] REINDEXing database-wide daily
Hi, We're using PostgreSQL 8.2. I have a question in connection to this question posted by me earlier: http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php In our application, DML operations (INSERT/UPDATE/DELETE) are heavily performed in a day. I also read about pg_autovacuum REINDEX at: http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html http://www.postgresql.org/docs/8.2/static/sql-reindex.html I do not want to run pg_autovacuum daemon on a busy hour. In case, if I can afford to take my database offline at low-usage time and perform REINDEX database-wide manually/linux cron, to boost up index performance, what is the community answer/suggestion on the following: 1. Is it a good idea to perform this on a daily basis? 2. Any implications of doing this on a daily basis? 3. Is there a way to find out bloated indexes? 4. Any other maintenance command, like ANALYZE, that has to be executed before/after REINDEX? 5. Is there a way to find out when REINDEX was last run on an INDEX/TABLE/DATABASE? NOTE: I've also seen from my past experience that REINDEX database-wide greatly improves performance of the application.
Re: [PERFORM] Performance regarding LIKE searches
On Mon, 29 Mar 2010, randa...@bioinfo.wsu.edu wrote: WHERE ... lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' ... I'm sure you noticed that this is never going to return any rows? Matthew -- Me... a skeptic? I trust you have proof? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Database size growing over time and leads to performance impact
We're using pgpool-II version 2.0.1 for PostgreSQL connection management. pgpool configurations are: num_init_children = 450 child_life_time = 300 connection_life_time = 120 child_max_connections = 30 As you recommended, I ran ps -ax|grep postgres at almost a busy transaction time and I can find idle entries: [r...@newuser ~]# ps -ax|grep postgres 2664 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43545) idle 2783 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43585) idle 2806 ?Ss 0:02 postgres: newuser mydb 192.168.0.200(43588) idle 2807 ?Ss 0:01 postgres: newuser mydb 192.168.0.200(43589) idle 2818 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43601) idle 2819 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43602) idle 2833 ?Ss 0:02 postgres: newuser mydb 192.168.0.200(43603) idle 2856 ?Ss 0:03 postgres: newuser mydb 192.168.0.200(43614) idle Based on pgpool documentation, and also as far as I know, even though application layer returns/closes the application, pgpool will only handle actual closing of connections based on the connection_life_time parameter defined. And if this timeout, it goes to wait for connection request state. Can you throw some light on this? Is there any better way that we need to re-configure our pgpool parameters? -Original Message- From: Andy Colson [mailto:a...@squeakycode.net] Sent: Saturday, March 27, 2010 7:06 PM To: Gnanakumar; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Database size growing over time and leads to performance impact On 03/27/2010 08:00 AM, Gnanakumar wrote: Hi, We're using PostgreSQL 8.2. Recently, in our production database, there was a severe performance impact.. Even though, we're regularly doing both: 1. VACUUM FULL ANALYZE once in a week during low-usage time and 2. ANALYZE everyday at low-usage time Also, we noticed that the physical database size has grown upto 30 GB. But, if I dump the database in the form of SQL and import it locally in my machine, it was only 3.2 GB. Then while searching in Google to optimize database size, I found the following useful link: http://www.linuxinsight.com/optimize_postgresql_database_size.html It says that even vacuumdb or reindexdb doesn't really compact database size, only dump/restore does because of MVCC architecture feature in PostgreSQL and this has been proven here. So, finally we decided to took our production database offline and performed dump/restore. After this, the physical database size has also reduced from 30 GB to 3.5 GB and the performance was also very good than it was before. Physical database size was found using the following command: du -sh /usr/local/pgsql/data/base/database-oid I also cross-checked this size using pg_size_pretty(pg_database_size(datname)). Questions 1. Is there any version/update of PostgreSQL addressing this issue? 2. How in real time, this issues are handled by other PostgreSQL users without taking to downtime? 3. Any ideas or links whether this is addressed in upcoming PostgreSQL version 9.0 release? The issue is not with PG's. Any newer version of PG will act exactly the same. I don't think you understand. Vacuum is not meant to reduce size of the db, its meant to mark pages for reuse. VACUUM FULL is almost never needed. The fact it didnt reduce your db size is probably because of something else, like an open transaction. If you have a transaction left open, then your db will never be able to shrink or re-use pages. You'd better fix that issue first. (run ps -ax|grep postgres and look for idle in transaction) You need to vacuum way more often than once a week. Just VACUUM ANALYZE, two, three times a day. Or better yet, let autovacuum do its thing. (if you do have autovacuum enabled, then the only problem is the open transaction thing). Dont VACUUM FULL, its not helping you, and is being removed in newer versions. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] 3ware vs. MegaRAID
Hello, I am waiting for an ordered machine dedicated to PostgresSQL. It was expected to have 3ware 9650SE 16 port controller. However, the vendor wants to replace this controller with MegaRAID SAS 84016E, because, as they say, they have it on stock, while 3ware would be available in a few weeks. Is this a good replace, generally? Will it run on FreeBSD, specifically? Thanks Irek. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Database size growing over time and leads to performance impact
On 3/30/2010 6:17 AM, Gnanakumar wrote: We're using pgpool-II version 2.0.1 for PostgreSQL connection management. pgpool configurations are: num_init_children = 450 child_life_time = 300 connection_life_time = 120 child_max_connections = 30 As you recommended, I ran ps -ax|grep postgres at almost a busy transaction time and I can find idle entries: [r...@newuser ~]# ps -ax|grep postgres 2664 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43545) idle 2783 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43585) idle 2806 ?Ss 0:02 postgres: newuser mydb 192.168.0.200(43588) idle 2807 ?Ss 0:01 postgres: newuser mydb 192.168.0.200(43589) idle 2818 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43601) idle 2819 ?Ss 0:00 postgres: newuser mydb 192.168.0.200(43602) idle 2833 ?Ss 0:02 postgres: newuser mydb 192.168.0.200(43603) idle 2856 ?Ss 0:03 postgres: newuser mydb 192.168.0.200(43614) idle Based on pgpool documentation, and also as far as I know, even though application layer returns/closes the application, pgpool will only handle actual closing of connections based on the connection_life_time parameter defined. And if this timeout, it goes to wait for connection request state. Can you throw some light on this? Is there any better way that we need to re-configure our pgpool parameters? Connections are ok. Connection is different than transaction. The output above looks good, that's what you want to see. (If it had said idle in transaction that would be a problem). I dont think you need to change anything. Hopefully just vacuuming more often will help. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] REINDEXing database-wide daily
On 3/30/2010 4:32 AM, Gnanakumar wrote: Hi, We're using PostgreSQL 8.2. I have a question in connection to this question posted by me earlier: http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php In our application, DML operations (INSERT/UPDATE/DELETE) are heavily performed in a day. I also read about pg_autovacuum REINDEX at: http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html http://www.postgresql.org/docs/8.2/static/sql-reindex.html I do not want to run pg_autovacuum daemon on a busy hour. In case, if I can afford to take my database offline at low-usage time and perform REINDEX database-wide manually/linux cron, to boost up index performance, what is the community answer/suggestion on the following: 1. Is it a good idea to perform this on a daily basis? 2. Any implications of doing this on a daily basis? 3. Is there a way to find out bloated indexes? 4. Any other maintenance command, like ANALYZE, that has to be executed before/after REINDEX? 5. Is there a way to find out when REINDEX was last run on an INDEX/TABLE/DATABASE? NOTE: I've also seen from my past experience that REINDEX database-wide greatly improves performance of the application. I could be way off base here, so I hope others will confirm/deny this: I think the more often you run vacuum, the less you notice it. If you wait for too long then vacuum will have to work harder and you'll notice a speed decrease. But many small vacuums which dont have as much work to do, you wont notice. It could be, and I'm guessing again, because your database grew from 3 to 30 gig (if I recall the numbers right), REINDEX had lots of affect. But if vacuum can keep up with space reuse, REINDEX may not be needed. (maybe a few weeks or once a month). -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
On Mon, Mar 29, 2010 at 2:00 AM, Tadipathri Raghu traghu@gmail.com wrote: I have noticed one more thing here, that if you turn off the fsync and try to run the transaction than its breaking the currnet filenode and generating another filenode. Is it true that whenever you turn off or on the fsync the filenode will break and create one more on that table. I don't know what you mean by a filenode. Changing the fsync parameter doesn't cause any additional files to be created or written. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] why does swap not recover?
On Fri, Mar 26, 2010 at 7:57 PM, Richard Yen d...@richyen.com wrote: Note that it is constantly paging in, but never paging out. This would indicate that it's constantly reading from swap, but never writing out to it. Why would postgres do this? (postgres is pretty much the only thing running on this machine). I'm planning on lowering the shared_buffers to a more sane value, like 25GB (pgtune recommends this for a Mixed-purpose machine) or less (pgtune recommends 14GB for an OLTP machine). However, before I do this (and possibly resolve the issue), I was hoping to see if anyone would have an explanation for the constant reading from swap, but never writing back. Reading a page in from swap still leaves that data on the disk. So it may be that you're reading in pages from disk, not modifying them, discarding them (without any need to write them out since they're still on disk), and then reading them in again when they're accessed again. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] REINDEXing database-wide daily
Gnanakumar gna...@zoniac.com wrote: We're using PostgreSQL 8.2. Newer versions have much improved the VACUUM and CLUSTER features. You might want to consider upgrading to a later major version. I have a question in connection to this question posted by me earlier: http://archives.postgresql.org/pgsql-performance/2010-03/msg00343.php I hope that you have stopped using VACUUM FULL on a regular basis, based on the responses to that post. The FULL option is only intended as a means to recover from extreme heap bloat when there is not room for a CLUSTER. Any other use is going to cause problems. If you continue to use it for other purposes, you may not get a lot of sympathy when you inevitably experience those problems. I do not want to run pg_autovacuum daemon on a busy hour. You would probably be surprised to see how much of a performance boost you can get during your busy times by having a properly configured autovacuum running. My initial reaction to seeing performance degradation during autovacuum was to make it less aggressive, which lead to increasing bloat between autovacuum runs, which degraded performance between runs and made things that much worse when autovacuum finally kicked in. It was only by using aggressive maintenance to clean up the bloat and then configuring autovacuum to be much more aggressive that I saw performance during peak periods improve; although on some systems I had to introduce a 10 ms vacuum cost delay. This is one of those areas where your initial intuitions can be totally counter-productive. In case, if I can afford to take my database offline at low-usage time and perform REINDEX database-wide manually/linux cron, to boost up index performance, what is the community answer/suggestion on the following: 1. Is it a good idea to perform this on a daily basis? No. It is generally not something to run on a routine basis, and if you're not careful you could make performance worse, by making the indexes so tight that most of your inserts or updates will cause index page splits. 2. Any implications of doing this on a daily basis? We haven't found it necessary or useful, but if you have an appropriate fill factor, I suppose it might not actually do any damage. There is some chance, based on your usage pattern, that a daily CLUSTER of some tables might boost performance by reducing random access, but daily REINDEX is unlikely to be a win. 3. Is there a way to find out bloated indexes? I don't have anything offhand, but you might poke around pg_class looking at reltuples and relpages. 4. Any other maintenance command, like ANALYZE, that has to be executed before/after REINDEX? Not generally, but I seem to remember that there can be exceptions. Indexes on expressions? GIN? 5. Is there a way to find out when REINDEX was last run on an INDEX/TABLE/DATABASE? I don't think so. NOTE: I've also seen from my past experience that REINDEX database-wide greatly improves performance of the application. I don't doubt that; if you've been shooting yourself in the foot by running VACUUM FULL, then REINDEX would be a good bandage to alleviate the pain. My suggestion is to clean up your existing bloat by running CLUSTER on all tables, configure autovacuum to aggressive values similar to what you see in 8.3 or 8.4 and turn it on, run a nightly VACUUM ANALYZE VERBOSE of the database and review the output to make sure your fsm settings are adequate and to monitor bloat, and eliminate all use of VACUUM FULL or REINDEX unless you've somehow slipped up and allowed extreme bloat. This will allow tables and indexes to settle in to an efficient size where they are not constantly giving up disk space to the OS at night and then having to reacquire it from the OS when under heavy load during the day. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
Faheem Mitha fah...@email.unc.edu wrote: If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Because the planner looks at a very wide variety of plans, some of which may use many allocations of work_mem size, and some of which don't. The costs are compared and the lowest cost one is chosen. If you are close to the tipping point then even a very small change might affect which is chosen. It pays to keep the work_mem setting sane so that unexpected plan changes don't cause problems. Look at the plans and their costs to get a feel for what's being chosen and why. Although it's a very bad idea to use these in production, you can often shift the plan to something you *think* would be better using the enable_* settings, to see what the planner thinks such a plan will cost and where it thinks the cost would be; that can help in tuning the settings. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] Database size growing over time and leads to performance impact
You may want to consider performing more frequent vacuums a week or really considering leveraging autovacuum if it makes sense to your transactions volume. Regards, Husam -Original Message- From: Gnanakumar gna...@zoniac.com Sent: Saturday, March 27, 2010 6:06 AM To: pgsql-ad...@postgresql.org pgsql-ad...@postgresql.org; pgsql-performance@postgresql.org pgsql-performance@postgresql.org Subject: [ADMIN] Database size growing over time and leads to performance impact Hi, We're using PostgreSQL 8.2. Recently, in our production database, there was a severe performance impact.. Even though, we're regularly doing both: 1. VACUUM FULL ANALYZE once in a week during low-usage time and 2. ANALYZE everyday at low-usage time Also, we noticed that the physical database size has grown upto 30 GB. But, if I dump the database in the form of SQL and import it locally in my machine, it was only 3.2 GB. Then while searching in Google to optimize database size, I found the following useful link: http://www.linuxinsight.com/optimize_postgresql_database_size.html It says that even vacuumdb or reindexdb doesn't really compact database size, only dump/restore does because of MVCC architecture feature in PostgreSQL and this has been proven here. So, finally we decided to took our production database offline and performed dump/restore. After this, the physical database size has also reduced from 30 GB to 3.5 GB and the performance was also very good than it was before. Physical database size was found using the following command: du -sh /usr/local/pgsql/data/base/database-oid I also cross-checked this size using pg_size_pretty(pg_database_size(datname)). Questions 1. Is there any version/update of PostgreSQL addressing this issue? 2. How in real time, this issues are handled by other PostgreSQL users without taking to downtime? 3. Any ideas or links whether this is addressed in upcoming PostgreSQL version 9.0 release? ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you. ** FACLD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] transaction overhead at on commit delete rows;
We have a postgres database which accessed by clients app via PL/PGSQL stored procedures. For some reasons we use about 25 temp tables on commit delete rows. It widely used by our SP. I can see a stramge delay at any “begin” and “commit”: 2010-03-09 15:14:01 MSK logrus 32102 amber LOG: duration: 20.809 ms statement: BEGIN 2010-03-09 15:14:01 MSK logrus 32102 amber LOG: duration: 0.809 ms statement: SELECT empl.BL_CustomerFreeCLGet('384154676925391', '8189', NULL) 010-03-09 15:14:01 MSK logrus 32102 amber LOG: duration: 0.283 ms statement: FETCH ALL IN unnamed portal 165; -- +++empl.BL_CustomerFreeCLGet+++21360 2010-03-09 15:14:01 MSK logrus 32102 amber LOG: duration: 19.895 ms statement: COMMIT The more system load and more temp table used in session, then more “begin” and “commit” times. This occure only with temp table on commit delete rows. Test example below: create database test; create language plpgsql; CREATE OR REPLACE FUNCTION test_connectionprepare(in_create bool,in_IsTemp bool,in_DelOnCommit bool,in_TableCount int) RETURNS boolean AS $$ declare m_count int := 50; m_isTemp bool; begin m_count := coalesce(in_TableCount,m_count); FOR i IN 0..m_count LOOP if in_create then execute 'create ' || case when in_IsTemp then ' temp ' else ' ' end ||' table tmp_table_' || i::text || '(id int,pid int,name text) ' || case when in_DelOnCommit then ' on commit delete rows ' else ' ' end || ';'; else execute 'drop table if exists tmp_table_' || i::text ||';'; end if; END LOOP; return in_create; end; $$ LANGUAGE 'plpgsql' VOLATILE SECURITY DEFINER; -- Now run pgScript: DECLARE @I; SET @I = 1; WHILE @I = 100 BEGIN select now(); SET @I = @I + 1; END It spent about 2200-2300 ms on my server. Let's create 50 temp tables: select test_connectionprepare(true,true,true,100); and run script againe. We can see 2-3 times slowing! temp tables number - test run time: 0 - 2157-2187 10 - 2500-2704 50 - 5900-6000 100 - 7900-8000 500 - 43000+ -- Sorry for my english. My server info: PostgreSQL 8.4.1 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.2.real (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu4), 64-bit Linux u16 2.6.24-24-server #1 SMP Tue Jul 7 19:39:36 UTC 2009 x86_64 GNU/Linux 4xOpteron 16 processor cores. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Tue, 30 Mar 2010, Kevin Grittner wrote: Faheem Mitha fah...@email.unc.edu wrote: If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Because the planner looks at a very wide variety of plans, some of which may use many allocations of work_mem size, and some of which don't. The costs are compared and the lowest cost one is chosen. If you are close to the tipping point then even a very small change might affect which is chosen. It pays to keep the work_mem setting sane so that unexpected plan changes don't cause problems. Sure, but define sane setting, please. I guess part of the point is that I'm trying to keep memory low, and it seems this is not part of the planner's priorities. That it, it does not take memory usage into consideration when choosing a plan. If that it wrong, let me know, but that is my understanding. Look at the plans and their costs to get a feel for what's being chosen and why. Although it's a very bad idea to use these in production, you can often shift the plan to something you *think* would be better using the enable_* settings, to see what the planner thinks such a plan will cost and where it thinks the cost would be; that can help in tuning the settings. Right. You mean to close off certain options to the planner using 'Planner Method Configuration'. I suppose one can also use 'Planner Cost Constants' to alter plan behaviour. I haven't tried changing these. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? Certainly, but I don't know what you and Robert have in mind, and I'm not experienced enough to make an educated guess. I'm open to specific suggestions. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 3ware vs. MegaRAID
Hi, I am waiting for an ordered machine dedicated to PostgresSQL. It was expected to have 3ware 9650SE 16 port controller. However, the vendor wants to replace this controller with MegaRAID SAS 84016E, because, as they say, they have it on stock, while 3ware would be available in a few weeks. Is this a good replace, generally? Will it run on FreeBSD, specifically? Not sure about that specific controller, but I do have a Fujitsu rebranded RAID Ctrl SAS onboard 256MB iTBBU LSI that works pretty good on my FreeBSD 6.2 box with the mfi driver. Getting the megacli tool took some effort as it involves having Linux emulation running but it's now working fine. I wouldn't dare to use it for write operations as I remember it freezing the box just after upgrading to amd64 (it was working good on i386). Cheers -- Matteo Beccati Development Consulting - http://www.beccati.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 3ware vs. MegaRAID
Ireneusz Pluta wrote: I am waiting for an ordered machine dedicated to PostgresSQL. It was expected to have 3ware 9650SE 16 port controller. However, the vendor wants to replace this controller with MegaRAID SAS 84016E, because, as they say, they have it on stock, while 3ware would be available in a few weeks. Is this a good replace, generally? Will it run on FreeBSD, specifically? The MFI driver needed to support that MegaRAID card has been around since FreeBSD 6.1: http://oldschoolpunx.net/phpMan.php/man/mfi/4 The MegaRAID SAS 84* cards have worked extremely well for me in terms of performance and features for all the systems I've seen them installed in. I'd consider it a modest upgrade from that 3ware card, speed wise. The main issue with the MegaRAID cards is that you will have to write a lot of your own custom scripts to monitor for failures using their painful MegaCLI utility, and under FreeBSD that also requires using their Linux utility via emulation: http://www.freebsdsoftware.org/sysutils/linux-megacli.html -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On thing which I haven't really mentioned in this thread or in my writeup, is that the planners value for the number of rows in geno is way off base some of the time. It is around 800 million, it thinks it is 100 million. I don't know if this is significant or not, or what to do about it. eg. in the ped_bigjoin EXPLAIN ANALYZE VERBOSE: - Sort (cost=56855882.72..57144683.54 rows=115520330 width=42) (actual time=23027732.092..37113627.380 rows=823086774 loops=1) Output: (CASE WHEN (hapmap.geno.snpval_id = (-1)) THEN '0 0'::text WHEN (hapmap.geno.snpval_id = 0) THEN (((dedup_patient_anno.allelea_id)::text || ' '::text) || (dedup_patient_anno.allelea_id)::text) WHEN (hapmap.geno.snpval_id = 1) THEN (((dedup_patient_anno.allelea_id)::text || ' '::text) || (dedup_patient_anno.alleleb_id)::text) WHEN (hapmap.geno.snpval_id = 2) THEN (((dedup_patient_anno.alleleb_id)::text || ' '::text) || (dedup_patient_anno.alleleb_id)::text) ELSE NULL::text END), hapmap.geno.idlink_id, hapmap.geno.anno_id, pheno.patientid, pheno.phenotype, sex.code Faheem. On Tue, 30 Mar 2010, Kevin Grittner wrote: Faheem Mitha fah...@email.unc.edu wrote: If you're concerned about memory usage, try reducing work_mem; you've probably got it set to something huge. work_mem = 1 GB (see diag.{tex/pdf}). The point isn't that I'm using so much memory. Again, my question is, why are these changes affecting memory usage so drastically? Because the planner looks at a very wide variety of plans, some of which may use many allocations of work_mem size, and some of which don't. The costs are compared and the lowest cost one is chosen. If you are close to the tipping point then even a very small change might affect which is chosen. It pays to keep the work_mem setting sane so that unexpected plan changes don't cause problems. Look at the plans and their costs to get a feel for what's being chosen and why. Although it's a very bad idea to use these in production, you can often shift the plan to something you *think* would be better using the enable_* settings, to see what the planner thinks such a plan will cost and where it thinks the cost would be; that can help in tuning the settings. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temp table on commit delete rows: transaction overhead
Artiom Makarov artiom.maka...@gmail.com writes: When temp tables with on commit delete rows exists, I can see a strange delay at any begin and commit. A delay at commit is hardly surprising, because each such temp table requires filesystem operations at commit (basically an ftruncate). I don't recall any operations at transaction start for such tables, but there may be some. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Tue, Mar 30, 2010 at 12:30 PM, Faheem Mitha fah...@email.unc.edu wrote: Sure, but define sane setting, please. I guess part of the point is that I'm trying to keep memory low, and it seems this is not part of the planner's priorities. That it, it does not take memory usage into consideration when choosing a plan. If that it wrong, let me know, but that is my understanding. I don't understand quite why you're confused here. We've already explained to you that the planner will not employ a plan that uses more than the amount of memory defined by work_mem for each sort or hash. Typical settings for work_mem are between 1MB and 64MB. 1GB is enormous. You might need to create some indices, too. Ok. To what purpose? This query picks up everything from the tables and the planner does table scans, so conventional wisdom and indeed my experience, says that indexes are not going to be so useful. There are situations where scanning the entire table to build up a hash table is more expensive than using an index. Why not test it? Certainly, but I don't know what you and Robert have in mind, and I'm not experienced enough to make an educated guess. I'm open to specific suggestions. Try creating an index on geno on the columns that are being used for the join. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] query has huge variance in execution times
postgres 8.3.5 on RHEL4 update 6 This query starts executing at 18:41: cemdb= select query_start,current_query from pg_stat_activity where procpid=10022; query_start | current_query ---+- 2010-03-30 18:41:11.685261-07 | select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time = $1 and c.ts_interval_start_time $2 and b.ts_interval_start_time = $3 and b.ts_interval_start_time $4 (1 row) about 5 mins later, I, suspecting problems, do (the values are the same as for $1 et al above; EXPLAIN was done on purpose to keep stats [hopefully] the same as when pid 10022 started; there are 80,000 rows in each of the 2 tables at the time of this EXPLAIN and when 10022 started): cemdb= explain select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time = '2010-3-29 01:00' and c.ts_interval_start_time '2010-3-29 02:00' and b.ts_interval_start_time = '2010-3-29' and b.ts_interval_start_time '2010-3-30'; QUERY PLAN -- Merge Join (cost=33574.89..34369.38 rows=25207 width=8) Merge Cond: ((b.ts_transet_incarnation_id = c.ts_transet_incarnation_id) AND (b.ts_tranunit_id = c.ts_tranunit_id) AND (b.ts_user_incarnation_id = c.ts_user_incarnation_id)) - Sort (cost=13756.68..13854.96 rows=78623 width=32) Sort Key: b.ts_transet_incarnation_id, b.ts_tranunit_id, b.ts_user_incarnation_id - Index Scan using ts_stats_tranunit_user_daily_starttime on ts_stats_tranunit_user_daily b (cost=0.00..10560.13 rows=78623 width=32) Index Cond: ((ts_interval_start_time = '2010-03-29 00:00:00-07'::timestamp with time zone) AND (ts_interval_start_time '2010-03-30 00:00:00-07'::timestamp with time zone)) - Sort (cost=19818.21..19959.72 rows=113207 width=24) Sort Key: c.ts_transet_incarnation_id, c.ts_tranunit_id, c.ts_user_incarnation_id - Index Scan using ts_stats_tranunit_user_interval_starttime on ts_stats_tranunit_user_interval c (cost=0.00..15066.74 rows=113207 width=24) Index Cond: ((ts_interval_start_time = '2010-03-29 01:00:00-07'::timestamp with time zone) AND (ts_interval_start_time '2010-03-29 02:00:00-07'::timestamp with time zone)) (10 rows) cemdb= \q I then run the query manually: [r...@rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U admin -d cemdb -c select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time = '2010-3-29 01:00' and c.ts_interval_start_time '2010-3-29 02:00' and b.ts_interval_start_time = '2010-3-29' and b.ts_interval_start_time '2010-3-30' /tmp/select.txt 21 real0m0.813s user0m0.116s sys 0m0.013s I let process 10022 run for an hour. an strace shows lots of I/O: [r...@rdl64xeoserv01 log]# strace -p 10022 read(18, \214\2\0\0\374\200#\1\0\0\0\0P\3\0 \4 \0\0\0\0\320\234..., 8192) = 8192 semop(73007122, 0xbfe0fc20, 1) = 0 _llseek(18, 538451968, [538451968], SEEK_SET) = 0 read(18, \214\2\0\0\274\347\t#\1\0\0\0\0P\3\0 \4 \0\0\0\0\320\234..., 8192) = 8192 _llseek(18, 535928832, [535928832], SEEK_SET) = 0 read(18, \214\2\0\0\310\300\226\\1\0\0\0\0P\3\0 \4 \0\0\0\0\320..., 8192) = 8192 _llseek(18, 532398080, [532398080], SEEK_SET) = 0 many more similar lines I then kill 10022 and the application retries the same query: [10022-cemdb-admin-2010-03-30 19:02:37.460 PDT]FATAL: terminating connection due to administrator command [10022-cemdb-admin-2010-03-30 19:02:37.460 PDT]STATEMENT: select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time = $1 and