Re: [PERFORM] Analyze on temp table taking very long
Thanks for the tip. I'll also check in the lock, it's a customer setup and we don't get access to the box very frequently. Also The code was something like this. loop inserting data into the tmptbl analyze tmptbl end loop if I replace this with loop inserting data into the tmptbl end loop analyze It goes through fine. -mridula -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tom Lane Sent: Tuesday, March 22, 2011 3:57 PM To: Mahadevan, Mridula Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Analyze on temp table taking very long Mahadevan, Mridula mridula.mahade...@ironmountain.com writes: This has been running fine for a while on multiple setups, large and small volumes. The setups all have the same hardware configuration. On one particular setup with about 200k records and this analyze runs for 45min and then times out(statement timeout is set to 45 min). typically this takes a few seconds at best. But when I move the analyze outside the loop everything runs fine. Is it actually *running*, as in doing something, or is it just blocked? I can't immediately think of any reason for some other process to have a lock on a temp table that belongs to your process; but it seems unlikely that ANALYZE would randomly take much longer than expected unless something was preventing it from making progress. Look into pg_locks and/or watch the backend with strace next time this happens. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
The potential breakthrough here with the 320 is consumer grade SSD performance and price paired with high reliability. On Mon, Mar 28, 2011 at 7:54 PM, Andy angelf...@yahoo.com wrote: This might be a bit too little too late though. As you mentioned there really isn't any real performance improvement for the Intel SSD. Meanwhile, SandForce (the controller that OCZ Vertex is based on) is releasing its next generation controller at a reportedly huge performance increase. Is there any benchmark measuring the performance of these SSD's (the new Intel vs. the new SandForce) running database workloads? The benchmarks I've seen so far are for desktop applications. Andy --- On Mon, 3/28/11, Greg Smith g...@2ndquadrant.com wrote: From: Greg Smith g...@2ndquadrant.com Subject: [PERFORM] Intel SSDs that may not suck To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Date: Monday, March 28, 2011, 4:21 PM Today is the launch of Intel's 3rd generation SSD line, the 320 series. And they've finally produced a cheap consumer product that may be useful for databases, too! They've put 6 small capacitors onto the board and added logic to flush the write cache if the power drops. The cache on these was never very big, so they were able to avoid needing one of the big super-capacitors instead. Having 6 little ones is probably a net reliability win over the single point of failure, too. Performance is only a little better than earlier generation designs, which means they're still behind the OCZ Vertex controllers that have been recommended on this list. I haven't really been hearing good things about long-term reliability of OCZ's designs anyway, so glad to have an alternative. *Important*: don't buy SSD for important data without also having a good redundancy/backup plan. As relatively new technology they do still have a pretty high failure rate. Make sure you budget for two drives and make multiple copies of your data. Anyway, the new Intel drivers fast enough for most things, though, and are going to be very inexpensive. See http://www.storagereview.com/intel_ssd_320_review_300gb for some simulated database tests. There's more about the internals at http://www.anandtech.com/show/4244/intel-ssd-320-review and the white paper about the capacitors is at http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf Some may still find these two cheap for enterprise use, given the use of MLC limits how much activity these drives can handle. But it's great to have a new option for lower budget system that can tolerate some risk there. -- Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
Hello Greg, list, On 2011-03-28 22:21, Greg Smith wrote: Today is the launch of Intel's 3rd generation SSD line, the 320 series. And they've finally produced a cheap consumer product that may be useful for databases, too! They've put 6 small capacitors onto the board and added logic to flush the write cache if the power drops. The cache on these was never very big, so they were able to avoid needing one of the big super-capacitors instead. Having 6 little ones is probably a net reliability win over the single point of failure, too. Performance is only a little better than earlier generation designs, which means they're still behind the OCZ Vertex controllers that have been recommended on this list. I haven't really been hearing good things about long-term reliability of OCZ's designs anyway, so glad to have an alternative. *Important*: don't buy SSD for important data without also having a good redundancy/backup plan. As relatively new technology they do still have a pretty high failure rate. Make sure you budget for two drives and make multiple copies of your data. Anyway, the new Intel drivers fast enough for most things, though, and are going to be very inexpensive. See http://www.storagereview.com/intel_ssd_320_review_300gb for some simulated database tests. There's more about the internals at http://www.anandtech.com/show/4244/intel-ssd-320-review and the white paper about the capacitors is at http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf Some may still find these two cheap for enterprise use, given the use of MLC limits how much activity these drives can handle. But it's great to have a new option for lower budget system that can tolerate some risk there. While I appreciate the heads up about these new drives, your posting suggests (though you formulated in a way that you do not actually say it) that OCZ products do not have a long term reliability. No factual data. If you have knowledge of sandforce based OCZ drives fail, that'd be interesting because that's the product line what the new Intel SSD ought to be compared with. From my POV I've verified that the sandforce based OCZ drives operate as they should (w.r.t. barriers/write through) and I've reported what and how that testing was done (where I really appreciated your help with) - http://archives.postgresql.org/pgsql-performance/2010-07/msg00449.php. The three drives we're using in a development environment right now report (with recent SSD firmwares and smartmontools) their health status including the supercap status as well as reserved blocks and a lot more info, that can be used to monitor when it's about to be dead. Since none of the drives have failed yet, or are in the vicinity of their end of life predictions, it is currently unknown if this health status is reliable. It may be, but may as well not be. Therefore I'm very interested in hearing hard facts about failures and the smart readings right before that. Below are smart readings from two Vertex 2 Pro's, the first is the same I did the testing with earlier. You can see it's lifetime reads/writes as well as unexpected power loss count is larger than the other, newer one. The FAILING_NOW of available reserved space is an artefact of smartmontools db that has its threshold wrong: it should be read as Gb's reserved space, and I suspect for a new drive it might be in the order of 18 or 20. It's hard to compare with spindles: I've seen them fail in all sorts of ways, but as of yet I've seen no SSD failure yet. I'm inclined to start a perpetual pgbench on one ssd with monitoring of smart stats to see if what they report is really a good indicator of their lifetime. If that is so I'm beginning to believe then this technology is better in failure predictability than spindles, which pretty much seems at random when you have large arrays. Model I tested with earlier: === START OF INFORMATION SECTION === Model Family: SandForce Driven SSDs Device Model: OCZ VERTEX2-PRO Serial Number:OCZ-BVW101PBN8Q8H8M5 LU WWN Device Id: 5 e83a97 f88e46007 Firmware Version: 1.32 User Capacity:50,020,540,416 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is:Tue Mar 29 11:25:04 2011 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed
Re: [PERFORM] Intel SSDs that may not suck
On Mar 29, 2011, at 12:13 AM, Merlin Moncure wrote: My own experience with MLC drives is that write cycle expectations are more or less as advertised. They do go down (hard), and have to be monitored. If you are writing a lot of data this can get pretty expensive although the cost dynamics are getting better and better for flash. I have no idea what would be precisely prudent, but maybe some good monitoring tools and phased obsolescence at around 80% duty cycle might not be a bad starting point. With hard drives, you can kinda wait for em to pop and swap em in -- this is NOT a good idea for flash raid volumes. we've been running some of our DB's on SSD's (x25m's, we also have a pair of x25e's in another box we use for some super hot tables). They have been in production for well over a year (in some cases, nearly a couple years) under heavy load. We're currently being bit in the ass by performance degradation and we're working out plans to remedy the situation. One box has 8 x25m's in a R10 behind a P400 controller. First, the p400 is not that powerful and we've run experiments with newer (p812) controllers that have been generally positive. The main symptom we've been seeing is write stalls. Writing will go, then come to a complete halt for 0.5-2 seconds, then resume. The fix we're going to do is replace each drive in order with the rebuild occuring between each. Then we do a security erase to reset the drive back to completely empty (including the spare blocks kept around for writes). Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. In general, i wouldn't recommend the cciss stuff with SSD's at this time because it makes some things such as security erase, smart and other things near impossible. (performance seems ok though) We've got some tests planned seeing what we can do with an Areca controller and some ssds to see how it goes. Also note that there is a funky interaction with an MSA70 and SSDs. they do not work together. (I'm not sure if HP's official branded ssd's have the same issue). The write degradation could probably be monitored looking at svctime from sar. We may be implementing that in the near future to detect when this creeps up again. -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
2011/3/29 Jeff thres...@torgo.978.org: On Mar 29, 2011, at 12:13 AM, Merlin Moncure wrote: My own experience with MLC drives is that write cycle expectations are more or less as advertised. They do go down (hard), and have to be monitored. If you are writing a lot of data this can get pretty expensive although the cost dynamics are getting better and better for flash. I have no idea what would be precisely prudent, but maybe some good monitoring tools and phased obsolescence at around 80% duty cycle might not be a bad starting point. With hard drives, you can kinda wait for em to pop and swap em in -- this is NOT a good idea for flash raid volumes. we've been running some of our DB's on SSD's (x25m's, we also have a pair of x25e's in another box we use for some super hot tables). They have been in production for well over a year (in some cases, nearly a couple years) under heavy load. We're currently being bit in the ass by performance degradation and we're working out plans to remedy the situation. One box has 8 x25m's in a R10 behind a P400 controller. First, the p400 is not that powerful and we've run experiments with newer (p812) controllers that have been generally positive. The main symptom we've been seeing is write stalls. Writing will go, then come to a complete halt for 0.5-2 seconds, then resume. The fix we're going to do is replace each drive in order with the rebuild occuring between each. Then we do a security erase to reset the drive back to completely empty (including the spare blocks kept around for writes). Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. In general, i wouldn't recommend the cciss stuff with SSD's at this time because it makes some things such as security erase, smart and other things near impossible. (performance seems ok though) We've got some tests planned seeing what we can do with an Areca controller and some ssds to see how it goes. Also note that there is a funky interaction with an MSA70 and SSDs. they do not work together. (I'm not sure if HP's official branded ssd's have the same issue). The write degradation could probably be monitored looking at svctime from sar. We may be implementing that in the near future to detect when this creeps up again. svctime is untrustable. From the systat author, this field will be removed in a future version. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
On Mar 29, 2011, at 10:16 AM, Jeff wrote: Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. We've been doing a burn in for about 4 days now on an array of 8 x25m's behind a p812 controller: here's a sample of what it is currently doing (I have 10 threads randomly seeking, reading, and 10% of the time writing (then fsync'ing) out, using my pgiosim tool which I need to update on pgfoundry) 10:25:24 AM dev104-2 7652.21 109734.51 12375.22 15.96 8.22 1.07 0.12 88.32 10:25:25 AM dev104-2 7318.52 104948.15 11696.30 15.94 8.62 1.17 0.13 92.50 10:25:26 AM dev104-2 7871.56 112572.48 13034.86 15.96 8.60 1.09 0.12 91.38 10:25:27 AM dev104-2 7869.72 111955.96 13592.66 15.95 8.65 1.10 0.12 91.65 10:25:28 AM dev104-2 7859.41 111920.79 13560.40 15.97 9.32 1.19 0.13 98.91 10:25:29 AM dev104-2 7285.19 104133.33 12000.00 15.94 8.08 1.11 0.13 92.59 10:25:30 AM dev104-2 8017.27 114581.82 13250.91 15.94 8.48 1.06 0.11 90.36 10:25:31 AM dev104-2 8392.45 120030.19 13924.53 15.96 8.90 1.06 0.11 94.34 10:25:32 AM dev104-2 10173.86 145836.36 16409.09 15.95 10.72 1.05 0.11113.52 10:25:33 AM dev104-2 7007.14 100107.94 11688.89 15.95 7.39 1.06 0.11 79.29 10:25:34 AM dev104-2 8043.27 115076.92 13192.31 15.95 9.09 1.13 0.12 96.15 10:25:35 AM dev104-2 7409.09 104290.91 13774.55 15.94 8.62 1.16 0.12 90.55 the 2nd to last column is svctime. first column after dev104-2 is TPS. if I kill the writes off, tps rises quite a bit: 10:26:34 AM dev104-2 22659.41 361528.71 0.00 15.95 10.57 0.42 0.04 99.01 10:26:35 AM dev104-2 22479.41 359184.31 7.84 15.98 9.61 0.52 0.04 98.04 10:26:36 AM dev104-2 21734.29 347230.48 0.00 15.98 9.30 0.43 0.04 95.33 10:26:37 AM dev104-2 21551.46 344023.30116.50 15.97 9.56 0.44 0.05 97.09 10:26:38 AM dev104-2 21964.42 350592.31 0.00 15.96 10.25 0.42 0.04 96.15 10:26:39 AM dev104-2 22512.75 359294.12 7.84 15.96 10.23 0.50 0.04 98.04 10:26:40 AM dev104-2 22373.53 357725.49 0.00 15.99 9.52 0.43 0.04 98.04 10:26:41 AM dev104-2 21436.79 342596.23 0.00 15.98 9.17 0.43 0.04 94.34 10:26:42 AM dev104-2 22525.49 359749.02 39.22 15.97 10.18 0.45 0.04 98.04 now to demonstrate write stalls on the problemtic box: 10:30:49 AM dev104-3 0.00 0.00 0.00 0.00 0.38 0.00 0.00 35.85 10:30:50 AM dev104-3 3.03 8.08258.59 88.00 2.43635.00333.33101.01 10:30:51 AM dev104-3 4.00 0.00128.00 32.00 0.67391.75 92.75 37.10 10:30:52 AM dev104-3 10.89 0.00 95.05 8.73 1.45133.55 12.27 13.37 10:30:53 AM dev104-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:30:54 AM dev104-3155.00 0.00 1488.00 9.60 10.88 70.23 2.92 45.20 10:30:55 AM dev104-3 10.00 0.00536.00 53.60 1.66100.20 45.80 45.80 10:30:56 AM dev104-3 46.53 0.00411.88 8.85 3.01 78.51 4.30 20.00 10:30:57 AM dev104-3 11.00 0.00 96.00 8.73 0.79 72.91 27.00 29.70 10:30:58 AM dev104-3 12.00 0.00 96.00 8.00 0.79 65.42 11.17 13.40 10:30:59 AM dev104-3 7.84 7.84 62.75 9.00 0.67 85.38 32.00 25.10 10:31:00 AM dev104-3 8.00 0.00224.00 28.00 0.82102.00 47.12 37.70 10:31:01 AM dev104-3 20.00 0.00184.00 9.20 0.24 11.80 1.10 2.20 10:31:02 AM dev104-3 4.95 0.00 39.60 8.00 0.23 46.00 13.00 6.44 10:31:03 AM dev104-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 that was from a simple dd, not random writes. (since it is in production, I can't really do the random write test as easily) theoretically, a nice rotation of disks would remove that problem. annoying, but it is the price you need to pay -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To
[PERFORM] very long updates very small tables
Dear list, we have got a web application and when people log in some information is written to the user tables. We have around 500 active users, but at the most 5 people are logged in at the same time. From times to times people log in and then the application is not responsive any more. What we see in the postgres server logs is that processes are waiting for other transactions to finish though not because of a deadlock. The log tells me that certain update statements take sometimes about 3-10 minutes. But we are talking about updates on tables with 1000 to 1 rows and updates that are supposed to update 1 row. We are running under windows 2008 and postgres 8.4.7. ( Sorry for the windows, it was not MY first choice ) My only explanation at the moment would be, that there must be any kind of windows process that stops all other processes until it is finished or something like that. ( Could it even be autovaccuum? ). Is there a way to find out how long autovaccum took ? Has anyone seen anything similiar? Or could it really be that we need a bigger machine with more io? But the one disk in the system still seems not very busy and response times in windows resource monitor are not higher than 28 ms. Following is an excerpt of our server log. LOG: process 1660 acquired ShareLock on transaction 74652 after 533354.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4984 acquired ShareLock on transaction 74652 after 1523530.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 956 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 383055.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4312 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 1422677.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 1523567.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 533391.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 5504 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 183216.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 1524 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 376370.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 1422688.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 383067.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 'f', $2 = '31' LOG: process 4532 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 118851.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4448 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 366304.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 183241.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 376395.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 4204 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 339893.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 366342.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 4760 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 205943.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 339923.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: duration: 205963.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: duration: 124654.000 ms execute unnamed: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: process 3844 still waiting for ShareLock on transaction 74839 after 8000.000 ms Thanx in advance. Lars -- ~~~ Lars Feistner Kompetenzzentrum für Prüfungen in der Medizin Medizinische Fakultät Heidelberg, Im Neuenheimer Feld 346, Raum 013 69120 Heidelberg
Re: [PERFORM] Intel SSDs that may not suck
This can be resolved by partitioning the disk with a larger write spare area so that the cells don't have to by recycled so often. There is a lot of misinformation about SSD's, there are some great articles on anandtech that really explain how the technology works and some of the differences between the controllers as well. If you do the reading you can find a solution that will work for you, SSD's are probably one of the best technologies to come along for us in a long time that gives us such a performance jump in the IO world. We have gone from completely IO bound to CPU bound, it's really worth spending the time to investigate and understand how this can impact your system. http://www.anandtech.com/show/2614 http://www.anandtech.com/show/2738 http://www.anandtech.com/show/4244/intel-ssd-320-review http://www.anandtech.com/tag/storage http://www.anandtech.com/show/3849/micron-announces-realssd-p300-slc-ssd-for-enterprise -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jeff Sent: Tuesday, March 29, 2011 9:33 AM To: Jeff Cc: Merlin Moncure; Andy; pgsql-performance@postgresql.org; Greg Smith; Brian Ristuccia Subject: Re: [PERFORM] Intel SSDs that may not suck On Mar 29, 2011, at 10:16 AM, Jeff wrote: Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. We've been doing a burn in for about 4 days now on an array of 8 x25m's behind a p812 controller: here's a sample of what it is currently doing (I have 10 threads randomly seeking, reading, and 10% of the time writing (then fsync'ing) out, using my pgiosim tool which I need to update on pgfoundry) 10:25:24 AM dev104-2 7652.21 109734.51 12375.22 15.96 8.22 1.07 0.12 88.32 10:25:25 AM dev104-2 7318.52 104948.15 11696.30 15.94 8.62 1.17 0.13 92.50 10:25:26 AM dev104-2 7871.56 112572.48 13034.86 15.96 8.60 1.09 0.12 91.38 10:25:27 AM dev104-2 7869.72 111955.96 13592.66 15.95 8.65 1.10 0.12 91.65 10:25:28 AM dev104-2 7859.41 111920.79 13560.40 15.97 9.32 1.19 0.13 98.91 10:25:29 AM dev104-2 7285.19 104133.33 12000.00 15.94 8.08 1.11 0.13 92.59 10:25:30 AM dev104-2 8017.27 114581.82 13250.91 15.94 8.48 1.06 0.11 90.36 10:25:31 AM dev104-2 8392.45 120030.19 13924.53 15.96 8.90 1.06 0.11 94.34 10:25:32 AM dev104-2 10173.86 145836.36 16409.09 15.95 10.72 1.05 0.11113.52 10:25:33 AM dev104-2 7007.14 100107.94 11688.89 15.95 7.39 1.06 0.11 79.29 10:25:34 AM dev104-2 8043.27 115076.92 13192.31 15.95 9.09 1.13 0.12 96.15 10:25:35 AM dev104-2 7409.09 104290.91 13774.55 15.94 8.62 1.16 0.12 90.55 the 2nd to last column is svctime. first column after dev104-2 is TPS. if I kill the writes off, tps rises quite a bit: 10:26:34 AM dev104-2 22659.41 361528.71 0.00 15.95 10.57 0.42 0.04 99.01 10:26:35 AM dev104-2 22479.41 359184.31 7.84 15.98 9.61 0.52 0.04 98.04 10:26:36 AM dev104-2 21734.29 347230.48 0.00 15.98 9.30 0.43 0.04 95.33 10:26:37 AM dev104-2 21551.46 344023.30116.50 15.97 9.56 0.44 0.05 97.09 10:26:38 AM dev104-2 21964.42 350592.31 0.00 15.96 10.25 0.42 0.04 96.15 10:26:39 AM dev104-2 22512.75 359294.12 7.84 15.96 10.23 0.50 0.04 98.04 10:26:40 AM dev104-2 22373.53 357725.49 0.00 15.99 9.52 0.43 0.04 98.04 10:26:41 AM dev104-2 21436.79 342596.23 0.00 15.98 9.17 0.43 0.04 94.34 10:26:42 AM dev104-2 22525.49 359749.02 39.22 15.97 10.18 0.45 0.04 98.04 now to demonstrate write stalls on the problemtic box: 10:30:49 AM dev104-3 0.00 0.00 0.00 0.00 0.38 0.00 0.00 35.85 10:30:50 AM dev104-3 3.03 8.08258.59 88.00 2.43635.00333.33101.01 10:30:51 AM dev104-3 4.00 0.00128.00 32.00 0.67391.75 92.75 37.10 10:30:52 AM dev104-3 10.89 0.00 95.05 8.73 1.45133.55 12.27 13.37 10:30:53 AM dev104-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10:30:54 AM dev104-3155.00 0.00 1488.00 9.60 10.88 70.23 2.92 45.20 10:30:55 AM dev104-3 10.00 0.00536.00 53.60 1.66100.20 45.80
Re: [PERFORM] Intel SSDs that may not suck
On 2011-03-29 16:16, Jeff wrote: halt for 0.5-2 seconds, then resume. The fix we're going to do is replace each drive in order with the rebuild occuring between each. Then we do a security erase to reset the drive back to completely empty (including the spare blocks kept around for writes). Are you replacing the drives with new once, or just secure-erase and back in? What kind of numbers are you drawing out of smartmontools in usage figures? (Also seeing some write-stalls here, on 24 Raid50 volumes of x25m's, and have been planning to cycle drives for quite some time, without actually getting to it. Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. Thats also my experience. -- Jesper
Re: [PERFORM] Intel SSDs that may not suck
On Mar 29, 2011, at 12:12 PM, Jesper Krogh wrote: Are you replacing the drives with new once, or just secure-erase and back in? What kind of numbers are you drawing out of smartmontools in usage figures? (Also seeing some write-stalls here, on 24 Raid50 volumes of x25m's, and have been planning to cycle drives for quite some time, without actually getting to it. we have some new drives that we are going to use initially, but eventually it'll be a secure-erase'd one we replace it with (which should perform identical to a new one) What enclosure controller are you using on the 24 disk beast? -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
Both the X25-M and the parts that AnandTech reviews (and a pretty thorough one they do) are, on a good day, prosumer. Getting review material for truly Enterprise parts, the kind that STEC, Violin, and Texas Memory will spend a year to get qualified at HP or IBM or Oracle is really hard to come by. Zsolt does keep track of what's going on in the space, although he doesn't test himself, that I've seen. Still, a useful site to visit on occasion: http://www.storagesearch.com/ regards Original message Date: Tue, 29 Mar 2011 11:32:16 -0400 From: pgsql-performance-ow...@postgresql.org (on behalf of Strange, John W john.w.stra...@jpmchase.com) Subject: Re: [PERFORM] Intel SSDs that may not suck To: Jeff thres...@torgo.dyndns-server.com Cc: Merlin Moncure mmonc...@gmail.com,Andy angelf...@yahoo.com,pgsql-performance@postgresql.org pgsql-performance@postgresql.org,Greg Smith g...@2ndquadrant.com,Brian Ristuccia br...@ristuccia.com This can be resolved by partitioning the disk with a larger write spare area so that the cells don't have to by recycled so often. There is a lot of misinformation about SSD's, there are some great articles on anandtech that really explain how the technology works and some of the differences between the controllers as well. If you do the reading you can find a solution that will work for you, SSD's are probably one of the best technologies to come along for us in a long time that gives us such a performance jump in the IO world. We have gone from completely IO bound to CPU bound, it's really worth spending the time to investigate and understand how this can impact your system. http://www.anandtech.com/show/2614 http://www.anandtech.com/show/2738 http://www.anandtech.com/show/4244/intel-ssd-320-review http://www.anandtech.com/tag/storage http://www.anandtech.com/show/3849/micron-announces-realssd-p300-slc-ssd-for-enterprise -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Jeff Sent: Tuesday, March 29, 2011 9:33 AM To: Jeff Cc: Merlin Moncure; Andy; pgsql-performance@postgresql.org; Greg Smith; Brian Ristuccia Subject: Re: [PERFORM] Intel SSDs that may not suck On Mar 29, 2011, at 10:16 AM, Jeff wrote: Now that all sounds awful and horrible until you get to overall performance, especially with reads - you are looking at 20k random reads per second with a few disks. Adding in writes does kick it down a noch, but you're still looking at 10k+ iops. That is the current trade off. We've been doing a burn in for about 4 days now on an array of 8 x25m's behind a p812 controller: here's a sample of what it is currently doing (I have 10 threads randomly seeking, reading, and 10% of the time writing (then fsync'ing) out, using my pgiosim tool which I need to update on pgfoundry) 10:25:24 AM dev104-2 7652.21 109734.51 12375.22 15.96 8.22 1.07 0.12 88.32 10:25:25 AM dev104-2 7318.52 104948.15 11696.30 15.94 8.62 1.17 0.13 92.50 10:25:26 AM dev104-2 7871.56 112572.48 13034.86 15.96 8.60 1.09 0.12 91.38 10:25:27 AM dev104-2 7869.72 111955.96 13592.66 15.95 8.65 1.10 0.12 91.65 10:25:28 AM dev104-2 7859.41 111920.79 13560.40 15.97 9.32 1.19 0.13 98.91 10:25:29 AM dev104-2 7285.19 104133.33 12000.00 15.94 8.08 1.11 0.13 92.59 10:25:30 AM dev104-2 8017.27 114581.82 13250.91 15.94 8.48 1.06 0.11 90.36 10:25:31 AM dev104-2 8392.45 120030.19 13924.53 15.96 8.90 1.06 0.11 94.34 10:25:32 AM dev104-2 10173.86 145836.36 16409.09 15.95 10.72 1.05 0.11113.52 10:25:33 AM dev104-2 7007.14 100107.94 11688.89 15.95 7.39 1.06 0.11 79.29 10:25:34 AM dev104-2 8043.27 115076.92 13192.31 15.95 9.09 1.13 0.12 96.15 10:25:35 AM dev104-2 7409.09 104290.91 13774.55 15.94 8.62 1.16 0.12 90.55 the 2nd to last column is svctime. first column after dev104-2 is TPS. if I kill the writes off, tps rises quite a bit: 10:26:34 AM dev104-2 22659.41 361528.71 0.00 15.95 10.57 0.42 0.04 99.01 10:26:35 AM dev104-2 22479.41 359184.31 7.84 15.98 9.61 0.52 0.04 98.04 10:26:36 AM dev104-2 21734.29 347230.48 0.00 15.98 9.30 0.43 0.04 95.33 10:26:37 AM dev104-2 21551.46 344023.30116.50 15.97 9.56 0.44 0.05 97.09 10:26:38 AM dev104-2 21964.42 350592.31 0.00 15.96 10.25 0.42 0.04 96.15 10:26:39 AM dev104-2 22512.75 359294.12 7.84 15.96 10.23 0.50 0.04 98.04 10:26:40 AM dev104-2 22373.53 357725.49 0.00
Re: [PERFORM] Intel SSDs that may not suck
On 03/29/2011 06:34 AM, Yeb Havinga wrote: While I appreciate the heads up about these new drives, your posting suggests (though you formulated in a way that you do not actually say it) that OCZ products do not have a long term reliability. No factual data. If you have knowledge of sandforce based OCZ drives fail, that'd be interesting because that's the product line what the new Intel SSD ought to be compared with. I didn't want to say anything too strong until I got to the bottom of the reports I'd been sorting through. It turns out that there is a very wide incompatibility between OCZ drives and some popular Gigabyte motherboards: http://www.ocztechnologyforum.com/forum/showthread.php?76177-do-you-own-a-Gigabyte-motherboard-and-have-the-SMART-error-with-FW1.11...look-inside (I'm typing this message on a system with one of the impacted combinations, one reason why I don't own a Vertex 2 Pro yet. That I would have to run a Beta BIOS does not inspire confidence.) What happens on the models impacted is that you can't get SMART data from the drive. That means no monitoring for the sort of expected failures we all know can happen with any drive. So far that looks to be at the bottom of all the anecdotal failure reports I'd found: the drives may have been throwing bad sectors or some other early failure, and the owners had no idea because they thought SMART would warn them--but it wasn't working at all. Thus, don't find out there's a problem until the drive just dies altogether one day. More popular doesn't always mean more reliable, but for stuff like this it helps. Intel ships so many more drives than OCZ that I'd be shocked if Gigabyte themselves didn't have reference samples of them for testing. This really looks like more of a warning about why you should be particularly aggressive with checking SMART when running recently introduced drives, which it sounds like you are already doing. Reliability in this area is so strange...a diversion to older drives gives an idea how annoyed I am about all this. Last year, I gave up on Western Digital's consumer drives (again). Not because the failure rates were bad, but because the one failure I did run into was so terrible from a SMART perspective. The drive just lied about the whole problem so aggressively I couldn't manage the process. I couldn't get the drive to admit it had a problem such that it could turn into an RMA candidate, despite failing every time I ran an aggressive SMART error check. It would reallocate a few sectors, say good as new!, and then fail at the next block when I re-tested. Did that at least a dozen times before throwing it in the pathological drives pile I keep around for torture testing. Meanwhile, the Seagate drives I switched back to are terrible, from a failure percentage perspective. I just had two start to go bad last week, both halves of an array which is always fun. But, the failure started with very clearly labeled increases in reallocated sectors, and the drive that eventually went really bad (making the bad noises) was kicked back for RMA. If you've got redundancy, I'll take components that fail cleanly over ones that hide what's going on, even if the one that fails cleanly is actually more likely to fail. With a rebuild always a drive swap away, having accurate data makes even a higher failure rate manageable. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very long updates very small tables
Lars Feistner feist...@uni-heidelberg.de wrote: The log tells me that certain update statements take sometimes about 3-10 minutes. But we are talking about updates on tables with 1000 to 1 rows and updates that are supposed to update 1 row. The top possibilities that come to my mind are: (1) The tables are horribly bloated. If autovacuum is off or not aggressive enough, things can degenerate to this level. (2) Memory is over-committed and your machine is thrashing. (3) There are explicit LOCK commands in the software which is contributing to the blocking. (4) There is some external delay within the transaction, such as waiting for user input while the transaction is open. Maybe there's a combination of the above at play. Can you rule any of these out? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel SSDs that may not suck
On 2011-03-29 18:50, Jeff wrote: we have some new drives that we are going to use initially, but eventually it'll be a secure-erase'd one we replace it with (which should perform identical to a new one) What enclosure controller are you using on the 24 disk beast? LSI ELP and a HP D2700 enclosure. Works flawlessly, the only bad thing (which actually is pretty grave) is that the controller mis-numbers the slots in the enclosure, so you'll have to have the mapping drawn on paper next to the enclosure to replace the correct disk. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] multiple table scan performance
I've got some functionality that necessarily must scan a relatively large table. Even worse, the total workload is actually 3 similar, but different queries, each of which requires a table scan. They all have a resultset that has the same structure, and all get inserted into a temp table. Is there any performance benefit to revamping the workload such that it issues a single: insert into (...) select ... UNION select ... UNION select as opposed to 3 separate insert into () select ... statements. I could figure it out empirically, but the queries are really slow on my dev laptop and I don't have access to the staging system at the moment. Also, it requires revamping a fair bit of code, so I figured it never hurts to ask. I don't have a sense of whether postgres is able to parallelize multiple subqueries via a single scan
Re: [PERFORM] multiple table scan performance
On Tue, Mar 29, 2011 at 7:16 PM, Samuel Gendler sgend...@ideasculptor.com wrote: Is there any performance benefit to revamping the workload such that it issues a single: insert into (...) select ... UNION select ... UNION select as opposed to 3 separate insert into () select ... statements. I wouldn't expect any difference - if you used UNION ALL (union will be equivalent to insert into () select DISTINCT ...) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multiple table scan performance
On Wed, Mar 30, 2011 at 01:16, Samuel Gendler sgend...@ideasculptor.com wrote: I've got some functionality that necessarily must scan a relatively large table Is there any performance benefit to revamping the workload such that it issues a single: insert into (...) select ... UNION select ... UNION select as opposed to 3 separate insert into () select ... statements. Apparently not, as explained by Claudio Freire. This seems like missed opportunity for the planner, however. If it scanned all three UNION subqueries in parallel, the synchronized seqscans feature would kick in and the physical table would only be read once, instead of 3 times. (I'm assuming that seqscan disk access is your bottleneck) You can trick Postgres (8.3.x and newer) into doing it in parallel anyway: open 3 separate database connections and issue each of these 'INSERT INTO ... SELECT' parts separately. This way all the queries should execute in about 1/3 the time, compared to running them in one session or with UNION ALL. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multiple table scan performance
On Tue, Mar 29, 2011 at 5:05 PM, Marti Raudsepp ma...@juffo.org wrote: On Wed, Mar 30, 2011 at 01:16, Samuel Gendler sgend...@ideasculptor.com wrote: You can trick Postgres (8.3.x and newer) into doing it in parallel anyway: open 3 separate database connections and issue each of these 'INSERT INTO ... SELECT' parts separately. This way all the queries should execute in about 1/3 the time, compared to running them in one session or with UNION ALL. That's a good idea, but forces a lot of infrastructural change on me. I'm inserting into a temp table, then deleting everything from another table before copying over. I could insert into an ordinary table, but then I've got to deal with ensuring that everything is properly cleaned up, etc. Since nothing is actually blocked, waiting for the queries to return, I think I'll just let them churn for now. It won't make much difference in production, where the whole table will fit easily into cache. I just wanted things to be faster in my dev environment. Regards, Marti
Re: [PERFORM] multiple table scan performance
On 3/29/11 3:16 PM, Samuel Gendler wrote: I've got some functionality that necessarily must scan a relatively large table. Even worse, the total workload is actually 3 similar, but different queries, each of which requires a table scan. They all have a resultset that has the same structure, and all get inserted into a temp table. Is there any performance benefit to revamping the workload such that it issues a single: insert into (...) select ... UNION select ... UNION select as opposed to 3 separate insert into () select ... statements. I could figure it out empirically, but the queries are really slow on my dev laptop and I don't have access to the staging system at the moment. Also, it requires revamping a fair bit of code, so I figured it never hurts to ask. I don't have a sense of whether postgres is able to parallelize multiple subqueries via a single scan You don't indicate how complex your queries are. If it's just a single table and the conditions are relatively simple, could you do something like this? insert into (...) select ... where (...) OR (...) OR (...) Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance