Re: [PERFORM] RAID card recommendation
On Tue, Nov 24, 2009 at 7:35 PM, Greg Smith wrote: > Scott Marlowe wrote: >> >> As far as drives go we've been really happy with WD of late, they make >> large enterprise class SATA drives that don't pull a lot of power >> (green series) and fast SATA drives that pull a bit more but are >> faster (black series). > > Be careful to note the caveat that you need their *enterprise class* drives. > When you run into an error on their regular consumer drives, they get > distracted for a while trying to cover the whole thing up, in a way that's > exactly the opposite of the behavior you want for a RAID configuration. I > have a regular consumer WD drive that refuses to admit that it has a problem > such that I can RMA it, but that always generates an error if I rewrite the > whole drive. The behavior of the firmware is downright shameful. As cheap > consumer drives go, I feel like WD has pulled ahead of everybody else on > performance and possibly even actual reliability, but the error handling of > their firmware is so bad I'm still using Seagate drives--when those fail, as > least they're honest about it. When I inquired earlier this summer about using the consumer WDs in a new server I was told rather firmly by my sales guy "uhm, no". They put the enterprise drives through the wringer before he said they seemed ok. They have been great, both green and black series. For what they are, big SATA drives in RAID-6 or RAID-10 they're quite good. Moderate to quite good performers at a reasonable price. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
Jochen Erwied wrote: - Promise Technology Supertrak ES4650 + additional BBU - Adaptec RAID 5405 SGL/256 SATA/SAS + additional BBU - Adaptec RAID 5405Z SGL/512 SATA/SAS I've never seen a Promise controller that had a Linux driver you would want to rely on under any circumstances. Adaptec used to have seriously bad Linux drivers too. I've gotten the impression they've cleaned up their act considerably the last few years, but they've been on my list of hardware to shun for so long I haven't bothered investigating. Easier to just buy from a company that has always cared about good Linux support, like 3ware. In any case, driver quality is what you want to research before purchasing any of these; doesn't matter how fast the cards are if they crash or corrupt your data. What I like to do is look at what companies who sell high-quality production servers with Linux preinstalled and see what hardware they include. You can find a list of vendors people here like at http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks#Helpful_vendors_of_SATA_RAID_systems -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
Scott Marlowe wrote: As far as drives go we've been really happy with WD of late, they make large enterprise class SATA drives that don't pull a lot of power (green series) and fast SATA drives that pull a bit more but are faster (black series). Be careful to note the caveat that you need their *enterprise class* drives. When you run into an error on their regular consumer drives, they get distracted for a while trying to cover the whole thing up, in a way that's exactly the opposite of the behavior you want for a RAID configuration. I have a regular consumer WD drive that refuses to admit that it has a problem such that I can RMA it, but that always generates an error if I rewrite the whole drive. The behavior of the firmware is downright shameful. As cheap consumer drives go, I feel like WD has pulled ahead of everybody else on performance and possibly even actual reliability, but the error handling of their firmware is so bad I'm still using Seagate drives--when those fail, as least they're honest about it. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
Matthew Wakeling wrote: People have mentioned Areca as making good RAID controllers. We're looking at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. Does anyone have an opinion on whether it is a turkey or a star? Performance should be OK but not great compared with some of the newer alternatives (this design is a few years old now). The main issue I've had with this series of cards is that the command-line tools are very hit or miss. See http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html for a long commentary about the things I was disappointed by on the similar ARC-1210 once I actually ran into a drive failure on one. As Scott points out there, they have other cards with a built-in management NIC that allows an alternate management path, and I believe those have better performance too. Another possibility is a 3-ware card of some description. I've put a fair number of 9690SA cards in systems with little to complain about. Performance was reasonable as long as you make sure to tweak the read-ahead: http://www.3ware.com/kb/article.aspx?id=11050 Ignore most of the rest of their advice on that page though--for example, increasing vm.dirty_background_ratio and vm.dirty_ratio is an awful idea for PostgreSQL use, where if anything you want to decrease the defaults. Also, while they claim you can connect SAS drives to these cards, they don't support sending SMART commands to them and support seemed pretty limited overall for them. Stick with plain on SATA ones. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
Even though the column in question is not unique on t2 could you not index it? That should improve the performance of the inline query. Are dates applicable in any way? In some cases adding a date field, partitioning or indexing on that and adding where date>x days. That can be an effective way to limit records searched. Kris On 24-Nov-09, at 9:59, "Jerry Champlin" > wrote: You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Jerry Champlin|Absolute Performance Inc.|Mobile: 303-588-2547 -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Luca Tettamanti Sent: Tuesday, November 24, 2009 6:37 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] DELETE performance problem Hello, I've run in a severe performance problem with the following statement: DELETE FROM t1 WHERE t1.annotation_id IN ( SELECT t2.annotation_id FROM t2) t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's not even unique, in fact there are duplicates - there are about 20M distinct annotation_id in this table). There are no FKs on either tables. I've killed the query after 14h(!) of runtime... I've reproduced the problem using a only the ids (extracted from the full tables) with the following schemas: test2=# \d t1 Table "public.t1" Column | Type | Modifiers ---++--- annotation_id | bigint | not null Indexes: "t1_pkey" PRIMARY KEY, btree (annotation_id) test2=# \d t2 Table "public.t2" Column | Type | Modifiers ---++--- annotation_id | bigint | Indexes: "t2_idx" btree (annotation_id) The query above takes about 30 minutes to complete. The slowdown is not as severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using procexp I see the process churning the disk and using more memory until it hits some limit (at about 1.8GB) then the IO slows down considerably. See this screenshot[1]. This is exactly what happens with the full dataset. This is the output of the explain: test2=> explain analyze delete from t1 where annotation_id in (select annotation _id from t2); QUERY PLAN --- --- -- - Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339 5.565..1832056.588 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) -> Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim e=0.291..179119.487 rows=45874812 loops=1) -> Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433 93.742..643393.742 rows=26185953 loops=1) -> HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) -> Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=2460.595..480446.581 rows=60956812 loops=1) Total runtime: 2271122.474 ms (7 rows) Time: 2274723,284 ms An identital linux machine (with 8.4.1) shows the same issue; with strace I see a lots of seeks: % time seconds usecs/call callserrors syscall -- --- --- - - 90.370.155484 15 10601 read 9.100.0156495216 3 fadvise64 0.390.000668 0 5499 write 0.150.000253 0 10733 lseek 0.000.00 0 3 open 0.000.00 0 3 close 0.000.00 0 3 semop -- --- --- - - 100.000.172054 26845 total (30s sample) Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the trace is the following: % time seconds usecs/call callserrors syscall -- --- --- - - 100.000.063862 0321597 read 0.000.00 0 3 lseek 0.000.00 076 mmap -- --- --- - - 100.000.063862321676 total The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory is on hardware (Dell PERC5) raid mirror, with the log on a separate array. One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit). shared_buffers = 512MB work_mem = 512MB maintenance_work_mem = 1GB checkpoint_segments = 16 wal_buff
Re: [PERFORM] RAID card recommendation
Jochen Erwied wrote: Tuesday, November 24, 2009, 10:34:00 PM you wrote: Aberdeen is the builder I use. They'll put any card in you want (within reason) including our preference here, Areca. Perhaps you meant Areca? I knew Areca only for their internal arrays (which one of our customers uses for his 19" systems), but did not know they manufacture their own controllers. Added the ARC-1212+BBU to my wishlist :-) For what it's worth I'm using the Adaptec 5445Z on my new server (don't have Postgre running on it) and have been happy with it. For storage on my servers I use http://www.pc-pitstop.com/sas_cables_enclosures/scsase16.asp which has an Areca ARC-8020 expander in it. With the 5445Z I use the 4 internal ports for a fast RAID0 "working array" with 450G Seagate 15k6 drives and the external goes to the 16 drive enclosure through the expander. With 16 drives you have a lot of possibilities for configuring arrays. I have another server with an Adaptec 52445 (don't have Postgre running on it either) connected to two of the 16 drive enclosures and am happy with it. I'm running Postgre on my workstation that has an Adaptec 52445 hooked up to two EnhanceBox-E8MS (http://www.enhance-tech.com/products/desktop/E8_Series.html). I have 8 ST373455SS drives in my tower and 8 in the EnhanceBox so my database is running off 16 drives in RAID5. Everyone complains about RAID5 but it works for me in my situation. Very very rarely am I waiting on the disks when running queries. The other EnhanceBox has 8 ST31000640SS drives in RAID5 just for backup images. All 24 drives run off the 52445 and again, I've been satisfied with it. I've also been happy with the Enhance Technology products. Sorry for being so long but just wanted to put a plug in for the Adaptec cards and let you know about the external options. The 5 series cards are a huge improvement over the 3 series. I had a 3805 and wasn't that impressed. It's actually sitting on my shelf now collecting dust. Bob
Re: [PERFORM] RAID card recommendation
Tuesday, November 24, 2009, 10:34:00 PM you wrote: > Aberdeen is the builder I use. They'll put any card in you want > (within reason) including our preference here, Areca. Perhaps you > meant Areca? I knew Areca only for their internal arrays (which one of our customers uses for his 19" systems), but did not know they manufacture their own controllers. Added the ARC-1212+BBU to my wishlist :-) -- Jochen Erwied | home: joc...@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: j...@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erw...@vodafone.de +49-173-5404164 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
The problem with RAID-5 or RAID-6 is not the normal speed operation, it's the degraded performance when there is a drive failure. This includes read-only scenarios. A DB server getting any kind of real use will effectively appear to be down to client apps if it loses a drive from that RAID set. Basically, think of RAID-5/6 as RAID-0 but with much slower writes, and a way to recover the data without going to backup tapes if there is a disc loss. It is NOT a solution for staying up in case of a failure. Presumably, there is a business reason that you're thinking of using RAID-5/6 with hardware RAID and maybe a hot spare, rather than software RAID-0 which would save you 2-3 spindles of formatted capacity, plus the cost of the RAID card. Whatever that reason is, it's also a reason to use RAID-10. If you absolutely need it to fit in 2U of rack space, you can get a 2U server with a bunch of 2.5" spindles and with 24x 500GB SATA you can get the same formatted size with RAID-10; or you can use an external SAS expander to put additional 3.5" drives in another enclosure. If we're taking rackmount server RAID card votes, I've had good experiences with the LSI under Linux. Cheers Dave On Tue, Nov 24, 2009 at 11:23 AM, Matthew Wakeling wrote: > > We're about to purchase a new server to store some of our old databases, > and I was wondering if someone could advise me on a RAID card. We want to > make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 6 > because there will be zero write traffic. The priority is stuffing as much > storage into a small 2U rack as possible, with performance less important. > We will be running Debian Linux. > > People have mentioned Areca as making good RAID controllers. We're looking > at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. Does anyone > have an opinion on whether it is a turkey or a star? > > Another possibility is a 3-ware card of some description. > > Thanks in advance, > > Matthew > > -- > Now you see why I said that the first seven minutes of this section will > have > you looking for the nearest brick wall to beat your head against. This is > why I do it at the end of the lecture - so I can run. > -- Computer Science lecturer > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] RAID card recommendation
On Tue, Nov 24, 2009 at 1:59 PM, Jochen Erwied wrote: > Tuesday, November 24, 2009, 9:05:28 PM you wrote: > >> Have you searched the -performance archives for references to them? >> I'm not that familiar with Adaptec RAID controllers. Not requiring a >> battery check / replacement is nice. > > Either I searched for the wrong terms, or there isn't really that much > reference on RAID-controllers on this list. Aberdeen is menthioned once and > looks interesting, but I didn't find a reseller in Germany. As far as I see > from the list, Promise and Adaptec both seem to be not too bad choices. Aberdeen is the builder I use. They'll put any card in you want (within reason) including our preference here, Areca. Perhaps you meant Areca? >> So, assuming this means an 8 hour work day for ~20M rows, you're >> looking at around 700 per second. > > It's an automated application running 24/7, so I require 'only' about > 200-250 updates per second. Oh, much better. A decent hardware RAID controller with battery backed cache could handle that load with a pair of spinning 15k drives in RAID-1 probably. >> Another option might be a JBOD box attached to the machine that holds >> 12 or so 2.5" 15k like the hitachi ultrastar 147G 2.5" drives. This >> sounds like a problem you need to be able to throw a lot of drives at >> at one time. Is it likely to grow much after this? > > JBOD in an external casing would be an alternative, especially when using > an external case. And no, the database will not grow too much after > reaching its final size. Yeah, if it's not gonna grow a lot more after the 2B rows, then you probably won't need an external case. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
- "Scott Marlowe" escreveu: > On Tue, Nov 24, 2009 at 1:37 PM, Ing. Marcos Ortiz Valmaseda > wrote: > > Do you expose that performance issued caused by RAID 5? Because this > is one > > of our solutions here on my country to save the data of our > PostgreSQL > > database. Which model do you recommend ? RAID 0,RAID 1, RAID 5 or > RAID 10? > > RAID-1 or RAID-10 are the default, mostly safe choices. > > For disposable dbs, RAID-0 is fine. > > For very large dbs with very little writing and mostly reading and on > a budget, RAID-6 is ok. > > In most instances I never recommend RAID-5 anymore. I would never recommend RAID-5 for database customers (any database system), some of the current ones are using it and the worst nightmares in disk performance are related to RAID-5. As Scott said, RAID-1 is safe, RAID-0 is fast (and accept more request load too), RAID-10 is a great combination of both worlds. Flavio Henrique A. Gurgel Consultor -- 4Linux tel. 55-11-2125.4765 fax. 55-11-2125.4777 www.4linux.com.br -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
Tuesday, November 24, 2009, 9:05:28 PM you wrote: > Have you searched the -performance archives for references to them? > I'm not that familiar with Adaptec RAID controllers. Not requiring a > battery check / replacement is nice. Either I searched for the wrong terms, or there isn't really that much reference on RAID-controllers on this list. Aberdeen is menthioned once and looks interesting, but I didn't find a reseller in Germany. As far as I see from the list, Promise and Adaptec both seem to be not too bad choices. > So, you're willing (or forced by economics) to suffer downtime due to > drive failure every so often. I haven't experienced any downtime due to a disk failure for quite a while now (call me lucky), although I had a really catastrophic experience with a RAID-5 some time ago (1 drive crashed, the second one during rebuild :-() But for this application losing one day of updates is not a big deal, and downtime isn't either. It's a long running project of mine, with growing storage needs, but not with 100% of integrity or uptime. > So, assuming this means an 8 hour work day for ~20M rows, you're > looking at around 700 per second. It's an automated application running 24/7, so I require 'only' about 200-250 updates per second. > I'd definitely test the heck out of whatever RAID card you're buying > to make sure it performs well enough. For some loads and against some > HW RAID cards, SW RAID might be the winner. Well, I haven't got so much opportunities to test out different kind of hardware, so I have to rely on experience or reports. > Another option might be a JBOD box attached to the machine that holds > 12 or so 2.5" 15k like the hitachi ultrastar 147G 2.5" drives. This > sounds like a problem you need to be able to throw a lot of drives at > at one time. Is it likely to grow much after this? JBOD in an external casing would be an alternative, especially when using an external case. And no, the database will not grow too much after reaching its final size. But looking at the prices for anything larger than 4+1 drives in an external casing is not funny at all :-( -- Jochen Erwied | home: joc...@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: j...@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erw...@vodafone.de +49-173-5404164 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
On Tue, Nov 24, 2009 at 1:37 PM, Ing. Marcos Ortiz Valmaseda wrote: > Do you expose that performance issued caused by RAID 5? Because this is one > of our solutions here on my country to save the data of our PostgreSQL > database. Which model do you recommend ? RAID 0,RAID 1, RAID 5 or RAID 10? RAID-1 or RAID-10 are the default, mostly safe choices. For disposable dbs, RAID-0 is fine. For very large dbs with very little writing and mostly reading and on a budget, RAID-6 is ok. In most instances I never recommend RAID-5 anymore. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
Gurgel, Flavio escribió: - "Richard Neill" escreveu: Matthew Wakeling wrote: We're about to purchase a new server to store some of our old databases, and I was wondering if someone could advise me on a RAID card. We want to make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 6 because there will be zero write traffic. The priority is stuffing as much storage into a small 2U rack as possible, with performance less important. We will be running Debian Linux. People have mentioned Areca as making good RAID controllers. We're looking at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. Does anyone have an opinion on whether it is a turkey or a star? Another possibility is a 3-ware card of some description. Do you actually need a RAID card at all? It's just another point of failure: the Linux software raid (mdadm) is pretty good. Also, be very wary of RAID5 for an array that size. It is highly probable that, if one disk has failed, then during the recovery process, you may lose a second disk. The unrecoverable error rate on standard disks is about 1 in 10^14 bits; your disk array is 10^11 bits in size... We got bitten by this Richard Linux kernel software RAID is fully supported in Debian Lenny, is quite cheap to implement and powerful. I would avoid SATA disks but it's just me. SAS controllers and disks are expensive but worth every penny spent on them. Prefer RAID 1+0 over RAID 5 not only because of the risk of failure of a second disk, but I have 3 cases of performance issues caused by RAID 5. It's said that performance is not the problem but think twice because a good application tends to scale fast to several users. Of course, keep a good continuous backup strategy of your databases and don't trust just the mirroring of disks in a RAID fashion. Flavio Henrique A. Gurgel Consultor -- 4Linux tel. 55-11-2125.4765 fax. 55-11-2125.4777 www.4linux.com.br Do you expose that performance issued caused by RAID 5? Because this is one of our solutions here on my country to save the data of our PostgreSQL database. Which model do you recommend ? RAID 0,RAID 1, RAID 5 or RAID 10? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [GENERAL] Strange performance degradation
Lorenzo Allegrucci escribió: Matthew Wakeling wrote: On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote: Anyway, how can I get rid those "idle in transaction" processes? Can I just kill -15 them or is there a less drastic way to do it? Are you crazy? Sure, if you want to destroy all of the changes made to the database in that transaction and thoroughly confuse the client application, you can send a TERM signal to a backend, but the consequences to your data are on your own head. I'm not crazy, it was just a question.. Anyway, problem solved in the Django application. Matthew replied to you of that way because this is not a good manner to do this, not fot thr fact that you are crazy. You can find better ways to do this. Regards -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
On Tue, Nov 24, 2009 at 12:28 PM, Jochen Erwied wrote: > > Since I'm currently looking at upgrading my own database server, maybe some > of the experts can give a comment on one of the following controllers: > > - Promise Technology Supertrak ES4650 + additional BBU > - Adaptec RAID 5405 SGL/256 SATA/SAS + additional BBU > - Adaptec RAID 5405Z SGL/512 SATA/SAS > > My personal favourite currently is the 5405Z, since it does not require > regular battery replacements and because it has 512MB of cache. Have you searched the -performance archives for references to them? I'm not that familiar with Adaptec RAID controllers. Not requiring a battery check / replacement is nice. > Since my server only has room for four disks, I'd choose the following > one: > > - Seagate Cheetah 15K.6 147GB SAS We use the older gen 15k.5 and have been very happy with them. Nowadays it seems the fastest Seagates and Hitachis own the market for super fast drives. > Drives would be organized as RAID-0 for fast access, I do not need > terabytes of storage. So, you're willing (or forced by economics) to suffer downtime due to drive failure every so often. > The database currently is about 150 GB in size (including indexes), the > main table having a bit less than 1 billion rows (maximum will be about 2 > billion) and getting about 10-20 million updates per day, so update speed > is critical. So, assuming this means an 8 hour work day for ~20M rows, you're looking at around 700 per second. > Currently the database is running on a mdadm raid-0 with four S-ATA drives > (7.2k rpm), which was ok when the database was half this size... > > Operating System is Gentoo Linux 2.6.31-r1 on a Fujitsu Siemens Primergy > 200 S2 (2xXEON @ 1.6 GHz) with 4 GB of RAM (which also would be increased > to its maximum of 8 GB during the above update) I'd definitely test the heck out of whatever RAID card you're buying to make sure it performs well enough. For some loads and against some HW RAID cards, SW RAID might be the winner. Another option might be a JBOD box attached to the machine that holds 12 or so 2.5" 15k like the hitachi ultrastar 147G 2.5" drives. This sounds like a problem you need to be able to throw a lot of drives at at one time. Is it likely to grow much after this? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
Since I'm currently looking at upgrading my own database server, maybe some of the experts can give a comment on one of the following controllers: - Promise Technology Supertrak ES4650 + additional BBU - Adaptec RAID 5405 SGL/256 SATA/SAS + additional BBU - Adaptec RAID 5405Z SGL/512 SATA/SAS My personal favourite currently is the 5405Z, since it does not require regular battery replacements and because it has 512MB of cache. Since my server only has room for four disks, I'd choose the following one: - Seagate Cheetah 15K.6 147GB SAS Drives would be organized as RAID-0 for fast access, I do not need terabytes of storage. The database currently is about 150 GB in size (including indexes), the main table having a bit less than 1 billion rows (maximum will be about 2 billion) and getting about 10-20 million updates per day, so update speed is critical. Currently the database is running on a mdadm raid-0 with four S-ATA drives (7.2k rpm), which was ok when the database was half this size... Operating System is Gentoo Linux 2.6.31-r1 on a Fujitsu Siemens Primergy 200 S2 (2xXEON @ 1.6 GHz) with 4 GB of RAM (which also would be increased to its maximum of 8 GB during the above update) -- Jochen Erwied | home: joc...@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: j...@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erw...@vodafone.de +49-173-5404164 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
On Tue, Nov 24, 2009 at 10:23 AM, Matthew Wakeling wrote: > > We're about to purchase a new server to store some of our old databases, and > I was wondering if someone could advise me on a RAID card. We want to make a > 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 6 > because there will be zero write traffic. The priority is stuffing as much > storage into a small 2U rack as possible, with performance less important. > We will be running Debian Linux. > > People have mentioned Areca as making good RAID controllers. We're looking > at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. Does anyone > have an opinion on whether it is a turkey or a star? We run a 12xx series on our office server in RAID-6 over 8 1TB 7200RPM server class SATA drives. Our production server runs the 1680 on top of 16 15k5 seagates in RAID-10. The performance difference between these two are enormous. Things that take minutes on the production server can take hours on the office server. Production handles 1.5Million users, office handles 20 or 30 users. I've been really happy with the reliability of the 12xx card here at work. 100% uptime for a year, that machine goes down for kernel updates and only that. But it's not worked that hard all day everyday, so I can't compare its reliability with production in RAID-10 which has had one drive fail the week it was delivered and none since in 400+days. We have two hot spares there. > Another possibility is a 3-ware card of some description. They get good reviews as well. Both manufacturers have their "star" performers, and their "utility" or work group class controllers. For what you're doing the areca 12xx or 3ware 95xx series should do fine. As far as drives go we've been really happy with WD of late, they make large enterprise class SATA drives that don't pull a lot of power (green series) and fast SATA drives that pull a bit more but are faster (black series). We've used both and are quite happy with each. We use a pair of blacks to build slony read slaves and they're very fast, with write speeds of ~100MB/second and read speeds double that in linux under sw RAID-1 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
- "Richard Neill" escreveu: > Matthew Wakeling wrote: > > > > We're about to purchase a new server to store some of our old > databases, > > and I was wondering if someone could advise me on a RAID card. We > want > > to make a 6-drive SATA RAID array out of 2TB drives, and it will be > RAID > > 5 or 6 because there will be zero write traffic. The priority is > > stuffing as much storage into a small 2U rack as possible, with > > performance less important. We will be running Debian Linux. > > > > People have mentioned Areca as making good RAID controllers. We're > > looking at the "Areca ARC-1220 PCI-Express x8 SATA II" as a > possibility. > > Does anyone have an opinion on whether it is a turkey or a star? > > > > Another possibility is a 3-ware card of some description. > > > > Do you actually need a RAID card at all? It's just another point of > failure: the Linux software raid (mdadm) is pretty good. > > Also, be very wary of RAID5 for an array that size. It is highly > probable that, if one disk has failed, then during the recovery > process, > you may lose a second disk. The unrecoverable error rate on standard > disks is about 1 in 10^14 bits; your disk array is 10^11 bits in > size... > > We got bitten by this > > Richard Linux kernel software RAID is fully supported in Debian Lenny, is quite cheap to implement and powerful. I would avoid SATA disks but it's just me. SAS controllers and disks are expensive but worth every penny spent on them. Prefer RAID 1+0 over RAID 5 not only because of the risk of failure of a second disk, but I have 3 cases of performance issues caused by RAID 5. It's said that performance is not the problem but think twice because a good application tends to scale fast to several users. Of course, keep a good continuous backup strategy of your databases and don't trust just the mirroring of disks in a RAID fashion. Flavio Henrique A. Gurgel Consultor -- 4Linux tel. 55-11-2125.4765 fax. 55-11-2125.4777 www.4linux.com.br -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID card recommendation
On Nov 24, 2009, at 9:23 AM, Matthew Wakeling wrote: We're about to purchase a new server to store some of our old databases, and I was wondering if someone could advise me on a RAID card. We want to make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 6 because there will be zero write traffic. The priority is stuffing as much storage into a small 2U rack as possible, with performance less important. We will be running Debian Linux. People have mentioned Areca as making good RAID controllers. We're looking at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. Does anyone have an opinion on whether it is a turkey or a star? We've used that card and have been quite happy with it. Looking through the release notes for firmware upgrades can be pretty worrying ("you needed to fix what?!"), but we never experienced any problems ourselves, and its not like 3ware release notes are any different. But the main benefits of a RAID card are a write cache and easy hot swap. It sounds like you don't need a write cache. Can you be happy with the kernel's hotswap ability?
Re: [PERFORM] RAID card recommendation
Matthew Wakeling wrote: We're about to purchase a new server to store some of our old databases, and I was wondering if someone could advise me on a RAID card. We want to make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 6 because there will be zero write traffic. The priority is stuffing as much storage into a small 2U rack as possible, with performance less important. We will be running Debian Linux. People have mentioned Areca as making good RAID controllers. We're looking at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. Does anyone have an opinion on whether it is a turkey or a star? Another possibility is a 3-ware card of some description. Do you actually need a RAID card at all? It's just another point of failure: the Linux software raid (mdadm) is pretty good. Also, be very wary of RAID5 for an array that size. It is highly probable that, if one disk has failed, then during the recovery process, you may lose a second disk. The unrecoverable error rate on standard disks is about 1 in 10^14 bits; your disk array is 10^11 bits in size... We got bitten by this Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] RAID card recommendation
We're about to purchase a new server to store some of our old databases, and I was wondering if someone could advise me on a RAID card. We want to make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 6 because there will be zero write traffic. The priority is stuffing as much storage into a small 2U rack as possible, with performance less important. We will be running Debian Linux. People have mentioned Areca as making good RAID controllers. We're looking at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. Does anyone have an opinion on whether it is a turkey or a star? Another possibility is a 3-ware card of some description. Thanks in advance, Matthew -- Now you see why I said that the first seven minutes of this section will have you looking for the nearest brick wall to beat your head against. This is why I do it at the end of the lecture - so I can run. -- Computer Science lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
On Tuesday 24 November 2009, Thom Brown wrote: > > It's a shame there isn't a LIMIT option on DELETE so this can be done in > small batches. delete from table where pk in (select pk from table where delete_condition limit X); -- "No animals were harmed in the recording of this episode. We tried but that damn monkey was just too fast." -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance degradation
On Tue, 24 Nov 2009, Denis Lussier wrote: IMHO the client application is already confused and it's in Prod. Shouldn't he perhaps terminate/abort the IDLE connections in Prod and work on correcting the problem so it doesn't occur in Dev/Test?? The problem is, the connection isn't just IDLE - it is idle IN TRANSACTION. This means that there is quite possibly some data that has been modified in that transaction. If you kill the backend, then that will automatically roll back the transaction, and all of those changes would be lost. I agree that correcting the problem in dev/test is the priority, but I would be very cautious about killing transactions in production. You don't know what data is uncommitted. The safest thing to do may be to bounce the application, rather than Postgres. Matthew -- All of this sounds mildly turgid and messy and confusing... but what the heck. That's what programming's all about, really -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
On Tue, Nov 24, 2009 at 3:19 PM, Thom Brown wrote: > 2009/11/24 Luca Tettamanti > > On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin >> wrote: >> > You may want to consider using partitioning. That way you can drop the >> > appropriate partition and never have the overhead of a delete. >> >> Hum, I don't think it's doable in my case; the partitioning is not >> know a priori. First t1 is fully populated, then the data is loaded >> and manipulated by my application, the result is stored in t2; only >> then I want to remove (part of) the data from t1. >> >> thanks, >> Luca >> >> > It's a shame there isn't a LIMIT option on DELETE so this can be done in > small batches. > you sort of can do it, using PK on table as pointer. DELETE FROM foo USING ... etc. with subquery in using that will limit number of rows ;) > > Thom > -- GJ
Re: [PERFORM] [GENERAL] Strange performance degradation
Matthew Wakeling wrote: On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote: Anyway, how can I get rid those "idle in transaction" processes? Can I just kill -15 them or is there a less drastic way to do it? Are you crazy? Sure, if you want to destroy all of the changes made to the database in that transaction and thoroughly confuse the client application, you can send a TERM signal to a backend, but the consequences to your data are on your own head. I'm not crazy, it was just a question.. Anyway, problem solved in the Django application. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Jerry Champlin|Absolute Performance Inc.|Mobile: 303-588-2547 -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Luca Tettamanti Sent: Tuesday, November 24, 2009 6:37 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] DELETE performance problem Hello, I've run in a severe performance problem with the following statement: DELETE FROM t1 WHERE t1.annotation_id IN ( SELECT t2.annotation_id FROM t2) t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's not even unique, in fact there are duplicates - there are about 20M distinct annotation_id in this table). There are no FKs on either tables. I've killed the query after 14h(!) of runtime... I've reproduced the problem using a only the ids (extracted from the full tables) with the following schemas: test2=# \d t1 Table "public.t1" Column | Type | Modifiers ---++--- annotation_id | bigint | not null Indexes: "t1_pkey" PRIMARY KEY, btree (annotation_id) test2=# \d t2 Table "public.t2" Column | Type | Modifiers ---++--- annotation_id | bigint | Indexes: "t2_idx" btree (annotation_id) The query above takes about 30 minutes to complete. The slowdown is not as severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using procexp I see the process churning the disk and using more memory until it hits some limit (at about 1.8GB) then the IO slows down considerably. See this screenshot[1]. This is exactly what happens with the full dataset. This is the output of the explain: test2=> explain analyze delete from t1 where annotation_id in (select annotation _id from t2); QUERY PLAN - Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339 5.565..1832056.588 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) -> Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim e=0.291..179119.487 rows=45874812 loops=1) -> Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433 93.742..643393.742 rows=26185953 loops=1) -> HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) -> Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=2460.595..480446.581 rows=60956812 loops=1) Total runtime: 2271122.474 ms (7 rows) Time: 2274723,284 ms An identital linux machine (with 8.4.1) shows the same issue; with strace I see a lots of seeks: % time seconds usecs/call callserrors syscall -- --- --- - - 90.370.155484 15 10601 read 9.100.0156495216 3 fadvise64 0.390.000668 0 5499 write 0.150.000253 0 10733 lseek 0.000.00 0 3 open 0.000.00 0 3 close 0.000.00 0 3 semop -- --- --- - - 100.000.172054 26845 total (30s sample) Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the trace is the following: % time seconds usecs/call callserrors syscall -- --- --- - - 100.000.063862 0321597 read 0.000.00 0 3 lseek 0.000.00 076 mmap -- --- --- - - 100.000.063862321676 total The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory is on hardware (Dell PERC5) raid mirror, with the log on a separate array. One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit). shared_buffers = 512MB work_mem = 512MB maintenance_work_mem = 1GB checkpoint_segments = 16 wal_buffers = 8MB fsync = off # Just in case... usually it's enabled effective_cache_size = 4096MB (the machine with win2k8 is running with a smaller shared_buffers - 16MB) Any idea on what's going wrong here? thanks, Luca [1] http://img10.imageshack.us/i/psql2.png/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mai
Re: [PERFORM] DELETE performance problem
2009/11/24 Luca Tettamanti > On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin > wrote: > > You may want to consider using partitioning. That way you can drop the > > appropriate partition and never have the overhead of a delete. > > Hum, I don't think it's doable in my case; the partitioning is not > know a priori. First t1 is fully populated, then the data is loaded > and manipulated by my application, the result is stored in t2; only > then I want to remove (part of) the data from t1. > > thanks, > Luca > > It's a shame there isn't a LIMIT option on DELETE so this can be done in small batches. Thom
Re: [PERFORM] DELETE performance problem
On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin wrote: > You may want to consider using partitioning. That way you can drop the > appropriate partition and never have the overhead of a delete. Hum, I don't think it's doable in my case; the partitioning is not know a priori. First t1 is fully populated, then the data is loaded and manipulated by my application, the result is stored in t2; only then I want to remove (part of) the data from t1. thanks, Luca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] DELETE performance problem
Hello, I've run in a severe performance problem with the following statement: DELETE FROM t1 WHERE t1.annotation_id IN ( SELECT t2.annotation_id FROM t2) t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's not even unique, in fact there are duplicates - there are about 20M distinct annotation_id in this table). There are no FKs on either tables. I've killed the query after 14h(!) of runtime... I've reproduced the problem using a only the ids (extracted from the full tables) with the following schemas: test2=# \d t1 Table "public.t1" Column | Type | Modifiers ---++--- annotation_id | bigint | not null Indexes: "t1_pkey" PRIMARY KEY, btree (annotation_id) test2=# \d t2 Table "public.t2" Column | Type | Modifiers ---++--- annotation_id | bigint | Indexes: "t2_idx" btree (annotation_id) The query above takes about 30 minutes to complete. The slowdown is not as severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using procexp I see the process churning the disk and using more memory until it hits some limit (at about 1.8GB) then the IO slows down considerably. See this screenshot[1]. This is exactly what happens with the full dataset. This is the output of the explain: test2=> explain analyze delete from t1 where annotation_id in (select annotation _id from t2); QUERY PLAN - Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339 5.565..1832056.588 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) -> Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim e=0.291..179119.487 rows=45874812 loops=1) -> Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433 93.742..643393.742 rows=26185953 loops=1) -> HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) -> Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=2460.595..480446.581 rows=60956812 loops=1) Total runtime: 2271122.474 ms (7 rows) Time: 2274723,284 ms An identital linux machine (with 8.4.1) shows the same issue; with strace I see a lots of seeks: % time seconds usecs/call callserrors syscall -- --- --- - - 90.370.155484 15 10601 read 9.100.0156495216 3 fadvise64 0.390.000668 0 5499 write 0.150.000253 0 10733 lseek 0.000.00 0 3 open 0.000.00 0 3 close 0.000.00 0 3 semop -- --- --- - - 100.000.172054 26845 total (30s sample) Before hitting the memory "limit" (AS on win2k8, unsure about Linux) the trace is the following: % time seconds usecs/call callserrors syscall -- --- --- - - 100.000.063862 0321597 read 0.000.00 0 3 lseek 0.000.00 076 mmap -- --- --- - - 100.000.063862321676 total The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory is on hardware (Dell PERC5) raid mirror, with the log on a separate array. One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit). shared_buffers = 512MB work_mem = 512MB maintenance_work_mem = 1GB checkpoint_segments = 16 wal_buffers = 8MB fsync = off # Just in case... usually it's enabled effective_cache_size = 4096MB (the machine with win2k8 is running with a smaller shared_buffers - 16MB) Any idea on what's going wrong here? thanks, Luca [1] http://img10.imageshack.us/i/psql2.png/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Dynamic sql example
2009/11/24 ramasubramanian : > Dear All. > Can any one give me dynamic sql in postgres stored procedure using > "USING CLAUSE" CREATE TABLE tab(a integer); CREATE OR REPLACE FUNCTION foo(_a integer) RETURNS void AS $$ DECLARE r record; BEGIN FOR r IN EXECUTE 'SELECT * FROM tab WHERE a = $1' USING _a LOOP RAISE NOTICE '%', r.a; END LOOP; END; $$ LANGUAGE plpgsql; regards Pavel Stehule > Regards, > Ram -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Dynamic sql example
Dear All. Can any one give me dynamic sql in postgres stored procedure using "USING CLAUSE" Regards, Ram
Re: [PERFORM] [GENERAL] Strange performance degradation
On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote: Anyway, how can I get rid those "idle in transaction" processes? Can I just kill -15 them or is there a less drastic way to do it? Are you crazy? Sure, if you want to destroy all of the changes made to the database in that transaction and thoroughly confuse the client application, you can send a TERM signal to a backend, but the consequences to your data are on your own head. Fix the application, don't tell Postgres to stop being a decent database. Matthew -- I would like to think that in this day and age people would know better than to open executables in an e-mail. I'd also like to be able to flap my arms and fly to the moon.-- Tim Mullen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance