Re: [PERFORM] SSD + RAID
Laszlo Nagy wrote: Hello, I'm about to buy SSD drive(s) for a database. For decision making, I used this tech report: http://techreport.com/articles.x/16255/9 http://techreport.com/articles.x/16255/10 Here are my concerns: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy. * I could buy two X25-E drives and have 32GB disk space, and some redundancy. This would cost about $1600, not counting the RAID controller. It is on the edge. This was the solution I went with (4 drives in a raid 10 actually). Not a cheap solution, but the performance is amazing. * I could also buy many cheaper MLC SSD drives. They cost about $140. So even with 10 drives, I'm at $1400. I could put them in RAID6, have much more disk space (256GB), high redundancy and POSSIBLY good read/write speed. Of course then I need to buy a good RAID controller. My question is about the last option. Are there any good RAID cards that are optimized (or can be optimized) for SSD drives? Do any of you have experience in using many cheaper SSD drives? Is it a bad idea? Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Shouldn't the planner have a higher cost for reverse index scans?
Josh Berkus wrote: Tom, Right, because they do. If you think otherwise, demonstrate it. (bonnie tests approximating a reverse seqscan are not relevant to the performance of indexscans.) Working on it. I *think* I've seen this issue in the field, which is why I brought it up in the first place, but getting a good test case is, of course, difficult. I think I may be experiencing this situation now. The query select comment_date from user_comments where user_comments.uid=1 order by comment_date desc limit 1 Explain: Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=52848.785..52848.787 rows=1 loops=1) - Index Scan Backward using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=52848.781..52848.781 rows=1 loops=1) Filter: (uid = 1) Total runtime: 52848.840 ms takes 10's of seconds to complete (52 sec last run). However select comment_date from user_comments where user_comments.uid=1 order by comment_date limit 1 Explain: Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=70.402..70.403 rows=1 loops=1) - Index Scan using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=70.398..70.398 rows=1 loops=1) Filter: (uid = 1) Total runtime: 70.453 ms takes well under 1 sec. reply_date is a timestamp with time zone and has the index CREATE INDEX idx_user_comments_comment_date ON user_comments USING btree (comment_date); I don't understand why it is so much slower to scan it reverse It's a fairly big table. About 4.4 million rows, 888MB. That index is 96MB. I tried dropping and recreating the index, but it doesn't seem to have helped any. Can I create a reverse index on the dates so it can do a forward scan of the reverse index?
Re: [PERFORM] Shouldn't the planner have a higher cost for reverse index scans?
Tom Lane wrote: Lists li...@on-track.ca writes: The query select comment_date from user_comments where user_comments.uid=1 order by comment_date desc limit 1 Explain: Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=52848.785..52848.787 rows=1 loops=1) - Index Scan Backward using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=52848.781..52848.781 rows=1 loops=1) Filter: (uid = 1) Total runtime: 52848.840 ms takes 10's of seconds to complete (52 sec last run). However select comment_date from user_comments where user_comments.uid=1 order by comment_date limit 1 Explain: Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=70.402..70.403 rows=1 loops=1) - Index Scan using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=70.398..70.398 rows=1 loops=1) Filter: (uid = 1) Total runtime: 70.453 ms takes well under 1 sec. AFAICS this is pure chance --- it is based on when we happen to hit the first row with uid = 1 while scanning in forward or reverse comment_date order. Unless you have evidence that the number of rows skipped over is similar in both cases, there is no reason to suppose that this example bears on Josh's concern. As noted by Merlin, if you're willing to create another index to help this type of query, then a two-column index on (uid, comment_date) would be ideal. regards, tom lane Thank you Tom and Merlin (and Grzegorz for the answer to my other question I no longer need). The composite index seems to do the trick. The reverse index scan is now taking about the same time. Rows with uid=1 should be spread throughout the table but there should be a larger amount earlier in the table (based on insert order). I already had a separate index on uid CREATE INDEX idx_user_comments_uid ON user_comments USING btree (uid); Under the circumstances, shouldn't a bitmap of those 2 indexes be far faster than using just the date index (compared to the old plan, not the new composite index). Why would the planner not choose that plan?
Re: [PERFORM] Best replication solution?
I'm currently running 32bit FreeBSD so I can't really add more ram (PAE doesn't work well under FreeBSD from what I've read) and there are enough writes that more ram won't solve the problem completely. However I will add plenty more ram next time I rebuild it. Heikki Linnakangas wrote: Lists wrote: Server is a dual core xeon 3GB ram and 2 mirrors of 15k SAS drives (1 for most data, 1 for wal and a few tables and indexes) In total all databases on the server are about 10G on disk (about 2GB in pgdump format). I'd suggest buying as much RAM as you can fit into the server. RAM is cheap, and with a database of that size more cache could have a dramatic effect. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best replication solution?
Andrew Sullivan wrote: On Sun, Apr 05, 2009 at 11:36:33AM -0700, Lists wrote: *Slony-I* - I've used this in the past, but it's a huge pain to work with, caused serious performance issues under heavy load due to long running transactions (may not be the case anymore, it's been a while since I used it on a large database with many writes), and doesn't seem very reliable (I've had replication break on me multiple times). It is indeed a pain to work with, but I find it hard to believe that it is the actual source of performance issues. What's more likely true is that it wasn't tuned to your write load -- that _will_ cause performance issues. Can you point me in the direction of the documentation for tuning it? I don't see anything in the documentation for tuning for write load. Of course, tuning it is a major pain, as mentioned. I'm also somewhat puzzled by the claim of unreliability: most of the actual replication failures I've ever seen under Slony are due to operator error (these are trivial to induce, alas -- aforementioned pain to work with again). Recently I had a problem with duplicate key errors on the slave, which shouldn't be possible since they keys are the same. I've just noticed in the documentation that The Duplicate Key Violation http://www.slony.info/documentation/faq.html#DUPKEY bug has helped track down a number of rather obscure PostgreSQL race conditions, so that in modern versions of Slony-I and PostgreSQL, there should be little to worry about. so that may no longer be an issue. However I experienced with this the latest Slony (as of late last year) and Postgresql 8.3. Also the dupe key error linked appears to be duplicate key of slony meta-data were as this was a duplicate key of one of my table's primary key. Slony is baroque and confusing, but it's specifically designed to fail in safe ways (which is not true of some of the other systems: several of them have modes in which it's possible to have systems out of sync with each other, but with no way to detect as much. IMO, that's much worse, so we designed Slony to fail noisily if it was going to fail at all). An error is better than silently failing, but of course neither is optimal. The slony project could really benefit from a simpler user interface and simpler documentation. It's integration into pgadminIII is a good step, but even with that it is still a bit of a pain so I hope it continues to improve in ease of use. Being powerful and flexable is good, but ease of use with sensible defaults for complex items that can be easily overridden is even better. *Mammoth Replicator* - This is open source now, is it any good? It sounds like it's trigger based like Slony. Is it based on Slony, or simply use a similar solution? It's completely unrelated, and it doesn't use triggers. I think the people programming it are first-rate. Last I looked at it, I felt a little uncomfortable with certain design choices, which seemed to me to be a little hacky. They were all on the TODO list, though. *SkyTools/Londiste* - Don't know anything special about it. I've been quite impressed by the usability. It's not quite as flexible as Slony, but it has the same theory of operation. The documentation is not as voluminous, although it's also much handier as reference material than Slony's (which is, in my experience, a little hard to navigate if you don't already know the system pretty well). A Thanks, I'll look into both of those as well.
[PERFORM] Best replication solution?
I am looking to setup replication of my postgresql database, primarily for performance reasons. The searching I've done shows a lot of different options, can anyone give suggestions about which one(s) are best? I've read the archives, but there seems to be more replication solutions since the last thread on this subject and it seems to change frequently. I'd really like a solution that replicates DDL, but very few do so I think I'm out of luck for that. I can live without it. Multi-master support would be nice too, but also seems to cause too many problems so it looks like I'll have to do without it too. *Slony-I* - I've used this in the past, but it's a huge pain to work with, caused serious performance issues under heavy load due to long running transactions (may not be the case anymore, it's been a while since I used it on a large database with many writes), and doesn't seem very reliable (I've had replication break on me multiple times). *Mammoth Replicator* - This is open source now, is it any good? It sounds like it's trigger based like Slony. Is it based on Slony, or simply use a similar solution? *pgpool* - Won't work for us reliably for replication because we have some triggers and stored procedures that write data. *PGCluster* - Sounds cool, but based on the mailing list traffic and the last news post on the site being from 2005, development seems to be near dead. Also, no releases seems to make it beyond the RC stage -- for multi-master stability is particularly important for data integrity. *PGReplicator - *Don't know anything special about it. * Bucardo* - Don't know anything special about it. *Postgres-R* - Don't know anything special about it. *SkyTools/Londiste* - Don't know anything special about it.
Re: [PERFORM] Best replication solution?
I have a high traffic database with high volumes of reads, and moderate volumes of writes. Millions of queries a day. Running the latest version of Postgresql 8.2.x (I want to upgrade to 8.3, but the dump/reload requires an unacceptable amount of downtime) Server is a dual core xeon 3GB ram and 2 mirrors of 15k SAS drives (1 for most data, 1 for wal and a few tables and indexes) In total all databases on the server are about 10G on disk (about 2GB in pgdump format). The IO on the disks is being maxed out and I don't have the budget to add more disks at this time. The web server has a raid10 of sata drives with some io bandwidth to spare so I would like to replicate all data over, and send some read queries to that server -- in particular the very IO intensive FTI based search queries. ries van Twisk wrote: Dr Mr No Name, what replication solution is the best depends on your requirements. May be you can tell a bit more what your situation is? Since you didn't gave us to much information about your requirements it's hard to give you any advice. Ries On Apr 5, 2009, at 1:36 PM, Lists wrote: I am looking to setup replication of my postgresql database, primarily for performance reasons. The searching I've done shows a lot of different options, can anyone give suggestions about which one(s) are best? I've read the archives, but there seems to be more replication solutions since the last thread on this subject and it seems to change frequently. I'd really like a solution that replicates DDL, but very few do so I think I'm out of luck for that. I can live without it. Multi-master support would be nice too, but also seems to cause too many problems so it looks like I'll have to do without it too. *Slony-I* - I've used this in the past, but it's a huge pain to work with, caused serious performance issues under heavy load due to long running transactions (may not be the case anymore, it's been a while since I used it on a large database with many writes), and doesn't seem very reliable (I've had replication break on me multiple times). *Mammoth Replicator* - This is open source now, is it any good? It sounds like it's trigger based like Slony. Is it based on Slony, or simply use a similar solution? *pgpool* - Won't work for us reliably for replication because we have some triggers and stored procedures that write data. *PGCluster* - Sounds cool, but based on the mailing list traffic and the last news post on the site being from 2005, development seems to be near dead. Also, no releases seems to make it beyond the RC stage -- for multi-master stability is particularly important for data integrity. *PGReplicator - *Don't know anything special about it. * Bucardo* - Don't know anything special about it. *Postgres-R* - Don't know anything special about it. *SkyTools/Londiste* - Don't know anything special about it.
[PERFORM] Optmal tags design?
I am planning to add a tags (as in the web 2.0 thing) feature to my web based application. I would like some feedback from the experts here on what the best database design for that would be. The possibilities I have come up with are: * A tags table containing the tag and id number of what it links to. select pid from tags where tag='bla' select tag from tags where pid=xxx. * a tags table where each tag exists only once, and a table with the tag ID and picture ID to link them together. select pid from tags inner join picture_tags using(tag_id) where tag='bla' select tag from tags inner join picture_tags using(tag_id) where pid='xxx' * A full text index in the picture table containing the tags select pid from pictures where tags @@ to_tsquery('bla') (or the non-fti version) select pid from pictures where tags ~* '.*bla.*' select tags from pictures where pid=xxx; ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Adding disks/xlog index
I have a busy postgresql server running running on a raid1 of 2 15k rpm scsi drives. I have been running into the problem of maxed out IO bandwidth. I would like to convert my raid1 into a raid10 but that would require a full rebuild which is more downtime than I want so I am looking into other alternatives. The best one I have come up with is moving the xlog/wal (can someone confirm whether these are the same thing?) to another physical drive. I also think it may be beneficial to move some indexes to another drive as well (same one as xlog). Some questions on this: 1. Can the database survive loss/corruption of the xlog and indexes in a recoverable way? To save money (and because I won't need the throughput as much), I am thinking on making this index/wal/xlog drive a single cheap sata drive (or maybe a non-raided 15k scsi for 60% more money). However without the redundancy of a mirror I am concerned about drive failure. Loss of several mins of recent transactions in a serious crash is acceptable to be, but full/serious database corruption (the likes of fsync off) is not. 2. Is there any point using a high performance (ie scsi) disk for this, or would the mirror containing the majority of the data still be the major bottleneck causing the disk usage to not exceed sata performance anyway? 3. Is there any easy way to move ALL indexes to another drive? Is this a good performance idea or would they just bottleneck each other seriously? Other info for reference Running postgresql 8.2 on FreeBSD 6.1 server is a core2 with 4gb of ram. CPU usage is moderate. Also, can anyone recommend a good shared_buffers size? The server is dedicated to postgres except for half a gig used by memcached. Right now I have it set at 51200 which may be too high (I've read varying suggestions with this and I'm not sure how aggressive FreeBSD6's IO cache is). And any suggestions on what effective_cache_size I should use on this hardware and OS? I've been using 384MB but I don't know if this is optimal or not. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Trigger performance problem
---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql
This must be a linux'ism because to my knowledge FreeBSD does not keep the os-cache mapped into the kernel address space unless it have active objects associated with the data. And FreeBSD also have a default split of 3GB userspace and 1GB. kernelspace when running with a default configuration. Linux people might want to try other os'es to compare the performance. Best regards, Nicolai Petri Ps. Sorry for my lame MS mailer - quoting is not something it knows how to do. :) - Original Message - From: William Yu [EMAIL PROTECTED] I inferred this from reading up on the compressed vm project. It can be higher or lower depending on what devices you have in your system -- however, I've read messages from kernel hackers saying Linux is very aggressive in reserving memory space for devices because it must be allocated at boottime. Josh Berkus wrote: William, The theshold for using PAE is actually far lower than 4GB. 4GB is the total memory address space -- split that in half for 2GB for userspace, 2GB for kernel. The OS cache resides in kernel space -- after you take alway the memory allocation for devices, you're left with a window of roughly 900MB. I'm curious, how do you get 1.1GB for memory allocation for devices? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html