Re: [PERFORM] Very important choice
Lago, Bruno Almeida do wrote: Hello my friends, I'd like to know (based on your experience and technical details) which OS is recommended for running PostgreSQL keeping in mind 3 indicators: 1 - Performance (SO, Network and IO) 2 - SO Stability 3 - File System Integrity The short answer is almost certainly whichever OS you are most familiar with. If you have a problem, you don't want to be learning new details about your OS while fixing it. That rules out FreeBSD for now. What hardware you want to use will affect performance and choice of OS. You'll need to decide what hardware you're looking to use. As far as file-systems are concerned, ext3 seems to be the slowest, and the general feeling seems to be that XFS is perhaps the fastest. In terms of reliability, avoid cutting-edge releases of any file-system - let others test them for you. One thing to consider is how long it takes to recover from a crash - you can run PostgreSQL on ext2, but checking a large disk can take hours after a crash. That's the real benefit of journalling for PG - speed of recovery. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote: You might look at Opteron's, which theoretically have a higher data bandwidth. If you're doing anything data intensive, like a sort in memory, this could make a difference. Would Opteron systems need 64-bit postgresql (and os, gcc, ...) build to have that advantage? Well, that would give you the most benefit, but the memory bandwidth is still greater than on a Xeon. There's really no issue with 64 bit if you're using open source software; it all compiles for 64 bits and you're good to go. http://stats.distributed.net runs on a dual opteron box running FreeBSD and I've had no issues. RAID10 will be faster than RAID1. Sorry Jim, by RAID10 you mean several raid1 arrays mounted on different linux partitions? Or several raid1 arrays that build up a raid0 array? In the latter case, who decides which data goes in which raid1 array? Raid Adapter? You should take a look around online for a description of raid types. There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored drives (a RAID 0 built out of RAID 1s), the other is a mirror of two RAID 0s. The former is much better; if you're lucky you can lose half your drives without any data loss (if each dead drive is part of a different mirror). Recovery is also faster. You'll almost certainly be much happier with hardware raid instead of software raid. stats.distributed.net runs a 3ware controller and SATA drives. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Performance of count(*) on large tables vs SQL Server
Doing some rather crude comparative performance tests between PG 8.0.1 on Windows XP and SQL Server 2000, PG whips SQL Server's ass on insert into junk (select * from junk) on a one column table defined as int. If we start with a 1 row table and repeatedly execute this command, PG can take the table from 500K rows to 1M rows in 20 seconds; SQL Server is at least twice as slow. BUT... SQL Server can do select count(*) on junk in almost no time at all, probably because this query can be optimised to go back and use catalogue statistics. PG, on the other hand, appears to do a full table scan to answer this question, taking nearly 4 seconds to process the query. Doing an ANALYZE on the table and also VACUUM did not seem to affect this. Can PG find a table's row count more efficiently?. This is not an unusual practice in commercial applications which assume that count(*) with no WHERE clause will be a cheap query - and use it to test if a table is empty, for instance. (because for Oracle/Sybase/SQL Server, count(*) is cheap). (sure, I appreciate there are other ways of doing this, but I am curious about the way PG works here). ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
To be honest I've used compaq, dell and LSI SCSI RAID controllers and got pretty pathetic benchmarks from all of them. The best system I have is the one I just built: 2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives: 2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10. 2x3ware (now AMCC) Escalade 9500S-8MI. This system with fsync on has managed 2500 insert transactions/sec (granted they are simple transactions, but still). RAID 10 is a stripe of mirrors. RAID 10 give you the best read and write performance combined. RAID 5 gives very bad write perfomance, but good read performance. With RAID 5 you can only loose a single drive and rebuild times are slow. RAID 10 can loose up to have the array depending on which drives without loosing data. I would be interested in starting a site listing RAID benchmarks under linux. If anyone is interested let me know. I would be interested in at least some bonnie++ benchmarks, and perhaps other if people would like. Alex Turner NetEconomist On Tue, 1 Feb 2005 05:27:27 -0600, Jim C. Nasby [EMAIL PROTECTED] wrote: On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote: You might look at Opteron's, which theoretically have a higher data bandwidth. If you're doing anything data intensive, like a sort in memory, this could make a difference. Would Opteron systems need 64-bit postgresql (and os, gcc, ...) build to have that advantage? Well, that would give you the most benefit, but the memory bandwidth is still greater than on a Xeon. There's really no issue with 64 bit if you're using open source software; it all compiles for 64 bits and you're good to go. http://stats.distributed.net runs on a dual opteron box running FreeBSD and I've had no issues. RAID10 will be faster than RAID1. Sorry Jim, by RAID10 you mean several raid1 arrays mounted on different linux partitions? Or several raid1 arrays that build up a raid0 array? In the latter case, who decides which data goes in which raid1 array? Raid Adapter? You should take a look around online for a description of raid types. There's technically RAID0+1 and RAID1+0; one is a stripe of mirrored drives (a RAID 0 built out of RAID 1s), the other is a mirror of two RAID 0s. The former is much better; if you're lucky you can lose half your drives without any data loss (if each dead drive is part of a different mirror). Recovery is also faster. You'll almost certainly be much happier with hardware raid instead of software raid. stats.distributed.net runs a 3ware controller and SATA drives. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
Hi all, 1) What kind of performance gain can I expect switching from 7.1 to 7.4 (or 8.0)? Obviously I'm doing my own testing, but I'm not very impressed by 8.0 speed, may be I'm doing testing on a low end server... 8.0 gives you savepoints. While this may not seem like a big deal at first, the ability to handle exceptions inside pl/pgsql functions gives you much more flexibility to move code into the server. Also, recent versions of pl/pgsql give you more flexibility with cursors, incuding returning them outside of the function. Corollary: use pl/pgsql. It can be 10 times or more faster than query by query editing. You also have the parse/bind interface. This may not be so easily to implement in your app, but if you are machine gunning your server with queries, use parameterized prepared queries and reap 50% + performance, meaning lower load and quicker transaction turnaround time. Merlin ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Why the difference in query plan and performance pg 7.4.6?
Hi all, I have a freshly vacuumed table with 1104379 records with a index on zipcode. Can anyone explain why the queries go as they go, and why the performance differs so much (1 second versus 64 seconds, or stated differently, 1 records per second versus 1562 records per second) and why the query plan of query 2 ignores the index? For completeness sake I also did a select ordernumber without any ordering. That only took 98 second for 1104379 record (11222 record per second, compariable with the first query as I would have expected). Query 1: select a.ordernumer from orders a order by a.zipcode limit 1 Explain: QUERY PLAN Limit (cost=0.00..39019.79 rows=1 width=14) - Index Scan using orders_postcode on orders a (cost=0.00..4309264.07 rows=1104379 width=14) Running time: 1 second Query 2: select a.ordernumer from orders a order by a.zipcode limit 10 Explain: QUERY PLAN Limit (cost=207589.75..207839.75 rows=10 width=14) - Sort (cost=207589.75..210350.70 rows=1104379 width=14) Sort Key: postcode - Seq Scan on orders a (cost=0.00..46808.79 rows=1104379 width=14) Running time: 64 seconds Query 3: select a.ordernumer from orders a QUERY PLAN Seq Scan on orders a (cost=0.00..46808.79 rows=1104379 width=4) Running time: 98 seconds Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Why the difference in query plan and performance pg
Joost Kraaijeveld wrote: Hi all, I have a freshly vacuumed table with 1104379 records with a index on zipcode. Can anyone explain why the queries go as they go, and why the performance differs so much (1 second versus 64 seconds, or stated differently, 1 records per second versus 1562 records per second) and why the query plan of query 2 ignores the index? Indexes are generally only faster if you are grabbing 10% of the table. Otherwise you have the overhead of loading the index into memory, and then paging through it looking for the entries. With 100,000 entries a sequential scan is actually likely to be faster than an indexed one. If you try: select a.ordernumer from orders a order by a.zipcode how long does it take? You can also try disabling sequential scan to see how long Query 2 would be if you used indexing. Remember, though, that because of caching, a repeated index scan may seem faster, but in actual production, that index may not be cached, depending on what other queries are done. John =:- For completeness sake I also did a select ordernumber without any ordering. That only took 98 second for 1104379 record (11222 record per second, compariable with the first query as I would have expected). Query 1: select a.ordernumer from orders a order by a.zipcode limit 1 Explain: QUERY PLAN Limit (cost=0.00..39019.79 rows=1 width=14) - Index Scan using orders_postcode on orders a (cost=0.00..4309264.07 rows=1104379 width=14) Running time: 1 second Query 2: select a.ordernumer from orders a order by a.zipcode limit 10 Explain: QUERY PLAN Limit (cost=207589.75..207839.75 rows=10 width=14) - Sort (cost=207589.75..210350.70 rows=1104379 width=14) Sort Key: postcode - Seq Scan on orders a (cost=0.00..46808.79 rows=1104379 width=14) Running time: 64 seconds Query 3: select a.ordernumer from orders a QUERY PLAN Seq Scan on orders a (cost=0.00..46808.79 rows=1104379 width=4) Running time: 98 seconds Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 8: explain analyze is your friend signature.asc Description: OpenPGP digital signature
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
Merlin Moncure wrote: Corollary: use pl/pgsql. It can be 10 times or more faster than query by query editing. Merlin, thanks for your good suggestions. By now, our system has never used stored procedures approach, due to the fact that we're staying on the minimum common SQL features that are supported by most db engines. I realize though that it would provide an heavy performance boost. You also have the parse/bind interface This is something I have already engineered in our core classes (that use DBI + DBD::Pg), so that switching to 8.0 should automatically enable the single-prepare, multiple-execute behavior, saving a lot of query planner processing, if I understand correctly. -- Cosimo ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
Alex Turner wrote: To be honest I've used compaq, dell and LSI SCSI RAID controllers and got pretty pathetic benchmarks from all of them. I also have seen average-low results for LSI (at least the 1020 card). 2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives: 2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10. 2x3ware (now AMCC) Escalade 9500S-8MI. Thanks, this is precious information. I would be interested in starting a site listing RAID benchmarks under linux. If anyone is interested let me know. I would be interested in at least some bonnie++ benchmarks, and perhaps other if people would like. I have used also tiobench [http://tiobench.sourceforge.net/] Any experience with it? -- Cosimo ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] horizontal partition
Hi all, I have a big table with ~ 10 Milion rows, and is a very pain administer it, so after years I convinced my self to partition it and replace the table usage ( only for reading ) with a view. Now my user_logs table is splitted in 4: user_logs user_logs_2002 user_logs_2003 user_logs_2004 and the view v_user_logs is builded on top of these tables: CREATE OR REPLACE VIEW v_user_logs AS SELECT * FROM user_logs UNION ALL SELECT * FROM user_logs_2002 UNION ALL SELECT * FROM user_logs_2003 UNION ALL SELECT * FROM user_logs_2004 ; the view is performing really well: empdb=# explain analyze select * from v_user_logs where id_user = sp_id_user('kalman'); QUERY PLAN Subquery Scan v_user_logs (cost=0.00..895.45 rows=645 width=88) (actual time=17.039..2345.388 rows=175 loops=1) - Append (cost=0.00..892.23 rows=645 width=67) (actual time=17.030..2344.195 rows=175 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..120.70 rows=60 width=67) (actual time=17.028..17.036 rows=1 loops=1) - Index Scan using idx_user_user_logs on user_logs (cost=0.00..120.40 rows=60 width=67) (actual time=17.012..17.018 rows=1 loops=1) Index Cond: (id_user = 4185) - Subquery Scan *SELECT* 2 (cost=0.00..475.44 rows=316 width=67) (actual time=49.406..1220.400 rows=79 loops=1) - Index Scan using idx_user_user_logs_2004 on user_logs_2004 (cost=0.00..473.86 rows=316 width=67) (actual time=49.388..1219.386 rows=79 loops=1) Index Cond: (id_user = 4185) - Subquery Scan *SELECT* 3 (cost=0.00..204.33 rows=188 width=67) (actual time=59.375..1068.806 rows=95 loops=1) - Index Scan using idx_user_user_logs_2003 on user_logs_2003 (cost=0.00..203.39 rows=188 width=67) (actual time=59.356..1067.934 rows=95 loops=1) Index Cond: (id_user = 4185) - Subquery Scan *SELECT* 4 (cost=0.00..91.75 rows=81 width=67) (actual time=37.623..37.623 rows=0 loops=1) - Index Scan using idx_user_user_logs_2002 on user_logs_2002 (cost=0.00..91.35 rows=81 width=67) (actual time=37.618..37.618 rows=0 loops=1) Index Cond: (id_user = 4185) Total runtime: 2345.917 ms (15 rows) the problem is now if this view is used in others views like this: CREATE OR REPLACE VIEW v_ua_user_login_logout_tmp AS SELECT u.login, ul.* FROM user_login u, v_user_logs ul WHERE u.id_user = ul.id_user ; empdb=# explain analyze select * from v_ua_user_login_logout_tmp where login = 'kalman'; QUERY PLAN - Hash Join (cost=4.01..228669.81 rows=173 width=100) (actual time=1544.784..116490.363 rows=175 loops=1) Hash Cond: (outer.id_user = inner.id_user) - Subquery Scan ul (cost=0.00..193326.71 rows=7067647 width=88) (actual time=5.677..108190.096 rows=7067831 loops=1) - Append (cost=0.00..157988.47 rows=7067647 width=67) (actual time=5.669..77109.995 rows=7067831 loops=1) - Subquery Scan *SELECT* 1 (cost=0.00..8158.48 rows=362548 width=67) (actual time=5.666..3379.178 rows=362862 loops=1) - Seq Scan on user_logs (cost=0.00..6345.74 rows=362548 width=67) (actual time=5.645..1395.673 rows=362862 loops=1) - Subquery Scan *SELECT* 2 (cost=0.00..93663.88 rows=4191588 width=67) (actual time=9.149..35094.798 rows=4191580 loops=1) - Seq Scan on user_logs_2004 (cost=0.00..72705.94 rows=4191588 width=67) (actual time=9.117..16531.486 rows=4191580 loops=1) - Subquery Scan *SELECT* 3 (cost=0.00..44875.33 rows=2008233 width=67) (actual time=0.562..24017.680 rows=2008190 loops=1) - Seq Scan on user_logs_2003 (cost=0.00..34834.17 rows=2008233 width=67) (actual time=0.542..13224.265 rows=2008190 loops=1) - Subquery Scan *SELECT* 4 (cost=0.00..11290.78 rows=505278 width=67) (actual time=7.100..3636.163 rows=505199 loops=1) - Seq Scan on user_logs_2002 (cost=0.00..8764.39 rows=505278 width=67) (actual time=6.446..1474.709 rows=505199 loops=1) - Hash (cost=4.00..4.00 rows=1 width=16) (actual time=0.083..0.083 rows=0 loops=1) - Index Scan using user_login_login_key on user_login u (cost=0.00..4.00 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1) Index Cond: ((login)::text = 'kalman'::text) Total runtime: 116491.056 ms (16 rows) as you can see the index scan is not used anymore. Do
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
Jim C. Nasby wrote: On Tue, Feb 01, 2005 at 07:35:35AM +0100, Cosimo Streppone wrote: You might look at Opteron's, which theoretically have a higher data bandwidth. If you're doing anything data intensive, like a sort in memory, this could make a difference. Would Opteron systems need 64-bit postgresql (and os, gcc, ...) build to have that advantage? Well, that would give you the most benefit, but the memory bandwidth is still greater than on a Xeon. There's really no issue with 64 bit if you're using open source software; it all compiles for 64 bits and you're good to go. http://stats.distributed.net runs on a dual opteron box running FreeBSD and I've had no issues. You can get 64-bit Xeons also but it takes hit in the I/O department due to the lack of a hardware I/O MMU which limits DMA transfers to addresses below 4GB. This has a two-fold impact: 1) transfering data to 4GB require first a transfer to 4GB and then a copy to the final destination. 2) You must allocate real memory 2X the address space of the devices to act as bounce buffers. This is especially problematic for workstations because if you put a 512MB Nvidia card in your computer for graphics work -- you've just lost 1GB of memory. (I dunno how much the typical SCSI/NIC/etc take up.) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] High end server and storage for a PostgreSQL OLTP system
None - but I'll definately take a look.. Alex Turner NetEconomist On Tue, 01 Feb 2005 22:11:30 +0100, Cosimo Streppone [EMAIL PROTECTED] wrote: Alex Turner wrote: To be honest I've used compaq, dell and LSI SCSI RAID controllers and got pretty pathetic benchmarks from all of them. I also have seen average-low results for LSI (at least the 1020 card). 2xOpteron 242, Tyan S2885 MoBo, 4GB Ram, 14xSATA WD Raptor drives: 2xRaid 1, 1x4 disk Raid 10, 1x6 drive Raid 10. 2x3ware (now AMCC) Escalade 9500S-8MI. Thanks, this is precious information. I would be interested in starting a site listing RAID benchmarks under linux. If anyone is interested let me know. I would be interested in at least some bonnie++ benchmarks, and perhaps other if people would like. I have used also tiobench [http://tiobench.sourceforge.net/] Any experience with it? -- Cosimo ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly