Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++
On 17/08/2011 7:26 PM, Ogden wrote: I am using bonnie++ to benchmark our current Postgres system (on RAID 5) with the new one we have, which I have configured with RAID 10. The drives are the same (SAS 15K). I tried the new system with ext3 and then XFS but the results seem really outrageous as compared to the current system, or am I reading things wrong? The benchmark results are here: http://malekkoheavyindustry.com/benchmark.html The results are not completely outrageous, however you don't say what drives, how many and what RAID controller you have in the current and new systems. You might expect that performance from 10/12 disks in RAID 10 with a good controller. I would say that your current system is outrageous in that is is so slow! Cheers, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++
On 17/08/2011 7:56 PM, Ogden wrote: On Aug 17, 2011, at 1:33 PM, Gary Doades wrote: On 17/08/2011 7:26 PM, Ogden wrote: I am using bonnie++ to benchmark our current Postgres system (on RAID 5) with the new one we have, which I have configured with RAID 10. The drives are the same (SAS 15K). I tried the new system with ext3 and then XFS but the results seem really outrageous as compared to the current system, or am I reading things wrong? The benchmark results are here: http://malekkoheavyindustry.com/benchmark.html The results are not completely outrageous, however you don't say what drives, how many and what RAID controller you have in the current and new systems. You might expect that performance from 10/12 disks in RAID 10 with a good controller. I would say that your current system is outrageous in that is is so slow! Cheers, Gary. Yes, under heavy writes the load would shoot right up which is what caused us to look at upgrading. If it is the RAID 5, it is mind boggling that it could be that much of a difference. I expected a difference, now that much. The new system has 6 drives, 300Gb 15K SAS and I've put them into a RAID 10 configuration. The current system is ext3 with RAID 5 over 4 disks on a Perc/5i controller which has half the write cache as the new one (256 Mb vs 512Mb). Hmm... for only 6 disks in RAID 10 I would say that the figures are a bit higher than I would expect. The PERC 5 controller is pretty poor in my opinion, PERC 6 a lot better and the new H700's pretty good. I'm guessing you have a H700 in your new system. I've just got a Dell 515 with a H700 and 8 SAS in RAID 10 and I only get around 600 MB/s read using ext4 and Ubuntu 10.4 server. Like I say, your figures are not outrageous, just unexpectedly good :) Cheers, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 07/12/2010 7:43 PM, Andy Colson wrote: On 12/7/2010 1:22 PM, Justin Pitts wrote: Also, as a fair warning: mssql doesn't really care about transactions, but PG really does. Make sure all your code is properly starting and commiting transactions. -Andy I do not understand that statement. Can you explain it a bit better? In mssql you can write code that connects to the db, fire off updates and inserts, and then disconnects. I believe mssql will keep all your changes, and the transaction stuff is done for you. In PG the first statement you fire off (like an insert into for example) will start a transaction. If you dont commit before you disconnect that transaction will be rolled back. Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. As far as I know both MS SQL and and Postgres work just the same as regards explicit and implicit (autocommit) transactions, only the underlying storage/logging mechanisms are different. Transactions shouldn't make ay real difference to the select/join performance being complained about though. It's already stated that the insert performance of postgres far exceeds SQL Server, which is my experience also. As already suggested, until we see the exact table definitions including indexes etc. there's no real way to tell what the problem is. How many rows are in the second table? It really shouldn't take that much time to read 1000 rows unless you have a bizarrely slow hard disk. It would be nice to eliminate any programmatic or driver influence too. How does the SQL select execute in enterprise manager for mssql and psql or pgadmin for postgres? Cheers, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 07/12/2010 9:29 PM, Tom Polak wrote: From EXPLAIN ANALYZE I can see the query ran much faster. Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual time=0.036..4.679 rows=1001 loops=1) Join Filter: (pgtemp1.state = pgtemp2.stateid) - Seq Scan on pgtemp1 (cost=0.00..122.01 rows=1001 width=788) (actual time=0.010..0.764 rows=1001 loops=1) - Materialize (cost=0.00..1.01 rows=1 width=510) (actual time=0.000..0.001 rows=1 loops=1001) - Seq Scan on pgtemp2 (cost=0.00..1.01 rows=1 width=510) (actual time=0.006..0.008 rows=1 loops=1) Total runtime: 5.128 ms The general question comes down to, can I expect decent perfomance from Postgresql compared to MSSQL. I was hoping that Postgresql 9.0 beat MSSQL 2000 since MS 2000 is over 10 years old. So postgres actually executed the select in around 5 miiliseconds. Pretty good I would say. The problem therefore lies not with postgres itself, but what is done with the results afterwards? Assuming that this is pure local and therefore no network issues, perhaps there is a performance issue in this case with the Npgsql driver? Someone who knows more about this driver could perhaps shed some light on this? I have used .NET (C#) with postgres before, but only using the odbc driver. Perhaps you could try that instead (using OdbcCommand, OdbcDataReader etc.). I mainly use ruby (jruby) with postgres both under linux and Windows, but I can certainly process 1000 records of similar structure in well under 1 second. Cheers, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizer showing wrong rows in plan
On 28/03/2010 10:07 AM, Tadipathri Raghu wrote: Hi All, I want to give some more light on this by analysing more like this 1. In my example I have created a table with one column as INT( which occupies 4 bytes) 2. Initially it occupies one page of space on the file that is (8kb). So, here is it assuming these many rows may fit in this page. Clarify me on this Please. Like I said, it's just a guess. With no statistics all postgres can do is guess, or in this case use the in-built default for a newly created table. It could guess 1 or it could guess 10,000,000. What it does is produce a reasonable guess in the absence of any other information. You should read the postgres documentation for further information about statistics and how the optimizer uses them. Regards, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB is slow until DB is reloaded
On 04/01/2010 7:10 PM, Madison Kelly wrote: Hi all, I've got a fairly small DB (~850MB when pg_dump'ed) running on PgSQL v8.1.11 a CentOS 5.3 x86_64 Xen-based virtual machine. The filesystem is ext3 on LVM with 32MB extents. It's about the only real resource-hungry VM on the server. It slows down over time and I can't seem to find a way to get the performance to return without doing a dump and reload of the database. I've tried manually running 'VACUUM FULL' and restarting the postgresql daemon without success. For example, here is an actual query before the dump and again after the dump (sorry for the large query): -=] Before the dump/reload [=- ser...@iwt= EXPLAIN ANALYZE SELECT lor_order_type, lor_order_date, lor_order_time, lor_isp_agent_id, lor_last_modified_date, lor_isp_order_number, lor_instr_for_bell_rep, lor_type_of_service, lor_local_voice_provider, lor_dry_loop_instr, lor_termination_location, lor_req_line_speed, lor_server_from, lor_rate_band, lor_related_order_nums, lor_related_order_types, lor_activation_date, lor_cust_first_name, lor_cust_last_name, lor_req_activation_date, lor_street_number, lor_street_number_suffix, lor_street_name, lor_street_type, lor_street_direction, lor_location_type_1, lor_location_number_1, lor_location_type_2, lor_location_number_2, lor_postal_code, lor_municipality, lor_province, lor_customer_group, lor_daytime_phone, lor_daytime_phone_ext, lod_value AS circuit_number FROM line_owner_report LEFT JOIN line_owner_data ON (lor_id=lod_lo_id AND lod_variable='ISPCircuitNumber1') WHERE lor_lo_id=514; QUERY PLAN -- Hash Left Join (cost=2115.43..112756.81 rows=8198 width=1152) (actual time=1463.311..1463.380 rows=1 loops=1) Hash Cond: (outer.lor_id = inner.lod_lo_id) - Seq Scan on line_owner_report (cost=0.00..108509.85 rows=8198 width=1124) (actual time=1462.810..1462.872 rows=1 loops=1) Filter: (lor_lo_id = 514) - Hash (cost=2112.85..2112.85 rows=1033 width=36) (actual time=0.421..0.421 rows=5 loops=1) - Bitmap Heap Scan on line_owner_data (cost=9.61..2112.85 rows=1033 width=36) (actual time=0.274..0.378 rows=5 loops=1) Recheck Cond: (lod_variable = 'ISPCircuitNumber1'::text) - Bitmap Index Scan on lod_variable_index (cost=0.00..9.61 rows=1033 width=0) (actual time=0.218..0.218 rows=5 loops=1) Index Cond: (lod_variable = 'ISPCircuitNumber1'::text) Total runtime: 1463.679 ms (10 rows) -=] After the dump/reload [=- ser...@iwt= EXPLAIN ANALYZE SELECT lor_order_type, lor_order_date, lor_order_time, lor_isp_agent_id, lor_last_modified_date, lor_isp_order_number, lor_instr_for_bell_rep, lor_type_of_service, lor_local_voice_provider, lor_dry_loop_instr, lor_termination_location, lor_req_line_speed, lor_server_from, lor_rate_band, lor_related_order_nums, lor_related_order_types, lor_activation_date, lor_cust_first_name, lor_cust_last_name, lor_req_activation_date, lor_street_number, lor_street_number_suffix, lor_street_name, lor_street_type, lor_street_direction, lor_location_type_1, lor_location_number_1, lor_location_type_2, lor_location_number_2, lor_postal_code, lor_municipality, lor_province, lor_customer_group, lor_daytime_phone, lor_daytime_phone_ext, lod_value AS circuit_number FROM line_owner_report LEFT JOIN line_owner_data ON (lor_id=lod_lo_id AND lod_variable='ISPCircuitNumber1') WHERE lor_lo_id=514; QUERY PLAN - Nested Loop Left Join (cost=10.84..182.57 rows=5 width=1152) (actual time=1.980..2.083 rows=1 loops=1) - Seq Scan on line_owner_report (cost=0.00..70.05 rows=5 width=1124) (actual time=1.388..1.485 rows=1 loops=1) Filter: (lor_lo_id = 514) - Bitmap Heap Scan on line_owner_data (cost=10.84..22.47 rows=3 width=36) (actual time=0.562..0.562 rows=0 loops=1) Recheck Cond: ((outer.lor_id = line_owner_data.lod_lo_id) AND (line_owner_data.lod_variable = 'ISPCircuitNumber1'::text)) - BitmapAnd (cost=10.84..10.84 rows=3 width=0) (actual time=0.552..0.552 rows=0 loops=1) - Bitmap Index Scan on lod_id_index (cost=0.00..4.80 rows=514 width=0) (actual time=0.250..0.250 rows=126 loops=1) Index Cond: (outer.lor_id = line_owner_data.lod_lo_id) - Bitmap Index Scan on lod_variable_index (cost=0.00..5.80 rows=514 width=0) (actual time=0.262..0.262 rows=5 loops=1) Index Cond: (lod_variable = 'ISPCircuitNumber1'::text) Total runtime: 2.576 ms (11 rows) Any idea on what might be causing the slowdown? Is it likely filesystem related or am I missing for
Re: [PERFORM] DB is slow until DB is reloaded
On 04/01/2010 8:30 PM, Madison Kelly wrote: Steve Crawford wrote: Madison Kelly wrote: Hi all, I've got a fairly small DB... It slows down over time and I can't seem to find a way to get the performance to return without doing a dump and reload of the database... Some questions: Is autovacuum running? This is the most likely suspect. If not, things will bloat and you won't be getting appropriate analyze runs. Speaking of which, what happens if you just run analyze? And as long as you are dumping and reloading anyway, how about version upgrading for bug reduction, performance improvement, and cool new features. Cheers, Steve Yup, I even tried manually running 'VACUUM FULL' and it didn't help. As for upgrading; VACUUM FULL is not the same as VACUUM ANALYZE FULL. You shouldn't need the FULL option amyway. a) I am trying to find a way around the dump/reload. I am doing it as a last resort only. b) I want to keep the version in CentOS' repo. I'd not tried simply updating the stats via ANALYZE... I'll keep an eye on performance and if it starts to slip again, I will run ANALYZE and see if that helps. If there is a way to run ANALYZE against a query that I am missing, please let me know. From your queries it definitely looks like its your stats that are the problem. When the stats get well out of date the planner is choosing a hash join because it thinks thousands of rows are involved where as only a few are actually involved. Thats why, with better stats, the second query is using a loop join over very few rows and running much quicker. Therefore it's ANALYZE you need to run as well as regular VACUUMing. There should be no need to VACUUM FULL at all as long as you VACUUM and ANALYZE regularly. Once a day may be enough, but you don't say how long it takes your database to become slow. You can VACUUM either the whole database (often easiest) or individual tables if you know in more detail what the problem is and that only certain tables need it. Setting up autovacuum may well be sufficient. Cheers, Gary. Madi -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Rajesh Kumar Mallah wrote: Hi, I am going to get a Dell 2950 with PERC6i with 8 * 73 15K SAS drives + 300 GB EMC SATA SAN STORAGE, I seek suggestions from users sharing their experience with similar hardware if any. I have following specific concerns. 1. On list i read that RAID10 function in PERC5 is not really striping but spanning and does not give performance boost is it still true in case of PERC6i ? It's long been our policy to buy Dell servers and I agree with most people here that the performance of the PERCs (5 and earlier) have been generally pretty poor However, they seem to have listened and got it right, or at least a lot better, with the PERC6. I have recently installed Ubuntu server on 2 Dell 2950s with 8GB RAM and six 2.5 inch 15K rpm SAS disks in a single RAID10. I only got chance to run bonnie++ on them a few times, but I was consistently getting around 200MB/sec for both sequential read and write (16GB file). Similar setup with the older Dell 2850 (PERC5, 6 x 15K rpm 3.5 inch SCSI) gave only around 120GB/sec whatever I did. Hope this helps. Cheers, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Tom Lane wrote: I increased the size of the test case by 10x (basically s/10/100/) which is enough to push it into the external-sort regime. I get amazingly stable runtimes now --- I didn't have the patience to run 100 trials, but in 30 trials I have slowest 11538 msec and fastest 11144 msec. So this code path is definitely not very sensitive to this data distribution. While these numbers aren't glittering in comparison to the best-case qsort times (~450 msec to sort 10% as much data), they are sure a lot better than the worst-case times. So maybe a workaround for you is to decrease maintenance_work_mem, counterintuitive though that be. (Now, if you *weren't* using maintenance_work_mem of 100MB or more for your problem restore, then I'm not sure I know what's going on...) Good call. I basically reversed your test by keeping the number of rows the same (20), but reducing maintenance_work_mem. Reducing to 8192 made no real difference. Reducing to 4096 flattened out all the times nicely. Slower overall, but at least predictable. Hopefully only a temporary solution until qsort is fixed. My restore now takes 22 minutes :) I think the reason I wasn't seeing performance issues with normal sort operations is because they use work_mem not maintenance_work_mem which was only set to 2048 anyway. Does that sound right? Regards, Gary. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Gary Doades [EMAIL PROTECTED] writes: I think the reason I wasn't seeing performance issues with normal sort operations is because they use work_mem not maintenance_work_mem which was only set to 2048 anyway. Does that sound right? Very probable. Do you want to test the theory by jacking that up? ;-) Hmm, played around a bit. I have managed to get it to do a sort on one of the bad columns using a select of two whole tables that results in a sequntial scan, sort and merge join. I also tried a simple select column order by column for a bad column. I tried varying maintenance_work_mem and work_mem up and down between 2048 and 65536 but I always get similar results. The sort phase always takes 4 to 5 seconds which seems about right for 900,000 rows. This was on a colunm that took 12 minutes to create an index on. I've no idea why it should behave this way, but probably explains why I (and others) may not have noticed it before. Regards, Gary. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: Interesting. I tried your test script and got fairly close times for all the cases on two different machines: old HPUX machine: shortest 5800 msec, longest 7960 msec new Fedora 4 machine: shortest 461 msec, longest 608 msec (the HPUX machine was doing other stuff at the same time, so some of its variation is probably only noise). So what this looks like to me is a corner case that FreeBSD's qsort fails to handle well. You might try forcing Postgres to use our private copy of qsort, as we do on Solaris for similar reasons. (The easy way to do this by hand is to configure as normal, then alter the LIBOBJS setting in src/Makefile.global to add qsort.o, then proceed with normal build.) However, I think that our private copy is descended from *BSD sources, so it might have the same failure mode. It'd be worth finding out. The final interesting thing is that as I increase shared buffers to 2000 or 3000 the problem gets *worse* shared_buffers is unlikely to impact index build time noticeably in recent PG releases. maintenance_work_mem would affect it a lot, though. What setting were you using for that? Can anyone else try these test cases on other platforms? Thanks for that. I've since tried it on Windows (pg 8.1.2) and the times were all similar, around 1200ms so it might just be BSD. I'll have to wait until tomorrow to get back to my BSD box. FreeBSD ports makes it easy to install, so I'll have to figure out how to get in and change things manually. I guess the appropriate files are still left around after the ports make command finishes, so I just edit the file and make again? If it can't be fixed though I guess we may have a problem using BSD. I'm surprised this hasn't been brought up before, the case doesn't seem *that* rare. Maybe not that many using FreeBSD? I'd certainly be interested if anyone else can repro it on FreeBSD though. Regards, Gary. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: shared_buffers is unlikely to impact index build time noticeably in recent PG releases. maintenance_work_mem would affect it a lot, though. What setting were you using for that? Also, i tried upping maintenance_work_mem to 65536 and it didn't make much difference (maybe 10% faster for the normal cases). Upping the shared_buffers *definitely* makes the bad cases worse though, but I agree I don't see why... Regards, Gary. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: I tried forcing PG to use src/port/qsort.c on the Fedora machine, and lo and behold: new Fedora 4 machine: shortest 434 msec, longest 8530 msec So it sure looks like this script does expose a problem on BSD-derived qsorts. Curiously, the case that's much the worst for me is the third in the script, while the shortest time is the first case, which was slow for Gary. So I'd venture that the *BSD code has been tweaked somewhere along the way, in a manner that moves the problem around without really fixing it. (Anyone want to compare the actual FreeBSD source to what we have?) If I run the script again, it is not always the first case that is slow, it varies from run to run, which is why I repeated it quite a few times for the test. Interestingly, if I don't delete the table after a run, but just drop and re-create the index repeatedly it stays a pretty consistent time, either repeatedly good or repeatedly bad! Regards, Gary. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Strange Create Index behaviour
Tom Lane wrote: So it sure looks like this script does expose a problem on BSD-derived qsorts. Curiously, the case that's much the worst for me is the third in the script, while the shortest time is the first case, which was slow for Gary. So I'd venture that the *BSD code has been tweaked somewhere along the way, in a manner that moves the problem around without really fixing it. (Anyone want to compare the actual FreeBSD source to what we have?) It's really interesting to see a case where port/qsort is radically worse than other qsorts ... unless we figure that out and fix it, I think the idea of using port/qsort everywhere has just taken a major hit. More specifically to BSD, is there any way I can use a non-BSD qsort for building Postresql server? Regards, Gary. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
Dave Cramer wrote: I understand that but I have seen VM's crash. This does bring up another point. Since postgresql is not threaded a .NET pl would require a separate VM for each connection (unless you can share the vm ?). One of the java pl's (pl-j) for postgres has dealt with this issue. For a hundred connections that's a hundred .NET vm's or java vm's. Is the .NET VM shareable ? In Windows, most certainly. Not sure about mono. Cheers, Gary. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
Pierre-Frédéric Caillaud wrote: On Mon, 10 Jan 2005 12:46:01 -0500, Alex Turner [EMAIL PROTECTED] wrote: You sir are correct! You can't use perl in MS-SQL or Oracle ;). Can you benefit from the luminous power of Visual Basic as a pl in MSSQL ? The .NET Runtime will be a part of the next MS SQLServer engine. You will be able to have C# as a pl in the database engine with the next version of MSSQL. That certainly will be something to think about. Cheers, Gary. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
Rosser Schwarz wrote: while you weren't looking, Gary Doades wrote: The .NET Runtime will be a part of the next MS SQLServer engine. It won't be long before someone writes a procedural language binding to PostgreSQL for Parrot [1]. That should offer us a handful or six more languages that can be used, including BASIC, Ruby and Scheme, Perl (5 and 6), Python and TCL for more or less free, and ... wait for it, BrainF***. IIRC, people have talked about porting C# to Parrot, as well. Or perhaps get the mono engine in there somewhere to pick up another dozen or so languages supported by .NET and mono.. Cheers, Gary. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
Dave Cramer wrote: I'm curious, why do you think that's serious ? What do you really expect to do in the stored procedure ? Anything of consequence will seriously degrade performance if you select it in say a million rows. I'm not sure what you mean by select it in a million rows. I would expect to write a procedure within the database engine to select a million rows, process them and return the result to the client. Very efficient. Cheers, Gary. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
Dave Cramer wrote: Ok, so one use case is to select a large number of rows and do some non-trivial operation on them. I can see where getting the rows inside the server process ( ie some procedural language ) thereby reducing the round trip overhead would be beneficial. However how do you deal with the lack of control ? For instance what happens if you run out of memory while doing this ? I'm not sure about other DB'S but if you crash the procedural language inside postgres you will bring the server down. It would seem to me that any non-trivial operation would be better handled outside the server process, even if it costs you the round trip. Since a .NET language is operating effectively inside a VM it is pretty much impossible to bring down the server that way. Only a bug in the .NET runtime itself will do that. The C# try/catch/finally with .NET global execption last chance handlers will ensure the server and your code is well protected. Cheers, Gary. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
Randolf Richardson wrote: I'm looking for recent performance statistics on PostgreSQL vs. Oracle vs. Microsoft SQL Server. Recently someone has been trying to convince my client to switch from SyBASE to Microsoft SQL Server (they originally wanted to go with Oracle but have since fallen in love with Microsoft). All this time I've been recommending PostgreSQL for cost and stability (my own testing has shown it to be better at handling abnormal shutdowns and using fewer system resources) in addition to true cross-platform compatibility. I'm not sure that you are going to get a simple answer to this one. It really depends on what you are trying to do. The only way you will know for sure what the performance of PostgreSQL is is to try it with samples of your common queries, updates etc. I have recently ported a moderately complex database from MS SQLServer to Postgres with reasonable success. 70% selects, 20% updates, 10% insert/deletes. I had to do a fair bit of work to get the best performance out of Postgres, but most of the SQL has as good or better performance then SQLServer. There are still areas where SQLServer outperforms Postgres. For me these tend to be the larger SQL Statements with correlated subqueries. SQLServer tends to optimise them better a lot of the time. Updates tend to be a fair bit faster on SQLServer too, this may be MS taking advantage of Windows specific optimisations in the filesystem. I did give Oracle a try out of curiosity. I never considered it seriously because of the cost. The majority of my SQL was *slower* under Oracle than SQLServer. I spent some time with it and did get good performance, but it took a *lot* of work tuning to Oracle specific ways of doing things. My Summary: SQLServer: A good all round database, fast, stable. Moderately expensive to buy, cheap and easy to work with and program for (on Windows) PostgreSQL: A good all rounder, fast most of the time, stable. Free to acquire, more expensive to work with and program for. Client drivers may be problematic depending on platform and programming language. Needs more work than SQLServer to get the best out of it. Improving all the time and worth serious consideration. Oracle: A bit of a monstrosity. Can be very fast with a lot of work, can't comment on stability but I guess it's pretty good. Very expensive to acquire and work with. Well supported server and clients. Cheers, Gary. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Postgres backend using huge amounts of ram
Tom Lane wrote: It's also worth noting that work_mem is temporarily set to maintenance_work_mem, which you didn't tell us the value of: It's left at the default. (16384). This would be OK if that is all it used for this type of thing. My recollection is that hash join chooses hash table partitions partly on the basis of the estimated number of input rows. Since the estimate was way off, the actual table size got out of hand a bit :-( A bit!! The really worrying bit is that a normal (ish) query also exhibited the same behaviour. I'm a bit worried that if the stats get a bit out of date so that the estimate is off, as in this case, a few backends trying to get this much RAM will see the server grind to a halt. Is this a fixable bug? It seems a fairly high priority, makes the server go away, type bug to me. If you need the test data, I could zip the two tables up and send them somewhere Thanks, Gary. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Postgres backend using huge amounts of ram
How much RAM can a single postgres backend use? I've just loaded a moderately sized dataset into postgres and was applying RI constraints to the tables (using pgadmin on windows). Part way though I noticed the (single) postgres backend had shot up to using 300+ MB of my RAM! The two tables are: create table reqt_dates ( reqt_date_idserial, reqt_id integer not null, reqt_date date not null, primary key (reqt_date_id) ) without oids; and create table booking_plan ( booking_plan_id serial, reqt_date_idinteger not null, booking_id integer not null, booking_datedate not null, datetime_from timestamp not null, datetime_to timestamp not null, primary key (booking_plan_id) ) without oids; and I was was trying to do: alter table booking_plan add foreign key ( reqt_date_id ) references reqt_dates ( reqt_date_id ) on delete cascade; Since I can't get an explain of what the alter table was doing I used this: select count(*) from booking_plan,reqt_dates where booking_plan.reqt_date_id = reqt_dates.reqt_date_id and sure enough this query caused the backend to use 300M RAM. The plan for this was: QUERY PLAN Aggregate (cost=37.00..37.00 rows=1 width=0) (actual time=123968.000..123968.000 rows=1 loops=1) - Hash Join (cost=15.50..36.50 rows=1000 width=0) (actual time=10205.000..120683.000 rows=1657709 loops=1) Hash Cond: (outer.reqt_date_id = inner.reqt_date_id) - Seq Scan on booking_plan (cost=0.00..15.00 rows=1000 width=4) (actual time=10.000..4264.000 rows=1657709 loops=1) - Hash (cost=15.00..15.00 rows=1000 width=4) (actual time=10195.000..10195.000 rows=0 loops=1) - Seq Scan on reqt_dates (cost=0.00..15.00 rows=1000 width=4) (actual time=0.000..6607.000 rows=2142184 loops=1) Total runtime: 124068.000 ms I then analysed the database. Note, there are no indexes at this stage except the primary keys. the same query then gave: QUERY PLAN Aggregate (cost=107213.17..107213.17 rows=1 width=0) (actual time=57002.000..57002.000 rows=1 loops=1) - Hash Join (cost=35887.01..106384.32 rows=1657709 width=0) (actual time=9774.000..54046.000 rows=1657709 loops=1) Hash Cond: (outer.reqt_date_id = inner.reqt_date_id) - Seq Scan on booking_plan (cost=0.00..22103.55 rows=1657709 width=4) (actual time=10.000..19648.000 rows=1657709 loops=1) - Hash (cost=24355.92..24355.92 rows=2142184 width=4) (actual time=9674.000..9674.000 rows=0 loops=1) - Seq Scan on reqt_dates (cost=0.00..24355.92 rows=2142184 width=4) (actual time=0.000..4699.000 rows=2142184 loops=1) Total runtime: 57002.000 ms This is the same set of hash joins, BUT the backend only used 30M of private RAM. Platform is Windows XP, Postgres 8.0 beta 5 shared_buffers = 4000 work_mem = 8192 Any explanations? Thanks, Gary. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Anomalies in 7.4.5
On 21 Oct 2004 at 15:50, Thomas F.O'Connell wrote: If not, should I be REINDEXing manually, as well as VACUUMing manually after large data imports (whether via COPY or INSERT)? Or will a VACUUM FULL ANALYZE be enough? It's not the vacuuming that's important here, just the analyze. If you import any data into a table, Postgres often does not *know* that until you gather the statistics on the table. You are simply running into the problem of the planner not knowing how much data/distribution of data in your tables. If you have large imports it may be faster overall to drop the indexes first, then insert the data, then put the indexes back on, then analyze. Cheers, Gary. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Odd planner choice?
Oops, forgot to mention: PostgreSQL 8.0 beta 2 Windows. Thanks, Gary. On 8 Oct 2004 at 20:32, Gary Doades wrote: I'm looking at one of my standard queries and have encountered some strange performance problems. The query below is to search for vacant staff member date/time slots given a series of target date/times. The data contained in the booking_plan/staff_booking tables contain the existing bookings, so I'm looking for clashing bookings to eliminate them from a candidate list. The query is: select distinct b.staff_id from staff_booking b, booking_plan bp, t_search_reqt_dates rd where b.booking_id = bp.booking_id and rd.datetime_from = bp.datetime_to and rd.datetime_to = bp.datetime_from AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1 and rd.search_id = 13 and rd.reqt_date between '2004-09-30' AND '2005-12-31' There are 197877 rows in staff_booking, 573416 rows in booking_plan and 26 rows in t_search_reqt_dates. The t_search reqt_dates is a temp table created and populated with the target date/times. The temp table is *not* analyzed, all the other are. The good query plan comes with the criteria on search_id and reqt_date given in the last two lines in the query. Note all the rows in the temp table are search_id = 13 and all the rows are between the two dates, so the whole 26 rows is always pulled out. In this case it is doing exactly what I expect. It is pulling all rows from the t_search_reqt_dates table, then pulling the relevant records from the booking_plan and then hashing with staff_booking. Excellent performance. The problem is I don't need the clauses for search_id and reqt_dates as the whole table is always read anyway. The good plan is because the planner thinks just one row will be read from t_search_reqt_dates. If I remove the redundant clauses, the planner now estimates 1000 rows returned from the table, not unreasonable since it has no statistics. But *why* in that case, with *more* estimated rows does it choose to materialize that table (26 rows) 573416 times!!! whenever it estimates more than one row it chooses the bad plan. I really want to remove the redundant clauses, but I can't. If I analyse the table, then it knows there are 26 rows and chooses the bad plan whatever I do. Any ideas??? Cheers, Gary. Plans for above query Good QUERY PLAN Unique (cost=15440.83..15447.91 rows=462 width=4) (actual time=1342.000..1342.000 rows=110 loops=1) - Sort (cost=15440.83..15444.37 rows=7081 width=4) (actual time=1342.000..1342.000 rows=2173 loops=1) Sort Key: b.staff_id - Hash Join (cost=10784.66..15350.26 rows=7081 width=4) (actual time=601.000..1331.000 rows=2173 loops=1) Hash Cond: (outer.booking_id = inner.booking_id) - Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual time=0.000..400.000 rows=197877 loops=1) - Hash (cost=10781.12..10781.12 rows=7080 width=4) (actual time=591.000..591.000 rows=0 loops=1) - Nested Loop (cost=0.00..10781.12 rows=7080 width=4) (actual time=10.000..581.000 rows=2173 loops=1) Join Filter: ((outer.datetime_from = inner.datetime_to) AND (outer.datetime_to = inner.datetime_from)) - Seq Scan on t_search_reqt_dates rd (cost=0.00..16.50 rows=1 width=20) (actual time=0.000..0.000 rows=26 loops=1) Filter: ((search_id = 13) AND (reqt_date = '2004-09-30'::date) AND (reqt_date = '2005-12-31'::date)) - Index Scan using booking_plan_idx2 on booking_plan bp (cost=0.00..10254.91 rows=63713 width=24) (actual time=0.000..11.538 rows=5871 loops=26) Index Cond: ((bp.booking_date = (outer.reqt_date - 1)) AND (bp.booking_date = (outer.reqt_date + 1))) Total runtime: 1342.000 ms Bad QUERY PLAN Unique (cost=7878387.29..7885466.50 rows=462 width=4) (actual time=41980.000..41980.000 rows=110 loops=1) - Sort (cost=7878387.29..7881926.90 rows=7079211 width=4) (actual time=41980.000..41980.000 rows=2173 loops=1) Sort Key: b.staff_id - Nested Loop (cost=5314.32..7480762.73 rows=7079211 width=4) (actual time=6579.000..41980.000 rows=2173 loops=1) Join Filter: ((inner.datetime_from = outer.datetime_to) AND (inner.datetime_to = outer.datetime_from) AND (outer.booking_date = (inner.reqt_date - 1)) AND (outer.booking_date = (inner.reqt_date + 1))) - Hash Join (cost=5299.32..26339.73 rows=573416 width=24) (actual time=2413.000..7832.000 rows=573416 loops=1) Hash Cond: (outer.booking_id = inner.booking_id) - Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416 width=24) (actual time=0.000..1201.000 rows=573416 loops=1) - Hash (cost=4233.39..4233.39 rows=197877 width=8) (actual time=811.000..811.000 rows=0 loops=1) - Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual time=0.000..430.000 rows=197877 loops=1) - Materialize (cost=15.00..20.00 rows=1000 width=20) (actual time
Re: [PERFORM] Odd planner choice?
On 8 Oct 2004 at 16:04, Tom Lane wrote: Gary Doades [EMAIL PROTECTED] writes: If I remove the redundant clauses, the planner now estimates 1000 rows returned from the table, not unreasonable since it has no statistics. But *why* in that case, with *more* estimated rows does it choose to materialize that table (26 rows) 573416 times!!! It isn't. It's materializing that once and scanning it 573416 times, once for each row in the outer relation. And this is not a bad plan given the estimates. If it had stuck to what you call the good plan, and there *had* been 1000 rows in the temp table, that plan would have run 1000 times longer than it did. As a general rule, if your complaint is that you get a bad plan for an unanalyzed table, the response is going to be so analyze the table. The problem is in this case is that if I *do* analyse the table I *always* get the bad plan. Bad in this case meaning the query takes a lot longer. I'm still not sure why it can't choose the better plan by just reading the 26 rows once and index scan the booking_plan table 26 times (as in the good plan). OK, with 1000 row estimate I can see that index scanning 1000 times into the booking_plan table would take some time, but the even if planner estimates 5 rows it still produces the same slow query. If I analyze the table it then knows there are 26 rows and therefore always goes slow. This is why I am not analyzing this table, to fool the planner into thinking there is only one row and produce a much faster access plan. Not ideal I know. Just using one redundant clause I now get: select distinct b.staff_id from staff_booking b, booking_plan bp, t_search_reqt_dates rd where b.booking_id = bp.booking_id and rd.datetime_from = bp.datetime_to and rd.datetime_to = bp.datetime_from AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1 and rd.search_id = 13 QUERY PLAN Unique (cost=50885.97..50921.37 rows=462 width=4) (actual time=35231.000..35241.000 rows=110 loops=1) - Sort (cost=50885.97..50903.67 rows=35397 width=4) (actual time=35231.000..35241.000 rows=2173 loops=1) Sort Key: b.staff_id - Hash Join (cost=44951.32..50351.07 rows=35397 width=4) (actual time=34530.000..35231.000 rows=2173 loops=1) Hash Cond: (outer.booking_id = inner.booking_id) - Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual time=0.000..351.000 rows=197877 loops=1) - Hash (cost=44933.62..44933.62 rows=35397 width=4) (actual time=34530.000..34530.000 rows=0 loops=1) - Nested Loop (cost=15.50..44933.62 rows=35397 width=4) (actual time=8342.000..34520.000 rows=2173 loops=1) Join Filter: ((inner.datetime_from = outer.datetime_to) AND (inner.datetime_to = outer.datetime_from) AND (outer.booking_date = (inner.reqt_date - 1)) AND (outer.booking_date = (inner.reqt_date + 1))) - Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416 width=24) (actual time=0.000..1053.000 rows=573416 loops=1) - Materialize (cost=15.50..15.53 rows=5 width=20) (actual time=0.001..0.019 rows=26 loops=573416) - Seq Scan on t_search_reqt_dates rd (cost=0.00..15.50 rows=5 width=20) (actual time=0.000..0.000 rows=26 loops=1) Filter: (search_id = 13) Total runtime: 35241.000 ms If this is the only answer for now, then fair enough I will just have to do more testing. Regards, Gary. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Optimizing a request
On 31 Aug 2004 at 20:59, Jean-Max Reymond wrote: hi, I want to optimize the following request and avoid the seq scan on the table article (1000 rows). explain SELECT art_id, art_titre, art_texte, rub_titre FROM article inner join rubrique on article.rub_id = rubrique.rub_id where rub_parent = 8; Hash Join (cost=8.27..265637.59 rows=25 width=130) Hash Cond: (outer.rub_id = inner.rub_id) - Seq Scan on article (cost=0.00..215629.00 rows=1000 width=108) - Hash (cost=8.26..8.26 rows=3 width=22) - Index Scan using rubrique_parent on rubrique (cost=0.00..8.26 rows=3 width=22) Index Cond: (rub_parent = 8) thanks for your answers, -- Have you run ANALYZE on this database after creating the indexes or loading the data? What percentage of rows in the article table are likely to match the keys selected from the rubrique table? If it is likely to fetch a high proportion of the rows from article then it may be best that a seq scan is performed. What are your non-default postgresql.conf settings? It may be better to increase the default_statistics_target (to say 100 to 200) before running ANALYZE and then re-run the query. Cheers, Gary.
Re: [PERFORM] Optimizing a request
On 31 Aug 2004 at 21:42, Jean-Max Reymond wrote: - Original Message - From: Gary Doades [EMAIL PROTECTED] Date: Tue, 31 Aug 2004 20:21:49 +0100 Subject: Re: [PERFORM] Optimizing a request To: [EMAIL PROTECTED] Have you run ANALYZE on this database after creating the indexes or loading the data? the indexes are created and the data loaded and then, I run vacuum analyze. What percentage of rows in the article table are likely to match the keys selected from the rubrique table? only 1 record. If it is likely to fetch a high proportion of the rows from article then it may be best that a seq scan is performed. What are your non-default postgresql.conf settings? It may be better to increase the default_statistics_target (to say 100 to 200) before running ANALYZE and then re-run the query. yes, default_statistics_target is set to the default_value. I have just increased shared_buffers and effective_cache_size to give advantage of 1 Mb RAM I can only presume you mean 1 GB RAM. What exactly are your settings for shared buffers and effective_cache_size? Can you increase default_statistics_target and re-test? It is possible that with such a large table that the distribution of values is skewed and postgres does not realise that an index scan would be better. It seems very odd otherwise that only on row out of 10,000,000 could match and postgres does not realise this. Can you post an explain analyse (not just explain) for this query? Cheers, Gary. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optimizing a request
On 31 Aug 2004 at 22:24, Jean-Max Reymond wrote: On Tue, 31 Aug 2004 21:16:46 +0100, Gary Doades [EMAIL PROTECTED] wrote: I can only presume you mean 1 GB RAM. What exactly are your settings for shared buffers and effective_cache_size? for 1 GB RAM, shared_buffers = 65536 effective_cache_size = 16384 This seems like the wrong way round also. You might try: shared_buffers = 1 effective_cache_size = 6 Cheers, Gary. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux
On 2 Jun 2004 at 16:45, Merlin Moncure wrote: 'better' does not mean 'faster'. Win32 has a pretty decent journaling filesytem (ntfs) and a good I/O subsystem which includes IPC. Process management is poor compared to newer linux kernels but this is unimportant except in extreme cases. Right now the win32 native does not sync() (but does fsync()). So, the performance is somewhere between fsync = off and fsync = on (probably much closer to fsync = on). It is reasonable to assume that the win32 port will outperform the unix versions at many tasks (at the expense of safety) until the new sync() code is put in. If tested on the same source base, 40-60% differences can only be coming from the I/O subsystem. There are other factors which aren't clear from this exchange like what version of gcc, etc. Hmm, interesting. I've been running the Win32 port for a couple of weeks now. Using the same database as a Linux 2.6 system. Same processor and memory but different disks. Linux system has 10K rpm SCSI disks Windows has 7200 rpm serial ATA disks. When a lot of IO is involved the performance differences are very mixed as I would expect. Sometimes Windows wins, sometimes Linux. BUT, very consistently, when NO IO is involved then the Win32 port is always around 20% slower than Linux. In cases where the EXPLAIN ANALYZE results are different I have disregarded. In all the cases that the EXPLAIN ANALYZE results are the same and no IO is involved the Win32 port is slower. Currently I am putting this down to the build/gcc differences. I can't see why there should be this difference otherwise. (memory management??) Regards, Gary. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL performance in simple queries
Try using select * from files_t where parent = ::int8 You have declared parent as int8, but the query will assume int4 for and may not use the index. Also make sure you have ANALYZEd this table. Regards, Gary. On 20 May 2004 at 0:07, Eugeny Balakhonov wrote: Hello for all! I have PostgreSQL 7.4 under last version of Cygwin and have some problems with performance :( It is very strange... I don't remember this problem on previous version Cygwin and PostgreSQL 7.3 I have only two simple tables: CREATE TABLE public.files_t ( id int8 NOT NULL, parent int8, size int8 NOT NULL, dir bool NOT NULL DEFAULT false, ctime timestamp NOT NULL, ftime timestamp NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone, name text NOT NULL, access varchar(10) NOT NULL, host int4 NOT NULL, uname text NOT NULL, CONSTRAINT pk_files_k PRIMARY KEY (id), CONSTRAINT fk_files_k FOREIGN KEY (parent) REFERENCES public.files_t (id) ON UPDATE CASCADE ON DELETE CASCADE, CONSTRAINT fk_hosts_k FOREIGN KEY (host) REFERENCES public.hosts_t (id) ON UPDATE CASCADE ON DELETE CASCADE ) WITH OIDS; and CREATE TABLE public.hosts_t ( id int4 NOT NULL, ftime timestamp NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone, utime timestamp NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone, name text NOT NULL, address inet NOT NULL, CONSTRAINT pk_hosts_k PRIMARY KEY (id) ) WITH OIDS; Table files_t has 249259 records and table hosts_t has only 59 records. I tries to run simple query: select * from files_t where parent = This query works 0.256 seconds! It is very big time for this small table! I have index for field parent: CREATE INDEX files_parent_idx ON public.files_t USING btree (parent); But if I tries to see query plan then I see following text: Seq Scan on files_t (cost=0.00..6103.89 rows=54 width=102) Filter: (parent = ) PostgreSQL do not uses index files_parent_idx! I have enabled all options of QUERY TUNING in postgresql.conf, I have increased memory sizes for PostgreSQL: shared_buffers = 2000 # min 16, at least max_connections*2, 8KB each sort_mem = 32768 # min 64, size in KB vacuum_mem = 65536 # min 1024, size in KB fsync = false # turns forced synchronization on or off checkpoint_segments = 3 # in logfile segments, min 1, 16MB each enable_hashagg = true enable_hashjoin = true enable_indexscan = true enable_mergejoin = true enable_nestloop = true enable_seqscan = true enable_sort = true enable_tidscan = true geqo = true geqo_threshold = 22 geqo_effort = 1 geqo_generations = 0 geqo_pool_size = 0 # default based on tables in statement, # range 128-1024 geqo_selection_bias = 2.0 # range 1.5-2.0 stats_start_collector = true stats_command_string = true stats_block_level = true stats_row_level = true stats_reset_on_server_start = false Please help me! My database has a very small size (only 249259 records) but it works very slowly :( Best regards Eugeny ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] planner/optimizer question
On 29 Apr 2004 at 19:17, Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: Certainly the fact that MSSQL is essentially a single-user database makes things easier for them. Our recent testing (cf the Xeon thread) says that the interlocking we do to make the world safe for multiple backends has a fairly high cost (at least on some hardware) compared to the rest of the work in scenarios where you are doing zero-I/O scans of data already in memory. Especially so for index scans. I'm not sure this completely explains the differential that Gary is complaining about, but it could be part of it. Is it really true that MSSQL doesn't support concurrent operations? regards, tom lane As far as I am aware SQLSever supports concurrent operations. It certainly creates more threads for each connection. None of my observations of the system under load (50 ish concurrent users, 150 ish connections) suggest that it is serializing queries. These tests are currentl on single processor Athlon XP 2000+ systems. Regards, Gary. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] planner/optimizer question
On 30 Apr 2004 at 7:26, Dennis Bjorklund wrote: On Fri, 30 Apr 2004, Gary Doades wrote: I should have also pointed out that MSSQL reported that same index scan as taking 65% of the overall query time. It was just faster. The overall query took 103ms in MSSQL. Are your results based on a single client accessing the database and no concurrent updates? Would adding more clients, and maybe having some client that updates/inserts into the tables, still make mssql faster then pg? Maybe it's so simple as pg being optimized for more concurrent users then mssql? I'm just asking, I don't know much about the inner workings of mssql. -- /Dennis Björklund At the moment it is difficult to set up many clients for testing concurrent stuff. In the past I have had several SQLServer clients under test, mainly select queries. MSSQL can certainly execute queries while other queries are still running in the background. Our production app is fairly well biased towards selects. Currently it is about 70% selects, 20% inserts, 6% deletes and 4% updates. Very few updates are more than one row based on the primary key. Over 90% of the time spend running SQL is in select queries. My limited concurrent testing on Postgres gives very good performance on updates, inserts, deletes, but it is suffering on the selects in certain areas which why I have been concentrating my efforts on that area. Having got similar (or the same) access plans in both Postgres and MSSQL I was getting down to the next level of checking what was going on when executing the already planned query. I do have another database system I could try. Sybase SQLAnywhere. This is not the original Sybase Entrerprise which has the same roots as MSSQL. In the past my testing suggested that SQLAnywhere performance was as godd or better than MSSQL. I mey try to set it up with the same data in these tests for a more detailed comparison. Regards, Gary. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] planner/optimizer question
On 30 Apr 2004 at 8:32, Jeff wrote: A better comparision query may be a simple select a from mytable where a between foo and bar to get an index scan. In that case its a straight up, vanilla index scan. Nothing else getting in the way. Yes, you're right and I have done this just to prove to myself that it is the index scan that is the bottleneck. I have some complex SQL that executes very quickly with Postgres, similar to MSSQL, but the index scans in most of those only touch a few rows for a few loops. It seems to be a problem when the index scan is scanning very many rows and for each of these it has to go to the table just to find out if the index it just looked at is still valid. Gary. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] planner/optimizer question
On 30 Apr 2004 at 9:37, Kevin Barnard wrote: I was always under the impression that MSSQL used leaf and row level locking and therefore was not a concurrent, in the same sense that postgres is, database. It would still allow for concurrent connections and such but updates will get blocked/ delayed. I might be wrong. Ultimately you may be right. I don't know enough about SQLServer internals to say either way. Anyway, most of our system is in selects for 70% of the time. I could try and set up a test for this when I get a bit more time. Unfortunately I suspect that this topic won't get taken much further. In order to test this it would mean modifying quite a bit of code. Whether putting additional info in the index header and not visiting the data row if all the required data is in the index would be beneficial would require quite a bit of work by someone who knows more than I do. I reckon that no-one has the time to do this at the moment. Regards, Gary. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] planner/optimizer question
I guess the real question is, why maintain index flags and not simply drop the index entry altogether? A more interesting case would be to have the backend process record index tuples that it would invalidate (if committed), then on commit send that list to a garbage collection process. It's still vacuum -- just the reaction time for it would be much quicker. This was my original question. I guess the problem is with MVCC. The row may have gone from your current view of the table but not from someone elses. I don't (yet) understand the way it works to say for sure, but I still think it is worth pursuing further for someone who does know the deep stuff. They seem to have concluded that it is not worth it however. Cheers, Gary. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] planner/optimizer question
It's also entirely possible your indices are using inaccurate statistical information. Have you ANALYZEd recently? In this example the statistics don't matter. The plans used were the same for MSSQL and Postgres. I was trying to eliminate the difference in plans between the two, which obviously does make a difference, sometimes in MSSQL favour and sometimes the other way round. Both systems, having decided to do the same index scan, took noticably different times. The Postgres database was fully vacuumed and analysed anyway. I agree about MSSQL recovery time. it sucks. This is why they are making a big point about the improved recovery time in yukon. Although the recovery time is important, I see this as an exception, whereas at the moment I am interested in the everyday. Cheers, Gary. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] planner/optimizer question
On 29 Apr 2004 at 13:54, Josh Berkus wrote: Gary, It's also quite possble the MSSQL simply has more efficient index scanning implementation that we do.They've certainly had incentive; their storage system sucks big time for random lookups and they need those fast indexes. (just try to build a 1GB adjacency list tree on SQL Server. I dare ya). Certainly the fact that MSSQL is essentially a single-user database makes things easier for them.They don't have to maintain multiple copies of the index tuples in memory.I think that may be our main performance loss. Possibly, but MSSQL certainly uses data from indexes and cuts out the subsequent (possibly random seek) data fetch. This is also why the Index Tuning Wizard often recommends multi column compound indexes in some cases. I've tried these recommendations on occasions and they certainly speed up the selects significantly. If anyhing the index scan on the new compound index must be slower then the original single column index and yet it still gets the data faster. This indicates to me that it is not the scan (or IO) performance that is making the difference, but not having to go get the data row. Cheers, Gary. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] planner/optimizer question
Having picked out an index scan as being the highest time user I concentrated on that in this case and compared the same index scan on MSSQL. At least MSSQL reported it as an index scan on the same index for the same number of rows. I should have also pointed out that MSSQL reported that same index scan as taking 65% of the overall query time. It was just faster. The overall query took 103ms in MSSQL. Gary. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] planner/optimizer question
I know you will shoot me down, but... Why is there an entry in the index for a row if the row is not valid? Wouldn't it be better for the index entry validity to track the row validity. If a particular data value for a query (join, where etc.) can be satisfied by the index entry itself this would be a big performance gain. Cheers, Gary. On 28 Apr 2004 at 0:27, Tom Lane wrote: [EMAIL PROTECTED] writes: ... Wouldn't the most efficient plan be to scan the index regardless of crm_id because the only columns needed are in the index? No. People coming from other databases often have the misconception that queries can be answered by looking only at an index. That is never true in Postgres because row validity info is only stored in the table; so we must always visit the table entry to make sure the row is still valid/visible for the current query. Accordingly, columns added to the index that aren't constrained by the WHERE clause are not very useful ... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] planner/optimizer question
I can understand the performance loss on non-selects for keeping the index validity state tracking the row validity, but would that outweigh the performance gains on selects? Depends on your mix of selects to non selects I guess, but other database systems seem to imply that keeping the index on track is worth it overall. Cheers, Gary. On 28 Apr 2004 at 15:04, Christopher Kings-Lynne wrote: Why is there an entry in the index for a row if the row is not valid? Wouldn't it be better for the index entry validity to track the row validity. If a particular data value for a query (join, where etc.) can be satisfied by the index entry itself this would be a big performance gain. For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big performance loss. Chris ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
On 5 Apr 2004 at 8:36, Josh Berkus wrote: Point taken, though, SQL Server has done a better job in opitimizing for dumb queries. This is something that PostgreSQL needs to work on, as is self-referential updates for large tables, which also tend to be really slow. Mind you, in SQL Server 7 I used to be able to crash the server with a big self-referential update, so this is a common database problem. I agree about the dumb queries (I'm not mine are *that* dumb :) ) When you can write SQL that looks right, feels right, gives the right answers during testing and SQLServer runs them really fast, you stop there and tend not to tinker with the SQL further. You *can* (I certainly do) achieve comparable performance with PostgreSQL, but you just have to work harder for it. Now that I have learned the characteristics of both servers I can write SQL that is pretty good on both. I suspect that there are people who evaluate PostgreSQL by executing their favorite SQLSever queries against it, see that it is slower and never bother to go further. Cheers, Gary. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
Possibly. A lot of my queries show comparable performance, some a little slower and a few a little faster. There are a few, however, that really grind on PostgreSQL. I am leaning patterns from these to try and and target the most likely performance problems to come and hand tune these types of SQL. I'm not complaining about PostgreSQL or saying that SQLServer is better, in most cases it is not. SQLServer seems to be more predictable and forgiving in performance which tends to make for lazy SQL programming. It also has implications when the SQL is dynamically created based on user input, there are more chances of PostgreSQL hitting a performance problem than SQLServer. Overall I'm still very impressed with PostgreSQL. Given the $7000 per processor licence for SQLServer makes the case for PostgreSQL even stronger! Cheers, Gary. On 3 Apr 2004 at 17:43, Aaron Werman wrote: Almost any cross dbms migration shows a drop in performance. The engine effectively trains developers and administrators in what works and what doesn't. The initial migration thus compares a tuned to an untuned version. /Aaron - Original Message - From: Josh Berkus [EMAIL PROTECTED] To: Gary Doades [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, April 03, 2004 1:59 PM Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel. Gary, There are no indexes on the columns involved in the update, they are not required for my usual select statements. This is an attempt to slightly denormalise the design to get the performance up comparable to SQL Server 2000. We hope to move some of our databases over to PostgreSQL later in the year and this is part of the ongoing testing. SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) so I am hand optimising some of the more frequently used SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets smarter. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
Unfortunately I don't understand the question! My background is the primarily Win32. The last time I used a *nix OS was about 20 years ago apart from occasional dips into the linux OS over the past few years. If you can tell be how to find out what you want I will gladly give you the information. Regards, Gary. On 3 Apr 2004 at 16:52, Cott Lang wrote: On Sat, 2004-04-03 at 03:50, Gary Doades wrote: On 2 Apr 2004 at 22:36, [EMAIL PROTECTED] wrote: OK, some more detail: Before wiping 2.4 off my test box for the second time: Perhaps I missed it, but which io scheduler are you using under 2.6? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
Unfortunately I have to try and keep both SQLServer and PostgreSQL compatibilty. Our main web application is currently SQLServer, but we want to migrate customers who don't care what the DB server is over to PostgreSQL. Some of our larger customers demand SQLServer, you know how it is! I don't want to maintain two sets of code or SQL, so I am trying to find common ground. The code is not a problem, but the SQL sometimes is. Cheers, Gary. On 3 Apr 2004 at 17:43, Aaron Werman wrote: Almost any cross dbms migration shows a drop in performance. The engine effectively trains developers and administrators in what works and what doesn't. The initial migration thus compares a tuned to an untuned version. /Aaron - Original Message - From: Josh Berkus [EMAIL PROTECTED] To: Gary Doades [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, April 03, 2004 1:59 PM Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel. Gary, There are no indexes on the columns involved in the update, they are not required for my usual select statements. This is an attempt to slightly denormalise the design to get the performance up comparable to SQL Server 2000. We hope to move some of our databases over to PostgreSQL later in the year and this is part of the ongoing testing. SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) so I am hand optimising some of the more frequently used SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets smarter. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
On 3 Apr 2004 at 21:23, Mike Nolan wrote: Almost any cross dbms migration shows a drop in performance. The engine effectively trains developers and administrators in what works and what doesn't. The initial migration thus compares a tuned to an untuned version. I think it is also possible that Microsoft has more programmers working on tuning issues for SQL Server than PostgreSQL has working on the whole project. -- Mike Nolan Agreed. Also considering the high price of SQLServer it is in their interests to spend a lot of resources on tuning/performance to give it a commercial edge over it rivals and in silly benchmark scores. Cheers, Gary. -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
Actually it hasn't been my experience either. Most of my queries against the database, large and small are either a little quicker or no real difference. I have only really noticed big differences under stress when memory (RAM) is being squeezed. The main winner on 2.6 seems to be write performance and memory management. Unfortunately I only have one test machine and I can't really keep switching between 2.4 and 2.6 to do the comparisons. I had written down 27 timings from a set of SQL of varying complexity using the 2.4 kernel. Each SQL statement was executed 10 times and the average of the last 5 was used. I can only really compare those timings against the new installation on 2.6. I know that this is not ideal real world testing, but it is good enough for me at the moment. Unless anyone has contradictory indications then I will proceed with 2.6. I did increase the default stats target from 10 to 50 and re-analysed. The explain numbers are slightly different, but the time to run was almost the same. Not surprising since the plan was the same. QUERY PLAN Merge Join (cost=0.00..192636.20 rows=2845920 width=92) Merge Cond: (outer.reqt_id = inner.reqt_id) - Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..52662.40 rows=2206291 width=6) - Index Scan using staff_book_idx2 on staff_booking (cost=0.00..102529.28 rows=2845920 width=90) On 3 Apr 2004 at 10:59, Josh Berkus wrote: Gary, There are no indexes on the columns involved in the update, they are not required for my usual select statements. This is an attempt to slightly denormalise the design to get the performance up comparable to SQL Server 2000. We hope to move some of our databases over to PostgreSQL later in the year and this is part of the ongoing testing. SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) so I am hand optimising some of the more frequently used SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets smarter. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
Thanks, I know about set showplan_text, but it is only the equivalent of explain, not explain analyze. The graphical plan gives full statistics, runtime, percentage cost, loop execution counts etc. which is much more useful. I don't know of a way of getting the graphical plan content in text form. Cheers, Gary. On 3 Apr 2004 at 6:50, @g v t c wrote: Use Set Show_Plan or something of the sort in Query Analyzer. Then run your SQL. This will change the graphical plan to a text plan similar to Postgresql or at least something close to readable. Gary Doades wrote: On 2 Apr 2004 at 22:36, [EMAIL PROTECTED] wrote: OK, some more detail: Before wiping 2.4 off my test box for the second time: SQL Statement for update: update staff_booking set time_from = r.time_from from order_reqt r where r.reqt_id = staff_booking.reqt_id; Explain: (on 2.4) QUERY PLAN Merge Join (cost=0.00..185731.30 rows=2845920 width=92) Merge Cond: (outer.reqt_id = inner.reqt_id) - Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..53068.20 rows=2206291 width=6) - Index Scan using staff_book_idx2 on staff_booking (cost=0.00..99579.21 rows=2845920 width=90) Total execution time: 18 hours 12 minutes vacuum full analyze: total time 3 hours 22 minutes Wait 2 hours for re-install 2.6, set params etc. restore database. Same SQL Statement Explain: (on 2.6) QUERY PLAN Merge Join (cost=0.00..209740.24 rows=2845920 width=92) Merge Cond: (outer.reqt_id = inner.reqt_id) - Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..50734.20 rows=2206291 width=6) - Index Scan using staff_book_idx2 on staff_booking (cost=0.00..117921.92 rows=2845920 width=90) Total execution time: 2 hours 53 minutes vacuum full analyze: total time 1 hours 6 minutes Table definitions for the two tables involved: CREATE TABLE ORDER_REQT ( REQT_ID SERIAL, ORDER_IDinteger NOT NULL, DAYOFWEEK smallint NOT NULL CHECK (DAYOFWEEK BETWEEN 0 AND 6), TIME_FROM smallint NOT NULL CHECK (TIME_FROM BETWEEN 0 AND 1439), DURATIONsmallint NOT NULL CHECK (DURATION BETWEEN 0 AND 1439), PRODUCT_ID integer NOT NULL, NUMBER_REQT smallint NOT NULL DEFAULT (1), WROPTIONS integer NOT NULL DEFAULT 0, UID_REF integer NOT NULL, DT_STAMPtimestamp NOT NULL DEFAULT current_timestamp, Sentinel_Priority integer NOT NULL DEFAULT 0, PERIOD smallint NOT NULL DEFAULT 1 CHECK (PERIOD BETWEEN -2 AND 4), FREQUENCY smallint NOT NULL DEFAULT 1, PRIMARY KEY (REQT_ID) ); CREATE TABLE STAFF_BOOKING ( BOOKING_ID SERIAL, REQT_ID integer NOT NULL, ENTITY_TYPE smallint NOT NULL DEFAULT 3 check(ENTITY_TYPE in(3,4)), STAFF_IDinteger NOT NULL, CONTRACT_ID integer NOT NULL, TIME_FROM smallint NOT NULL CHECK (TIME_FROM BETWEEN 0 AND 1439), DURATIONsmallint NOT NULL CHECK (DURATION BETWEEN 0 AND 1439), PERIOD smallint NOT NULL DEFAULT 1 CHECK (PERIOD BETWEEN -2 AND 4), FREQUENCY smallint NOT NULL DEFAULT 1, TRAVEL_TO smallint NOT NULL DEFAULT 0, UID_REF integer NOT NULL, DT_STAMPtimestamp NOT NULL DEFAULT current_timestamp, SELL_PRICE numeric(10,4) NOT NULL DEFAULT 0, COST_PRICE numeric(10,4) NOT NULL DEFAULT 0, MIN_SELL_PRICE numeric(10,4) NOT NULL DEFAULT 0, MIN_COST_PRICE numeric(10,4) NOT NULL DEFAULT 0, Sentinel_Priority integer NOT NULL DEFAULT 0, CHECK_INTERVAL smallint NOT NULL DEFAULT 0, STATUS smallint NOT NULL DEFAULT 0, WROPTIONS integer NOT NULL DEFAULT 0, PRIMARY KEY (BOOKING_ID) ); Foreign keys: ALTER TABLE ORDER_REQT ADD FOREIGN KEY ( ORDER_ID ) REFERENCES MAIN_ORDER ( ORDER_ID ) ON DELETE CASCADE; ALTER TABLE ORDER_REQT ADD FOREIGN KEY ( PRODUCT_ID ) REFERENCES PRODUCT ( PRODUCT_ID ); ALTER TABLE STAFF_BOOKING ADD FOREIGN KEY ( CONTRACT_ID ) REFERENCES STAFF_CONTRACT ( CONTRACT_ID ); ALTER TABLE STAFF_BOOKING ADD FOREIGN KEY ( STAFF_ID ) REFERENCES STAFF ( STAFF_ID ); Indexes: CREATE INDEX FK_IDX_ORDER_REQT ON ORDER_REQT ( ORDER_ID
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
Following on from Josh's response and my previous reply on SQLServer planning. The main problem query is this one: SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS, SC.MIN_HOURS, (SELECT COUNT(*) FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID AND MAIN_ORDER.CLIENT_ID = 6) WHERE TIMESHEET_DETAIL.CONTRACT_ID = SC.CONTRACT_ID) AS VISITS, (SELECT (SUM(R.DURATION+1))/60.0 FROM ORDER_REQT R JOIN STAFF_BOOKING B ON (B.REQT_ID = R.REQT_ID) JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) WHERE B.CONTRACT_ID = SC.CONTRACT_ID AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18') AS RHOURS FROM VSTAFF VS JOIN STAFF_CONTRACT SC ON (SC.STAFF_ID = VS.STAFF_ID) JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID) JOIN SEARCH_REQT_RESULT SR ON (SR.STAFF_ID = VS.STAFF_ID) WHERE SR.SEARCH_ID = 1 AND SC.CONTRACT_ID IN (SELECT C.CONTRACT_ID FROM STAFF_PRODUCT P,STAFF_CONTRACT C WHERE P.CONTRACT_ID=C.CONTRACT_ID AND C.STAFF_ID = VS.STAFF_ID AND P.PRODUCT_ID IN (SELECT PRODUCT_ID FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1) AND C.AVAIL_DATE_FROM = '2004-06-12' AND C.AVAIL_DATE_TO = '2004-06-18' GROUP BY C.CONTRACT_ID HAVING (COUNT(C.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID) FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1))) The explain analyze is: QUERY PLAN Nested Loop (cost=101.54..1572059.57 rows=135 width=152) (actual time=13749.100..1304586.501 rows=429 loops=1) InitPlan - Index Scan using fk_idx_wruserarea on wruserarea (cost=3.26..6.52 rows=1 width=4) (actual time=0.944..0.944 rows=1 loops=1) Index Cond: (area_id = 1) Filter: (uid = $4) InitPlan - Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.686..0.691 rows=1 loops=1) Filter: ((username)::name = current_user()) - Hash Join (cost=95.02..3701.21 rows=215 width=138) (actual time=100.476..1337.392 rows=429 loops=1) Hash Cond: (outer.staff_id = inner.staff_id) Join Filter: (subplan) - Seq Scan on staff_contract sc (cost=0.00..33.24 rows=1024 width=37) (actual time=0.114..245.366 rows=1024 loops=1) - Hash (cost=93.95..93.95 rows=430 width=109) (actual time=38.563..38.563 rows=0 loops=1) - Hash Join (cost=47.47..93.95 rows=430 width=109) (actual time=15.502..36.627 rows=429 loops=1) Hash Cond: (outer.staff_id = inner.staff_id) - Seq Scan on staff (cost=34.61..66.48 rows=1030 width=105) (actual time=9.655..15.264 rows=1030 loops=1) Filter: ((hashed subplan) OR $5) SubPlan - Seq Scan on staff_area (cost=10.73..33.38 rows=493 width=4) (actual time=8.452..8.452 rows=0 loops=1) Filter: ((hashed subplan) OR (area_id = 1)) SubPlan - Seq Scan on wruserarea (cost=3.26..10.72 rows=5 width=4) (actual time=0.977..1.952 rows=1 loops=1) Filter: (uid = $1) InitPlan - Seq Scan on wruser (cost=0.00..3.26 rows=1 width=4) (actual time=0.921..0.926 rows=1 loops=1) Filter: ((username)::name = current_user()) - Hash (cost=11.79..11.79 rows=430 width=4) (actual time=5.705..5.705 rows=0 loops=1) - Index Scan using fk_idx_search_reqt_result on search_reqt_result sr (cost=0.00..11.79 rows=430 width=4) (actual time=0.470..4.482 rows=429 loops=1) Index Cond: (search_id = 1) SubPlan - HashAggregate (cost=8.32..8.32 rows=1 width=4) (actual time=2.157..2.157 rows=1 loops=429) Filter: (count(contract_id) = $9) InitPlan - Aggregate (cost=1.04..1.04 rows=1 width=4) (actual time=0.172..0.173 rows=1 loops=1) - Seq Scan on search_order_reqt (cost=0.00..1.04 rows=1 width=4) (actual time=0.022..0.038 rows=1 loops=1) Filter: (search_id = 1) - Hash IN Join (cost=1.04..7.27 rows=1 width=4) (actual time=2.064..2.117 rows=1 loops=429) Hash Cond: (outer.product_id = inner.product_id) - Nested Loop (cost=0.00..6.19 rows=7 width=8) (actual time=1.112..2.081 rows=8 loops=429) - Index Scan using fk_idx_staff_contract_2 on staff_contract c (cost=0.00..3.03 rows=1 width=4) (actual time=0.206..0.245 rows=1 loops=429) Index Cond: (staff_id = $8) Filter: ((avail_date_from = '2004-06-12'::date) AND (avail_date_to = '2004-06-18'::date)) - Index Scan using
[PERFORM] PostgreSQL and Linux 2.6 kernel.
As part of my ongoing evaluation of PostgreSQL I have been doing a little stress testing. I though I would share an interesting result here.. Machine spec: 500 MHz PIII 256MB RAM old-ish IDE HD (5400RPM) Linux 2.4.22 kernel (Madrake 9.2) I have PostgreSQL 7.4.1 installed and have managed to load up a 1.4 GB database from MS SQLServer. Vaccum analyzed it. As a test in PosgreSQL I issued a statement to update a single column of a table containing 2.8 million rows with the values of a column in a table with similar rowcount. Using the above spec I had to stop the server after 17 hours. The poor thing was thrashing the hard disk and doing more swapping than useful work. Having obtained a copy of Mandrake 10.0 with the 2.6 kernal I though I would give it a go. Same hardware. Same setup. Same database loaded up. Same postgresql.conf file to make sure all the settings were the same. Vaccum analyzed it. same update statement COMPLETED in 2 hours 50 minutes. I'm impressed. I could see from vmstat that the system was achieving much greater IO thoughput than the 2.4 kernel. Although the system was still swapping there seems to be a completely different memory management pattern that suits PostgreSQL very well. Just to see that this wasn't a coincidence I am repeating the test. It is now into the 14th hour using the old 2.4 kernel. I'm going to give up. Has anyone else done any comparative testing with the 2.6 kernel? Cheers, Gary. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
The post was not intended to be content-rich, just my initial feedback after only just switching to 2.6. Since I had largely given up on this particular line of attack using 2.4 I didn't think to do a detailed analysis at this time. I was also hoping that others would add to the discussion. As this could become important I will be doing more analysis, but due to the nature of the issue and trying to keep as many factors constant as possible, this may take some time. Cheers, Gary. On 2 Apr 2004 at 1:32, Tom Lane wrote: Gary Doades [EMAIL PROTECTED] writes: As a test in PosgreSQL I issued a statement to update a single column of a table containing 2.8 million rows with the values of a column in a table with similar rowcount. Using the above spec I had to stop the server after 17 hours. The poor thing was thrashing the hard disk and doing more swapping than useful work. This statement is pretty much content-free, since you did not show us the table schemas, the query, or the EXPLAIN output for the query. (I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily have provided all the other hard facts.) There's really no way to tell where the bottleneck is. Maybe it's a kernel-level issue, but I would not bet on that without more evidence. I'd definitely not bet on it without direct confirmation that the same query plan was used in both setups. regards, tom lane -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match