Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
BTW, http://stats.distributed.net/~decibel/base.log is a test I ran; select count(*) was ~6x faster than explain analyze select *. On Tue, Apr 26, 2005 at 07:46:52PM -0700, Kevin Brown wrote: Josh Berkus wrote: Jim, Kevin, Hrm... I was about to suggest that for timing just the query (and not output/data transfer time) using explain analyze, but then I remembered that explain analyze can incur some non-trivial overhead with the timing calls. Is there a way to run the query but have psql ignore the output? If so, you could use \timing. Would timing SELECT COUNT(*) FROM (query) work? Just \timing would work fine; PostgreSQL doesn't return anything until it has the whole result set. Hmm...does \timing show the amount of elapsed time between query start and the first results handed to it by the database (even if the database itself has prepared the entire result set for transmission by that time), or between query start and the last result handed to it by the database? Because if it's the latter, then things like server-client network bandwidth are going to affect the results that \timing shows, and it won't necessarily give you a good indicator of how well the database backend is performing. I would expect that timing SELECT COUNT(*) FROM (query) would give you an idea of how the backend is performing, because the amount of result set data that has to go over the wire is trivial. Each is, of course, useful in its own right, and you want to be able to measure both (so, for instance, you can get an idea of just how much your network affects the overall performance of your queries). That's why MSSQL vs. PostgreSQL timing comparisons are deceptive unless you're careful: MSSQL returns the results on block at a time, and reports execution time as the time required to return the *first* block, as opposed to Postgres which reports the time required to return the whole dataset. Interesting. I had no idea MSSQL did that, but I can't exactly say I'm surprised. :-) -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Josh Berkus wrote: Jim, Kevin, Hrm... I was about to suggest that for timing just the query (and not output/data transfer time) using explain analyze, but then I remembered that explain analyze can incur some non-trivial overhead with the timing calls. Is there a way to run the query but have psql ignore the output? If so, you could use \timing. Would timing SELECT COUNT(*) FROM (query) work? Just \timing would work fine; PostgreSQL doesn't return anything until it has the whole result set. Hmm...does \timing show the amount of elapsed time between query start and the first results handed to it by the database (even if the database itself has prepared the entire result set for transmission by that time), or between query start and the last result handed to it by the database? Because if it's the latter, then things like server-client network bandwidth are going to affect the results that \timing shows, and it won't necessarily give you a good indicator of how well the database backend is performing. I would expect that timing SELECT COUNT(*) FROM (query) would give you an idea of how the backend is performing, because the amount of result set data that has to go over the wire is trivial. Each is, of course, useful in its own right, and you want to be able to measure both (so, for instance, you can get an idea of just how much your network affects the overall performance of your queries). That's why MSSQL vs. PostgreSQL timing comparisons are deceptive unless you're careful: MSSQL returns the results on block at a time, and reports execution time as the time required to return the *first* block, as opposed to Postgres which reports the time required to return the whole dataset. Interesting. I had no idea MSSQL did that, but I can't exactly say I'm surprised. :-) -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Thanks we will try that, we are working on a test suit for the way our app gets data (ODBC). we plan to include updates, inserts, and selects and all three at once with a log of the results. Then we should use a stress test tool to see how it works with multiple instances (I used Microsoft's tool last time I did stress testing). Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 [EMAIL PROTECTED] www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -Original Message- From: Marko Ristola [mailto:[EMAIL PROTECTED] Sent: Sunday, April 24, 2005 2:15 AM To: Joel Fradkin Cc: 'Mohan, Ross'; [EMAIL PROTECTED]; pgsql-performance@postgresql.org Subject: Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Here is, how you can receive all one billion rows with pieces of 2048 rows. This changes PostgreSQL and ODBC behaviour: Change ODBC data source configuration in the following way: Fetch = 2048 UseDeclareFetch = 1 It does not create core dumps with 32 bit computers with billions of rows! This is a bit slower than fetching all rows at once. Scalability means sometimes a bit less speed :( With UseDeclareFetch=1 you might get even 150 thousands rows per second. With UseDeclareFetch=0 the backend might be able to send about 200 thousands rows per second. So, these high numbers come, if all the results are already in memory, and no disc accesses are needed. These are about the peak speeds with VARCHAR, without Unicode, with Athlon64 home computer. With sequential disc scan, more typical fetching speed is about 50-100 thousands rows per second. PostgreSQL ODBC row fetching speed is very good. Perhaps with better discs, with RAID10, the current upper limit about 200 thousands rows per second could be achieved?? So the in memory examples show, that the hard disc is normally the bottleneck. It is on the server side. My experiments are done in Linux. In Windows, the speed might be a bit different by a constant factor (algorithmically). These speeds depend on very many factos even on sequential scan. ODBC speed is affected by the number of columns fetched and the types of the columns. Integers are processed faster than textual or date columns. The network latency is decreased with UseDeclareFetc=1 by increasing the Fetch=2048 parameter: With Fetch=1 you get a bad performance with lots of rows, but if you fetch more data from the server once per 2048 rows, the network latency affects only once for the 2048 row block. Regards, Marko Ristola Joel Fradkin wrote: Hate to be dumb, but unfortunately I am. Could you give me an idea what I should be using, or is there a good resource for me to check out. I have been spending so much time with config and moving data, converting etc, I never looked at the odbc settings (didn't even think about it until Josh brought it up). I did ask him for his advice, but would love a second opinion. Our data is a bit of a mixture, some records have text items most are varchars and integers with a bit of Booleans mixed in. I am running 8.0.2 so not sure if the protocol is ODBC or Postgres? Thanks for responding I appreciate any help Joel Fradkin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mohan, Ross Sent: Thursday, April 21, 2005 10:01 AM To: [EMAIL PROTECTED] Subject: Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Joel, thanks. A couple of things jump out there for me, not a problem for a routine ODBC connection, but perhaps in the lotsa stuff context of your current explorations, it might be relevant? I am completely shooting from the hip, here, but...if it were my goose to cook, I'd be investigating Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID= ; PWD=;ReadOnly=0;Protocol=6.4; || Protocol? Is this related to version? is the driver wy old? FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100; || Fetch great for OLTP, lousy for batch? Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=8190; || what ARE the datatypes and sizes in your particular case? Debug=0; || a run with debug=1 probably would spit up something interesting CommLog=0;Optimizer=1; || Optimizer? that's a new one on me Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsA s Char=1;Parse=0;CancelAsFreeStmt=;ExtraSysTablePrefixes=dd_;LFConversion=1;U p
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
In practice, we have watched Windows evolve in such a fashion with respect to multiuser support, and, in effect, it has never really gotten it. Microsoft started by hacking something on top of MS-DOS, and by the time enough applications had enough dependancies on the way that worked, it has essentially become impossible for them to migrate properly to a multiuser model since applications are normally designed with the myopic this is MY computer! model of the world. Completely false. NT was a complete rewrite (1993ish) and was inherently multi-user with even the GDI running as a user level process (no longer however). The NT kernel was scalable and portable, running on the Alpha, MIPS, etc. However, you do have a point with applications...many win32 developers have a very bad habit about expecting their apps to install and run as root. However, this is generally not a problem with Microsoft stuff. In short, the problem is really people, not the technology. Merlin ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Tried changing the settings and saw no change in a test using asp. The test does several selects on views and tables. It actually seemed to take a bit longer. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 [EMAIL PROTECTED] www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -Original Message- From: Marko Ristola [mailto:[EMAIL PROTECTED] Sent: Sunday, April 24, 2005 2:15 AM To: Joel Fradkin Cc: 'Mohan, Ross'; [EMAIL PROTECTED]; pgsql-performance@postgresql.org Subject: Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Here is, how you can receive all one billion rows with pieces of 2048 rows. This changes PostgreSQL and ODBC behaviour: Change ODBC data source configuration in the following way: Fetch = 2048 UseDeclareFetch = 1 It does not create core dumps with 32 bit computers with billions of rows! This is a bit slower than fetching all rows at once. Scalability means sometimes a bit less speed :( With UseDeclareFetch=1 you might get even 150 thousands rows per second. With UseDeclareFetch=0 the backend might be able to send about 200 thousands rows per second. So, these high numbers come, if all the results are already in memory, and no disc accesses are needed. These are about the peak speeds with VARCHAR, without Unicode, with Athlon64 home computer. With sequential disc scan, more typical fetching speed is about 50-100 thousands rows per second. PostgreSQL ODBC row fetching speed is very good. Perhaps with better discs, with RAID10, the current upper limit about 200 thousands rows per second could be achieved?? So the in memory examples show, that the hard disc is normally the bottleneck. It is on the server side. My experiments are done in Linux. In Windows, the speed might be a bit different by a constant factor (algorithmically). These speeds depend on very many factos even on sequential scan. ODBC speed is affected by the number of columns fetched and the types of the columns. Integers are processed faster than textual or date columns. The network latency is decreased with UseDeclareFetc=1 by increasing the Fetch=2048 parameter: With Fetch=1 you get a bad performance with lots of rows, but if you fetch more data from the server once per 2048 rows, the network latency affects only once for the 2048 row block. Regards, Marko Ristola Joel Fradkin wrote: Hate to be dumb, but unfortunately I am. Could you give me an idea what I should be using, or is there a good resource for me to check out. I have been spending so much time with config and moving data, converting etc, I never looked at the odbc settings (didn't even think about it until Josh brought it up). I did ask him for his advice, but would love a second opinion. Our data is a bit of a mixture, some records have text items most are varchars and integers with a bit of Booleans mixed in. I am running 8.0.2 so not sure if the protocol is ODBC or Postgres? Thanks for responding I appreciate any help Joel Fradkin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mohan, Ross Sent: Thursday, April 21, 2005 10:01 AM To: [EMAIL PROTECTED] Subject: Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Joel, thanks. A couple of things jump out there for me, not a problem for a routine ODBC connection, but perhaps in the lotsa stuff context of your current explorations, it might be relevant? I am completely shooting from the hip, here, but...if it were my goose to cook, I'd be investigating Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID= ; PWD=;ReadOnly=0;Protocol=6.4; || Protocol? Is this related to version? is the driver wy old? FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100; || Fetch great for OLTP, lousy for batch? Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=8190; || what ARE the datatypes and sizes in your particular case? Debug=0; || a run with debug=1 probably would spit up something interesting CommLog=0;Optimizer=1; || Optimizer? that's a new one on me Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsA s Char=1;Parse=0;CancelAsFreeStmt=;ExtraSysTablePrefixes=dd_;LFConversion=1;U p datableCursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBina r y=0;UseServerSidePrepare=0 || that's about all I can see, prima facie. I'll be very
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
I am waiting to here back from Josh on using cursors and trying to flatten long running views. I am a little disappointed I have not understood enough to get my analyzer to use the proper plan, we had to set seqscan off to get the select from response_line to work fast and I had to turn off merge joins to get assoc list to work fast. Once I am up I can try to learn more about it, I am so glad there are so many folks here willing to take time to educate us newb's. I am not a big fan of tweaking the optimizer because you are robbing Peter to pay Paul, so to speak. pg 8.1 may come out with new optimizer tweaks and you'll have to do it all over again. If the optimizer is not 'getting' your view, there are a few different approaches to fixing the problem. I am also not a big fan of de-normalizing your database. Essentially you are lighting a fuse that may blow up later. Here are some general approaches to planner optimization that can help out in tricky situations. 1. Split up views. Often overlooked but can provide good enhancements. If your view is based on 3 or more tables, has left/right joins, consider breaking it up into two or more views. Views can be based on views and it is easier to force the planner to pick good plans this way. If you can find other uses for component views in other queries, so much the better. 2. Materialize your view. Use lazy materialization, i.e. you query the view into a table at scheduled times. Now we are trading disk spaces and coherence for performance...this may not fit your requirements but the nice thing about it is that it will help give us the 'ideal plan' running time which we are shooting for. 3. pl/pgsql. Using combinations of loops, refcursors, and queries, you can cut code that should give you comparable performance to the ideal plan. If you can do the actual work here as well (no data returned to client), you get a tremendous win. Also pl/pgsql works really well for recursive sets and other things that are difficult to run in the context of a single query. Just be aware of the disadvantages: a. not portable b. maintenance overhead c. require relatively high developer skill set I will go out on a limb and say that mastering the above approaches can provide the solution to virtually any performance problem within the limits of your hardware and the problem complexity. Based on your questions, it sounds to me like your #1 problem is your developer skillset relative to your requirements. However, this is easily solvable...just keep attacking the problem and don't be afraid to bring in outside help (which you've already done, that's a start!). Merlin ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Merlin Moncure) wrote: In practice, we have watched Windows evolve in such a fashion with respect to multiuser support, and, in effect, it has never really gotten it. Microsoft started by hacking something on top of MS-DOS, and by the time enough applications had enough dependancies on the way that worked, it has essentially become impossible for them to migrate properly to a multiuser model since applications are normally designed with the myopic this is MY computer! model of the world. Completely false. NT was a complete rewrite (1993ish) and was inherently multi-user with even the GDI running as a user level process (no longer however). The NT kernel was scalable and portable, running on the Alpha, MIPS, etc. Completely irrelevant. When Win32 was deployed, the notion that more than a tiny fraction of the users would be running Win32 apps on multiuser platforms was absolutely laughable. It continued to be laughable until well into this century, when Microsoft ceased to sell systems based on MS-DOS. However, you do have a point with applications...many win32 developers have a very bad habit about expecting their apps to install and run as root. However, this is generally not a problem with Microsoft stuff. In short, the problem is really people, not the technology. Reality is that it is all about the applications. Microsoft spent _years_ pushing people from MS-DOS to Windows 3.1 to WfW to Windows 95, and had to do a lot of hard pushing. The result of that was that a lot of vendors built Win32 applications for Windows 95. None of those systems supported multiple users, so the usage and experience with Win32 pointed everyone to the construction of single user applications. At that point, whether Windows NT did or didn't support multiple users became irrelevant. Usage patterns had to be oriented towards single user operation because that's all Win32 could be used to support for the vast majority that _weren't_ running Windows NT. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://linuxfinances.info/info/x.html But what can you do with it? -- ubiquitous cry from Linux-user partner. -- Andy Pearce, [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Here is, how you can receive all one billion rows with pieces of 2048 rows. This changes PostgreSQL and ODBC behaviour: Change ODBC data source configuration in the following way: Fetch = 2048 UseDeclareFetch = 1 It does not create core dumps with 32 bit computers with billions of rows! This is a bit slower than fetching all rows at once. Scalability means sometimes a bit less speed :( With UseDeclareFetch=1 you might get even 150 thousands rows per second. With UseDeclareFetch=0 the backend might be able to send about 200 thousands rows per second. So, these high numbers come, if all the results are already in memory, and no disc accesses are needed. These are about the peak speeds with VARCHAR, without Unicode, with Athlon64 home computer. With sequential disc scan, more typical fetching speed is about 50-100 thousands rows per second. PostgreSQL ODBC row fetching speed is very good. Perhaps with better discs, with RAID10, the current upper limit about 200 thousands rows per second could be achieved?? So the in memory examples show, that the hard disc is normally the bottleneck. It is on the server side. My experiments are done in Linux. In Windows, the speed might be a bit different by a constant factor (algorithmically). These speeds depend on very many factos even on sequential scan. ODBC speed is affected by the number of columns fetched and the types of the columns. Integers are processed faster than textual or date columns. The network latency is decreased with UseDeclareFetc=1 by increasing the Fetch=2048 parameter: With Fetch=1 you get a bad performance with lots of rows, but if you fetch more data from the server once per 2048 rows, the network latency affects only once for the 2048 row block. Regards, Marko Ristola Joel Fradkin wrote: Hate to be dumb, but unfortunately I am. Could you give me an idea what I should be using, or is there a good resource for me to check out. I have been spending so much time with config and moving data, converting etc, I never looked at the odbc settings (didn't even think about it until Josh brought it up). I did ask him for his advice, but would love a second opinion. Our data is a bit of a mixture, some records have text items most are varchars and integers with a bit of Booleans mixed in. I am running 8.0.2 so not sure if the protocol is ODBC or Postgres? Thanks for responding I appreciate any help Joel Fradkin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mohan, Ross Sent: Thursday, April 21, 2005 10:01 AM To: [EMAIL PROTECTED] Subject: Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Joel, thanks. A couple of things jump out there for me, not a problem for a routine ODBC connection, but perhaps in the lotsa stuff context of your current explorations, it might be relevant? I am completely shooting from the hip, here, but...if it were my goose to cook, I'd be investigating Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4; || Protocol? Is this related to version? is the driver wy old? FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100; || Fetch great for OLTP, lousy for batch? Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=8190; || what ARE the datatypes and sizes in your particular case? Debug=0; || a run with debug=1 probably would spit up something interesting CommLog=0;Optimizer=1; || Optimizer? that's a new one on me Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAs Char=1;Parse=0;CancelAsFreeStmt=;ExtraSysTablePrefixes=dd_;LFConversion=1;Up datableCursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinar y=0;UseServerSidePrepare=0 || that's about all I can see, prima facie. I'll be very curious to know if ODBC is any part of your performance equation. HTH, Ross -Original Message- From: Joel Fradkin [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 10:54 AM To: Mohan, Ross Cc: [EMAIL PROTECTED]; PostgreSQL Perform Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Here is the connect string I am using. It could be horrid as I cut it from ODBC program. Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVar charSize=254;MaxLongVarcharSize=8190;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;Us eDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Pa rse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableC ursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseS erverSidePrepare=0 Joel Fradkin -Original Message- From: Mohan, Ross [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 9:42 AM
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
I would very, very strongly encourage you to run multi-user tests before deciding on mysql. Mysql is nowhere near as capable when it comes to concurrent operations as PostgreSQL is. From what others have said, it doesn't take many concurrent operations for it to just fall over. I can't speak from experience because I avoid mysql like the plague, though. :) I am just testing the water so to speak, if it cant handle single user tests then multiple user tests are kind of a waste of time. I am trying to de-normalize my view into a table to see if I can get my app to work. It is a good idea anyway but raises a ton of questions about dealing with the data post a case being closed etc; also on multiple child relationships like merchandise and payments etc. I did do a test of all three (MSSQL, MYSQL,and postgres) in aqua studio , all on the same machine running the servers and found postgres beat out MYSQL, but like any other test it may have been an issue with aqua studio and mysql in any case I have not made a decision to use mysql I am still researching fixes for postgres. I am waiting to here back from Josh on using cursors and trying to flatten long running views. I am a little disappointed I have not understood enough to get my analyzer to use the proper plan, we had to set seqscan off to get the select from response_line to work fast and I had to turn off merge joins to get assoc list to work fast. Once I am up I can try to learn more about it, I am so glad there are so many folks here willing to take time to educate us newb's. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel Fradkin wrote: I would very, very strongly encourage you to run multi-user tests before deciding on mysql. Mysql is nowhere near as capable when it comes to concurrent operations as PostgreSQL is. From what others have said, it doesn't take many concurrent operations for it to just fall over. I can't speak from experience because I avoid mysql like the plague, though. :) I am just testing the water so to speak, if it cant handle single user tests then multiple user tests are kind of a waste of time. Joel I think you are missing the point on the above comment. The above comment as I read is, o.k. you are having problems with PostgreSQL BUT MySQL isn't going to help you and you will see that in multi-user tests. MySQL is known to work very well on small databases without a lot of concurrent sessions. I don't think anybody here would argue that. Where MySQL runs into trouble is larger databases with lots of concurrent connections. I am a little disappointed I have not understood enough to get my analyzer to use the proper plan, we had to set seqscan off to get the select from response_line to work fast and I had to turn off merge joins to get assoc list to work fast. Once I am up I can try to learn more about it, I am so glad there are so many folks here willing to take time to educate us newb's. Sincerely, Joshua D. Drake Command Prompt, Inc. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
I am just testing the water so to speak, if it cant handle single user tests then multiple user tests are kind of a waste of time. At the risk of being even more pedantic, let me point out that if you are going to be running your application with multiple users the reverse is even more true, 'If it can't handle multiple user tests then single user tests are kind of a waste of time'. brew == Strange Brew ([EMAIL PROTECTED]) Check out my Stock Option Covered Call website http://www.callpix.com and my Musician's Online Database Exchange http://www.TheMode.com == ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Centuries ago, Nostradamus foresaw when [EMAIL PROTECTED] (Joel Fradkin) would write: I am just testing the water so to speak, if it cant handle single user tests then multiple user tests are kind of a waste of time. I would suggest that if multi-user functionality is needed, then starting with single user tests is a similar waste of time. There's good reason to look at it this way... It is all too common for people to try to start building things with primitive functionality, and then try to evolve the system into what they need. It is possible for that to work, if the base covers enough of the necessary functionality. In practice, we have watched Windows evolve in such a fashion with respect to multiuser support, and, in effect, it has never really gotten it. Microsoft started by hacking something on top of MS-DOS, and by the time enough applications had enough dependancies on the way that worked, it has essentially become impossible for them to migrate properly to a multiuser model since applications are normally designed with the myopic this is MY computer! model of the world. You may not need _total_ functionality in the beginning, but, particularly for multiuser support, which has deep implications for applications, it needs to be there In The Beginning. -- output = reverse(moc.liamg @ enworbbc) http://linuxdatabases.info/info/lisp.html A CONS is an object which cares. -- Bernie Greenberg. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andreas Pflug Sent: 21 April 2005 14:06 To: Joel Fradkin Cc: 'John A Meinel'; josh@agliodbs.com; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Beware! From the data, I can see that you're probably using pgAdmin3. The time to execute your query including transfer of all data to the client is 17s in this example, while displaying it (i.e. pure GUI and memory alloc stuff) takes 72s. Execute to a file to avoid this. Perhaps we should add a guruhint there for longer runtimes? Regards, dave ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Quoting Alvaro Herrera [EMAIL PROTECTED]: One further question is: is this really a meaningful test? I mean, in production are you going to query 30 rows regularly? And is the system always going to be used by only one user? I guess the question is if this big select is representative of the load you expect in production. While there may be some far-out queries that nobody would try, you might be surprised what becomes the norm for queries, as soon as the engine feasibly supports them. SQL is used for warehouse and olap apps, as a data queue, and as the co-ordinator or bridge for (non-SQL) replication apps. In all of these, you see large updates, large result sets and volatile tables (large to me means over 20% of a table and over 1M rows). To answer your specific question: yes, every 30 mins, in a data redistribution app that makes a 1M-row query, and writes ~1000 individual update files, of overlapping sets of rows. It's the kind of operation SQL doesn't do well, so you have to rely on one big query to get the data out. My 2c -- Dreams come true, not free. -- S.Sondheim, ITW ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Hrm... I was about to suggest that for timing just the query (and not output/data transfer time) using explain analyze, but then I remembered that explain analyze can incur some non-trivial overhead with the timing calls. Is there a way to run the query but have psql ignore the output? If so, you could use \timing. In any case, it's not valid to use pgadmin to time things. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
On Fri, Apr 22, 2005 at 01:51:08PM -0400, Joel Fradkin wrote: I just finished testing Postgres, MYSQL, and MSSQL on my machine (2 gigs internal XP). I have adjusted the postgres config to what I think is an ok place and have mysql default and mssql default. Using Aqua studio a program that hooks to all three I have found: Initial exec Second exec Returning 331,640 records on all 3 database MSSQL468ms 16ms 2 mins 3 secs MYSQL 14531ms 6625ms 2 mins 42 secs Postgr 52120ms 11702ms 2 mins 15 secs One further question is: is this really a meaningful test? I mean, in production are you going to query 30 rows regularly? And is the system always going to be used by only one user? I guess the question is if this big select is representative of the load you expect in production. What happens if you execute the query more times? Do the times stay the same as the second run? -- Alvaro Herrera ([EMAIL PROTECTED]) Use it up, wear it out, make it do, or do without ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
On Fri, Apr 22, 2005 at 05:04:19PM -0400, Joel Fradkin wrote: And is the system always going to be used by only one user? No we have 400+ concurrent users I guess the question is if this big select is representative of the load you expect in production. Yes we see many time on the two processor box running MSSQL large return sets using 100%cpu for 5-30 seconds. What happens if you execute the query more times? Do the times stay the same as the second run? I will definitely have to pressure testing prior to going live in production. I have not done concurrent tests as honestly single user tests are failing, so multiple user testing is not something I need yet. I would very, very strongly encourage you to run multi-user tests before deciding on mysql. Mysql is nowhere near as capable when it comes to concurrent operations as PostgreSQL is. From what others have said, it doesn't take many concurrent operations for it to just fall over. I can't speak from experience because I avoid mysql like the plague, though. :) Likewise, MSSQL will probably look better single-user than it will multi-user. Unless you're going to only access the database single-user, it's just not a valid test case (and by the way, this is true no matter what database you're looking at. Multiuser access is where you uncover your real bottlenecks.) -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
One further question is: is this really a meaningful test? I mean, in production are you going to query 30 rows regularly? It is a query snippet if you will as the view I posted for audit and case where tables are joined are more likely to be ran. Josh and I worked over this until we got explain analyze on the linux box to 1 sec. I was just using this as a test as I don't have my views set up on MYSQL. So many of my reports pull huge data sets (comprised of normalized joins). I am thinking I probably have to modify to using an non normalized table, and Josh is sending me information on using cursors instead of selects. And is the system always going to be used by only one user? No we have 400+ concurrent users I guess the question is if this big select is representative of the load you expect in production. Yes we see many time on the two processor box running MSSQL large return sets using 100%cpu for 5-30 seconds. What happens if you execute the query more times? Do the times stay the same as the second run? I will definitely have to pressure testing prior to going live in production. I have not done concurrent tests as honestly single user tests are failing, so multiple user testing is not something I need yet. Joel -- Alvaro Herrera ([EMAIL PROTECTED]) Use it up, wear it out, make it do, or do without ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Are you using 8.0.2? I hope so because there were some Win32 performance changes related to fsync() in that release. --- Joel Fradkin wrote: I just finished testing Postgres, MYSQL, and MSSQL on my machine (2 gigs internal XP). I have adjusted the postgres config to what I think is an ok place and have mysql default and mssql default. Using Aqua studio a program that hooks to all three I have found: Initial exec Second exec Returning 331,640 records on all 3 database MSSQL468ms 16ms 2 mins 3 secs MYSQL 14531ms 6625ms 2 mins 42 secs Postgr 52120ms 11702ms 2 mins 15 secs Not sure if this proves your point on PGadmin versus MYSQL query tool versus MSSQL Query tool, but it certainly seems encouraging. I am going to visit Josh's tests he wanted me to run on the LINUX server. Joel Fradkin ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
I just finished testing Postgres, MYSQL, and MSSQL on my machine (2 gigs internal XP). I have adjusted the postgres config to what I think is an ok place and have mysql default and mssql default. Using Aqua studio a program that hooks to all three I have found: Initial exec Second exec Returning 331,640 records on all 3 database MSSQL468ms 16ms 2 mins 3 secs MYSQL 14531ms 6625ms 2 mins 42 secs Postgr 52120ms 11702ms 2 mins 15 secs Not sure if this proves your point on PGadmin versus MYSQL query tool versus MSSQL Query tool, but it certainly seems encouraging. I am going to visit Josh's tests he wanted me to run on the LINUX server. Joel Fradkin ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
John A Meinel [EMAIL PROTECTED] writes: Joel Fradkin wrote: Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. How were you measuring data retrieval time? I suspect he's using pgadmin. We've seen reports before suggesting that pgadmin can be amazingly slow, eg here http://archives.postgresql.org/pgsql-performance/2004-10/msg00427.php where the *actual* data retrieval time as shown by EXPLAIN ANALYZE was under three seconds, but pgadmin claimed the query runtime was 22 sec and data retrieval runtime was 72 sec. The problem is that pgAdmin takes your query results and puts it in a grid. The grid is not designed to be used in that way for large datasets. The time complexity is not linear and really breaks down around 10k-100k rows depending on various factors. pgAdmin users just have to become used to it and use limit or the filter feature at appropriate times. The ms sql enterprise manager uses cursors which has its own set of nasty issues (no mvcc). In fairness, unless you are running with \a switch, psql adds a fair amount of time to the query too. Joel: Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. The Data retrieval runtime is time spend by pgAdmin formatting, etc. The query runtime is the actual timing figure you should be concerned with (you are not comparing apples to apples). I can send you a utility I wrote in Delphi which adds only a few seconds overhead for 360k result set. Or, go into psql, throw \a switch, and run query. or: psql -A -c select * from myview where x output.txt it should finish the above in 16-17 sec plus the time to write out the file. Joel, I have a lot of experience with all three databases you are evaluating and you are making a huge mistake switching to mysql. you can make a decent case for ms sql, but it's quite expensive at your level of play as you know. Merlin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel Fradkin wrote: I did think of something similar just loading the data tables with junk records and I may visit that idea with Josh. I did just do some comparisons on timing of a plain select * from tbl where indexed column = x and it was considerably slower then both MSSQL and MYSQL, so I am still a bit confused. This still might be configuration issue (I ran on my 2gig desktop and the 8 gig Linux box comparisons were all ran on the same machines as far MSSQL, MYSQL, and Postgres. I turned off postgres when running MYSQL and turned off MYSQL when running postgres, MSSQL had one of the two running while I tested it. For the 360,000 records returned MYSQL did it in 40 seconds first run and 17 seconds second run. MSSQL did it in 56 seconds first run and 16 seconds second run. Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. Beware! From the data, I can see that you're probably using pgAdmin3. The time to execute your query including transfer of all data to the client is 17s in this example, while displaying it (i.e. pure GUI and memory alloc stuff) takes 72s. Execute to a file to avoid this. Regards, Andreas ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Why is MYSQL returning 360,000 rows, while Postgres is only returning 330,000? This may not be important at all, though. I also assume you are selecting from a plain table, not a view. Yes plain table. Difference in rows is one of the datasets had sears data in it. It (speed differences found) is much worse on some of my views, which is what forced me to start looking at other options. I suppose knowing your work_mem, and shared_buffers settings would be useful. I have posted my configs, but will add the Tampa to the bottom again. My desktop has # - Memory - shared_buffers = 8000 # min 16, at least max_connections*2, 8KB each work_mem = 8000#1024# min 64, size in KB maintenance_work_mem = 16384# min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 3#2 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 1000# min 100, ~50 bytes each # - Planner Cost Constants - effective_cache_size = 8#1000 # typically 8KB each random_page_cost = 2# units are one sequential page fetch cost How were you measuring data retrieval time? And how does this compare to what you were measuring on the other machines? It might be possible that what you are really measuring is just the time it takes psql to load up all the data into memory, and then print it out. And since psql defaults to measuring entry lengths for each column, this may not be truly comparable. It *looks* like it only takes 18s for postgres to get the data, but then it is taking 72s to transfer the data to you. That would be network latency, or something like that, not database latency. And I would say that 18s is very close to 16 or 17 seconds. This was ran on the machine with database (as was MYSQL and MSSQL). The PG timing was from PGADMIN and the 18 secs was second run, first run was Same time to return the data and 70 secs to do the first part like 147 secs all told, compared to the 40 seconds first run of MYSQL and 56 Seconds MSSQL. MYSQL was done in their query tool, it returns the rows as well and MSSQL was done in their query analyzer. All three tools appear to use a similar approach. Just an FYI doing an explain analyze of my problem view took much longer then actually returning the data in MSSQL and MYSQL. I have done extensive testing with MYSQL (just this table and two of my problem views). I am not using the transactional version, because I need the best speed. I don't know what commands you were issuing, or how you measured, though. You might be using some other interface (like ODBC), which I can't say too much about. John =:- This is the Linux box config. # - # PostgreSQL configuration file # - # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Please note that re-commenting a setting is NOT sufficient to revert it # to the default value, unless you restart the postmaster. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # pg_ctl reload. Some settings, such as listen_address, require # a postmaster shutdown and restart to take effect. #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. # data_directory = 'ConfigDir' # use data in another directory #data_directory = '/pgdata/data' # hba_file = 'ConfigDir/pg_hba.conf'# the host-based authentication file # ident_file = 'ConfigDir/pg_ident.conf' # the IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. # external_pid_file = '(none)' # write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - #listen_addresses = 'localhost' # what IP interface(s) to listen on; # defaults to localhost, '*' = any listen_addresses = '*' port = 5432 max_connections = 100 # note: increasing max_connections costs about 500
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. Maybe one of the ODBC cognoscenti here can chime in more concretely -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joel Fradkin Sent: Thursday, April 21, 2005 10:36 AM To: 'Tom Lane'; 'John A Meinel' Cc: 'Postgresql Performance' Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon I suspect he's using pgadmin. Yup I was, but I did try running on the linux box in psql, but it was running to the screen and took forever because of that. The real issue is returning to my app using ODBC is very slow (Have not tested the ODBC for MYSQL, MSSQL is ok (the two proc dell is running out of steam but been good until this year when we about doubled our demand by adding sears as a client). Using odbc to postgres on some of the views (Josh from Command is having me do some very specific testing) is timing out with a 10 minute time limit. These are pages that still respond using MSSQL (this is wehere production is using the duel proc and the test is using the 4 proc). I have a tool that hooks to all three databases so I can try it with that and see if I get different responses. Joel ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
I suspect he's using pgadmin. Yup I was, but I did try running on the linux box in psql, but it was running to the screen and took forever because of that. The real issue is returning to my app using ODBC is very slow (Have not tested the ODBC for MYSQL, MSSQL is ok (the two proc dell is running out of steam but been good until this year when we about doubled our demand by adding sears as a client). Using odbc to postgres on some of the views (Josh from Command is having me do some very specific testing) is timing out with a 10 minute time limit. These are pages that still respond using MSSQL (this is wehere production is using the duel proc and the test is using the 4 proc). I have a tool that hooks to all three databases so I can try it with that and see if I get different responses. Joel ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Here is the connect string I am using. It could be horrid as I cut it from ODBC program. Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVar charSize=254;MaxLongVarcharSize=8190;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;Us eDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Pa rse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableC ursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseS erverSidePrepare=0 Joel Fradkin -Original Message- From: Mohan, Ross [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 9:42 AM To: [EMAIL PROTECTED] Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. Maybe one of the ODBC cognoscenti here can chime in more concretely ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel, thanks. A couple of things jump out there for me, not a problem for a routine ODBC connection, but perhaps in the lotsa stuff context of your current explorations, it might be relevant? I am completely shooting from the hip, here, but...if it were my goose to cook, I'd be investigating Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4; || Protocol? Is this related to version? is the driver wy old? FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100; || Fetch great for OLTP, lousy for batch? Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=8190; || what ARE the datatypes and sizes in your particular case? Debug=0; || a run with debug=1 probably would spit up something interesting CommLog=0;Optimizer=1; || Optimizer? that's a new one on me Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Parse=0;CancelAsFreeStmt=;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableCursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseServerSidePrepare=0 || that's about all I can see, prima facie. I'll be very curious to know if ODBC is any part of your performance equation. HTH, Ross -Original Message- From: Joel Fradkin [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 10:54 AM To: Mohan, Ross Cc: [EMAIL PROTECTED]; PostgreSQL Perform Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Here is the connect string I am using. It could be horrid as I cut it from ODBC program. Session(StringConn) = DRIVER={PostgreSQL};DATABASE=wazagua;SERVER=192.168.123.252;PORT=5432;UID=; PWD=;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0; ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVar charSize=254;MaxLongVarcharSize=8190;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;Us eDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=0;BoolsAsChar=1;Pa rse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableC ursors=1;DisallowPremature=0;TrueIsMinus1=0;BI=0;ByteaAsLongVarBinary=0;UseS erverSidePrepare=0 Joel Fradkin -Original Message- From: Mohan, Ross [mailto:[EMAIL PROTECTED] Sent: Thursday, April 21, 2005 9:42 AM To: [EMAIL PROTECTED] Subject: RE: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon FWIW, ODBC has variables to tweak, as well. fetch/buffer sizes, and the like. Maybe one of the ODBC cognoscenti here can chime in more concretely ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel, I did not see any marked improvement, but I dont think my issues are related to the hardware. If you won't believe it, then we certainly can't convince you. AFAIK your bad view is a bad query plan made worse by the Dell's hardware problems. I am giving up on postgres and three developers two months of work and trying MYSQL. I'd suggest testing your *whole* application and not just this one query. And remember that you need to test InnoDB tables if you want transactions. I have posted several items and not got a response (not that I expect folks to drop everything). I want to thank everyone who has been of help and there are several. Hmmm ... I see about 25 responses to some of your posts on this list. Including ones by some of our head developers. That's more than you'd get out of a paid MSSQL support contract, I know from experience. If you want anything more, then you'll need a do-or-die contract with a support company. If your frustration is because you can't find this kind of help than I completely understand ... I have a waiting list for performance contracts myself. (and, if you hired me the first thing I'd tell you is to junk the Dell) I really like the environment and feel I have learned a lot in the past few months, but bottom line for me is speed. We bought a 30K Dell 6650 to get better performance. Would have been smart to ask on this list *before* buying the Dell, hey? Even a Google of this mailing list would have been informative. I chose PG because MSSQL was 70K to license. I believe the MYSQL will be 250.00 to license for us, but I may choose the 4k platinum support just to feel safe about having some one to touch base with in the event of an issue. Hmmm ... you're willing to pay MySQL $4k but expect the PG community to solve all your problems with free advice and a couple $100 with CMD? I sense an apples vs. barca loungers comparison here ... I am not sure I am walking away feeling real good about postgres, because it just should not take a rocket scientist to get it to work, and I used to think I was fairly smart and could figure stuff out and I hate admitting defeat (especially since we have everything working with postgres now). While I understand your frustration (I've been frustrated more than a few times with issues that stump me on Linux, for example) it's extremely unfair to lash out at a community that has provided you a lot of free advice because the advice hasn't fixed everything yet. By my reading, you first raised your query issue 6 days ago. 6 days is not a lot of time for getting *free* troubleshooting help by e-mail. Certainly it's going to take more than 6 days to port to MySQL. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Sorry if you feel I am lashing out at a community. Just to say it again, I am very appreciative of all the help everyone has supplied. I am running on more then just the 4 proc Dell (in fact my tests have been mostly on desktops). I have MSSQL running on a 2 proc dell which until my load has increased (over aprx 2 years) it was just fine. I totally agree that there are better solutions based on this lists comments, but I have all Dell hardware now and resist trying different vendors just to suit Postgres. I was under the impression there were still issues with 64bit postgres and Linux (or at least were when I purchased). I believed I could make my next aquistion a opteron based hardware. Again I am not at all trying to critasize any one, so please except my apology if I some how came across with that attitude. I am very disappointed at this point. My views may not be that great (although I am not saying that either), but they run ok on MSSQL and appear to run ok on MYSQL. I wish I did understand what I am doing wrong because I do not wish to revisit engineering our application for MYSQL. I would of spent more $ with Command, but he does need my data base to help me and I am not able to do that. I agree testing the whole app is the only way to see and unfortunately it is a time consuming bit. I do not have to spend 4k on MYSQL, that is if I want to have their premium support. I can spend $250.00 a server for the commercial license if I find the whole app does run well. I just loaded the data last night and only had time to convert one view this morning. I am sure it is something I do not understand and not a problem with postgres. I also am willing to take time to get more knowledgeable, but my time is running out and I feel honestly stupid. I have been in the process of converting for over two months and have said several times these lists are a godsend. It was never my intention to make you feel like I was flaming anyone involved. On the contrary, I feel many have taken time to look at my questions and given excellent advice. I know I check the archives so hopefully that time will help others after me. I may yet find that MYSQL is not a good fit as well. I have my whole app converted at this point and find pg works well for a lot of my usage. There are some key reporting views that need to retrieve many rows with many joins that just take too long to pull the data. I told my boss just now that if I try to de-normalize many of these data sets (like 6 main groups of data that the reporting may work, but as is many of my web pages are timing out (these are pages that still work on MSSQL and the 2 proc machine). Thanks again for all the help and know I truly appreciate what time every one has spent on my issues. I may find that revisiting the datasets is a way to make PG work, or as you mentioned maybe I can get some one with more knowledge to step in locally. I did ask Tom if he knew of anyone, maybe some one else on the list is aware of a professional in the Tampa FL area. Realistically I don't think a 30k$ Dell is a something that needs to be junked. I am pretty sure if I got MSSQL running on it, it would outperform my two proc box. I can agree it may not have been the optimal platform. My decision is not based solely on the performance on the 4 proc box. Joel Fradkin -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 20, 2005 1:54 PM To: Joel Fradkin Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Joel, I did not see any marked improvement, but I don't think my issues are related to the hardware. If you won't believe it, then we certainly can't convince you. AFAIK your bad view is a bad query plan made worse by the Dell's hardware problems. I am giving up on postgres and three developers two months of work and trying MYSQL. I'd suggest testing your *whole* application and not just this one query. And remember that you need to test InnoDB tables if you want transactions. I have posted several items and not got a response (not that I expect folks to drop everything). I want to thank everyone who has been of help and there are several. Hmmm ... I see about 25 responses to some of your posts on this list. Including ones by some of our head developers. That's more than you'd get out of a paid MSSQL support contract, I know from experience. If you want anything more, then you'll need a do-or-die contract with a support company. If your frustration is because you can't find this kind of help than I completely understand ... I have a waiting list for performance contracts myself. (and, if you hired me the first thing I'd tell you is to junk the Dell) I really like the environment and feel I have learned a lot in the past few months, but bottom line for me is speed. We bought a 30K Dell 6650 to get better performance. Would have been smart to ask
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel, I have MSSQL running on a 2 proc dell which until my load has increased (over aprx 2 years) it was just fine. I totally agree that there are better solutions based on this lists comments, but I have all Dell hardware now and resist trying different vendors just to suit Postgres. I was under the impression there were still issues with 64bit postgres and Linux (or at least were when I purchased). I believed I could make my next aquistion a opteron based hardware. Yeah, sorry, the Dell stuff is a sore point with me. You can't imagine the number of conversations I have that go like this: We're having a severe performance problem with PostgreSQL What hardware/OS are you using? Dell *650 with RHAS 3.0 BTW, which Update version is your RHAS? If you're on Update3, you can grab more performance right there by upgrading to Update4. Again I am not at all trying to critasize any one, so please except my apology if I some how came across with that attitude. I am very disappointed at this point. My views may not be that great (although I am not saying that either), but they run ok on MSSQL and appear to run ok on MYSQL. Yeah. I think you'll find a few things that are vice-versa. For that matter, I can point to a number of queries we run better than Oracle, and a number we don't. Your particular query problem seems to stem from some bad estimates. Can you post an EXPLAIN ANALYZE based on all the advice people have given you so far? I wish I did understand what I am doing wrong because I do not wish to revisit engineering our application for MYSQL. I can imagine. I would of spent more $ with Command, but he does need my data base to help me and I am not able to do that. Yes. For that matter, it'll take longer to troubleshoot on this list because of your security concerns. I agree testing the whole app is the only way to see and unfortunately it is a time consuming bit. I do not have to spend 4k on MYSQL, that is if I want to have their premium support. I can spend $250.00 a server for the commercial license if I find the whole app does run well. I just loaded the data last night and only had time to convert one view this morning. I am sure it is something I do not understand and not a problem with postgres. I also am willing to take time to get more knowledgeable, but my time is running out and I feel honestly stupid. You're not. You have a real query problem and it will require further troubleshooting to solve. Some of us make a pretty handsome living solving these kinds of problems, it take a lot of expert knowledge. It was never my intention to make you feel like I was flaming anyone involved. On the contrary, I feel many have taken time to look at my questions and given excellent advice. I know I check the archives so hopefully that time will help others after me. Well, I overreacted too. Sorry! I may find that revisiting the datasets is a way to make PG work, or as you mentioned maybe I can get some one with more knowledge to step in locally. I did ask Tom if he knew of anyone, maybe some one else on the list is aware of a professional in the Tampa FL area. Well, Robert Treat is in Florida but I'm pretty sure he's busy full-time. Realistically I don't think a 30k$ Dell is a something that needs to be junked. I am pretty sure if I got MSSQL running on it, it would outperform my two proc box. I can agree it may not have been the optimal platform. My decision is not based solely on the performance on the 4 proc box. Oh, certainly it's too late to buy a Sunfire or eServer instead. You just could have gotten far more bang for the buck with some expert advice, that's all. But don't bother with Dell support any further, they don't really have the knowledge to help you. So ... new EXPLAIN ANALYZE ? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
He is running RHAS4, which is the latest 2.6.x kernel from RH. I believe it should have done away with the RHAS3.0 Update 3 IO issue. anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 20, 2005 4:23 PM To: Joel Fradkin Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon Joel, I have MSSQL running on a 2 proc dell which until my load has increased (over aprx 2 years) it was just fine. I totally agree that there are better solutions based on this lists comments, but I have all Dell hardware now and resist trying different vendors just to suit Postgres. I was under the impression there were still issues with 64bit postgres and Linux (or at least were when I purchased). I believed I could make my next aquistion a opteron based hardware. Yeah, sorry, the Dell stuff is a sore point with me. You can't imagine the number of conversations I have that go like this: We're having a severe performance problem with PostgreSQL What hardware/OS are you using? Dell *650 with RHAS 3.0 BTW, which Update version is your RHAS? If you're on Update3, you can grab more performance right there by upgrading to Update4. Again I am not at all trying to critasize any one, so please except my apology if I some how came across with that attitude. I am very disappointed at this point. My views may not be that great (although I am not saying that either), but they run ok on MSSQL and appear to run ok on MYSQL. Yeah. I think you'll find a few things that are vice-versa. For that matter, I can point to a number of queries we run better than Oracle, and a number we don't. Your particular query problem seems to stem from some bad estimates. Can you post an EXPLAIN ANALYZE based on all the advice people have given you so far? I wish I did understand what I am doing wrong because I do not wish to revisit engineering our application for MYSQL. I can imagine. I would of spent more $ with Command, but he does need my data base to help me and I am not able to do that. Yes. For that matter, it'll take longer to troubleshoot on this list because of your security concerns. I agree testing the whole app is the only way to see and unfortunately it is a time consuming bit. I do not have to spend 4k on MYSQL, that is if I want to have their premium support. I can spend $250.00 a server for the commercial license if I find the whole app does run well. I just loaded the data last night and only had time to convert one view this morning. I am sure it is something I do not understand and not a problem with postgres. I also am willing to take time to get more knowledgeable, but my time is running out and I feel honestly stupid. You're not. You have a real query problem and it will require further troubleshooting to solve. Some of us make a pretty handsome living solving these kinds of problems, it take a lot of expert knowledge. It was never my intention to make you feel like I was flaming anyone involved. On the contrary, I feel many have taken time to look at my questions and given excellent advice. I know I check the archives so hopefully that time will help others after me. Well, I overreacted too. Sorry! I may find that revisiting the datasets is a way to make PG work, or as you mentioned maybe I can get some one with more knowledge to step in locally. I did ask Tom if he knew of anyone, maybe some one else on the list is aware of a professional in the Tampa FL area. Well, Robert Treat is in Florida but I'm pretty sure he's busy full-time. Realistically I don't think a 30k$ Dell is a something that needs to be junked. I am pretty sure if I got MSSQL running on it, it would outperform my two proc box. I can agree it may not have been the optimal platform. My decision is not based solely on the performance on the 4 proc box. Oh, certainly it's too late to buy a Sunfire or eServer instead. You just could have gotten far more bang for the buck with some expert advice, that's all. But don't bother with Dell support any further, they don't really have the knowledge to help you. So ... new EXPLAIN ANALYZE ? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel Fradkin wrote: ... I would of spent more $ with Command, but he does need my data base to help me and I am not able to do that. ... What if someone were to write an anonymization script. Something that changes any of the data of the database, but leaves all of the relational information. It could turn all strings into some sort of hashed version, so you don't give out any identifiable information. It could even modify relational entries, as long as it updated both ends, and this didn't affect the actual performance at all. I don't think this would be very hard to write. Especially if you can give a list of the tables, and what columns need to be modified. Probably this would generally be a useful script to have for cases like this where databases are confidential, but need to be tuned by someone else. Would that be reasonable? I would think that by renaming columns, hashing the data in the columns, and renaming tables, most of the proprietary information is removed, without removing the database information. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
On Apr 20, 2005, at 4:22 PM, Josh Berkus wrote: Realistically I don't think a 30k$ Dell is a something that needs to be junked. I am pretty sure if I got MSSQL running on it, it would outperform my two proc box. I can agree it may not have been the optimal platform. My decision is not based solely on the performance on the 4 proc box. Oh, certainly it's too late to buy a Sunfire or eServer instead. You just could have gotten far more bang for the buck with some expert advice, that's all. But don't bother with Dell support any further, they don't really have the knowledge to help you. FWIW, I have a $20k Dell box (PE2650 with 14-disk external PowerVault RAID enclosure) which I'm phasing out for a dual opteron box because it can't handle the load. It will be re-purposed as a backup system. Damn waste of money, but complaining customers can cost more... Trust me, it is likely your Dell hardware, as moving to the Opteron system has improved performance tremendously with fewer disks. Same amount of RAM and other basic configurations. Both have LSI based RAID cards, even. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
I did think of something similar just loading the data tables with junk records and I may visit that idea with Josh. I did just do some comparisons on timing of a plain select * from tbl where indexed column = x and it was considerably slower then both MSSQL and MYSQL, so I am still a bit confused. This still might be configuration issue (I ran on my 2gig desktop and the 8 gig Linux box comparisons were all ran on the same machines as far MSSQL, MYSQL, and Postgres. I turned off postgres when running MYSQL and turned off MYSQL when running postgres, MSSQL had one of the two running while I tested it. For the 360,000 records returned MYSQL did it in 40 seconds first run and 17 seconds second run. MSSQL did it in 56 seconds first run and 16 seconds second run. Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. So like 89 on the second run. The first run was 147 secs all told. These are all on my 2 meg desktop running XP. I can post the config. I noticed the postgres was using 70% of the cpu while MSSQL was 100%. Joel Fradkin I would of spent more $ with Command, but he does need my data base to help me and I am not able to do that. ... What if someone were to write an anonymization script. Something that changes any of the data of the database, but leaves all of the relational information. It could turn all strings into some sort of hashed version, so you don't give out any identifiable information. It could even modify relational entries, as long as it updated both ends, and this didn't affect the actual performance at all. I don't think this would be very hard to write. Especially if you can give a list of the tables, and what columns need to be modified. Probably this would generally be a useful script to have for cases like this where databases are confidential, but need to be tuned by someone else. Would that be reasonable? I would think that by renaming columns, hashing the data in the columns, and renaming tables, most of the proprietary information is removed, without removing the database information. John =:- ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel Fradkin wrote: I did think of something similar just loading the data tables with junk records and I may visit that idea with Josh. I did just do some comparisons on timing of a plain select * from tbl where indexed column = x and it was considerably slower then both MSSQL and MYSQL, so I am still a bit confused. This still might be configuration issue (I ran on my 2gig desktop and the 8 gig Linux box comparisons were all ran on the same machines as far MSSQL, MYSQL, and Postgres. I turned off postgres when running MYSQL and turned off MYSQL when running postgres, MSSQL had one of the two running while I tested it. For the 360,000 records returned MYSQL did it in 40 seconds first run and 17 seconds second run. MSSQL did it in 56 seconds first run and 16 seconds second run. Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. So like 89 on the second run. The first run was 147 secs all told. These are all on my 2 meg desktop running XP. I can post the config. I noticed the postgres was using 70% of the cpu while MSSQL was 100%. Joel Fradkin Why is MYSQL returning 360,000 rows, while Postgres is only returning 330,000? This may not be important at all, though. I also assume you are selecting from a plain table, not a view. I suppose knowing your work_mem, and shared_buffers settings would be useful. How were you measuring data retrieval time? And how does this compare to what you were measuring on the other machines? It might be possible that what you are really measuring is just the time it takes psql to load up all the data into memory, and then print it out. And since psql defaults to measuring entry lengths for each column, this may not be truly comparable. It *looks* like it only takes 18s for postgres to get the data, but then it is taking 72s to transfer the data to you. That would be network latency, or something like that, not database latency. And I would say that 18s is very close to 16 or 17 seconds. I don't know what commands you were issuing, or how you measured, though. You might be using some other interface (like ODBC), which I can't say too much about. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
Joel, Ok, please try this: ALTER TABLE tblresponseheader ALTER COLUMN clientnum SET STATISTICS 1000; ALTER TABLE tblresponseheader ALTER COLUMN locationid SET STATISTICS 1000; ALTER TABLE tbllocation ALTER COLUMN clientnum SET STATISTICS 1000; ALTER TABLE tbllocation ALTER COLUMN divisionid SET STATISTICS 1000; ALTER TABLE tbllocation ALTER COLUMN regionid SET STATISTICS 1000; ANALYZE tblresponseheader; ANALYZE tbllocation; Then run the EXPLAIN ANALYZE again. (on Linux) -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon
John A Meinel [EMAIL PROTECTED] writes: Joel Fradkin wrote: Postgres was on the second run Total query runtime: 17109 ms. Data retrieval runtime: 72188 ms. 331640 rows retrieved. How were you measuring data retrieval time? I suspect he's using pgadmin. We've seen reports before suggesting that pgadmin can be amazingly slow, eg here http://archives.postgresql.org/pgsql-performance/2004-10/msg00427.php where the *actual* data retrieval time as shown by EXPLAIN ANALYZE was under three seconds, but pgadmin claimed the query runtime was 22 sec and data retrieval runtime was 72 sec. I wouldn't be too surprised if that time was being spent formatting the data into a table for display inside pgadmin. It is a GUI after all, not a tool for pushing vast volumes of data around. It'd be interesting to check the runtimes for the same query with LIMIT 3000, ie, see if a tenth as much data takes a tenth as much processing time or not. The backend code should be pretty darn linear in this regard, but maybe pgadmin isn't. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq