Re: [GENERAL] Performance question
Thanks Laurenz, very good point! Luckily (phew!) the business scenario is such that race conditions cannot occur (and the transaction table is append only). There is business workflow to address duplicates but 1) it occurs extremely rarely (it would be a deliberate sabotage if it occurs) 2) there is no impact on business Yours Anil On Fri, Nov 21, 2014 at 5:16 PM, Albe Laurenz laurenz.a...@wien.gv.at wrote: Anil Menon wrote: I would like to ask from your experience which would be the best generic method for checking if row sets of a certain condition exists in a PLPGSQL function. I know of 4 methods so far (please feel free to add if I missed out any others) [...] Are you aware that all of these methods have a race condition unless you use isolation level READ STABILITY or better? It may be that rows are added or removed between the check and the corresponding action. Yours, Laurenz Albe
Re: [GENERAL] Performance question
Anil Menon wrote: I would like to ask from your experience which would be the best generic method for checking if row sets of a certain condition exists in a PLPGSQL function. I know of 4 methods so far (please feel free to add if I missed out any others) [...] Are you aware that all of these methods have a race condition unless you use isolation level READ STABILITY or better? It may be that rows are added or removed between the check and the corresponding action. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question
Thanks Adrian On Thu, Nov 20, 2014 at 3:46 AM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 11/19/2014 08:26 AM, Anil Menon wrote: Hello, I would like to ask from your experience which would be the best generic method for checking if row sets of a certain condition exists in a PLPGSQL function. I know of 4 methods so far (please feel free to add if I missed out any others) 1) get a count (my previous experience with ORCL shaped this option) select count(*) into vcnt from table where condition if vcnt 0 then do X else do y end if Cons : It seems doing a count(*) is not the best option for PG Well that would depend on the table size, whether it was 100 rows vs 1,000,000 rows The table is estimated/guesstimated to be ~900 million rows (~30Ma day, 90 days history, though initially it would be ~30M), though the where part of the query would return between 0 and ~2 rows 2) Use a non-count option select primary_key_Col into vcnt from table where condition if found then do X else do y end if Cons :Some people seems not to prefer this as (AFAIU) it causes a plpgsql-sql-plpgsql switches plpgsql is fairly tightly coupled to SQL, so I have not really seen any problems. But then I am not working on large datasets. I think that ~900M rows would constitute a large data set most likely 3) using perform perform primary_key_Col into vcnt from table where condition if found then do X else do y end if Seems to remove the above (item 2) issues (if any) AFAIK, you cannot do the above as written. PERFORM does not return a result: http://www.postgresql.org/docs/9.3/interactive/plpgsql- statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT It would have to be more like: perform primary_key_Col from table where condition You are absolutely right - my bad. 4) using exists if exists ( select 1 from table where condition ) then do x else do y end if My question is what would be the best (in terms of performance) method to use? My gut feeling is to use option 4 for PG. Am I right or is there any other method? All of the above is context specific. To know for sure you will need to test on actual data. Absolutely right, just that I want to ensure that I follow the most optimal method before the DB goes into production, after which priorities change on what needs to be changed. -- Adrian Klaver adrian.kla...@aklaver.com I guess the best answer would be its very context specific, but picking the brains of experienced resources helps :-) Thanks again Anil
[GENERAL] Performance question
Hello, I would like to ask from your experience which would be the best generic method for checking if row sets of a certain condition exists in a PLPGSQL function. I know of 4 methods so far (please feel free to add if I missed out any others) 1) get a count (my previous experience with ORCL shaped this option) select count(*) into vcnt from table where condition if vcnt 0 then do X else do y end if Cons : It seems doing a count(*) is not the best option for PG 2) Use a non-count option select primary_key_Col into vcnt from table where condition if found then do X else do y end if Cons :Some people seems not to prefer this as (AFAIU) it causes a plpgsql-sql-plpgsql switches 3) using perform perform primary_key_Col into vcnt from table where condition if found then do X else do y end if Seems to remove the above (item 2) issues (if any) 4) using exists if exists ( select 1 from table where condition ) then do x else do y end if My question is what would be the best (in terms of performance) method to use? My gut feeling is to use option 4 for PG. Am I right or is there any other method? Thanks in advance Anil
Re: [GENERAL] Performance question
Hi Anil: On Wed, Nov 19, 2014 at 5:26 PM, Anil Menon gakme...@gmail.com wrote: Cons : It seems doing a count(*) is not the best option for PG For this and some of the following options, if you are going to just test for existence, I would consider adding a limit 1 somewehere on the query, to let the optimizer know you only need one and it will abort the scan on first hit. Probabley not needed if you are going to give a query which uses an unique index, but it shouldn't hurt. Francisco Olarte.
Re: [GENERAL] Performance question
On 11/19/2014 08:26 AM, Anil Menon wrote: Hello, I would like to ask from your experience which would be the best generic method for checking if row sets of a certain condition exists in a PLPGSQL function. I know of 4 methods so far (please feel free to add if I missed out any others) 1) get a count (my previous experience with ORCL shaped this option) select count(*) into vcnt from table where condition if vcnt 0 then do X else do y end if Cons : It seems doing a count(*) is not the best option for PG Well that would depend on the table size, whether it was 100 rows vs 1,000,000 rows 2) Use a non-count option select primary_key_Col into vcnt from table where condition if found then do X else do y end if Cons :Some people seems not to prefer this as (AFAIU) it causes a plpgsql-sql-plpgsql switches plpgsql is fairly tightly coupled to SQL, so I have not really seen any problems. But then I am not working on large datasets. 3) using perform perform primary_key_Col into vcnt from table where condition if found then do X else do y end if Seems to remove the above (item 2) issues (if any) AFAIK, you cannot do the above as written. PERFORM does not return a result: http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT It would have to be more like: perform primary_key_Col from table where condition 4) using exists if exists ( select 1 from table where condition ) then do x else do y end if My question is what would be the best (in terms of performance) method to use? My gut feeling is to use option 4 for PG. Am I right or is there any other method? All of the above is context specific. To know for sure you will need to test on actual data. Thanks in advance Anil -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
On Sun, 2011-12-18 at 14:05 +1100, Chris Angelico wrote: On Sun, Dec 18, 2011 at 2:02 PM, Chris Travers chris.trav...@gmail.com wrote: I do not believe there are performance penalties for either. All commit or rollback does is determine visibility of changes made. Thanks. (And thanks for the incredibly quick response!) My framework has a read-only mode (determined by user-level access), in which it begins a read-only transaction. At the end of it, I currently have it rolling the transaction back (to make absolutely sure that no changes will be made), but was concerned that this might place unnecessary load on the system. I'll stick with rolling back, since it's not going to hurt! Chris Angelico The actual rollback won't hurt as long as you have not made any modificatons to any records. But opening the transaction could have side effects for other processes that want to modiy the records that you want to protect in your read-only transaction. How about using a databaseuser that has it's create/update/delete rights revoked? That will cause an error if the supposedly read-only routine does try to change data. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
On Sat, Dec 24, 2011 at 11:46 PM, vinny vi...@xs4all.nl wrote: The actual rollback won't hurt as long as you have not made any modificatons to any records. But opening the transaction could have side effects for other processes that want to modiy the records that you want to protect in your read-only transaction. How about using a databaseuser that has it's create/update/delete rights revoked? That will cause an error if the supposedly read-only routine does try to change data. The readonly-ness of the session is defined based on information stored in the database, so that would entail the cost of re-authenticating. Also, we want to minimize debugging time by having both read-only and read-write access use almost exactly the same code and DB access, meaning that we should not need to test every module in every mode. ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
On Sat, 2011-12-24 at 23:49 +1100, Chris Angelico wrote: On Sat, Dec 24, 2011 at 11:46 PM, vinny vi...@xs4all.nl wrote: The actual rollback won't hurt as long as you have not made any modificatons to any records. But opening the transaction could have side effects for other processes that want to modiy the records that you want to protect in your read-only transaction. How about using a databaseuser that has it's create/update/delete rights revoked? That will cause an error if the supposedly read-only routine does try to change data. The readonly-ness of the session is defined based on information stored in the database, so that would entail the cost of re-authenticating. Yes you would have to re-authenticate, you'd have to weigh the time-cost of that that against any performance hits the transaction might cause. Also, we want to minimize debugging time by having both read-only and read-write access use almost exactly the same code and DB access, meaning that we should not need to test every module in every mode. So, your read-only mode is basically a flag that forces your code to always issue a rollback at the end, instead of a commit for read/write mode. I find that a bit scary. :-) regard, Vincent. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
On Sun, Dec 25, 2011 at 12:00 AM, vinny vi...@xs4all.nl wrote: So, your read-only mode is basically a flag that forces your code to always issue a rollback at the end, instead of a commit for read/write mode. I find that a bit scary. :-) It's three things: 1) BEGIN TRANSACTION READ ONLY instead of BEGIN TRANSACTION 2) A high level flag that tells the PHP code that it ought not to change things 3) ROLLBACK instead of COMMIT ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
Chris Angelico wrote: On Sat, Dec 24, 2011 at 11:46 PM, vinny vi...@xs4all.nl wrote: How about using a databaseuser that has it's create/update/delete rights revoked? That will cause an error if the supposedly read-only routine does try to change data. Also, we want to minimize debugging time by having both read-only and read-write access use almost exactly the same code and DB access, meaning that we should not need to test every module in every mode. You can do it correctly while reusing all of your code; you just have different arguments at connect time and otherwise your code uses the connection handle in the same way afterwards. Its fine to have flags in the app so the app just tries acceptable things, but privileges in the database are the only way to actually be safe and resilient against accidental changes. -- Darren Duncan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
On Sat, Dec 24, 2011 at 11:34 AM, Darren Duncan dar...@darrenduncan.net wrote: You can do it correctly while reusing all of your code; you just have different arguments at connect time and otherwise your code uses the connection handle in the same way afterwards. Its fine to have flags in the app so the app just tries acceptable things, but privileges in the database are the only way to actually be safe and resilient against accidental changes. -- Darren Duncan Agreed. Roles in the db are important (we use them extensively in LedgerSMB). However here is a case where it doesn't work so well: Suppose I want to run read-write test cases against a production database to look for things that are wrong but we want to be mathematically sure that the test cases do not commit data to the database. So I don't entirely know the best way to do this in other languages, but here is what we did in Perl: 1) Our application normally uses DBI (and DBD::Pg). 2) We have a special module (LedgerSMB::DBTest) which basically exposes the portions of the DBI/DBD::Pg interface we are using, but lies to the higher levels about committing. Basically it returns true, but does nothing database-wise, providing the appearance of a consistent set of changes but in fact the changes are still to be rolled back. 3) In our test case scripts, we switch out the DBI database handles with instances of LedgerSMB::DBTest. This works well because it is fails safely. If we omit something from the DBTest module, our code will error. If we have test cases that depend on some transactions committing and others rolling back, we have to order the test cases appropriately or the test cases fail. In other words, the worst we can get are test case failures, not spurious commits, and the changes necessary to make this happen are in the test case files themselves. This makes it easy to verify. Best Wishes, Chris Travers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Performance question: Commit or rollback?
Some of my code involves transactions which will not have significant effect on the database. It might be a read-only transaction (possibly declared as one, but possibly not), or perhaps a completely empty transaction - I have a framework that will always open a transaction, then call on other code, and then complete the transaction at the bottom. In these instances, is it better to commit or to rollback? Are there performance penalties to either option? Similarly, what about when the first action in a transaction puts it in an error state? Is it better to commit (which should have no effect - nothing succeeded) or to roll back? I could test these things experimentally, but am afraid I'll skew my results based on the data I use. Hoping that somebody here knows for certain - there does seem to be a wealth of expertise here. Chris Angelico -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
On Sat, Dec 17, 2011 at 6:53 PM, Chris Angelico ros...@gmail.com wrote: Some of my code involves transactions which will not have significant effect on the database. It might be a read-only transaction (possibly declared as one, but possibly not), or perhaps a completely empty transaction - I have a framework that will always open a transaction, then call on other code, and then complete the transaction at the bottom. In these instances, is it better to commit or to rollback? Are there performance penalties to either option? I do not believe there are performance penalties for either. All commit or rollback does is determine visibility of changes made. Best Wishes, Chris Travers -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance question: Commit or rollback?
On Sun, Dec 18, 2011 at 2:02 PM, Chris Travers chris.trav...@gmail.com wrote: I do not believe there are performance penalties for either. All commit or rollback does is determine visibility of changes made. Thanks. (And thanks for the incredibly quick response!) My framework has a read-only mode (determined by user-level access), in which it begins a read-only transaction. At the end of it, I currently have it rolling the transaction back (to make absolutely sure that no changes will be made), but was concerned that this might place unnecessary load on the system. I'll stick with rolling back, since it's not going to hurt! Chris Angelico -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Performance question
Hello list, If I've got a trigger that calls a function each time there is a DELETE or UPDATE opration on a table in my system, and in this function I retrieve some boolean information from another table and based on this information, additional code will be ran or not in this function. Could the solely fact of calling the function and selecting the data on another table (and the trigger on each update and delete on any table) affect the overall db performance in a noticiable manner ?
Re: [GENERAL] Performance question
On Thu, Aug 16, 2007 at 08:52:02AM -0300, Marcelo de Moraes Serpa wrote: Hello list, If I've got a trigger that calls a function each time there is a DELETE or UPDATE opration on a table in my system, and in this function I retrieve some boolean information from another table and based on this information, additional code will be ran or not in this function. Could the solely fact of calling the function and selecting the data on another table (and the trigger on each update and delete on any table) affect the overall db performance in a noticiable manner ? Of course, you're adding at least one extra query to each UPDATE and DELETE. Plus the overhead of the trigger itself. The real question is: so what? If you need that logic to happen, you need it to happen. Unless you'll be updating or deleting scores of rows a second, I wouldn't worry too much about it. Remember the first rule of performance tuning: don't. :) -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpkB4ZpKaXlM.pgp Description: PGP signature
Re: [GENERAL] Performance Question - Table Row Size
I see. Thank you for the elaborate response. I have a clearer idea of what is going on now. In designing my application I was thinking of storing pieces of my data as serialized python data structures into a binary field (no more than 15KB), while a friend was arguing I should store the data in other tables and relate the tables together. He was arguing storing binary data on a table, even though, it is not queried slows down other queries and with this. Thanks again, Mike ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Performance Question - Table Row Size
Mike [EMAIL PROTECTED] writes: I see. Thank you for the elaborate response. I have a clearer idea of what is going on now. In designing my application I was thinking of storing pieces of my data as serialized python data structures into a binary field (no more than 15KB), while a friend was arguing I should store the data in other tables and relate the tables together. He was arguing storing binary data on a table, even though, it is not queried slows down other queries and with this. A 15KB column value is going to be stored out-of-line in the TOAST table anyway, so your table tuple will just contain a pointer to it, which isn't very big. If you don't use that column value in a given query its effect will be minimal. -Doug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Performance Question - Table Row Size
Hi, I am designing my database and I was wondering whether my table row size effects the performance of querying my table. Please note that my table is being designed to hold high volume of records and I do not plan to do (select *) for retrieving them. That is I plan to only query a few of those fields at a given time but each row contains significantly more data that are not being queried at the time. Thanks, Mike ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Performance Question - Table Row Size
On 7/9/07, Mike [EMAIL PROTECTED] wrote: I am designing my database and I was wondering whether my table row size effects the performance of querying my table. Please note that my table is being designed to hold high volume of records and I do not plan to do (select *) for retrieving them. That is I plan to only query a few of those fields at a given time but each row contains significantly more data that are not being queried at the time. Obvious observation: Since PostgreSQL's unit of data storage is the page, selects -- even on single attributes -- result in entire pages being loaded into memory and then read. Since the cache (PostgreSQL's shared buffers plus the OS file system cache) holds pages, not individual attributes, more data per tuple (row) means fewer tuples to fit in the cache. As far as the CPU cache goes, as I understand it, the fact that you're reading just a few attributes from each tuple (maybe even just a few from each page) is inefficient -- you will be forcing lots of data into the cache that is never used. In general, you might be better off normalizing your table, if possible, or partitioning it into subtables. But these are the broad strokes -- how many columns are we talking about exactly, and of what data types? Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Performance Question - Table Row Size
Mike [EMAIL PROTECTED] writes: I am designing my database and I was wondering whether my table row size effects the performance of querying my table. yes If your table is large and you're reading all the rows then you'll be limited by the i/o rate. If your rows are twice as big it will take twice as much i/o to read and it will take twice as long. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Performance Question
In article [EMAIL PROTECTED], Terry Lee Tucker [EMAIL PROTECTED] wrote: % elements of 50 thousand records on 8 structurally identical databases. We % threw together the script and decided to just delete the record and re-insert % it with the data that was brought into sync. Now the question: Is it just as % fast to do it this way, or is there some hidden advantage to performing an % update? If you have foreign key relationships to the table being updated, then deleting from that table will often be slower than updating. -- Patrick TJ McPhee North York Canada [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Performance Question
Hello List: I've been told that an update to a record is equivalent to a delete and insert operation. We have a utility written in Perl that brings into sync certain elements of 50 thousand records on 8 structurally identical databases. We threw together the script and decided to just delete the record and re-insert it with the data that was brought into sync. Now the question: Is it just as fast to do it this way, or is there some hidden advantage to performing an update? Just curious. TIA master=# select version(); version -- PostgreSQL 7.4.6 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-49) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Performance Question
Terry Lee Tucker [EMAIL PROTECTED] writes: Hello List: I've been told that an update to a record is equivalent to a delete and insert operation. We have a utility written in Perl that brings into sync certain elements of 50 thousand records on 8 structurally identical databases. We threw together the script and decided to just delete the record and re-insert it with the data that was brought into sync. Now the question: Is it just as fast to do it this way, or is there some hidden advantage to performing an update? UPDATE will probably be somewhat faster because it's only one SQL statement to parse, plan and execute. -Doug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Performance Question
Terry Lee Tucker [EMAIL PROTECTED] writes: Hello List: I've been told that an update to a record is equivalent to a delete and insert operation. We have a utility written in Perl that brings into sync certain elements of 50 thousand records on 8 structurally identical databases. We threw together the script and decided to just delete the record and re-insert it with the data that was brought into sync. Now the question: Is it just as fast to do it this way, or is there some hidden advantage to performing an update? If you're doing the whole DELETE/INSERT as a single transaction then it should be roughly comparable. The UPDATE operation tries to keep the records on the same page which makes it a faster operation all else being equal, but all else is rarely equal. One way it would be unequal is if you can do your DELETE as a single query and the insert operation as using a single large COPY FROM. Even if you issue 50 thousand INSERTs and a single big DELETE that would be better than issuing 50 thousand separate UPDATEs that have to use index lookups to track down the tuples being updated. Just be sure not to be issuing 50 thousand separate transactions, that will be *much* slower. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Performance Question
On Wednesday 14 June 2006 13:24, Greg Stark [EMAIL PROTECTED] wrote: One way it would be unequal is if you can do your DELETE as a single query and the insert operation as using a single large COPY FROM. This is definitely the fastest way to update tens of thousands of rows if you know they all need to be replaced. It saves on index lookups and also network latency to the feeding app. I have also had measurable success COPYing data into a temp table and then using joins against that to delete,update,or insert only the rows that actually need to be processed in the real table (saving unnecessary index updates). -- In a truly free society, Alcohol, Tobacco and Firearms would be a convenience store chain. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Performance Question
On Wednesday 14 June 2006 03:57 pm, Terry Lee Tucker [EMAIL PROTECTED] thus communicated: -- Hello List: -- -- I've been told that an update to a record is equivalent to a delete and insert -- operation. We have a utility written in Perl that brings into sync certain -- elements of 50 thousand records on 8 structurally identical databases. We -- threw together the script and decided to just delete the record and re-insert -- it with the data that was brought into sync. Now the question: Is it just as -- fast to do it this way, or is there some hidden advantage to performing an -- update? -- -- Just curious. -- -- TIA Thanks for the answers. This list is a BIG help to us all :o] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Performance question (FOR loop)
hi, I have this preformance question. create view test_v as select 'text'::varchar as Field1, 'text'::varchar as Field2; create or replace function test() returns setof test_v as $$ declare res test_v%ROWTYPE; begin for res in select t1.field1, t1.field2 from table1 t1; loop return next res; end loop; return; end; $$ Language plpgsql; where table1 has fields other than field1 and field2. I can run this query at the prompt, but i do not want the aplication layer to know my database schema. The only way i know I can hide the database architecture is giving 'em the stored procedure name to call (in this case: test() ). The query I am actually trying to optimize is long and has a few joins (for normalization) and hence didn't copy it here. The function structure is similar to the one above. (a) Am i right in thinking that if I eliminate the for loop, some performance gain can be achieved? (b) Is there a way to eliminate this for loop? (c) Is there any other way anyone has implemented where they have Application layer API accessing the database with its schema hidden? thanks, vish
Re: [GENERAL] Performance question (FOR loop)
vishal saberwal [EMAIL PROTECTED] writes: The query I am actually trying to optimize is long and has a few joins (for= =20 normalization) and hence didn't copy it here. The function structure is similar to the one above. (a) Am i right in thinking that if I eliminate the for loop, some=20 performance gain can be achieved? (b) Is there a way to eliminate this for loop? Is the plpgsql layer actually doing anything useful, or just passing back the results of the SQL command? If the latter, use a SQL function instead. Or perhaps even better, replace the function by a view. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Performance question
Hello, Anyone has try to install Postgres on a AMT Opteron (Dual Core) box with 4 CPUs and 64GB main memory and 3 TB Raid 10 (24 disks) Linux (2.6), to run a datawarehouse of 1TB. Any points Kind regards Rainer ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Performance question
On Mon, 2005-05-23 at 14:07, LiSim: Rainer Mokros wrote: Hello, Anyone has try to install Postgres on a AMT Opteron (Dual Core) box with 4 CPUs and 64GB main memory and 3 TB Raid 10 (24 disks) Linux (2.6), to run a datawarehouse of 1TB. Any points No, but if you buy my ticket I'd fly out and install it for free just to play on such a machine for an afternoon. Seriously. For that many CPUs, you might want to try running the latest snapshot or daily CVS tip, as Tom made some changes that seem to really increase performance on larger SMP systems. For performance tuning, look at the tuning guide on varlena first: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html I'd also make sure you're getting the best RAID controller you can afford, with a reliable and fast driver for your OS. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Performance question
Oleg Bartunov [EMAIL PROTECTED] writes: On Wed, 2 Jul 2003, Tom Lane wrote: You might find it useful to read the slides from my talk at last year's O'Reilly conference about this and related concurrency problems: http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681 I'd like to see all presentations in one collections. It'd be nice addition to documentation. Yes. Last year I asked Vince to put those slides up somewhere on the postgresql.org website, but he never got around to it (I think he got stuck wondering where they should go). Bruce has materials for several different talks he's given that should be there somewhere, too. Perhaps someone in the new webmastering crew would like to give the idea some thought. Or would techdocs be the right place to go? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Performance question
Tom Lane wrote: Yes. Last year I asked Vince to put those slides up somewhere on the postgresql.org website, but he never got around to it (I think he got stuck wondering where they should go). Bruce has materials for several different talks he's given that should be there somewhere, too. Perhaps someone in the new webmastering crew would like to give the idea some thought. Or would techdocs be the right place to go? I've always thought what the PHP group does with presentation materials is nice (simple but sufficient). See: http://conf.php.net/ Could we have a conf.postgresql.org with links from the home page? Joe ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [webmaster] [GENERAL] Performance question
Tom Lane wrote: Oleg Bartunov [EMAIL PROTECTED] writes: On Wed, 2 Jul 2003, Tom Lane wrote: You might find it useful to read the slides from my talk at last year's O'Reilly conference about this and related concurrency problems: http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681 I'd like to see all presentations in one collections. It'd be nice addition to documentation. Yes. Last year I asked Vince to put those slides up somewhere on the postgresql.org website, but he never got around to it (I think he got stuck wondering where they should go). Bruce has materials for several different talks he's given that should be there somewhere, too. Perhaps someone in the new webmastering crew would like to give the idea some thought. Or would techdocs be the right place to go? All my stuff is on my home page under Writings/Computer. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[GENERAL] Performance question
I'm trying to convince another open-source project (phpOpenTracker) to modify their current INSERT sql queries. Currently they just do an INSERT into a table without first checking if their might be a record with the same primary key. The reason for this that they need fast inserts and most user I assume are using MySQL which silently drops INSERT queries that violate primary key constraints. But postgres on the other hand (and rightly so) issues and logs an error. I have suggested that their current INSERT INTO t VALUES() be changed to: INSERT INTO T SELECT 'v1', 'v2' WHERE NOT EXISTS ( SELECT NULL FROM t WHERE pk='v1' ) However one of the developers is worried that this would cause a performance drop for MySQL users b/c of the extra SELECT in my version of the INSERT query. I had thought that the 'extra' SELECT isn't really extra at all since *any* DB still has to check that there is a record with the primary key that we are trying to insert. So whereas in my query the SELECT is explicitly stated in the regular version of a simple INSERT, the select is still there but implicit since there was a primary key defined on the table. So there really shouldn't be much, if any of a performance drop. Is there any truth to my logic in the previous paragraph? :) I'd like to test my theory on a real MySQL database but I don't have access to one right now, and I am not sure how to go about testing a MySQL db or even what kind of testing. If I can get a spare machine I will give it a go though. Thanks, Jean-Christian Imbeault ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Performance question (stripped down the problem)
On Thu, Sep 20, 2001 at 11:10:02AM +0200, Herbert Liechti wrote: I tried it. See my actions below. The main performance boost is reached by creating an index and disabling the sequential scan: Without any index; real0m18.128s user0m0.010s sys 0m0.010s Same statement with index real0m18.259s user0m0.020s sys 0m0.010s no difference now disable seqscan: time psql tt END set enable_seqscan = off; real0m3.701s user0m0.010s sys 0m0.000s same here (dual PIII-866, Debian, 512 MB, raid1+0) real0m6.472s user0m0.000s sys 0m0.010s real0m6.195s user0m0.010s sys 0m0.000s real0m2.885s user0m0.010s sys 0m0.000s tinus ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Performance question (stripped down the problem)
On Wed, 19 Sep 2001, Tom Lane wrote: No. In the first place, there's no extra sort: the planner is well aware that our current GROUP BY implementation produces ordered output. In the second place, there's no guarantee that GROUP BY will always produce ordered output in the future --- we are thinking about changing over to a hash-table-based implementation of grouping. If you want ordered output, you should say ORDER BY, not try to outsmart the system. And even if I would do so - sorting 51 items takes quite nothing. So the problem is anywhere else. I did further tests and got: SELECT Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) ; anz 177458 (1 row) - 2 Seconds ... that means longer than MS-SQL server takes with the additional GROUB BY and SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie; ... without ORDER BY but this doesn´t matter regarding the result set (well it is as ordered as it would be with the ORDER BY clause) and the time which stays at 18 seconds (after I increased sort_mem = 2048 it was *slightly* faster - 20 seconds for default sort_mem). So the real bottleneck seems to be the GROUP BY. Any chances that this could be changed in future PostgreSQL versions? This slowness makes use of PostgreSQL impossible for our application. Kind regards Andreas. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Performance question (stripped down the problem)
On Thu, 20 Sep 2001, Herbert Liechti wrote: I tried it. See my actions below. The main performance boost is reached by creating an index and disabling the sequential scan: Thanks. I tried this and it helps in dead (see below). --- create index ix_1 on hauptdaten_fall(meldekategorie); I did so before for in other tests. --- Same statement real0m18.259s user0m0.020s sys 0m0.010s The same on my machine: real0m18.128s user0m0.070s sys 0m0.010s --- now disable seqscan: --- time psql tt END set enable_seqscan = off; SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; END real0m3.701s user0m0.010s sys 0m0.000s real0m5.905s user0m0.060s sys 0m0.030s I have no real explanation why I have the same result in the first case but significant more time for the second. but it helps for the first step. On the other hand if I test my *real* database: real0m20.539s user0m0.060s sys 0m0.060s and with set enable_seqscan = off; real0m10.133s user0m0.040s sys 0m0.020s I get in fact an increase of speed by factor 2, but anyway it is even far to slow for our application. If I start a slightly more complex query (not to mention that we are far from the amount of data we will get after a year, I get the following stats: -- default -- set enable_seqscan = off; MS-SQL server real0m30.891s real0m27.165s about 1s user0m0.050s user0m0.080s sys 0m0.070s sys 0m0.050s other query example real0m53.698s real0m54.481s about 2.5s user0m0.190s user0m0.180s sys 0m0.040s sys 0m0.040s This is about factor 20 compared to the MS-SQL server and I have real hard arguing for PostgreSQL. In fact the MS-SQL server times are estimated from inside Access - the plain server would be even faster. By the way - the last example shows that enforcing index scan don´t necessarily makes the thing faster - perhaps it could even slow down for other queries?? I would be happy to forward the exact queries which lead to this measures if someone is interested. Kind regards Andreas. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [GENERAL] Performance question (stripped down the problem)
On Thu, 20 Sep 2001, Justin Clift wrote: Sorry, I haven't seen the history of this thread. One question which might be relevant is, have you adjusted the postgresql.conf file from the default memory settings to be something better? I adjusted two parameters: shared_buffers = 2048 (When I tried 4096 I´ve got a connection error. Don´t know what this means, but anyway increasing of this value did not changed anything.) sort_mem = 2048 (After increasing this value (from default 512) to 1024 I got an increase in speed from 20s to 18s - not much but better than nothing. Further increase to 2048 did not change anything further so I stopped here.) If these are the times you're getting from a default configuration, you might be able to get far better results by doing performance tuning of PostgreSQL and/or the server. Any other values which might help here? Kind regards Andreas. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Performance question (stripped down the problem)
Hi Andreas, Sorry, I haven't seen the history of this thread. One question which might be relevant is, have you adjusted the postgresql.conf file from the default memory settings to be something better? If these are the times you're getting from a default configuration, you might be able to get far better results by doing performance tuning of PostgreSQL and/or the server. What do you think? Regards and best wishes, Justin Clift Tille, Andreas wrote: On Thu, 20 Sep 2001, Herbert Liechti wrote: I tried it. See my actions below. The main performance boost is reached by creating an index and disabling the sequential scan: Thanks. I tried this and it helps in dead (see below). --- create index ix_1 on hauptdaten_fall(meldekategorie); I did so before for in other tests. --- Same statement real0m18.259s user0m0.020s sys 0m0.010s The same on my machine: real0m18.128s user0m0.070s sys 0m0.010s --- now disable seqscan: --- time psql tt END set enable_seqscan = off; SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; END real0m3.701s user0m0.010s sys 0m0.000s real0m5.905s user0m0.060s sys 0m0.030s I have no real explanation why I have the same result in the first case but significant more time for the second. but it helps for the first step. On the other hand if I test my *real* database: real0m20.539s user0m0.060s sys 0m0.060s and with set enable_seqscan = off; real0m10.133s user0m0.040s sys 0m0.020s I get in fact an increase of speed by factor 2, but anyway it is even far to slow for our application. If I start a slightly more complex query (not to mention that we are far from the amount of data we will get after a year, I get the following stats: -- default -- set enable_seqscan = off; MS-SQL server real0m30.891s real0m27.165s about 1s user0m0.050s user0m0.080s sys 0m0.070s sys 0m0.050s other query example real0m53.698s real0m54.481s about 2.5s user0m0.190s user0m0.180s sys 0m0.040s sys 0m0.040s This is about factor 20 compared to the MS-SQL server and I have real hard arguing for PostgreSQL. In fact the MS-SQL server times are estimated from inside Access - the plain server would be even faster. By the way - the last example shows that enforcing index scan don´t necessarily makes the thing faster - perhaps it could even slow down for other queries?? I would be happy to forward the exact queries which lead to this measures if someone is interested. Kind regards Andreas. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Performance question (stripped down the problem)
On Thu, 20 Sep 2001, Einar Karttunen asked me for query plans for both M$ SQL and postgresql: M$ SQL: |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1005]))) |--Stream Aggregate(GROUP BY:([Hauptdaten_Fall].[MeldeKategorie]) DEFINE:([Expr1005]=Count(*))) |--Index Scan(OBJECT:([IfSG].[dbo].[Hauptdaten_Fall].[IX_MeldeKategorie]), ORDERED FORWARD) Postgresql: time psql ifsg ... explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; ... NOTICE: QUERY PLAN: Aggregate (cost=32881.62..33768.91 rows=17746 width=16) - Group (cost=32881.62..33325.27 rows=177458 width=16) - Sort (cost=32881.62..32881.62 rows=177458 width=16) - Seq Scan on hauptdaten_fall (cost=0.00..15024.12 rows=177458 width=16) real0m1.382s user0m0.040s sys 0m0.020s And the other case with enforcing index scan: time psql ifsg ... set enable_seqscan = off; explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; ... NOTICE: QUERY PLAN: Aggregate (cost=0.00..146770.97 rows=17746 width=16) - Group (cost=0.00..146327.32 rows=177458 width=16) - Index Scan using ix_meldekategorie_hauptdaten_fa on hauptdaten_fall (cost=0.00..145883.68 rows=177458 width=16) real0m0.102s (for sure it´s faster to have a plan if enforced ...) user0m0.030s sys 0m0.020s Does this help in any way? If I´m not completely wrong also M$ SQL server prefers to use the index ix_meldekategorie. Kind regards Andreas. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Performance question (stripped down the problem)
Hi Andreas, I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and running Linux Mandrake 8.0 First thing I did was to increase the amount of shared memory and stuff which Linux allows things to use : echo kernel.shmall = 134217728 /etc/sysctl.conf echo kernel.shmmax = 134217728 /etc/sysctl.conf For my system, that'll raise the shared memory limits to 128MB at system boot time. btw, the 134217728 figure = 128MB (128 * 1024 * 1024) Then I changed the limits for the running system (so no reboot is necessary) : echo 134217728 /proc/sys/kernel/shmall echo 134217728 /proc/sys/kernel/shmmax Then adjusted the postgresql.conf file with these values : sort_mem = 32768 shared_buffers = 220 Now, that's a bunch of shared_buffers, but at the same time I also raised the max_connections to 110. This seems to have dropped my execution times, but I haven't seriously gotten around to tuning this system. The key thing I think you've missed is to update the shared memory, etc. More info about it can be found at : http://www.postgresql.org/idocs/index.php?kernel-resources.html Bruce Momjian also put together some information about optimising things with PostgreSQL at : http://www.ca.postgresql.org/docs/hw_performance/ If you want to be able to benchmark things on your system, I use the Open Source Database Benchmark (Linux only at present), running the latest CVS version of it, and also tweaked to not use hash indices. A tarball of working source code is available at : http://techdocs.postgresql.org/techdocs/perftuningfigures.php Hope this is of assistance Andreas. Regards and best wishes, Justin Clift Tille, Andreas wrote: On Thu, 20 Sep 2001, Justin Clift wrote: Sorry, I haven't seen the history of this thread. One question which might be relevant is, have you adjusted the postgresql.conf file from the default memory settings to be something better? I adjusted two parameters: shared_buffers = 2048 (When I tried 4096 I´ve got a connection error. Don´t know what this means, but anyway increasing of this value did not changed anything.) sort_mem = 2048 (After increasing this value (from default 512) to 1024 I got an increase in speed from 20s to 18s - not much but better than nothing. Further increase to 2048 did not change anything further so I stopped here.) If these are the times you're getting from a default configuration, you might be able to get far better results by doing performance tuning of PostgreSQL and/or the server. Any other values which might help here? Kind regards Andreas. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there. - Indira Gandhi ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Performance question (stripped down the problem)
Tille, Andreas writes: SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; (which should just measure the time needed for this task). It took my E250 (2x400MHz, 2GB) server 20 seconds and this is definitely much to long for our application. I loaded this into 7.2 development sources and it ran 35 seconds wall-clock time on a much smaller machine. After I ran what in 7.1 would be VACUUM ANALYZE it took about 22 seconds. The difference was that it was using the index on hauptdaten_fall.istaktuell when it shouldn't. (Try EXPLAIN to see what it does in your case. If it's still using the index you might want to force enable_indexscan = off.) I also got a minuscule speed-up by replacing the Count(Hauptdaten_Fall.ID) with Count(*), which acts differently with respect to nulls, so it depends whether you want to use it. Besides that, I don't see anything blatantly obvious to speed this up. -- Peter Eisentraut [EMAIL PROTECTED] http://funkturm.homeip.net/~peter ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[GENERAL] Performance question
Hello, I have ported a database from MS SQL Server to PostgreSQL. The database has 40 tables and 117 indexes which I defined for the same fields as in MS SQL. I converted the data using some SQL output from MS SQL server and inserted it with psql. Now I started some performance comparisons and did the following statement: SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; (sorry for the German names used here). The MS-SQL server represents the result immediately - you just not notice any delay. If I do it on the PostgreSQL server it takes 30s on comparable hardware :-(((. I really have no idea what caused this big difference in speed which would forbid the use of PostgreSQL for our application. How can I checked whether the indexes I created are really used? What could be other reasons for such a big difference in speed? I´m using PostgreSQL 7.1.2 on Debian GNU/Linux (testing). Any help appreciated Andreas. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [GENERAL] Performance question
On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote: Use explain. Explain tells you the query plan of the optimizer. explain SELECT .; Thanks I just found the thread Index usage question and tried to make some profit from it: explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; NOTICE: QUERY PLAN: Aggregate (cost=35267.33..36154.62 rows=17746 width=16) - Group (cost=35267.33..35710.98 rows=177458 width=16) - Sort (cost=35267.33..35267.33 rows=177458 width=16) - Seq Scan on hauptdaten_fall (cost=0.00..15024.12 rows=177458 width=16) Now I tried Vacuum analyze; but nothing changed :-( Aggregate (cost=35267.33..36154.62 rows=17746 width=16) - Group (cost=35267.33..35710.98 rows=177458 width=16) - Sort (cost=35267.33..35267.33 rows=177458 width=16) - Seq Scan on hauptdaten_fall (cost=0.00..15024.12 rows=177458 width=16) I have nearly no experience with query optimizing but the gread difference in speed tells me that something is wrong here. There were some hints in the Index usage question thread about some fields which might be interpreted as strings. Could this be a reason and how to check this? Kind regards Andreas. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Performance question
On Mon, 10 Sep 2001, Tille, Andreas wrote: Hello, Now I started some performance comparisons and did the following statement: The MS-SQL server represents the result immediately - you just not notice any delay. If I do it on the PostgreSQL server it takes 30s on comparable hardware :-(((. Did you do a VACUUM ANALYZE after loading the data? Rod -- A small clue and no money will get you further than lots of money and no clue. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [GENERAL] Performance question
On Mon, 10 Sep 2001, Tille, Andreas wrote: On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote: Use explain. Explain tells you the query plan of the optimizer. explain SELECT .; Thanks I just found the thread Index usage question and tried to make some profit from it: explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; Aggregate (cost=35267.33..36154.62 rows=17746 width=16) - Group (cost=35267.33..35710.98 rows=177458 width=16) - Sort (cost=35267.33..35267.33 rows=177458 width=16) - Seq Scan on hauptdaten_fall (cost=0.00..15024.12 rows=177458 width=16) I have nearly no experience with query optimizing but the gread difference in speed tells me that something is wrong here. There were some hints in the Index usage question thread about some fields which might be interpreted as strings. Could this be a reason and how to check this? What's the schema for the table? How many rows are in the table? How many rows actually have IstAktuell=20 (is 177458 a reasonable estimate?). If not, is there a common, non-NULL value that is much more common than other values? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Performance question
On Mon, 10 Sep 2001, Tille, Andreas wrote: Hello, I have ported a database from MS SQL Server to PostgreSQL. The database has 40 tables and 117 indexes which I defined for the same fields as in MS SQL. I converted the data using some SQL output from MS SQL server and inserted it with psql. Now I started some performance comparisons and did the following statement: SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; (sorry for the German names used here). The MS-SQL server represents the result immediately - you just not notice any delay. If I do it on the PostgreSQL server it takes 30s on comparable hardware :-(((. I really have no idea what caused this big difference in speed which would forbid the use of PostgreSQL for our application. How can I checked whether the indexes I created are really used? What could be other reasons for such a big difference in speed? Use explain. Explain tells you the query plan of the optimizer. explain SELECT .; Best regards Herbie -- Herbert Liechti http://www.thinx.ch ThinX networked business servicesAdlergasse 5, CH-4500 Solothurn ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Performance question
On Mon, Sep 10, 2001 at 02:34:25PM +0200, Tille, Andreas wrote: On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote: Use explain. Explain tells you the query plan of the optimizer. explain SELECT .; Thanks I just found the thread Index usage question and tried to make some profit from it: explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; NOTICE: QUERY PLAN: Aggregate (cost=35267.33..36154.62 rows=17746 width=16) - Group (cost=35267.33..35710.98 rows=177458 width=16) - Sort (cost=35267.33..35267.33 rows=177458 width=16) - Seq Scan on hauptdaten_fall (cost=0.00..15024.12 rows=177458 width=16) There must be a problem with your indeces. I tried the following: temp=# CREATE TABLE Hauptdaten_Fall ( temp(# MeldeKategorie text, temp(# ID integer, temp(# IstAktuell integer); CREATE temp=# explain SELECT MeldeKategorie,count(ID) FROM Hauptdaten_Fall temp-# WHERE IstAktuell=20 GROUP BY MeldeKategorie temp-# ORDER BY MeldeKategorie; NOTICE: QUERY PLAN: Aggregate (cost=22.67..22.72 rows=1 width=16) - Group (cost=22.67..22.69 rows=10 width=16) - Sort (cost=22.67..22.67 rows=10 width=16) - Seq Scan on hauptdaten_fall (cost=0.00..22.50 rows=10 width=16) EXPLAIN temp=# CREATE INDEX hfia ON Hauptdaten_Fall (IstAktuell); CREATE temp=# explain SELECT MeldeKategorie,count(ID) FROM Hauptdaten_Fall temp-# WHERE IstAktuell=20 GROUP BY MeldeKategorie temp-# ORDER BY MeldeKategorie; NOTICE: QUERY PLAN: Aggregate (cost=8.30..8.35 rows=1 width=16) - Group (cost=8.30..8.33 rows=10 width=16) - Sort (cost=8.30..8.30 rows=10 width=16) - Index Scan using hfia on hauptdaten_fall (cost=0.00..8.14 rows=10 width=16) EXPLAIN temp=# Which shows quite clearly that an index scan will improve the situation drastically. Even more so for you because the table seems to have very many rows in it. - Einar Karttunen ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
VS: [GENERAL] performance question
Title: VS: [GENERAL] performance question Tom Lane wrote: ernie cline [EMAIL PROTECTED] writes: Quick question. I am using postgres 7.0. When running a select query, does postgres lock the row (or table) while doing a select? Only on an insert/update/delete, right? SELECTs certainly don't block each other (except the SELECT FOR UPDATE variety). I kind of didn't think so, but I just wanted to make sure I wasn't insane. ;) We are experiencing some performance problems when running 2 programs that access the database at once (only doing querys, and not even complex ones). Need more detail ... What else do you need to know? Postgres 7.0 compiled with default options, running on a VA Linux Fullon box, kernel 2.2.13, running VA Linux/Redhat 6.1. Box is a single processor pentium 3 500mhz, 128megs of ram, apache 1.3.9. Using Perl DBI interface to access postgres. A very small html page is built (about 4.5k), and it takes about 2.6 seconds to execute and build the html. Running 2 of the pages at once, it takes about 5 seconds for them to complete. Not total, but 5 for each one. This is using just unix 'time' command, and moving my mouse to another rxvt window and hitting enter after first hitting it in another window ;). Is using Perl DBI slow? Would writing the cgi in C speed things up? (Please no holy wars for perl vs. C performance! G) Any help would be appreciated ... -ernie You could take your query and run it in psql to see if the performance problem is with postgres or with your perl stuff. If it's postgres, redesign your query. Of course, this can be quite difficult... BTW, does a reasonably complicated SELECT run faster if you implement it using a view or a stored procedure? If I'm not entirely uncorrect high end commercial db:s like oracle can perform some kind of statistical optimization on views and stored procs. Is this also true for postgres? If your perl stuff is slow then rewriting it in c won't help that much. Unless you have made some gross programming error which gets magically fixed in the c version, that is. Usually the culprit is the braindead way in which CGI:s work. Ie.: 1. CGI request to apache 2. Apache fires up the perl interpreter which byte-code compiles your script and runs it 3. Script initiates connection to postgres, which forks off a new backend to handle the request. 4. After the script has completed the request, it closes the db connection and gets unloaded. The forked postgres backend also dies, of course. 5. Phew! What a lot of work! Rewriting in C fixes only point 2 to a certain extent. The buzzwords to fix the CGI problem are 1) in-process script interpreter and 2) persistent database connections. 1) can be solved by using for example mod_perl and 2) hmm... should be doable. I don't know if you have to do this for yourself or if some friendly guy has made some library to implement this. Or you can use AOLServer which does all this for you (my personal choice). But then you have to rewrite your perl stuff in tcl or some other language implemented by aolserver modules (python, java and php at least, I think). But considering that your box seems quite fast I suspect the problem is with your SQL anyway... You could of course also try the postgres tuning tips at http://www.openacs.org/doc/openacs/html/openacs-install-5.html#ss5.4 But this applies mostly to writes, I think.. If you are planning on running a production server, I also recommend you to upgrade your kernel to 2.2.16 which fixes a few serious bugs: 1) The disk trashing bug (Could be a serious problem! Ok this was fixed in 2.2.15.. or was it 2.2.14 already.. I don't remember.) 2) The much halooed root exploit bug (Applies only if you have untrusted accounts, if I remember correctly..) Janne Blomqvist [EMAIL PROTECTED]
Re: [GENERAL] performance question
Arg! Thanks to everyone for their help. I followed a few suggestions I got from here, and installed Time::HiRes on my box (that part was my idea actually ;)), to monitor how long the query's take. I know all the stats show the MySQL is faster, but in my situation, postgres is really kicking ass. None of the query's took longer than .03 seconds ;). Then again, the database is small ... but anyway, in my script, we also call to an oracle Database. THAT is slow. Of the 2.6 seconds it takes to execute the script, 2.3 is that oracle call. Granted its a huge ~8 line query, but its still oracle's "fault". Thanks for all your help guys and gals! -ernie P.S. Anyone know of a script to convert oracle db to postgres? :) Or if I dump the oracle tables, will postgres be able to read them? Tom Lane wrote: ernie cline [EMAIL PROTECTED] writes: Need more detail ... What else do you need to know? What queries are you issuing, *exactly*? You might find it easiest to restart the postmaster with -d2 and collect the queries in the postmaster's log... regards, tom lane
Re: [GENERAL] performance question
... I know all the stats show the MySQL is faster... A story which, if and when true, stays true only for cases with one or a very few users. We used to just accept MySQL's claims in this regard without question, but it is finally dawning on us that they are not doing representative tests for a multi-client environment :) - Thomas
[GENERAL] Performance Question ODBC vs C
Hi there, I have been doing some performance testing with ODBC and have found that ODBC is much slower than using C to call the Psql API I don't understand why the results are like this, I thought ODBC would slow things down a bit but not my much, all its doing in sending off SQL straight the server? Test Server: 486 50mhz, 16 mb Ram Client: P266, 32 mb Ram SQL: 1000 INSERT INTO's with quite a lot of fields. Read from a text input file ODBC Test Windows 98 running VB6 and the freeware ODBC driver (I forget the name) Using TCP/IP network. The VB program will do about 30 records per minute. C Test Linux C program, reads input file, calls API's. Run on Client PC so still communicates over the network. This setup will do 340 record per minute My only guess is that the following bottlenecks are in the system: VB6 code that reads the text file - unlikely, it run's very quick if you remove the db.Execute call. ODBC parsing, maybe Something crappy about Windows, more likely Any ideas? -- End of Martin's email. \0
Re: [GENERAL] Performance Question ODBC vs C
On Sun, 19 Mar 2000, martin wrote: Hi there, I have been doing some performance testing with ODBC and have found that ODBC is much slower than using C to call the Psql API I don't understand why the results are like this, I thought ODBC would slow things down a bit but not my much, all its doing in sending off SQL straight the server? Test Server: 486 50mhz, 16 mb Ram Client: P266, 32 mb Ram SQL: 1000 INSERT INTO's with quite a lot of fields. Read from a text input file ODBC Test Windows 98 running VB6 and the freeware ODBC driver (I forget the name) Using TCP/IP network. The VB program will do about 30 records per minute. C Test Linux C program, reads input file, calls API's. Run on Client PC so still communicates over the network. This setup will do 340 record per minute My only guess is that the following bottlenecks are in the system: VB6 code that reads the text file - unlikely, it run's very quick if you remove the db.Execute call. ODBC parsing, maybe Something crappy about Windows, more likely ODBC sucks on its own, but I'd bet the difference is that you open a transaction for insert in Linux and don't do that in Windows (how do you open a transaction in ODBC?). Difference is usually about 1:10 when you do inserts in transaction, and server doesn't have to fsync after each insert... -alex