Re: [PERFORM] New to PostgreSQL, performance considerations
Hi, Did someone try '-mfpmath=sse -msse3'? Would be interesting to know if -mfpmath=sse boost the performance. I guess, the difference in the generated code isn't that much between i686 and prescott. The bigger step is i386 to i686. '-mfpmath=sse -msse3' will also use the SSE unit, which the classic i686 doesn't have. CFLAGS=-O2 -march=prescott -mfpmath=sse -msse3 Best regards Sven. Daniel van Ham Colchete schrieb: I just made another test with a second Gentoo machine: Pentium 4 3.0Ghz Prescott GCC 4.1.1 Glibc 2.4 PostgreSQL 8.1.5 Kernel 2.6.17 Same postgresql.conf as yesterday's. First test == GLIBC: -O2 -march=i686 PostgreSQL: -O2 -march=i686 Results: 974.638731 975.602142 975.882051 969.142503 992.914167 983.467131 983.231575 994.901330 970.375221 978.377467 Average (error): 980 tps (13 tps) Second test === GLIBC: -O2 -march=i686 PostgreSQL: -O2 -march=prescott Results: 988.319643 976.152973 1006.482553 992.431322 983.090838 992.674065 989.216746 990.897615 987.129802 975.907955 Average (error): 988 tps (15 tps) Third test == GLIBC: -O2 -march=prescott PostgreSQL: -O2 -march=i686 Results: 969.085400 966.187309 994.882325 968.715150 956.766771 970.151542 960.090571 967.680628 986.568462 991.756520 Average (error): 973 tps (19 tps) Forth test == GLIBC: -O2 -march=prescott PostgreSQL: -O2 -march=prescott Results: 980.888371 978.128269 969.344669 978.021509 979.256603 993.236457 984.078399 981.654834 976.295925 969.796277 Average (error): 979 tps (11 tps) The results showed no significant change. The conclusion of today's test would be that there are no improvement at PostgreSQL when using -march=prescott. I only see 3 diferences between yesterday's server and today's: the kernel version (y: 2.6.18, t:2.6.17), the server uses an IDE harddrive (yesterday was SATA), and the gcc version (3.4.6 - 4.1.1). I don't know why yesterday we had improved and today we had not. Best Daniel On 12/12/06, Daniel van Ham Colchete [EMAIL PROTECTED] wrote: I'm making some other tests here at another hardware (also Gentoo). I found out that PostgreSQL stops for a while if I change the -t parameter on bgbench from 600 to 1000 and I have ~150 tps instead of ~950tps. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Slow update with simple query
Hi list ! I am running a query to update the boolean field of a table based on another table's fields. The query is (changed names for readability): UPDATE t1 SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y') FROM t2 WHERE t1.uid = t2.uid t2.uid is the PRIMARY KEY. t2 only has ~1000 rows, so I think it fits fully in memory. t1 as ~2.000.000 rows. There is an index on t1.uid also. The explain (sorry, not explain analyze available yet) is : Hash Join (cost=112.75..307410.10 rows=2019448 width=357) Hash Cond: (outer.uid= inner.uid) - Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340) - Hash (cost=110.20..110.20 rows=1020 width=53) - Seq Scan on t2 (cost=0.00..110.20 rows=1020 width=53) My query has been running for more than 1.5 hour now, and it is still running. Nothing else is running on the server. There are two multicolumn-indexes on this column (both are 3-columns indexes). One of them has a functional column (date_trunc('month', datefield)). Do you think the problem is with the indexes ? The hardware is not great, but the database is on a RAID1 array, so its not bad either. I am surprised that it takes more than 3 seconds per row to be updated. Thanks for your opinion on this ! -- Arnaud ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow update with simple query
On mið, 2006-12-13 at 11:51 +0100, Arnaud Lesauvage wrote: Hi list ! I am running a query to update the boolean field of a table based on another table's fields. The query is (changed names for readability): UPDATE t1 SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y') FROM t2 WHERE t1.uid = t2.uid t2.uid is the PRIMARY KEY. t2 only has ~1000 rows, so I think it fits fully in memory. t1 as ~2.000.000 rows. There is an index on t1.uid also. The explain (sorry, not explain analyze available yet) is : Hash Join (cost=112.75..307410.10 rows=2019448 width=357) Hash Cond: (outer.uid= inner.uid) - Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340) - Hash (cost=110.20..110.20 rows=1020 width=53) - Seq Scan on t2 (cost=0.00..110.20 rows=1020 width=53) My query has been running for more than 1.5 hour now, and it is still running. Nothing else is running on the server. There are two multicolumn-indexes on this column (both are 3-columns indexes). One of them has a functional column (date_trunc('month', datefield)). Do you think the problem is with the indexes ? I guess so. are you sure about the index on t1.uid? what are the column definitions for t1.uid and t2.uid ? are they the same ? you should ba able to get a plan similar to: Merge Join (cost=0.00..43.56 rows=1000 width=11) Merge Cond: (outer.uid = inner.uid) - Index Scan using t1i on t1 (cost=0.00..38298.39 rows=235 width=10) - Index Scan using t2i on t2 (cost=0.00..26.73 rows=1000 width=5) what postgres version are you using ? gnari The hardware is not great, but the database is on a RAID1 array, so its not bad either. I am surprised that it takes more than 3 seconds per row to be updated. Thanks for your opinion on this ! -- Arnaud ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow update with simple query
Ragnar a écrit : Do you think the problem is with the indexes ? I guess so. are you sure about the index on t1.uid? what are the column definitions for t1.uid and t2.uid ? are they the same ? Man, no !!! I just checked and indeed, no index on this column. I probably dropped it lately. Thanks Ragnar. (t1.uid and t2.uid were the same, character(32) columns) you should ba able to get a plan similar to: Merge Join (cost=0.00..43.56 rows=1000 width=11) Merge Cond: (outer.uid = inner.uid) - Index Scan using t1i on t1 (cost=0.00..38298.39 rows=235 width=10) - Index Scan using t2i on t2 (cost=0.00..26.73 rows=1000 width=5) what postgres version are you using ? Ooops, forgot that too : 8.1.4 I am creating the index right now, I'll tell you if this fixes the problem. Thanks for your help ! -- Arnaud ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow update with simple query
Hi, the problem is a combination of bad formed SQL and maybe missing indexes. try this: UPDATE t1 SET booleanfield = foo.bar FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar FROM t2) AS foo WHERE t1.uid=foo.uid; and index t1.uid, t2.uid, t2.field, t2.field2 regards, Jens Schipkowski On Wed, 13 Dec 2006 11:51:10 +0100, Arnaud Lesauvage [EMAIL PROTECTED] wrote: Hi list ! I am running a query to update the boolean field of a table based on another table's fields. The query is (changed names for readability): UPDATE t1 SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y') FROM t2 WHERE t1.uid = t2.uid t2.uid is the PRIMARY KEY. t2 only has ~1000 rows, so I think it fits fully in memory. t1 as ~2.000.000 rows. There is an index on t1.uid also. The explain (sorry, not explain analyze available yet) is : Hash Join (cost=112.75..307410.10 rows=2019448 width=357) Hash Cond: (outer.uid= inner.uid) - Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340) - Hash (cost=110.20..110.20 rows=1020 width=53) - Seq Scan on t2 (cost=0.00..110.20 rows=1020 width=53) My query has been running for more than 1.5 hour now, and it is still running. Nothing else is running on the server. There are two multicolumn-indexes on this column (both are 3-columns indexes). One of them has a functional column (date_trunc('month', datefield)). Do you think the problem is with the indexes ? The hardware is not great, but the database is on a RAID1 array, so its not bad either. I am surprised that it takes more than 3 seconds per row to be updated. Thanks for your opinion on this ! -- Arnaud ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- ** APUS Software GmbH ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Slow update with simple query
Jens Schipkowski a écrit : the problem is a combination of bad formed SQL and maybe missing indexes. try this: UPDATE t1 SET booleanfield = foo.bar FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar FROM t2) AS foo WHERE t1.uid=foo.uid; Hi Jens, Why is this query better than the other one ? Because it runs the (field IN ('some','other') AND field2 = 'Y') once and then executes the join with the resulting set ? and index t1.uid, t2.uid, t2.field, t2.field2 t1.field can only take 3 or 4 values (don't remember exactly), and field2 only 2 ('Y' or 'N'). So this fields have a very low cardinality. Won't the planner chose to do a table scan in such a case ? Thanks for your advices ! -- Arnaud ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Slow update with simple query
On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED] wrote: Jens Schipkowski a écrit : the problem is a combination of bad formed SQL and maybe missing indexes. try this: UPDATE t1 SET booleanfield = foo.bar FROM (SELECT uid,(field IN ('some','other') AND field2 = 'Y') AS bar FROM t2) AS foo WHERE t1.uid=foo.uid; Hi Jens, Why is this query better than the other one ? Because it runs the (field IN ('some','other') AND field2 = 'Y') once and then executes the join with the resulting set ? True. The Subselect in FROM clause will be executed once and will be joined using the condition at where clause. So your condition at t2 is not executed for each row in t1(2mio records) but for each row in t2(1k records). And the boolean value is already set during update. regards, Jens and index t1.uid, t2.uid, t2.field, t2.field2 t1.field can only take 3 or 4 values (don't remember exactly), and field2 only 2 ('Y' or 'N'). So this fields have a very low cardinality. Won't the planner chose to do a table scan in such a case ? Thanks for your advices ! -- Arnaud -- ** APUS Software GmbH ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow update with simple query
Jens Schipkowski a écrit : On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED] Why is this query better than the other one ? Because it runs the (field IN ('some','other') AND field2 = 'Y') once and then executes the join with the resulting set ? True. The Subselect in FROM clause will be executed once and will be joined using the condition at where clause. So your condition at t2 is not executed for each row in t1(2mio records) but for each row in t2(1k records). And the boolean value is already set during update. OK Jens, thanks for clarifying this. I thought the planner could guess what to do in such cases. Regards -- Arnaud ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow update with simple query
On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote: Jens Schipkowski a écrit : On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED] Why is this query better than the other one ? Because it runs the (field IN ('some','other') AND field2 = 'Y') once and then executes the join with the resulting set ? True. The Subselect in FROM clause will be executed once and will be joined using the condition at where clause. So your condition at t2 is not executed for each row in t1(2mio records) but for each row in t2(1k records). And the boolean value is already set during update. OK Jens, thanks for clarifying this. I thought the planner could guess what to do in such cases. don't worry, it will. this is not your problem gnari ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow update with simple query
Ragnar a écrit : On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote: Jens Schipkowski a écrit : On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED] Why is this query better than the other one ? Because it runs the (field IN ('some','other') AND field2 = 'Y') once and then executes the join with the resulting set ? True. The Subselect in FROM clause will be executed once and will be joined using the condition at where clause. So your condition at t2 is not executed for each row in t1(2mio records) but for each row in t2(1k records). And the boolean value is already set during update. OK Jens, thanks for clarifying this. I thought the planner could guess what to do in such cases. don't worry, it will. this is not your problem Indeed, the new query does not perform that well : Hash Join (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1) Hash Cond: (outer.uid = inner.uid) - Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=338) (actual time=19.342..234304.499 rows=2033001 loops=1) - Hash (cost=110.20..110.20 rows=1020 width=53) (actual time=4.853..4.853 rows=1020 loops=1) - Seq Scan on t2 (cost=0.00..110.20 rows=1020 width=53) (actual time=0.017..2.586 rows=1020 loops=1) Total runtime: 2777844.892 ms I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index containind the updated field). I believe the multicolumn-functional-index computation is taking some time here, isn't it ? Regards -- Arnaud ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] New to PostgreSQL, performance considerations
Michael Stone wrote: On Tue, Dec 12, 2006 at 01:42:06PM +0100, Cosimo Streppone wrote: -O0 ~ 957 tps -O1 -mcpu=pentium4 -mtune=pentium4 ~ 1186 tps -O2 -mcpu=pentium4 -mtune=pentium4 ~ 1229 tps -O3 -mcpu=pentium4 -mtune=pentium4 ~ 1257 tps -O6 -mcpu=pentium4 -mtune=pentium4 ~ 1254 tps I'm curious now to get the same tests run with a custom-cflags-compiled glibc. I'd be curious to see -O2 with and without the arch-specific flags, since that's mostly what the discussion is about. I run the same tests only for: 1) '-O2' 2) '-O2 -march=pentium4 -mtune=pentium4 -mcpu=pentium4' (so no more doubts here, and thanks for gcc hints :-) and I obtained respectively an average of 1238 (plain -O2) vs. 1229 tps on 9 runs. Disk subsystem is a standard desktop SATA, no more than that. I tried also recompiling *only* pgbench with various options, but as I expected (and hoped) nothing changed. Interesting, eh? -- Cosimo ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Slow update with simple query
Arnaud, Have you run ANALYZE on the table after creating index? Also make sure that #effective_cache_size is set properly. A higher value makes it more likely to use index scans. Thanks asif ali Arnaud Lesauvage [EMAIL PROTECTED] wrote: Ragnar a écrit : On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote: Jens Schipkowski a écrit : On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage Why is this query better than the other one ? Because it runs the (field IN ('some','other') AND field2 = 'Y') once and then executes the join with the resulting set ? True. The Subselect in FROM clause will be executed once and will be joined using the condition at where clause. So your condition at t2 is not executed for each row in t1(2mio records) but for each row in t2(1k records). And the boolean value is already set during update. OK Jens, thanks for clarifying this. I thought the planner could guess what to do in such cases. don't worry, it will. this is not your problem Indeed, the new query does not perform that well : Hash Join (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1) Hash Cond: (outer.uid = inner.uid) - Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=338) (actual time=19.342..234304.499 rows=2033001 loops=1) - Hash (cost=110.20..110.20 rows=1020 width=53) (actual time=4.853..4.853 rows=1020 loops=1) - Seq Scan on t2 (cost=0.00..110.20 rows=1020 width=53) (actual time=0.017..2.586 rows=1020 loops=1) Total runtime: 2777844.892 ms I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index containind the updated field). I believe the multicolumn-functional-index computation is taking some time here, isn't it ? Regards -- Arnaud ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate - Any questions? Get answers on any topic at Yahoo! Answers. Try it now.
[PERFORM] Insertion to temp table deteriorating over time
Hi, Our application is using Postgres 7.4 and I'd like to understand the root cause of this problem: To speed up overall insert time, our application will write thousands of rows, one by one, into a temp table (same structure as a permanent table), then do a bulk insert from the temp table to the permanent table. After this bulk insert is done, the temp table is truncated and the process is repeated. We do this because Postgres can do many individual inserts to a temp table much faster than to a permanent table. The problem we are seeing is that over time, the cost of a single insert to the temp table seems to grow. After a restart of postgres, a single insert to the temp table takes about 3ms. Over a few days, this grows to about 60ms per insert. Restarting postgres drops this insert time back to 3ms, supposedly because the temp table is re-created. Our workaround right now is to restart the database every few days, but we don't like this solution much. Any idea where the bloat is happening? I believe that if we were dropping and re-creating the temp table over and over, that could result in pg_class bloat (among other catalog tables), but what is going wrong if we use the same table over and over and truncate it? Thanks, Steve
Re: [PERFORM] Slow update with simple query
Arnaud Lesauvage [EMAIL PROTECTED] writes: Indeed, the new query does not perform that well : Hash Join (cost=112.75..307504.97 rows=2024869 width=355) (actual time=53.995..246443.811 rows=2020061 loops=1) ... Total runtime: 2777844.892 ms I removed all unnecessary indexes on t1 before running the query (I left the index on uid and the multicolumn index containind the updated field). I believe the multicolumn-functional-index computation is taking some time here, isn't it ? Given that the plan itself only takes 246 sec, there's *something* associated with row insertion that's eating the other 2500+ seconds. Either index entry computation or constraint checking ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow update with simple query
asif ali a écrit : Arnaud, Have you run ANALYZE on the table after creating index? Yes, I have ! Also make sure that #effective_cache_size is set properly. A higher value makes it more likely to use index scans. It is set to 50.000. I thought this would be enough, and maybe too much ! Thanks for your advice ! -- Arnaud ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Insertion to temp table deteriorating over time
On 12/13/06, Steven Flatt [EMAIL PROTECTED] wrote: Hi, Our application is using Postgres 7.4 and I'd like to understand the root cause of this problem: To speed up overall insert time, our application will write thousands of rows, one by one, into a temp table 1. how frequently are you commiting the transaction ? if you commit less frequetly it will be faster. 2. If you use COPY instead of INSERT it will be faster. using COPY is easy with DBD::Pg (perl). In versions 8.x i think there has been major speed improvements in COPY. I do not know the root cause of slowdown though. Regds mallah. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] New to PostgreSQL, performance considerations
At 11:11 AM 12/13/2006, Cosimo Streppone wrote: Interesting, eh? Cosimo What I find interesting is that so far Guido's C2D Mac laptop has gotten the highest values by far in this set of experiments, and no one else is even close. The slowest results, Michael's, are on the system with what appears to be the slowest CPU of the bunch; and the ranking of the rest of the results seem to similarly depend on relative CPU performance. This is not what one would naively expect when benching a IO intensive app like a DBMS. Given that the typical laptop usually has 1 HD, and a relatively modest one at that (the fastest available are SATA 7200rpm or Seagate's perpendicular recording 5400rpm) in terms of performance, this feels very much like other factors are bottlenecking the experiments to the point where Daniel's results regarding compiler options are not actually being tested. Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more representative of a typical DB server hooked up to it? Again, the best way to confirm/deny Daniel's results is to duplicate the environment he obtained those results with as closely as possible (preferably exactly) and then have someone else try to duplicate his results. Also, I think the warnings regarding proper configuration of pgbench and which version of pgbench to use are worthy of note. Do we have guidance yet as to what checkpoint_segments should be set to? Should we be considering using something other than pgbench for such experiments? Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow update with simple query
Mark Lewis [EMAIL PROTECTED] writes: But he's using 8.1.4-- in that version, an explain analyze would list the time taken to go through triggers, so the fact that we don't see any of those lines means that it can't be constraint checking, so wouldn't it have to be the index update overhead? Well, it can't be foreign key checking. Could have been an expensive function in a CHECK constraint, maybe... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] New to PostgreSQL, performance considerations
What I find interesting is that so far Guido's C2D Mac laptop has gotten the highest values by far in this set of experiments, and no one else is even close. The slowest results, Michael's, are on the system with what appears to be the slowest CPU of the bunch; and the ranking of the rest of the results seem to similarly depend on relative CPU performance. This is not what one would naively expect when benching a IO intensive app like a DBMS. Given that the typical laptop usually has 1 HD, and a relatively modest one at that (the fastest available are SATA 7200rpm or Seagate's perpendicular recording 5400rpm) in terms of performance, this feels very much like other factors are bottlenecking the experiments to the point where Daniel's results regarding compiler options are not actually being tested. Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more representative of a typical DB server hooked up to it? I've only seen pg_bench numbers 2,000 tps on either really large hardware (none of the above mentioned comes close) or the results are in memory due to a small database size (aka measuring update contention). Just a guess, but these tests (compiler opts.) seem like they sometimes show a benefit where the database is mostly in RAM (which I'd guess many people have) since that would cause more work to be put on the CPU/Memory subsystems. Other people on the list hinted at this, but I share their hypothesis that once you get IO involved as a bottleneck (which is a more typical DB situation) you won't notice compiler options. I've got a 2 socket x 2 core woodcrest poweredge 2950 with a BBC 6 disk RAID I'll run some tests on as soon as I get a chance. I'm also thinking for this test, there's no need to tweak the default config other than maybe checkpoint_segments, since I don't really want postgres using large amounts of RAM (all that does is require me to build a larger test DB). Thoughts? Thanks, Bucky ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] strange query behavior
Hi, I have a query that uses an IN clause and it seems in perform great when there is more than two values in it but if there is only one it is really slow. Also if I change the query to use an = instead of IN in the case of only one value it is still slow. Possibly I need to reindex this particular index? thanks Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] strange query behavior
On mið, 2006-12-13 at 13:42 -0500, Tim Jones wrote: I have a query that uses an IN clause and it seems in perform great when there is more than two values in it but if there is only one it is really slow. Also if I change the query to use an = instead of IN in the case of only one value it is still slow. Possibly I need to reindex this particular index? can you provide us with an EXPLAIN ANALYZE for these 2 cases? what version pg is this? does this happen only for a particular single value, or for any values? I assume you have ANALYZEd the table in question. gnari ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] strange query behavior
Tim Jones [EMAIL PROTECTED] writes: I have a query that uses an IN clause and it seems in perform great when there is more than two values in it but if there is only one it is really slow. Also if I change the query to use an = instead of IN in the case of only one value it is still slow. Please provide EXPLAIN ANALYZE output for both cases. Possibly I need to reindex this particular index? More likely you need to ANALYZE the table so that the planner has up-to-date stats ... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Optimizing a query
I've currently got this table: , | n=# \d nanpa | Table public.nanpa |Column | Type | Modifiers | +--+--- | state | character(2) | | npa| character(3) | not null | nxx| character(3) | not null | ocn| character(4) | | company| text | | ratecenter | text | | switch | text | | effective | date | | use| character(2) | not null | assign | date | | ig | character(1) | | Indexes: | nanpa_pkey PRIMARY KEY, btree (npa, nxx) CLUSTER ` and was doing queries of the form: , | select * from nanpa where npa=775 and nxx=413; ` where were quite slow. Explain showed that it was doing sequential scans even though the primary key contained the two term I was selecting on. Today, looking at it again in prep to this post, I noticed that the numbers were being converted to ::text, and a quick test showed that queries of the form: , | select * from nanpa where npa=775::bpchar and nxx=413::bpchar; ` used the index. I specified char(3) when I created the table simple because npa and nxx are defined as three-character strings. Tagging the queies is a pain, especially as I often do queries of that form in psql(1). (Incidently, there are multiple similar tables, also keyed on (npa,nxx), which show the same problem. The nanpa table above is just a good example.) Should I convert the columns to text? Or create an additional index that expects ::text args? (If so, how?) Or is there some other way to ensure the indices get used w/o having to tag data in the queries? Thanks, -JimC -- James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Optimizing a query
Have you run vacuum/analyze on the table? -- Husam -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of James Cloos Sent: Wednesday, December 13, 2006 10:48 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Optimizing a query I've currently got this table: , | n=# \d nanpa | Table public.nanpa |Column | Type | Modifiers | +--+--- | state | character(2) | | npa| character(3) | not null | nxx| character(3) | not null | ocn| character(4) | | company| text | | ratecenter | text | | switch | text | | effective | date | | use| character(2) | not null | assign | date | | ig | character(1) | | Indexes: | nanpa_pkey PRIMARY KEY, btree (npa, nxx) CLUSTER ` and was doing queries of the form: , | select * from nanpa where npa=775 and nxx=413; ` where were quite slow. Explain showed that it was doing sequential scans even though the primary key contained the two term I was selecting on. Today, looking at it again in prep to this post, I noticed that the numbers were being converted to ::text, and a quick test showed that queries of the form: , | select * from nanpa where npa=775::bpchar and nxx=413::bpchar; ` used the index. I specified char(3) when I created the table simple because npa and nxx are defined as three-character strings. Tagging the queies is a pain, especially as I often do queries of that form in psql(1). (Incidently, there are multiple similar tables, also keyed on (npa,nxx), which show the same problem. The nanpa table above is just a good example.) Should I convert the columns to text? Or create an additional index that expects ::text args? (If so, how?) Or is there some other way to ensure the indices get used w/o having to tag data in the queries? Thanks, -JimC -- James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ** ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] strange query behavior
The tables for theses queries are vacuumed and analyzed regularly. I just did an analyze to be sure and here are the results explain analyze select * from battery join observationresults on battery.batteryidentifier = observationresults.batteryidentifier left outer join observationcomment on observationresults.observationidentifier = observationcomment.observationidentifier left outer join batterycomment on battery.batteryidentifier=batterycomment.batteryidentifier where battery.batteryidentifier in (1177470, 1177469); QUERY PLAN --- Nested Loop Left Join (cost=5.03..12553.00 rows=13 width=248) (actual time=0.362..1.345 rows=30 loops=1) - Nested Loop Left Join (cost=4.01..12424.13 rows=13 width=208) (actual time=0.307..0.927 rows=30 loops=1) - Nested Loop (cost=4.01..9410.49 rows=13 width=145) (actual time=0.227..0.416 rows=30 loops=1) - Bitmap Heap Scan on battery (cost=4.01..11.64 rows=2 width=69) (actual time=0.135..0.138 rows=2 loops=1) Recheck Cond: ((batteryidentifier = 1177470) OR (batteryidentifier = 1177469)) - BitmapOr (cost=4.01..4.01 rows=2 width=0) (actual time=0.106..0.106 rows=0 loops=1) - Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091 rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) - Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1) Index Cond: (batteryidentifier = 1177469) - Index Scan using ix_obresults_bat on observationresults (cost=0.00..4682.40 rows=1362 width=76) (actual time=0.047..0.091 rows=15 loops=2) Index Cond: (outer.batteryidentifier = observationresults.batteryidentifier) - Index Scan using ix_obcomment_obid on observationcomment (cost=0.00..227.73 rows=327 width=63) (actual time=0.013..0.013 rows=0 loops=30) Index Cond: (outer.observationidentifier = observationcomment.observationidentifier) - Bitmap Heap Scan on batterycomment (cost=1.02..9.84 rows=6 width=40) (actual time=0.007..0.007 rows=0 loops=30) Recheck Cond: (outer.batteryidentifier = batterycomment.batteryidentifier) - Bitmap Index Scan on ix_batcomment (cost=0.00..1.02 rows=6 width=0) (actual time=0.005..0.005 rows=0 loops=30) Index Cond: (outer.batteryidentifier = batterycomment.batteryidentifier) Total runtime: 1.585 ms explain analyze select * from battery join observationresults on battery.batteryidentifier = observationresults.batteryidentifier left outer join observationcomment on observationresults.observationidentifier = observationcomment.observationidentifier left outer join batterycomment on battery.batteryidentifier=batterycomment.batteryidentifier where battery.batteryidentifier = 1177470; QUERY PLAN Hash Left Join (cost=4733.62..269304.43 rows=1348 width=248) (actual time=19275.506..19275.568 rows=9 loops=1) Hash Cond: (outer.batteryidentifier = inner.batteryidentifier) - Merge Right Join (cost=4723.75..269287.81 rows=1348 width=208) (actual time=19275.432..19275.473 rows=9 loops=1) Merge Cond: (outer.observationidentifier = inner.observationidentifier) - Index Scan using ix_obcomment_obid on observationcomment (cost=0.00..245841.14 rows=7484253 width=63) (actual time=0.094..13403.300 rows=4361601 loops=1) - Sort (cost=4723.75..4727.12 rows=1348 width=145) (actual time=0.270..0.278 rows=9 loops=1) Sort Key: observationresults.observationidentifier - Nested Loop (cost=0.00..4653.67 rows=1348 width=145) (actual time=0.166..0.215 rows=9 loops=1) - Index Scan using ix_battery_id on battery (cost=0.00..5.81 rows=1 width=69) (actual time=0.079..0.082 rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) - Index Scan using ix_obresults_bat on observationresults (cost=0.00..4634.38 rows=1348 width=76) (actual time=0.079..0.102 rows=9 loops=1) Index Cond: (1177470 = batteryidentifier) - Hash (cost=9.85..9.85 rows=6 width=40) (actual time=0.039..0.039 rows=0 loops=1) - Bitmap Heap Scan on batterycomment (cost=1.02..9.85 rows=6 width=40) (actual time=0.037..0.037 rows=0 loops=1) Recheck Cond: (batteryidentifier = 1177470) - Bitmap Index Scan on ix_batcomment (cost=0.00..1.02 rows=6 width=0) (actual time=0.032..0.032 rows=0 loops=1)
Re: [PERFORM] New to PostgreSQL, performance considerations
At 01:49 PM 12/13/2006, Bucky Jordan wrote: I've only seen pg_bench numbers 2,000 tps on either really large hardware (none of the above mentioned comes close) or the results are in memory due to a small database size (aka measuring update contention). Which makes a laptop achieving such numbers all the more interesting IMHO. Just a guess, but these tests (compiler opts.) seem like they sometimes show a benefit where the database is mostly in RAM (which I'd guess many people have) since that would cause more work to be put on the CPU/Memory subsystems. The cases where the working set, or the performance critical part of the working set, of the DB is RAM resident are very important ones ITRW. Other people on the list hinted at this, but I share their hypothesis that once you get IO involved as a bottleneck (which is a more typical DB situation) you won't notice compiler options. Certainly makes intuitive sense. OTOH, this list has seen discussion of what should be IO bound operations being CPU bound. Evidently due to the expense of processing pg datastructures. Only objective benches are going to tell us where the various limitations on pg performance really are. I've got a 2 socket x 2 core woodcrest poweredge 2950 with a BBC 6 disk RAID I'll run some tests on as soon as I get a chance. I'm also thinking for this test, there's no need to tweak the default config other than maybe checkpoint_segments, since I don't really want postgres using large amounts of RAM (all that does is require me to build a larger test DB). Daniel's orginal system had 512MB RAM. This suggests to me that tests involving 256MB of pg memory should be plenty big enough. Thoughts? Hope they are useful. Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] New to PostgreSQL, performance considerations
On 13.12.2006, at 19:03, Ron wrote: What I find interesting is that so far Guido's C2D Mac laptop has gotten the highest values by far in this set of experiments, and no one else is even close. This might be the case because I have tested with fsync=off as my internal harddrive would be a limiting factor and the results wouldn't be really helpful. Perhaps it's still the IO system, I don't know. I can try to reproduce the tests as close as possible again. Perhaps I had different settings on something but I doubt that. The new Core * CPUs from Intel are extremely fast with PostgreSQL. Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more representative of a typical DB server hooked up to it? I have also now an Xserve with two Dual-Core Xeons and two SAS drives (15k Seagates) in a mirrored RAID here. Will do some testing tomorrow. Btw: I always compare only to my own results to have something comparable - same test, same scripts, same db version, same operating system and so on. The rest is just pure interest. cug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizing a query
James Cloos [EMAIL PROTECTED] writes: ... and was doing queries of the form: | select * from nanpa where npa=775 and nxx=413; If those are char(3) columns, shouldn't you be quoting the constants? select * from nanpa where npa='775' and nxx='413'; Any non-numeric input will fail entirely without the quotes, and I'm also not too confident that inputs of less than three digits will work as you expect (the blank-padding might not match what's in the table). Leading zeroes would be troublesome too. OTOH, if the keys are and always will be small integers, it's just stupid not to be storing them as integers --- integer comparison is far faster than string. Postgres' data type capabilities are exceptional. Learn to work with them, not against them --- that means thinking about what the data really is and declaring it appropriately. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] strange query behavior
Version 8.1 Here are the planner constraints I believe we changed effective_cache_size and random_page_cost BTW this is an AIX 5.2 #--- # QUERY TUNING #--- # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - #effective_cache_size = 1 # typically 8KB each effective_cache_size = 40 random_page_cost = 3.8 # units are one sequential page fetch # cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5# range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 10 # range 1-1000 #constraint_exclusion = off #from_collapse_limit = 8 #join_collapse_limit = 8# 1 disables collapsing of explicit # JOINs Thanks Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 13, 2006 4:59 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior Tim Jones [EMAIL PROTECTED] writes: The tables for theses queries are vacuumed and analyzed regularly. I just did an analyze to be sure and here are the results ... There's something pretty wacko about the choice of plan in the slow case --- I don't see why it'd not have used the same plan structure as for the IN case. It's coming up with a cost a lot higher than for the other, so it certainly knows this isn't a great plan ... Which PG version is this exactly? Are you running with any nondefault planner parameters? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Optimizing a query
Husam == Tomeh, Husam [EMAIL PROTECTED] writes: Husam Have you run vacuum/analyze on the table? Yes, back when I first noticed how slow it was. It did not make any difference. explain analyze says: , | n=# explain analyse select * from nanpa where npa=775 and nxx=473; |QUERY PLAN | | Seq Scan on nanpa (cost=0.00..5344.60 rows=4 width=105) (actual time=371.718..516.816 rows=1 loops=1) |Filter: (((npa)::text = '775'::text) AND ((nxx)::text = '473'::text)) | Total runtime: 516.909 ms | (3 rows) ` vs: , | n=# explain analyse select * from nanpa where npa=775::char and nxx=473::char; | QUERY PLAN | -- | Index Scan using nanpa_pkey on nanpa (cost=0.00..4.33 rows=1 width=105) (actual time=64.831..64.831 rows=0 loops=1) |Index Cond: ((npa = '7'::bpchar) AND (nxx = '4'::bpchar)) | Total runtime: 64.927 ms | (3 rows) ` BTW, I forgot to mention I'm at 8.1.4 on that box. -JimC -- James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] strange query behavior
Tim Jones [EMAIL PROTECTED] writes: Which PG version is this exactly? Are you running with any nondefault planner parameters? Version 8.1 8.1.what? Here are the planner constraints I believe we changed effective_cache_size and random_page_cost Those look reasonably harmless. My best bet at the moment is that you've got a pretty early 8.1.x release and are hitting one of the planner bugs that we fixed earlier this year. Not enough info to say for sure though. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] strange query behavior
Looks like 8.1.2 Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 13, 2006 5:37 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior Tim Jones [EMAIL PROTECTED] writes: Which PG version is this exactly? Are you running with any nondefault planner parameters? Version 8.1 8.1.what? Here are the planner constraints I believe we changed effective_cache_size and random_page_cost Those look reasonably harmless. My best bet at the moment is that you've got a pretty early 8.1.x release and are hitting one of the planner bugs that we fixed earlier this year. Not enough info to say for sure though. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] strange query behavior
That's what I did and got 8.1.2 ... do you want gcc version etc 3.3.2 powerpc aix5.2 Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Matthew O'Connor [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 13, 2006 5:51 PM To: Tim Jones Subject: Re: [PERFORM] strange query behavior From psql perform: select version(); and send us that output. Tim Jones wrote: Looks like 8.1.2 Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 13, 2006 5:37 PM To: Tim Jones Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] strange query behavior Tim Jones [EMAIL PROTECTED] writes: Which PG version is this exactly? Are you running with any nondefault planner parameters? Version 8.1 8.1.what? Here are the planner constraints I believe we changed effective_cache_size and random_page_cost Those look reasonably harmless. My best bet at the moment is that you've got a pretty early 8.1.x release and are hitting one of the planner bugs that we fixed earlier this year. Not enough info to say for sure though. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizing a query
Your nap and nxx columns have character datatype, so you should use quotes. Try: explain analyze select * from nanpa where npa='775' and nxx='473'; If that does not work, you could try to influence the planner's execution plan to favor index scans over sequential scan by tweaking a couple of the postgres parameters, particularly, the effective_cache_size. This parameter primarily set the planner's estimates of the relative likelihood of a particular table or index being in memory, and will thus have a significant effect on whether the planner chooses indexes over seqscans. Tweaking such parameters are usually done as a last resort. -- Husam -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of James Cloos Sent: Wednesday, December 13, 2006 2:35 PM To: Tomeh, Husam Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Optimizing a query Husam == Tomeh, Husam [EMAIL PROTECTED] writes: Husam Have you run vacuum/analyze on the table? Yes, back when I first noticed how slow it was. It did not make any difference. explain analyze says: , | n=# explain analyse select * from nanpa where npa=775 and nxx=473; |QUERY PLAN | | Seq Scan on nanpa (cost=0.00..5344.60 rows=4 width=105) (actual time=371.718..516.816 rows=1 loops=1) |Filter: (((npa)::text = '775'::text) AND ((nxx)::text = '473'::text)) | Total runtime: 516.909 ms | (3 rows) ` vs: , | n=# explain analyse select * from nanpa where npa=775::char and nxx=473::char; | QUERY PLAN | -- | Index Scan using nanpa_pkey on nanpa (cost=0.00..4.33 rows=1 width=105) (actual time=64.831..64.831 rows=0 loops=1) |Index Cond: ((npa = '7'::bpchar) AND (nxx = '4'::bpchar)) | Total runtime: 64.927 ms | (3 rows) ` BTW, I forgot to mention I'm at 8.1.4 on that box. -JimC -- James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ** This message contains confidential information intended only for the use of the addressee(s) named above and may contain information that is legally privileged. If you are not the addressee, or the person responsible for delivering it to the addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message immediately thereafter. Thank you. FADLD Tag ** ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Insertion to temp table deteriorating over time
After running some further standalone tests using temp tables, I'm not convinced the problem is specific to temp table usage. In fact it looks like generic SQL activity degrades over time. Having said that, what kinds of things should I be looking for that could deteriorate/bloat over time? Ordinarily the culprit might be infrequent vacuuming or analyzing, but that wouldn't be corrected by a restart of Postgres. In our case, restarting Postgres gives us a huge performance improvement (for a short while, anyways). By the way, we are using PG 7.4.6 on FreeBSD 5.30.0170. The temp table has 15 columns: a timestamp, a double, and the rest integers. It has no indexes. Thanks, Steve On 12/13/06, Tom Lane [EMAIL PROTECTED] wrote: Steven Flatt [EMAIL PROTECTED] writes: Any idea where the bloat is happening? I believe that if we were dropping and re-creating the temp table over and over, that could result in pg_class bloat (among other catalog tables), but what is going wrong if we use the same table over and over and truncate it? That seems very strange --- I too would have expected a TRUNCATE to bring you back to ground zero performance-wise. I wonder whether the issue is not directly related to the temp table but is just some generic resource leakage problem in a very long-running backend. Have you checked to see if the backend process bloats memory-wise, or perhaps has a huge number of files open (I wonder if it could be leaking open file handles to the successive generations of the temp table)? Are you sure that the slowdown is specific to inserts into the temp table, as opposed to generic SQL activity? Also, which PG version is this exactly (7.4 is not specific enough)? On what platform? Can you show us the full schema definition for the temp table and any indexes on it? regards, tom lane
Re: [PERFORM] strange query behavior
Tim Jones [EMAIL PROTECTED] writes: [ explain results ] As best I can see, the problem is with the estimate of the size of the inner join: for two keys we have - Nested Loop (cost=4.01..9410.49 rows=13 width=145) (actual time=0.227..0.416 rows=30 loops=1) - Bitmap Heap Scan on battery (cost=4.01..11.64 rows=2 width=69) (actual time=0.135..0.138 rows=2 loops=1) Recheck Cond: ((batteryidentifier = 1177470) OR (batteryidentifier = 1177469)) - BitmapOr (cost=4.01..4.01 rows=2 width=0) (actual time=0.106..0.106 rows=0 loops=1) - Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.091..0.091 rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) - Bitmap Index Scan on ix_battery_id (cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1) Index Cond: (batteryidentifier = 1177469) - Index Scan using ix_obresults_bat on observationresults (cost=0.00..4682.40 rows=1362 width=76) (actual time=0.047..0.091 rows=15 loops=2) Index Cond: (outer.batteryidentifier = observationresults.batteryidentifier) but for one key we have - Nested Loop (cost=0.00..4653.67 rows=1348 width=145) (actual time=0.166..0.215 rows=9 loops=1) - Index Scan using ix_battery_id on battery (cost=0.00..5.81 rows=1 width=69) (actual time=0.079..0.082 rows=1 loops=1) Index Cond: (batteryidentifier = 1177470) - Index Scan using ix_obresults_bat on observationresults (cost=0.00..4634.38 rows=1348 width=76) (actual time=0.079..0.102 rows=9 loops=1) Index Cond: (1177470 = batteryidentifier) The large rowcount estimate makes it back off to a non-nestloop plan for the outer joins, and in this situation that's a loser. I'm actually not sure why they're not both too high --- with the rowcount estimate of 1362 for the inner scan in the first example, you'd expect about twice that for the join result. But the immediate problem is that in the case where it knows exactly what batteryidentifier is being probed for, it's still off by more than a factor of 100 on the rowcount estimate for observationresults. How many rows in observationresults, and may we see the pg_stats entry for observationresults.batteryidentifier? It's likely that the answer for you will be raise the statistics target for observationresults and re-ANALYZE, but I'd like to gather more info about what's going wrong first. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Insertion to temp table deteriorating over time
Steven Flatt [EMAIL PROTECTED] writes: Having said that, what kinds of things should I be looking for that could deteriorate/bloat over time? Ordinarily the culprit might be infrequent vacuuming or analyzing, but that wouldn't be corrected by a restart of Postgres. In our case, restarting Postgres gives us a huge performance improvement (for a short while, anyways). Do you actually need to restart the postmaster, or is just starting a fresh session (fresh backend) sufficient? And again, have you monitored the backend process to see if it's bloating memory-wise or open-file-wise? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Insertion to temp table deteriorating over time
Steven Flatt [EMAIL PROTECTED] writes: Having said that, what kinds of things should I be looking for that could deteriorate/bloat over time? Ordinarily the culprit might be infrequent vacuuming or analyzing, but that wouldn't be corrected by a restart of Postgres. In our case, restarting Postgres gives us a huge performance improvement (for a short while, anyways). By the way, we are using PG 7.4.6 on FreeBSD 5.30.0170. The temp table has 15 columns: a timestamp, a double, and the rest integers. It has no indexes. Hm, *are* you vacuuming only infrequently? In particular, what is your maintenance policy for pg_class? Some experimentation with TRUNCATE and VACUUM VERBOSE shows that in 7.4, a TRUNCATE of a temp table with no indexes and no toast table generates three dead row versions in pg_class. (I'm surprised that it's as many as three, but in any case the TRUNCATE would certainly have to do one update of the table's pg_class entry and thereby generate one dead row version.) If you're being sloppy about vacuuming pg_class, then over time the repeated-truncate pattern would build up a huge number of dead rows in pg_class, *all with the same OID*. It's unsurprising that this would create some slowness in looking up the temp table's pg_class entry. If this theory is correct, the reason that starting a fresh backend makes it fast again is that the new backend creates a whole new temp table with a new OID assigned, and so the adjacent litter in pg_class doesn't matter anymore (or not so much anyway). Solution would be to institute regular vacuuming of the system catalogs... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] New to PostgreSQL, performance considerations
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: While skimming over the pgbench source it has looked to me like it's necessary to pass the -s switch (scale factor) to both the initialization (-i) and the subsequent (non -i) runs. No, it's not supposed to be, and I've never found it needed in practice. The code seems able to pull the scale out of the database (I forget how it figures it out exactly). pgbench is designed to be a general benchmark, meanining it exercises all parts of the system. I am thinking just reexecuting a single SELECT over and over again would be a better test of the CPU optimizations. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] New to PostgreSQL, performance considerations
[offtopic]; hmm quite a long thread below is stats of posting Total Messages:87Total Participants: 27 - 19 Daniel van Ham Colchete 12 Michael Stone 9 Ron 5 Steinar H. Gunderson 5 Alexander Staubo 4 Tom Lane 4 Greg Smith 3 Luke Lonergan 3 Christopher Browne 2 Merlin Moncure 2 Guido Neitzer 2 Dave Cramer 2 Cosimo Streppone 2 Bucky Jordan 1 Tatsuo Ishii 1 Sven Geisler 1 Shane Ambler 1 Michael Glaesemann 1 Mark Kirkwood 1 Gene 1 Florian Weimer 1 David Boreham 1 Craig A. James 1 Chris Browne 1 Brad Nicholson 1 Bill Moran 1 Alvaro Herrera --- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] New to PostgreSQL, performance considerations
Bruce, pgbench is designed to be a general benchmark, meanining it exercises all parts of the system. I am thinking just reexecuting a single SELECT over and over again would be a better test of the CPU optimizations. Mostly, though, pgbench just gives the I/O system a workout. It's not a really good general workload. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] New to PostgreSQL, performance considerations
On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote: Bruce, pgbench is designed to be a general benchmark, meanining it exercises all parts of the system. I am thinking just reexecuting a single SELECT over and over again would be a better test of the CPU optimizations. Mostly, though, pgbench just gives the I/O system a workout. It's not a really good general workload. It also will not utilize all cpus on a many cpu machine. We recently found that the only way to *really* test with pgbench was to actually run 4+ copies of pgbench at the same time. J -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] New to PostgreSQL, performance considerations
Joshua D. Drake [EMAIL PROTECTED] writes: On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote: Mostly, though, pgbench just gives the I/O system a workout. It's not a really good general workload. It also will not utilize all cpus on a many cpu machine. We recently found that the only way to *really* test with pgbench was to actually run 4+ copies of pgbench at the same time. The pgbench app itself becomes the bottleneck at high transaction rates. Awhile back I rewrote it to improve its ability to issue commands concurrently, but then desisted from submitting the changes --- if we change the app like that, future numbers would be incomparable to past ones, which sort of defeats the purpose of a benchmark no? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 14, 2006, at 14:44 , Tom Lane wrote: The pgbench app itself becomes the bottleneck at high transaction rates. Awhile back I rewrote it to improve its ability to issue commands concurrently, but then desisted from submitting the changes --- if we change the app like that, future numbers would be incomparable to past ones, which sort of defeats the purpose of a benchmark no? At the same time, if the current pgbench isn't the tool we want to use, is this kind of backward comparison going to hinder any move to improve it? It sounds like there's quite a bit of room for improvement in pg_bench, and in my opinion we should move forward to make an improved tool, one that measures what we want to measure. And while comparison with past results might not be possible, there remains the possibility of rerunning the improved pgbench on previous systems, I should think. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] File Systems Compared
On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote: On Wed, Dec 06, 2006 at 08:55:14 -0800, Mark Lewis [EMAIL PROTECTED] wrote: Anyone run their RAIDs with disk caches enabled, or is this akin to having fsync off? Disk write caches are basically always akin to having fsync off. The only time a write-cache is (more or less) safe to enable is when it is backed by a battery or in some other way made non-volatile. So a RAID controller with a battery-backed write cache can enable its own write cache, but can't safely enable the write-caches on the disk drives it manages. This appears to be changing under Linux. Recent kernels have write barriers implemented using cache flush commands (which some drives ignore, so you need to be careful). In very recent kernels, software raid using raid 1 will also handle write barriers. To get this feature, you are supposed to mount ext3 file systems with the barrier=1 option. For other file systems, the parameter may need to be different. But would that actually provide a meaningful benefit? When you COMMIT, the WAL data must hit non-volatile storage of some kind, which without a BBU or something similar, means hitting the platter. So I don't see how enabling the disk cache will help, unless of course it's ignoring fsync. Now, I have heard something about drives using their stored rotational energy to flush out the cache... but I tend to suspect urban legend there... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] New to PostgreSQL, performance considerations
Benchmarks, like any other SW, need modernizing and updating from time to time. Given the multi-core CPU approach to higher performance as the current fad in CPU architecture, we need a benchmark that is appropriate. If SPEC feels it is appropriate to rev their benchmark suite regularly, we probably should as well. Ron Peacetree At 12:44 AM 12/14/2006, Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote: Mostly, though, pgbench just gives the I/O system a workout. It's not a really good general workload. It also will not utilize all cpus on a many cpu machine. We recently found that the only way to *really* test with pgbench was to actually run 4+ copies of pgbench at the same time. The pgbench app itself becomes the bottleneck at high transaction rates. Awhile back I rewrote it to improve its ability to issue commands concurrently, but then desisted from submitting the changes --- if we change the app like that, future numbers would be incomparable to past ones, which sort of defeats the purpose of a benchmark no? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] New to PostgreSQL, performance considerations
(Re)-Design it to do both, unless there's reason to believe that doing one after the other would skew the results. Then old results are available, new results are also visible and useful for future comparisons. And seeing them side by side mught be an interesting exercise as well, at least for a while. (sorry for top-posting -- web based interface that doesn't do proper quoting) Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Michael Glaesemann Sent: Wed 12/13/2006 10:11 PM To: Tom Lane Cc: Joshua D. Drake; Josh Berkus; pgsql-performance@postgresql.org; Bruce Momjian; Alvaro Herrera; Alexander Staubo; Michael Stone Subject:Re: [PERFORM] New to PostgreSQL, performance considerations On Dec 14, 2006, at 14:44 , Tom Lane wrote: The pgbench app itself becomes the bottleneck at high transaction rates. Awhile back I rewrote it to improve its ability to issue commands concurrently, but then desisted from submitting the changes --- if we change the app like that, future numbers would be incomparable to past ones, which sort of defeats the purpose of a benchmark no? At the same time, if the current pgbench isn't the tool we want to use, is this kind of backward comparison going to hinder any move to improve it? It sounds like there's quite a bit of room for improvement in pg_bench, and in my opinion we should move forward to make an improved tool, one that measures what we want to measure. And while comparison with past results might not be possible, there remains the possibility of rerunning the improved pgbench on previous systems, I should think. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster --- Click link below if it is SPAM [EMAIL PROTECTED] https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4580ea76236074356172766[EMAIL PROTECTED]retrain=spamtemplate=historyhistory_page=1 !DSPAM:4580ea76236074356172766! --- ---(end of broadcast)--- TIP 6: explain analyze is your friend