[PERFORM] lowering priority automatically at connection
Hi, I find this very helpful: Lowering the priority of a PostgreSQL query http://weblog.bignerdranch.com/?p=11 Now I was wondering whether one could have a SELECT pg_setpriority(10); executed automatically each time a certain user connects (not necessarily using psql)? Any ideas if and how this might be possible? Regards :) Chris. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] lowering priority automatically at connection
Chris Mair [EMAIL PROTECTED] writes: I find this very helpful: Lowering the priority of a PostgreSQL query http://weblog.bignerdranch.com/?p=11 That guy doesn't actually have the foggiest idea what he's doing. The reason there is no built-in capability to do that is that it *does not work well*. Search the list archives for priority inversion to find out why not. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optimizing a huge_table/tiny_table join
On 5/24/06, Tom Lane [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] writes: Limit (cost=19676.75..21327.99 rows=6000 width=84) - Hash Join (cost=19676.75..1062244.81 rows=3788315 width=84) Hash Cond: (upper((outer.id)::text) upper((inner.id)::text)) - Seq Scan on huge_table h (cost= 0.00..51292.43 rows=2525543 width=46) - Hash (cost=19676.00..19676.00 rows=300 width=38) - Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38) Um, if huge_table is so much bigger than tiny_table, why are the cost estimates for seqscanning them only about 2.5x different? There's something wacko about your statistics, methinks. Well, they're not my statistics; they're explain's. You mean there's a bug in explain? I agree that it makes no sense that the costs don't differ as much as one would expect, but you can see right there the numbers of rows for the two tables. At any rate, how would one go about finding an explanation for these strange stats? More bewildering still (and infuriating as hell--because it means that all of my work for yesterday has been wasted) is that I can no longer reproduce the best query plan I posted earlier, even though the tables have not changed at all. (Hence I can't post the explain analyze for the best query plan, which Josh Drake asked for.) No matter what value I use for LIMIT, the query planner now insists on sequentially scanning huge_table and ignoring the available index. (If I turn off enable_seqscan, I get the second worst query plan I posted yesterday.) Anyway, I take it that there is no way to bypass the optimizer and instruct PostgreSQL exactly how one wants the search performed? Thanks! kj ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] lowering priority automatically at connection
I find this very helpful: Lowering the priority of a PostgreSQL query http://weblog.bignerdranch.com/?p=11 That guy doesn't actually have the foggiest idea what he's doing. The reason there is no built-in capability to do that is that it *does not work well*. Search the list archives for priority inversion to find out why not. regards, tom lane Ok, I've learned something new (*). I'll drop that idea :) Bye, Chris. (*) http://en.wikipedia.org/wiki/Priority_inversion ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Optimizing a huge_table/tiny_table join
On Thu, May 25, 2006 at 12:31:04PM -0400, [EMAIL PROTECTED] wrote: Well, they're not my statistics; they're explain's. You mean there's Explain doesn't get them from nowhere. How often is the table being ANALYSEd? More bewildering still (and infuriating as hell--because it means that all of my work for yesterday has been wasted) is that I can no longer reproduce the best query plan I posted earlier, even though the tables have not changed at all. (Hence I can't post the explain analyze for I find that very hard to believe. Didn't change _at all_? Are you sure no VACUUMs or anything are happening automatically? Anyway, I take it that there is no way to bypass the optimizer and instruct PostgreSQL exactly how one wants the search performed? No, there isn't. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] lowering priority automatically at connection
On Thu, May 25, 2006 at 06:16:24PM +0200, Chris Mair wrote: I find this very helpful: Lowering the priority of a PostgreSQL query http://weblog.bignerdranch.com/?p=11 Now I was wondering whether one could have a SELECT pg_setpriority(10); executed automatically each time a certain user connects (not necessarily using psql)? Beware that setting priorities can have unintended, adverse effects. Use a search engine to find information about priority inversion before deciding that query priorities are a good idea. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Optimizing a huge_table/tiny_table join
On 5/25/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Well, they're not my statistics; they're explain's. You mean there's a bug in explain? I agree that it makes no sense that the costs don't differ as much as one would expect, but you can see right there the numbers of rows for the two tables. At any rate, how would one go about finding an explanation for these strange stats? Well, the query planner uses statistics to deduce the best plan possible. Explain includes this statistical data in its output. See: http://www.postgresql.org/docs/8.1/interactive/planner-stats.html ...for information about what it is all about. The idea is that your statistics are probably not detailed enough to help the planner. See ALTER TABLE SET STATISTICS to change that. More bewildering still (and infuriating as hell--because it means that all of my work for yesterday has been wasted) is that I can no longer reproduce the best query plan I posted earlier, even though the tables have not changed at all. (Hence I can't post the explain analyze for the best query plan, which Josh Drake asked for.) No matter what value I use for LIMIT, the query planner now insists on sequentially scanning huge_table and ignoring the available index. (If I turn off enable_seqscan, I get the second worst query plan I posted yesterday.) Anyway, I take it that there is no way to bypass the optimizer and instruct PostgreSQL exactly how one wants the search performed? There is no way to bypass. But there are many ways to tune it. Hmm, there is a probability (though statistics are more probable go) that you're using some older version of PostgreSQL, and you're hitting same problem as I did: http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php Tom has provided back then a patch, which fixed it: http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php ...but I don't remember when it made into release. Regfa ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] is it possible to make this faster?
been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; t has an index on a,b,c. in my sample case with cardinality of 1000 for a, 2000 for b, and 30 records in t, pgsql does a seq. scan on dev box in about a second (returning 2000 records). recent versions of mysql do much better, returning same set in 20ms. mysql explain says it uses an index to optimize the group by somehow. is there a faster way to write this query? Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] is it possible to make this faster?
On Thu, May 25, 2006 at 16:07:19 -0400, Merlin Moncure [EMAIL PROTECTED] wrote: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; t has an index on a,b,c. in my sample case with cardinality of 1000 for a, 2000 for b, and 30 records in t, pgsql does a seq. scan on dev box in about a second (returning 2000 records). recent versions of mysql do much better, returning same set in 20ms. mysql explain says it uses an index to optimize the group by somehow. is there a faster way to write this query? SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC; ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] is it possible to make this faster?
On 5/25/06, Bruno Wolff III [EMAIL PROTECTED] wrote: On Thu, May 25, 2006 at 16:07:19 -0400, Merlin Moncure [EMAIL PROTECTED] wrote: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC; that is actually slower than group by in my case...am i missing something? (both essentially resolved to seq_scan) merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] is it possible to make this faster?
On Thu, May 25, 2006 at 04:07:19PM -0400, Merlin Moncure wrote: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; t has an index on a,b,c. The planner _should_ TTBOMK be able to do it by itself in 8.1, but have you tried something along the following lines? select a,b,(select c from t t2 order by c desc where t1.a=t2.a and t1.b=t2.b) from t t1 group by a,b; /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] is it possible to make this faster?
On May 25, 2006 01:31 pm, Merlin Moncure [EMAIL PROTECTED] wrote: SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC; that is actually slower than group by in my case...am i missing something? (both essentially resolved to seq_scan) Try it with an index on a,b,c. -- Alan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] is it possible to make this faster?
On Thu, May 25, 2006 at 16:31:40 -0400, Merlin Moncure [EMAIL PROTECTED] wrote: On 5/25/06, Bruno Wolff III [EMAIL PROTECTED] wrote: On Thu, May 25, 2006 at 16:07:19 -0400, Merlin Moncure [EMAIL PROTECTED] wrote: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC; that is actually slower than group by in my case...am i missing something? (both essentially resolved to seq_scan) If there aren't many c's for each (a,b), then a sort might be the best way to do this. I don't remember if skip scanning ever got done, but if it did, it would have been 8.1 or later. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] is it possible to make this faster?
Merlin Moncure [EMAIL PROTECTED] writes: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; t has an index on a,b,c. The index won't help, as per this comment from planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. Given the numbers you mention (300k rows in 2000 groups) I'm not convinced that an index-based implementation would help much; we'd still need to fetch at least one record out of every 150, which is going to cost near as much as seqscanning all of them. recent versions of mysql do much better, returning same set in 20ms. Well, since they don't do MVCC they can answer this query from the index without going to the heap at all. But that still seems remarkably fast for something that has to grovel through 300k index entries. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] is it possible to make this faster?
On 5/25/06, Steinar H. Gunderson [EMAIL PROTECTED] wrote: On Thu, May 25, 2006 at 04:07:19PM -0400, Merlin Moncure wrote: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; select a,b,(select c from t t2 order by c desc where t1.a=t2.a and t1.b=t2.b) from t t1 group by a,b; this came out to a tie with the group by approach, although it produced a different (but similar) plan. we are still orders of magnitude behind mysql here. Interestingly, if I extract out the distinct values of a,b to a temp table and rejoin to t using your approach, I get competitive times with mysql. this means the essential problem is: select a,b from t group by a,b is slow. This feels like the same penalty for mvcc we pay with count(*)...hm. merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] is it possible to make this faster?
On Thu, May 25, 2006 at 04:54:09PM -0400, Merlin Moncure wrote: select a,b,(select c from t t2 order by c desc where t1.a=t2.a and t1.b=t2.b) from t t1 group by a,b; this came out to a tie with the group by approach, although it produced a different (but similar) plan. we are still orders of magnitude behind mysql here. Actually, it _should_ produce a syntax error -- it's missing a LIMIT 1 in the subquery. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] is it possible to make this faster?
Tom Lane [EMAIL PROTECTED] writes: Merlin Moncure [EMAIL PROTECTED] writes: recent versions of mysql do much better, returning same set in 20ms. Well, since they don't do MVCC they can answer this query from the index without going to the heap at all. But that still seems remarkably fast for something that has to grovel through 300k index entries. Are you sure you measured that right? I tried to duplicate this using mysql 5.0.21, and I see runtimes of 0.45 sec without an index and 0.15 sec with. This compares to psql times around 0.175 sec. Doesn't look to me like we're hurting all that badly, even without using the index. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] is it possible to make this faster?
On Thu, 2006-05-25 at 15:52, Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; t has an index on a,b,c. The index won't help, as per this comment from planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. Given the numbers you mention (300k rows in 2000 groups) I'm not convinced that an index-based implementation would help much; we'd still need to fetch at least one record out of every 150, which is going to cost near as much as seqscanning all of them. recent versions of mysql do much better, returning same set in 20ms. Well, since they don't do MVCC they can answer this query from the index without going to the heap at all. But that still seems remarkably fast for something that has to grovel through 300k index entries. Well, they do, just with innodb tables. Merlin, have you tried this against innodb tables to see what you get? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] is it possible to make this faster?
On Thu, 2006-05-25 at 16:52 -0400, Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: been doing a lot of pgsql/mysql performance testing lately, and there is one query that mysql does much better than pgsql...and I see it a lot in normal development: select a,b,max(c) from t group by a,b; t has an index on a,b,c. The index won't help, as per this comment from planagg.c: * We don't handle GROUP BY, because our current implementations of * grouping require looking at all the rows anyway, and so there's not * much point in optimizing MIN/MAX. Given the numbers you mention (300k rows in 2000 groups) I'm not convinced that an index-based implementation would help much; we'd still need to fetch at least one record out of every 150, which is going to cost near as much as seqscanning all of them. Well, if the MySQL server has enough RAM that the index is cached (or index + relevant chunks of data file if using InnoDB?) then that would explain how MySQL can use an index to get fast results. -- Mark Lewis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizing a huge_table/tiny_table join
On May 25, 2006, at 12:07 PM, Dawid Kuroczko wrote: On 5/25/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Well, they're not my statistics; they're explain's. You mean there's a bug in explain? I agree that it makes no sense that the costs don't differ as much as one would expect, but you can see right there the numbers of rows for the two tables. At any rate, how would one go about finding an explanation for these strange stats? Well, the query planner uses statistics to deduce the best plan possible. Explain includes this statistical data in its output. See: http://www.postgresql.org/docs/8.1/interactive/planner-stats.html ...for information about what it is all about. The idea is that your statistics are probably not detailed enough to help the planner. See ALTER TABLE SET STATISTICS to change that. http://www.pervasive-postgres.com/lp/newsletters/2006/ Insights_postgres_Mar.asp#4 might also be worth your time to read. Hmm, there is a probability (though statistics are more probable go) that you're using some older version of PostgreSQL, and you're hitting same problem as I did: http://archives.postgresql.org/pgsql-performance/2005-07/msg00345.php Tom has provided back then a patch, which fixed it: http://archives.postgresql.org/pgsql-performance/2005-07/msg00352.php ...but I don't remember when it made into release. According to cvs, it's been in since 8.1 and 8.0.4: Revision 1.111.4.2: download - view: text, markup, annotated - select for diffs Fri Jul 22 19:12:33 2005 UTC (10 months ago) by tgl Branches: REL8_0_STABLE CVS tags: REL8_0_8, REL8_0_7, REL8_0_6, REL8_0_5, REL8_0_4 Diff to: previous 1.111.4.1: preferred, colored; branchpoint 1.111: preferred, colored; next MAIN 1.112: preferred, colored Changes since revision 1.111.4.1: +18 -37 lines Fix compare_fuzzy_path_costs() to behave a bit more sanely. The original coding would ignore startup cost differences of less than 1% of the estimated total cost; which was OK for normal planning but highly not OK if a very small LIMIT was applied afterwards, so that startup cost becomes the name of the game. Instead, compare startup and total costs fuzzily but independently. This changes the plan selected for two queries in the regression tests; adjust expected-output files for resulting changes in row order. Per reports from Dawid Kuroczko and Sam Mason. Revision 1.124: download - view: text, markup, annotated - select for diffs Fri Jul 22 19:12:01 2005 UTC (10 months ago) by tgl Branches: MAIN CVS tags: REL8_1_0BETA3, REL8_1_0BETA2, REL8_1_0BETA1 Diff to: previous 1.123: preferred, colored Changes since revision 1.123: +18 -37 lines Fix compare_fuzzy_path_costs() to behave a bit more sanely. The original coding would ignore startup cost differences of less than 1% of the estimated total cost; which was OK for normal planning but highly not OK if a very small LIMIT was applied afterwards, so that startup cost becomes the name of the game. Instead, compare startup and total costs fuzzily but independently. This changes the plan selected for two queries in the regression tests; adjust expected-output files for resulting changes in row order. Per reports from Dawid Kuroczko and Sam Mason. -- Jim C. Nasby, Database Architect[EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] is it possible to make this faster?
Jim Nasby [EMAIL PROTECTED] writes: On May 25, 2006, at 4:11 PM, Tom Lane wrote: Are you sure you measured that right? I tried to duplicate this using mysql 5.0.21, and I see runtimes of 0.45 sec without an index and 0.15 sec with. This compares to psql times around 0.175 sec. Doesn't look to me like we're hurting all that badly, even without using the index. Well, that would depend greatly on how wide the rows were, and I don't believe the OP ever mentioned that. If he's got a nice, fat varchar(1024) in that table, then it's not surprising that an index would help things. Wide rows might slow down the psql side of things somewhat (though probably not as much as you think). That doesn't account for the discrepancy in our mysql results though. For the record, I was testing with a table like create table t(a int, b int, c int); create index ti on t(a,b,c); regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] lowering priority automatically at connection
That guy doesn't actually have the foggiest idea what he's doing. The reason there is no built-in capability to do that is that it *does not work well*. Search the list archives for priority inversion to find out why not. http://en.wikipedia.org/wiki/Priority_inversion ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] is it possible to make this faster?
Also, are you sure your numbers are not coming out of the mysql query cache? That might explain some of it - also with Tom seeing comprable numbers in his test. -- Jeff Trout [EMAIL PROTECTED] http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] is it possible to make this faster?
Jeff - [EMAIL PROTECTED] writes: Also, are you sure your numbers are not coming out of the mysql query cache? That might explain some of it - also with Tom seeing comprable numbers in his test. Indeed, enabling the mysql query cache makes the timings drop to nil ... as long as I present a query that's strcmp-equal to the last one (not different in whitespace for instance). regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] is it possible to make this faster?
On 5/25/06, Tom Lane [EMAIL PROTECTED] wrote: Tom Lane [EMAIL PROTECTED] writes: Merlin Moncure [EMAIL PROTECTED] writes: recent versions of mysql do much better, returning same set in 20ms. Are you sure you measured that right? I tried to duplicate this using mysql 5.0.21, and I see runtimes of 0.45 sec without an index and 0.15 sec with. This compares to psql times around 0.175 sec. Doesn't look to me like we're hurting all that badly, even without using the index. Well, my numbers were approximate, but I tested on a few different machines. the times got closer as the cpu speed got faster. pg really loves a quick cpu. on 600 mhz p3 I got 70ms on mysql and 1050ms on pg. Mysql query cache is always off for my performance testing. My a and b columns were ID columns from another table, so I rewrote the join and now pg is smoking mysql (again). To quickly answer the other questions: 1. no, not testing innodb 2, rows are narrow Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly