Re: [SQL] LEAST and GREATEST functions?
Greg Stark wrote: MAX and MIN are single-parameter aggregate functions. LEAST and GREATEST are two-parameter (though in postgres they could be defined for 3 and more parameters) scalar functions. If LEAST and GREATEST can accept any number of parameters, wouldn't it make sense to code it like the way COALESCE works, rather than defining a function for it? This way we don't need define all the various functions with different types. e.g. SELECT greatest(a, b, c) FROM bar; becomes SELECT greatest(a, greatest(b, c)) from bar; becomes SELECT CASE WHEN b < c THEN CASE WHEN c < a THEN a ELSE c END ELSE CASE WHEN b < a THEN a ELSE b END END FROM bar; From the docs: COALESCE and NULLIF are just shorthand for CASE expressions. They are actually converted into CASE expressions at a very early stage of processing, and subsequent processing thinks it is dealing with CASE. Thus an incorrect COALESCE or NULLIF usage may draw an error message that refers to CASE. -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 11:30am up 188 days, 2:35, 5 users, load average: 5.19, 5.08, 5.02 pgp0.pgp Description: PGP signature
Re: [SQL] LEAST and GREATEST functions?
Tom Lane wrote: But COALESCE is a special feature hard-wired into the parser. There's no free lunch --- you pay for your extensibility somewhere. That's what I'm suggesting: hard-wiring LEAST and GREATEST into the parser. 7.5, maybe? The question is: is it worth hard-wiring vs functions? (time passes) Doesn't seem to be in SQL92. It's in Oracle, Interbase and MySQL, though, says google. I'd say we need to have LEAST and GREATEST at least somewhere in contrib (as functions) if not core, to make transition from other RDBMS to postgresql easier. A brief test shows that we would incur quite a performance penalty (I compared COALESCE with coalesce_sql_function) if it isn't hardwiring. -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 1:30pm up 188 days, 4:35, 4 users, load average: 5.03, 5.06, 5.08 pgp0.pgp Description: PGP signature
Re: [SQL] LEAST and GREATEST functions?
Tom Lane wrote: create function greatest(anyelement, anyelement) returns anyelement as 'select case when $1>$2 then $1 else $2 end' language sql; Any chance of this making it into 7.4's contrib? Maybe with enough documentation to make it a tutorial for PostgreSQL's user functions? You do have to create several greatest() functions for different numbers of arguments, but not one for each datatype you want to handle. Insignificant, compared with the flexiblity. I have not seen enough requests for a native LEAST/GREATEST implementation to make me think we need to do more than this... certainly I'd rather spend development effort on general facilities like polymorphism and inlining than on creating one-use facilities like built-in LEAST/GREATEST. Nice. It would speed up our current functions too. Thanks, developers, esp. Tom and Joe for this! Wow, less reasons to use VIEWs when CREATE FUNCTION can be just as fast, but more flexible. Can't wait, IMHO, the advocacy people can and should be promoting this(functions returning sets, and how it can be used) as a killer feature for 7.3 and 7.4. I know I was pretty happy to discover that gem lurking in the documentation in 7.3. -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 2:30pm up 188 days, 5:35, 4 users, load average: 5.04, 5.15, 5.16 pgp0.pgp Description: PGP signature
Re: [SQL] Non-Blocking Locks (i.e. Oracle NOWAIT)
Christoph Haller wrote: PostgreSQL 7.3.2 T1: begin; select * from foo for update; T2: set STATEMENT_TIMEOUT = 1000; -- milliseconds Seems like setting it to 1 (ms) emulates the NOWAIT condition better. Can't set it to 0, though. T2: select * from foo for update; T2: ERROR: Query was cancelled. How picky are we in borrowing syntax and idiocyncracies (?) from other DBs? Since we can closely emulate Oracle's behaviour on SELECT ... FOR UPDATE NOWAIT, should we just add NOWAIT to the next (non-feature frozen version) of postgres? -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 10:30am up 202 days, 1:35, 6 users, load average: 5.40, 5.14, 5.05 pgp0.pgp Description: PGP signature
[SQL] Hash Join not using hashed index?
I'm using Postgresql 7.02. == # explain select city.name, country.name from country, city where city.country_id = country.country_id; NOTICE: QUERY PLAN: Hash Join (cost=8.85..16.76 rows=75 width=18) -> Seq Scan on city (cost=0.00..1.75 rows=75 width=16) -> Hash (cost=5.53..5.53 rows=253 width=2) -> Seq Scan on country (cost=0.00..5.53 rows=253 width=2) EXPLAIN # create index country_id_idx on country using hash (country_id); CREATE # vacuum analyze; VACUUM # explain select city.name, country.name from country, city where city.country_id = country.country_id; NOTICE: QUERY PLAN: Hash Join (cost=8.85..16.76 rows=75 width=18) -> Seq Scan on city (cost=0.00..1.75 rows=75 width=16) -> Hash (cost=5.53..5.53 rows=253 width=2) -> Seq Scan on country (cost=0.00..5.53 rows=253 width=2) EXPLAIN == Why doesn't PostgreSQL use country_id_idx, but rather rehashing country_id?
Re: [SQL] Hash Join not using hashed index?
On Wed, Jun 28, 2000 at 10:56:17AM -0400, Tom Lane wrote: > Ang Chin Han <[EMAIL PROTECTED]> writes: > If it was like that then a hash index wouldn't have been applicable > anyway; hashes are only good for strict equality checks. If you want > something that can do ordering checks you need a btree index. > > (There are good reasons why btree is the default index type ;-)) There _was_ a btree index, before I added the extra hash index: pintoo=# \dcountry_pkey Index "country_pkey" Attribute | Type +-- country_id | smallint unique btree (primary key) > > Original cost est: > > Hash Join (cost=8.85..16.76 rows=75 width=18) > > -> Seq Scan on city (cost=0.00..1.75 rows=75 width=16) > > -> Hash (cost=5.53..5.53 rows=253 width=2) > >-> Seq Scan on country (cost=0.00..5.53 rows=253 width=2) > > > I guess the problem is that country-city is a one-to-many relation, > > BUT I've more countries than cities (note the # of rows above), thus > > throwing the planner off... > > Off what? This looks like a pretty reasonable plan to me, given the > fairly small table sizes. Do you have evidence that another plan > type would be quicker for this problem? No evidence, but I was hoping that having a prehashed country_id would speed things up a bit, since the seq scan on country could be redundant, requring only a seq scan on city and a index (hash) lookup on country. Or maybe this is a related question (just curious): pintoo=# explain select country_id from country order by country_id; NOTICE: QUERY PLAN: Sort (cost=15.63..15.63 rows=253 width=2) -> Seq Scan on country (cost=0.00..5.53 rows=253 width=2) pintoo=# explain select name from country order by name; NOTICE: QUERY PLAN: Sort (cost=15.63..15.63 rows=253 width=12) -> Seq Scan on country (cost=0.00..5.53 rows=253 width=12) If there is already in b-tree index on country_id, why bother re-sorting it, when it could be output'd by traversing the tree? Comparing with an unindexed column, we can see that the index is not used at all.
Re: [SQL] Hash Join not using hashed index?
On Wed, Jun 28, 2000 at 03:00:04AM -0400, Tom Lane wrote: > Hash joins don't have anything to do with hash indexes. > A hash join is a join that makes use of a temporary hashtable > built on-the-fly *in memory* for that join. Oh, I see. > The planner could choose to use an indexscan on a hash index > as an input for the join, but it'd only be likely to do so > if there is a restriction clause matching the index. In your > example you have only a join WHERE clause. Well, in my original query, there was, but the plan's the same. Probably the clause wasn't restrictive enough (" and region < n"). Original cost est: Hash Join (cost=8.85..16.76 rows=75 width=18) -> Seq Scan on city (cost=0.00..1.75 rows=75 width=16) -> Hash (cost=5.53..5.53 rows=253 width=2) -> Seq Scan on country (cost=0.00..5.53 rows=253 width=2) I guess the problem is that country-city is a one-to-many relation, BUT I've more countries than cities (note the # of rows above), thus throwing the planner off... OTOH, what's bugging me is that Postgresql could have used pre-generated hash index rather rebuilding it on the fly again. > Plain btree indexes on city.country_id and country.country_id > might work better --- at least they'd offer the option of > a merge join without doing explicit sort. I tried, and it did worse. Hmmm... I think I'm better off creating a temporary table to store the results, since the table is seldom updated but that query is run often. Rules to update that temp. table, too, of course. (cost is now 1.75, if anyone cares)
[SQL] Functions too slow, even with iscachable?
I have a query which runs fast: SELECT passwd FROM ticket WHERE name = ticket2name('test-006kdt') AND survey_id = ticket2survey_id('test-006kdt'); But slows down to a crawl when I wrapped it in a function: CREATE FUNCTION ticket2passwd(text) RETURNS text AS 'SELECT passwd FROM ticket WHERE name = ticket2name($1) AND ticket2survey_id($1)' LANGUAGE 'sql' WITH (iscachable); which should be a shortform for the first query: SELECT ticket2passwd('test-006kdt'); Any ideas? Thanks in advance. ticket2name and ticket2survey_id are both iscachable. BTW, pg_dump doesn't seem to preserve the iscachable attribute. Bug?
Re: [SQL] Functions too slow, even with iscachable?
On Mon, Aug 07, 2000 at 10:58:27AM -0400, Tom Lane wrote: > (I assume the lack of "survey_id =" here is just a cut-and-paste error?) Yup. :) > I think what you're getting bit by is that the optimizer doesn't > recognize "var = function(param)" as being a potential indexscan clause. > Does EXPLAIN show that the first query is producing an indexscan plan? It did. I'll try to make up a reproducible test case, if you need it. > I have not tried it, but I think you could get around this problem in > plpgsql, along the lines of > tmp1 = ticket2name($1); > tmp2 = ticket2survey_id($1); > SELECT passwd FROM ticket WHERE name = tmp1 AND survey_id = tmp2; > since the tmp vars will look like params to the optimizer and "var = param" > is indexable. Yup, it did work. Thanks! > Looks like we need to teach the optimizer that expressions involving > params can be treated like simple params for the purposes of > optimization. That'll be good. Anything to speed up the stored procedures are good: encourage people to put logic processing into the RDBMS where it should be.
[SQL] Aggregate functions, fast! (long)
Apologies in advance for the length of this post, but this has been bugging me for a week or so. Consider a table with a primary key pk and a list of attributes a, b and c: Table t pk a b c --- 1 1 1 1 2 1 2 3 : etc : 9998 1 1 1 2 1 2 If the table is a large but the range of possible values for the attributes small, aggregate queries such as the following with be very slow: 1. select min(a) from t; 2. select count(*) from t where a = 1 and b = 3; 3. select sum(d) from t where a = 1 and c > 3; 4. select avg(b) from t where c = 1; One way of speeding these type of queries is to have a table summarizing that table t: Table t_sum a b ccnt -- 1 1 1 2 (This is from pk 1 and 9998 having the same a,b,c) 1 2 3 1 2 1 2 1 : : etc: with a primary key of (a, b, c) and an integer cnt counting the number of times a particular combination of (a, b, c) occuring in table t The queries making use of these might be rewritten as: 1. select min(a) from t_sum; -- same as above, -- but we've less rows to scan 2. select cnt from t_sum where a = 1 and b = 3; 3. select sum(d * cnt) as sum from t_sum where a = 1 and c > 3; 4. select sum(b * cnt) / cnt as avg from t_sum where c = 1; (CAVEAT/BUG: 1. must return cnt = 0 if it doesn't exist in t_sum 2. rows with cnt = 0 will have to be deleted immediately or select min(foo) might return the wrong result) Now, t_sum can be automatically updated by triggers/rules (I'll get into this later) It might seem a bit pointless but I've made the example very generic to show that this _is_ a generic class of problem. Specific examples might include: select count(*) from books where category=x; select count(*) from articles where category=x and author=y; My point is it'll be nice if there is an easier mechanism ala CREATE VIEW as a shortcut (and more!) for 'select x from y where z = a;' The syntax might look like: CREATE AGGREGATE INDEX t1_sum on t1 (a, b, c); which would create the implicit triggers and table. The hard part is for postgres to have a rule rewrite system capable of converting the queries. (Perhaps we'll get this when views-with-aggregate-columns bug is fixed?) Of course, the performance gain can be achieved if you manually rewrite your queries to take advantage of the summary table (or aggregate index, similar to the normal index speeding up ranged lookups). And, oh, the rules/triggers: (I used these, and they work great for some of the tests I did, but I haven't fully debugged these for all cases, but they definately have bugs 1 & 2 described above. ) -- 8<- Cut here - CREATE TABLE t_sum ( a INTEGER, b INTEGER, c INTEGER, cnt INTEGER, PRIMARY KEY (a, b, c) ); CREATE RULE t_sum_add_rule AS ON INSERT TO t_sum WHERE EXISTS (SELECT cnt FROM t_sum WHERE a = new.a and b = new.b and c = new.c) DO INSTEAD UPDATE t_sum set cnt = cnt + 1 WHERE a = new.a and b = new.b and c = new.c; CREATE RULE t_insert_rule AS ON INSERT TO t DO INSERT INTO t_sum values (new.a, new.b, new.c, 1); CREATE FUNCTION "t_sum_upd" ( ) RETURNS opaque AS ' begin update t_sum set cnt = cnt - 1 where t_sum.a = old.a and t_sum.b = old.b and t_sum.c = old.c; insert into t_sum values (new.a, new.b, new.c); return new; end;' LANGUAGE 'plpgsql'; CREATE TRIGGER "t_upd" BEFORE UPDATE ON "t" FOR EACH ROW EXECUTE PROCEDURE "t_sum_upd" (); CREATE FUNCTION "t_sum_del" ( ) RETURNS opaque AS ' begin update t_sum set cnt = cnt - 1 where t_sum.a = old.a and t_sum.b = old.b and t_sum.c = old.c; return old; end;' LANGUAGE 'plpgsql'; CREATE TRIGGER "t_del" BEFORE DELETE ON "t" FOR EACH ROW EXECUTE PROCEDURE "t_sum_del" (); -- 8<- Cut here - P.S. This post is inspired when someone mentioned on the list that a separate counter might be kept by postgres to speed up some aggregate functions like select count(*) from t; P.P.S. Curious how do the commercial RDBMS handle this: select count(*) from people where gender='m'; when people contains one million rows and gender distribution is NEARLY 50% male/female?
Re: [SQL] Aggregate functions, fast! (long)
On Wed, Aug 09, 2000 at 02:53:45PM +0800, Ang Chin Han wrote: > The queries making use of these might be rewritten as: > 1. select min(a) from t_sum; -- same as above, > -- but we've less rows to scan > 2. select cnt from t_sum where a = 1 and b = 3; Sorry, bugfix: select sum(cnt) from t_sum where a = 1 and b = 3; since c is not referenced. > 3. select sum(d * cnt) as sum from t_sum where a = 1 and c > 3; > 4. select sum(b * cnt) / cnt as avg from t_sum where c = 1;
Re: [SQL] Rules aren't doing what I expect
On Wed, Aug 09, 2000 at 12:04:13PM -0400, Mark Volpe wrote: > I have a table with a trigger that can potentially modify a row before it gets > inserted or updated: [snip] > I have another table that tracks changes in the first table with rules: AFAIK, rules get rewritten first, before triggers are invoked, so your rules are getting the values before your trigger changes them. > The t1_log table doesn't show what was actually inserted into t1! > Are there any changes I can make to the logic above so that t1_log can > show the correct value? Either somehow rewrite your trigger as a rule, or stick another trigger to change the value before getting into your log table. You might be able to reuse your trigger function, I think, just point the trigger to that function. A bit inefficient, since it gets called twice.