Re: [SQL] LEAST and GREATEST functions?

2003-07-01 Thread Ang Chin Han
Greg Stark wrote:

MAX and MIN are single-parameter aggregate functions. LEAST and GREATEST are
two-parameter (though in postgres they could be defined for 3 and more
parameters) scalar functions.
If LEAST and GREATEST can accept any number of parameters, wouldn't it 
make sense to code it like the way COALESCE works, rather than defining 
a function for it? This way we don't need define all the various 
functions with different types.

e.g.

SELECT greatest(a, b, c) FROM bar;

becomes

SELECT greatest(a, greatest(b, c)) from bar;

becomes

SELECT
  CASE WHEN b < c
THEN
  CASE WHEN c < a
THEN a
ELSE c
  END
ELSE
  CASE WHEN b < a
THEN a
ELSE b
  END
  END
FROM bar;
From the docs:

COALESCE and NULLIF are just shorthand for CASE expressions. They are 
actually converted into CASE expressions at a very early stage of 
processing, and subsequent processing thinks it is dealing with CASE. 
Thus an incorrect COALESCE or NULLIF usage may draw an error message 
that refers to CASE.

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 
GNU/Linux
 11:30am  up 188 days,  2:35,  5 users,  load average: 5.19, 5.08, 5.02


pgp0.pgp
Description: PGP signature


Re: [SQL] LEAST and GREATEST functions?

2003-07-01 Thread Ang Chin Han
Tom Lane wrote:

But COALESCE is a special feature hard-wired into the parser.  There's
no free lunch --- you pay for your extensibility somewhere.
That's what I'm suggesting: hard-wiring LEAST and GREATEST into the 
parser. 7.5, maybe?

The question is: is it worth hard-wiring vs functions? (time passes) 
Doesn't seem to be in SQL92. It's in Oracle, Interbase and MySQL, 
though, says google.

I'd say we need to have LEAST and GREATEST at least somewhere in contrib 
(as functions) if not core, to make transition from other RDBMS to 
postgresql easier.

A brief test shows that we would incur quite a performance penalty (I 
compared COALESCE with coalesce_sql_function) if it isn't hardwiring.

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 
GNU/Linux
  1:30pm  up 188 days,  4:35,  4 users,  load average: 5.03, 5.06, 5.08


pgp0.pgp
Description: PGP signature


Re: [SQL] LEAST and GREATEST functions?

2003-07-02 Thread Ang Chin Han
Tom Lane wrote:

create function greatest(anyelement, anyelement) returns anyelement as
'select case when $1>$2 then $1 else $2 end' language sql;
Any chance of this making it into 7.4's contrib? Maybe with enough 
documentation to make it a tutorial for PostgreSQL's user functions?

You do have to create several greatest() functions for different numbers
of arguments, but not one for each datatype you want to handle.
Insignificant, compared with the flexiblity.

I have not seen enough requests for a native LEAST/GREATEST
implementation to make me think we need to do more than this...
certainly I'd rather spend development effort on general facilities
like polymorphism and inlining than on creating one-use facilities
like built-in LEAST/GREATEST.
Nice. It would speed up our current functions too. Thanks, developers, 
esp. Tom and Joe for this!

Wow, less reasons to use VIEWs when CREATE FUNCTION can be just as fast, 
but more flexible. Can't wait, IMHO, the advocacy people can and should 
be promoting this(functions returning sets, and how it can be used) as a 
killer feature for 7.3 and 7.4. I know I was pretty happy to discover 
that gem  lurking in the documentation in 7.3.

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 
GNU/Linux
  2:30pm  up 188 days,  5:35,  4 users,  load average: 5.04, 5.15, 5.16


pgp0.pgp
Description: PGP signature


Re: [SQL] Non-Blocking Locks (i.e. Oracle NOWAIT)

2003-07-15 Thread Ang Chin Han
Christoph Haller wrote:

PostgreSQL 7.3.2
T1: begin; select * from foo for update;
T2: set STATEMENT_TIMEOUT = 1000; -- milliseconds
Seems like setting it to 1 (ms) emulates the NOWAIT condition better. 
Can't set it to 0, though.

T2: select * from foo for update;
T2: ERROR:  Query was cancelled.
How picky are we in borrowing syntax and idiocyncracies (?) from other 
DBs? Since we can closely emulate Oracle's behaviour on SELECT ... FOR 
UPDATE NOWAIT, should we just add NOWAIT to the next (non-feature frozen 
version) of postgres?

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 
GNU/Linux
 10:30am  up 202 days,  1:35,  6 users,  load average: 5.40, 5.14, 5.05


pgp0.pgp
Description: PGP signature


[SQL] Hash Join not using hashed index?

2000-06-27 Thread Ang Chin Han

I'm using Postgresql 7.02.

==
# explain select city.name, country.name from country, city
where city.country_id = country.country_id;
NOTICE:  QUERY PLAN:
 
Hash Join  (cost=8.85..16.76 rows=75 width=18)
  ->  Seq Scan on city  (cost=0.00..1.75 rows=75 width=16)
  ->  Hash  (cost=5.53..5.53 rows=253 width=2)
->  Seq Scan on country  (cost=0.00..5.53 rows=253 width=2) 

EXPLAIN 
# create index country_id_idx on country using hash (country_id);
CREATE
# vacuum analyze;
VACUUM
# explain select city.name, country.name from country, city
where city.country_id = country.country_id;
NOTICE:  QUERY PLAN:
 
Hash Join  (cost=8.85..16.76 rows=75 width=18)
  ->  Seq Scan on city  (cost=0.00..1.75 rows=75 width=16)
  ->  Hash  (cost=5.53..5.53 rows=253 width=2)
->  Seq Scan on country  (cost=0.00..5.53 rows=253 width=2)
 
EXPLAIN
==

Why doesn't PostgreSQL use country_id_idx, but rather rehashing 
country_id?



Re: [SQL] Hash Join not using hashed index?

2000-07-03 Thread Ang Chin Han

On Wed, Jun 28, 2000 at 10:56:17AM -0400, Tom Lane wrote:
> Ang Chin Han <[EMAIL PROTECTED]> writes:
> If it was like that then a hash index wouldn't have been applicable
> anyway; hashes are only good for strict equality checks.  If you want
> something that can do ordering checks you need a btree index.
> 
> (There are good reasons why btree is the default index type ;-))

There _was_ a btree index, before I added the extra hash index:

pintoo=# \dcountry_pkey
 Index "country_pkey"
 Attribute  |   Type
+--
 country_id | smallint
unique btree (primary key)

> > Original cost est:
> > Hash Join  (cost=8.85..16.76 rows=75 width=18)
> >   -> Seq Scan on city  (cost=0.00..1.75 rows=75 width=16) 
> >   -> Hash  (cost=5.53..5.53 rows=253 width=2)
> >-> Seq Scan on country  (cost=0.00..5.53 rows=253 width=2) 
> 
> > I guess the problem is that country-city is a one-to-many relation,
> > BUT I've more countries than cities (note the # of rows above), thus
> > throwing the planner off...
> 
> Off what?  This looks like a pretty reasonable plan to me, given the
> fairly small table sizes.  Do you have evidence that another plan
> type would be quicker for this problem?

No evidence, but I was hoping that having a prehashed country_id
would speed things up a bit, since the seq scan on country could
be redundant, requring only a seq scan on city and a index (hash)
lookup on country.


Or maybe this is a related question (just curious):

pintoo=# explain select country_id from country order by country_id;
NOTICE:  QUERY PLAN:
 
Sort  (cost=15.63..15.63 rows=253 width=2)
  ->  Seq Scan on country  (cost=0.00..5.53 rows=253 width=2)

pintoo=# explain select name from country order by name;
NOTICE:  QUERY PLAN:
 
Sort  (cost=15.63..15.63 rows=253 width=12)
  ->  Seq Scan on country  (cost=0.00..5.53 rows=253 width=12)

If there is already in b-tree index on country_id, why bother
re-sorting it, when it could be output'd by traversing the tree?
Comparing with an unindexed column, we can see that the index
is not used at all.



Re: [SQL] Hash Join not using hashed index?

2000-07-03 Thread Ang Chin Han

On Wed, Jun 28, 2000 at 03:00:04AM -0400, Tom Lane wrote:
> Hash joins don't have anything to do with hash indexes.

> A hash join is a join that makes use of a temporary hashtable
> built on-the-fly *in memory* for that join.

Oh, I see.

> The planner could choose to use an indexscan on a hash index
> as an input for the join, but it'd only be likely to do so
> if there is a restriction clause matching the index.  In your
> example you have only a join WHERE clause.

Well, in my original query, there was, but the plan's the same.
Probably the clause wasn't restrictive enough (" and region < n").

Original cost est:
Hash Join  (cost=8.85..16.76 rows=75 width=18)
  ->  Seq Scan on city  (cost=0.00..1.75 rows=75 width=16) 
  ->  Hash  (cost=5.53..5.53 rows=253 width=2)
->  Seq Scan on country  (cost=0.00..5.53 rows=253 width=2) 

I guess the problem is that country-city is a one-to-many relation,
BUT I've more countries than cities (note the # of rows above), thus
throwing the planner off...

OTOH, what's bugging me is that Postgresql could have used
pre-generated hash index rather rebuilding it on the fly again.

> Plain btree indexes on city.country_id and country.country_id
> might work better --- at least they'd offer the option of
> a merge join without doing explicit sort.

I tried, and it did worse.


Hmmm... I think I'm better off creating a temporary table
to store the results, since the table is seldom updated
but that query is run often. Rules to update that temp. table, too, 
of course.

(cost is now 1.75, if anyone cares)



[SQL] Functions too slow, even with iscachable?

2000-08-07 Thread Ang Chin Han

I have a query which runs fast:

SELECT passwd FROM ticket WHERE name = ticket2name('test-006kdt') AND
survey_id = ticket2survey_id('test-006kdt');

But slows down to a crawl when I wrapped it in a function:

CREATE FUNCTION ticket2passwd(text) RETURNS text AS
'SELECT passwd FROM ticket WHERE name = ticket2name($1) AND
ticket2survey_id($1)' LANGUAGE 'sql' WITH (iscachable);

which should be a shortform for the first query:

SELECT ticket2passwd('test-006kdt');

Any ideas? Thanks in advance.


ticket2name and ticket2survey_id are both iscachable.


BTW, pg_dump doesn't seem to preserve the iscachable attribute. Bug?



Re: [SQL] Functions too slow, even with iscachable?

2000-08-08 Thread Ang Chin Han

On Mon, Aug 07, 2000 at 10:58:27AM -0400, Tom Lane wrote:

> (I assume the lack of "survey_id =" here is just a cut-and-paste error?)

Yup. :)

> I think what you're getting bit by is that the optimizer doesn't
> recognize "var = function(param)" as being a potential indexscan clause.
> Does EXPLAIN show that the first query is producing an indexscan plan?

It did. I'll try to make up a reproducible test case, if you need
it.

> I have not tried it, but I think you could get around this problem in
> plpgsql, along the lines of
>   tmp1 = ticket2name($1);
>   tmp2 = ticket2survey_id($1);
>   SELECT passwd FROM ticket WHERE name = tmp1 AND survey_id = tmp2;
> since the tmp vars will look like params to the optimizer and "var = param"
> is indexable.

Yup, it did work. Thanks!

> Looks like we need to teach the optimizer that expressions involving
> params can be treated like simple params for the purposes of
> optimization.

That'll be good. Anything to speed up the stored procedures are good:
encourage people to put logic processing into the RDBMS where it should
be.



[SQL] Aggregate functions, fast! (long)

2000-08-09 Thread Ang Chin Han

Apologies in advance for the length of this post, but this has
been bugging me for a week or so.

Consider a table with a primary key pk and a list of attributes a, b
and c:

  Table t
pk  a  b  c
---
 1  1  1  1
 2  1  2  3
 :   etc  :
  9998  1  1  1
    2  1  2

If the table is a large but the range of possible values for the
attributes small, aggregate queries such as the following with be
very slow:

  1.  select min(a) from t;
  2.  select count(*) from t where a = 1 and b = 3;
  3.  select sum(d) from t where a = 1 and c > 3;
  4.  select avg(b) from t where c = 1;
  
One way of speeding these type of queries is to have a table
summarizing that table t:

 Table t_sum
a  b  ccnt
--
1  1  1 2  (This is from pk 1 and 9998 having the same a,b,c)
1  2  3 1
2  1  2 1
:  : etc:
  with a primary key of (a, b, c) and an integer cnt counting
  the number of times a particular combination of (a, b, c)
  occuring in table t
  
The queries making use of these might be rewritten as:
  1.  select min(a) from t_sum; -- same as above, 
-- but we've less rows to scan
  2.  select cnt from t_sum where a = 1 and b = 3;
  3.  select sum(d * cnt) as sum from t_sum where a = 1 and c > 3;
  4.  select sum(b * cnt) / cnt as avg from t_sum where c = 1;

  (CAVEAT/BUG: 1. must return cnt = 0 if it doesn't exist in
  t_sum 
   2. rows with cnt = 0 will have to be deleted
  immediately or select min(foo) might return the
  wrong result) 
  
Now, t_sum can be automatically updated by triggers/rules (I'll
get into this later)

It might seem a bit pointless but I've made the example very
generic to show that this _is_ a generic class of problem.
Specific examples might include:
select count(*) from books where category=x;
select count(*) from articles where category=x and author=y;

My point is it'll be nice if there is an easier mechanism ala
CREATE VIEW as a shortcut (and more!) for
'select x from y where z = a;'

The syntax might look like:
  CREATE AGGREGATE INDEX t1_sum on t1 (a, b, c);
which would create the implicit triggers and table.

The hard part is for postgres to have a rule rewrite system
capable of converting the queries. (Perhaps we'll get this
when views-with-aggregate-columns bug is fixed?)

Of course, the performance gain can be achieved if you manually
rewrite your queries to take advantage of the summary table (or
aggregate index, similar to the normal index speeding up ranged
lookups).


And, oh, the rules/triggers: (I used these, and they work great
for some of the tests I did, but I haven't fully debugged these for
all cases, but they definately have bugs 1 & 2 described above.
)

-- 8<- Cut here -
CREATE TABLE t_sum (
a INTEGER,
b INTEGER,
c INTEGER,
cnt INTEGER,
PRIMARY KEY (a, b, c)
);

CREATE RULE t_sum_add_rule AS ON INSERT TO t_sum WHERE
EXISTS (SELECT cnt FROM t_sum WHERE
a = new.a and b = new.b and c = new.c)
  DO INSTEAD
UPDATE t_sum set cnt = cnt + 1 WHERE
a = new.a and b = new.b and c = new.c;

CREATE RULE t_insert_rule AS ON INSERT TO t
  DO
INSERT INTO t_sum values (new.a, new.b, new.c, 1);

CREATE FUNCTION "t_sum_upd" ( ) RETURNS opaque AS '
begin
update t_sum set cnt = cnt - 1
where t_sum.a = old.a
  and t_sum.b = old.b
  and t_sum.c = old.c;
insert into t_sum values (new.a, new.b, new.c);
return new;
end;' LANGUAGE 'plpgsql';

CREATE TRIGGER "t_upd" BEFORE UPDATE ON "t"
FOR EACH ROW EXECUTE PROCEDURE "t_sum_upd" ();

CREATE FUNCTION "t_sum_del" ( ) RETURNS opaque AS '
begin
update t_sum set cnt = cnt - 1
where t_sum.a = old.a
  and t_sum.b = old.b
  and t_sum.c = old.c;
return old;
end;' LANGUAGE 'plpgsql';
  
CREATE TRIGGER "t_del" BEFORE DELETE ON "t"
FOR EACH ROW EXECUTE PROCEDURE "t_sum_del" ();
-- 8<- Cut here -

P.S. This post is inspired when someone mentioned on the list that
a separate counter might be kept by postgres to speed up some
aggregate functions like select count(*) from t;

P.P.S. Curious how do the commercial RDBMS handle this:
select count(*) from people where gender='m';
when people contains one million rows and gender distribution is
NEARLY 50% male/female?




Re: [SQL] Aggregate functions, fast! (long)

2000-08-09 Thread Ang Chin Han

On Wed, Aug 09, 2000 at 02:53:45PM +0800, Ang Chin Han wrote:

> The queries making use of these might be rewritten as:
>   1.  select min(a) from t_sum; -- same as above, 
> -- but we've less rows to scan
>   2.  select cnt from t_sum where a = 1 and b = 3;

Sorry, bugfix:
select sum(cnt) from t_sum where a = 1 and b = 3;
  since c is not referenced.

>   3.  select sum(d * cnt) as sum from t_sum where a = 1 and c > 3;
>   4.  select sum(b * cnt) / cnt as avg from t_sum where c = 1;



Re: [SQL] Rules aren't doing what I expect

2000-08-09 Thread Ang Chin Han

On Wed, Aug 09, 2000 at 12:04:13PM -0400, Mark Volpe wrote:

> I have a table with a trigger that can potentially modify a row before it gets
> inserted or updated:

[snip]

> I have another table that tracks changes in the first table with rules:

AFAIK, rules get rewritten first, before triggers are invoked,
so your rules are getting the values before your trigger changes them.

> The t1_log table doesn't show what was actually inserted into t1!
> Are there any changes I can make to the logic above so that t1_log can
> show the correct value?

Either somehow rewrite your trigger as a rule, or stick another
trigger to change the value before getting into your log table.
You might be able to reuse your trigger function, I think, just
point the trigger to that function. A bit inefficient, since it
gets called twice.