Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-02-02 Thread Kevin Brown
Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
  For example, the following query is not possible to 
  workaround in PostgreSQL:
 
  select teams_desc.team_id, team_name, team_code, notes,
  min(teams_tree.treeno) as lnode, max(teams_tree.treeno) as rnode,
  parent.team_id as parent_id, count(*)/2 as tlevel
  from teams_desc JOIN teams_tree USING (team_id)
  join teams_tree parent ON parent.treeno  teams_tree.treeno
  join teams_tree parents on parents.treeno  teams_tree.treeno
  WHERE parent.treeno = (SELECT max(p1.treeno) from teams_tree p1
  where p1.treeno  teams_tree.treeno
  and exists (select treeno from teams_tree p2
  where p2.treeno  teams_tree.treeno
  and p2.team_id = p1.team_id))
  AND EXISTS (select parents2.team_id from teams_tree parents2
  where parents2.treeno  teams_tree.treeno
  AND parents2.team_id = parents.team_id)
  group by teams_desc.team_id, team_name, team_code, notes, parent.team_id;
 
  While one would hardly expect the above query to be fast, it is dissapointing
  that it takes about 8-10 times as long to execute on PostgreSQL as on MSSQL, 
  since MSSQL seems to be able to use indexes to evaluate all three MIN() and 
  MAX() expressions.
 
 I think you are leaping to conclusions about why there's a speed
 difference.  Or maybe I'm too dumb to see how an index could be used
 to speed these min/max operations --- but I don't see that one would
 be useful.  Certainly not an index on treeno alone.  Would you care to
 explain exactly how it's done?

Intuitively, it seems that an index on treeno is exactly what would
make the difference -- but min() and max() have to be smart enough to
use them when necessary.

I have a strong suspicion that min() and max() in MSSQL and other
databases are integrated into the parser, planner, and executor
directly.  It's the only way I can think of that would make it
possible for those functions to make use of indexes and other
advantages to the fullest extent possible.  For instance, in the above
query, the max() operation in the subselect would tell the planner and
executor to use the index on p1.treeno for two comparisons
simultaneously: p1.treeno  teams_tree.treeno and max(p1.treeno).
That means that the executor would descend the tree of the (btree)
index and instead of just comparing whether a branch is less than
teams_tree.treeno and following *all* of the branches that qualify, it
would follow the *largest* branch that qualified and nothing else.
That's a very significant optimization of the search, because instead
of eliminating an average of 50% of the branches to follow at each
node, it eliminates all but one.  But it's not something that a naive
aggregate function would be able to do: the min() and max() aggregates
would (I expect) have to become first class objects in the parser,
planner, and executor just as the WHERE clause and its conditions are.

There may be other aggregate functions that can make use of indexes to
the same extent that min() and max() should be able to, but I don't
know what they are offhand, and I certainly doubt that they would be
used nearly as often as min() and max().


Even with our type system, I'd think that min() and max() would be
relatively straightforward as first class objects (well, as
straightforward as any first class object that gets implemented in all
three stages, at any rate!): they work efficiently (that is, can use
an index scan) when the column in question has a btree index on it,
and fall back to sequential scans (and use the appropriate operator,
 or ) when the column in question doesn't.  It might even be
reasonable to allow a type to overload these functions so that the
planner and executor use the type-provided functions when available
(with the limitation that such type-provided functions would always
require a sequential scan as they do now) and fall back to the builtin
ones when the type doesn't provide them.  I imagine this might
complicate the parser, planner, and executor quite a bit, however.

So the interesting question that arises from the above is: are there
any types that define a min() and max() but which *do not* define 
and ?  I can't think of such types myself but can imagine that some
esoteric data types might qualify.  For the purposes of optimizing the
common case, however, such esoteric types could easily be ignored, but
it's for their sake that it would be useful to be able to use a
type-defined function in place of min() or max().



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-02-02 Thread Josh Berkus
Tom,

 In the end, the only reasonable way to handle this kind of thing is
 to teach the query planner about it.  Considering the small number
 of cases that are usefully optimizable (basically only MIN and MAX
 on a single table without any WHERE or GROUP clauses), and the ready
 availability of a SQL-level workaround, it strikes me as a very
 low-priority TODO item.

Low priority for you, Tom.  For some of us, it's one of the three most 
high-priority bugs in PostgreSQL.

I constantly try to sell my clients, and potential clients, on PostgreSQL.  
And the two things that trip me up the most frequently are lack of 
replication and our dog-slow aggregates.  I can usually sell Postgres on our 
strong points, but the aggregate issue is *always* a problem.   And the slow 
aggregate problem comes up about twice a week on Performance and three times 
a week on SQL.

Regardless of the technical reason, among MSSQL, Oracle, MySQL and PostgreSQL, 
we have the slowest performing simple aggregates.  It's very well to explain 
this is due to our system of extensible aggregates, but if a potential 
Postgres developer doesn't want to create custom aggregates, but does want to 
use MIN() in a correlated subquery, then they will go to a different RDBMS.

As I said before, I'm absolutely thrilled that you came up with a solution for 
COUNT(*) ... GROUP BY queries through Hash Aggregates.   That's half the 
picture, now we need a way to speed up MIN() and MAX() for simple one-column 
expressions.   While there is a workaround using ORDER BY  LIMIT, this 
doesn't work for correlated subqueries or if one wants to evaluate the result 
of MAX() in the query.  For example, the following query is not possible to 
workaround in PostgreSQL:

select teams_desc.team_id, team_name, team_code, notes,
min(teams_tree.treeno) as lnode, max(teams_tree.treeno) as rnode,
parent.team_id as parent_id, count(*)/2 as tlevel
from teams_desc JOIN teams_tree USING (team_id)
join teams_tree parent ON parent.treeno  teams_tree.treeno
join teams_tree parents on parents.treeno  teams_tree.treeno
WHERE parent.treeno = (SELECT max(p1.treeno) from teams_tree p1
where p1.treeno  teams_tree.treeno
and exists (select treeno from teams_tree p2
where p2.treeno  teams_tree.treeno
and p2.team_id = p1.team_id))
AND EXISTS (select parents2.team_id from teams_tree parents2
where parents2.treeno  teams_tree.treeno
AND parents2.team_id = parents.team_id)
group by teams_desc.team_id, team_name, team_code, notes, parent.team_id;

While one would hardly expect the above query to be fast, it is dissapointing 
that it takes about 8-10 times as long to execute on PostgreSQL as on MSSQL, 
since MSSQL seems to be able to use indexes to evaluate all three MIN() and 
MAX() expressions.

Further, assigning such a common query function to a Postgres-specific 
workaround hardly upholds our project's dedication to standards.   The fact 
that we are telling new users to use non-SQL-compliant code to do a query 
type present in 90% of databases bothers me every single time I give a newbie 
that advice.

It still seems to me that if a query's WHERE expression can be evaluated using 
an index, then any related MIN() or MAX() expression should be evaluable 
using an index.   That is, if you are selecting:
SELECT MAX(team_id) FROM teams WHERE team_id BETWEEN 100 and 200;
... with an index on team_id then this entire query should be able to return 
trough an index scan.   We've discussed the particular planner problems this 
presents for PostgreSQL, but I still believe that these are solvable ... and 
moreover, that we *need* to solve them if we're going to be competitive with 
other SQL RDBMSes.

I do realize that it's my job to find something to do about this issue since 
I'm the one so worked up about it.   What I'm concerned about is the 
possibility of having any idea or fix I come up with dismissed out of hand 
because it's a low-priority todo.   Please add up the questions and 
complaints of the users on SQL, NOVICE, and PERFORMANCE ... I know you read 
them.

Thanks for reading, Tom.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-02-02 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 For example, the following query is not possible to 
 workaround in PostgreSQL:

 select teams_desc.team_id, team_name, team_code, notes,
 min(teams_tree.treeno) as lnode, max(teams_tree.treeno) as rnode,
 parent.team_id as parent_id, count(*)/2 as tlevel
 from teams_desc JOIN teams_tree USING (team_id)
 join teams_tree parent ON parent.treeno  teams_tree.treeno
 join teams_tree parents on parents.treeno  teams_tree.treeno
 WHERE parent.treeno = (SELECT max(p1.treeno) from teams_tree p1
   where p1.treeno  teams_tree.treeno
   and exists (select treeno from teams_tree p2
   where p2.treeno  teams_tree.treeno
   and p2.team_id = p1.team_id))
 AND EXISTS (select parents2.team_id from teams_tree parents2
   where parents2.treeno  teams_tree.treeno
   AND parents2.team_id = parents.team_id)
 group by teams_desc.team_id, team_name, team_code, notes, parent.team_id;

 While one would hardly expect the above query to be fast, it is dissapointing
 that it takes about 8-10 times as long to execute on PostgreSQL as on MSSQL, 
 since MSSQL seems to be able to use indexes to evaluate all three MIN() and 
 MAX() expressions.

I think you are leaping to conclusions about why there's a speed
difference.  Or maybe I'm too dumb to see how an index could be used
to speed these min/max operations --- but I don't see that one would
be useful.  Certainly not an index on treeno alone.  Would you care to
explain exactly how it's done?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-02-02 Thread Josh Berkus
Tom,

 I think you are leaping to conclusions about why there's a speed
 difference.  Or maybe I'm too dumb to see how an index could be used
 to speed these min/max operations --- but I don't see that one would
 be useful.  Certainly not an index on treeno alone.  Would you care to
 explain exactly how it's done?

If I knew that, I'd have proposed a patch already, yes?

I'm working on it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-02-01 Thread Bruno Wolff III
On Sat, Feb 01, 2003 at 15:21:24 -0500,
  Greg Stark [EMAIL PROTECTED] wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
 That just means you need some way for aggregates to declare which records they
 need. The only values that seem like they would be useful would be first
 record last record and all records. Possibly something like all-nonnull
 records for things like count(), but that might be harder.

I don't see how this is going to be all that useful for aggregates in general.
min and max are special and it is unlikely that you are going to get much
speed up for general aggregate functions. For the case where you really
only need to scan a part of the data (say skipping nulls when nearly all
of the entries are null), a DBA can add an appropiate partial index and
where clause. This will probably happen infrequently enough that adding
special checks for this aren't going to pay off.

For min and max, it seems to me that putting special code to detect these
functions and replace them with equivalent subselects in the case where
an index exists (since a sort is worse than a linear scan) is a possible
long term solution to make porting easier.

In the short term education is the answer. At least the documentation of the
min and max functions and the FAQ, and the section with performance tips
should recommend the alternative form if there is an appropiate index.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-01-31 Thread Sean Chittenden
  I have a table which is very large (~65K rows). I have
  a column in it which is indexed, and I wish to use for
  a join. I'm finding that I'm using a sequential scan
  for this when selecting a MIN.
 
 Due to Postgres' system of extensible aggregates (i.e. you can write
 your own aggregates), all aggregates will trigger a Seq Scan in a
 query.  It's a known drawrback that nobody has yet found a good way
 around.

I've spent some time in the past thinking about this, and here's the
best idea that I can come up with:

Part one: setup an ALTER TABLE directive that allows for the
addition/removal of cached aggregates.  Ex:

  ALTER TABLE tab1 ADD AGGREGATE CACHE ON count(*);
  ALTER TABLE tab1 ADD AGGREGATE CACHE ON sum(col2);
  ALTER TABLE tab1 ADD AGGREGATE CACHE ON sum(col2) WHERE col2  100;
  ALTER TABLE tab1 ADD AGGREGATE CACHE ON sum(col2) WHERE col2 = 100;


Which would translate into some kind of action on a pg_aggregate_cache
catalog:

aggregate_cache_oid   OID   -- OID for the aggregate cache
aggregate_table_oid   OID   -- table OID
ins_aggfn_oid OID   -- aggregate function id for inserts
upd_aggfn_oid OID   -- aggregate function id for updates
del_aggfn_oid OID   -- aggregate function id for deletes
cache_value   INT   -- the value of the cache
private_data  INT[4]   -- temporary data space for needed
   -- data necessary to calculate cache_value
   -- four is just a guesstimate for how much
   -- space would be necessary to calculate
   -- the most complex of aggregates
where_clause  ???   -- I haven't the faintest idea how to
-- express some kind of conditional like this


Part two: setup a RULE or TRIGGER that runs on INSERT, UPDATE, or
DELETE.  For the count(*) exercise, the ON UPDATE would be a no-op.
For ON INSERT, the count(*) rule would have to do something like:

UPDATE pg_catalog.pg_aggregate_cache SET cached_value = (cached_value + 1)
   WHERE aggregate_cache_oid = 111;


For the sum(col2) aggregate cache, the math is a little more complex,
but I think it's quite reasonable given that it obviates a full table
scan.  For an insert:

UPDATE pg_catalog.pg_aggregate_cache SET cached_value =
   ((cached_value * private_data[0] + NEW.col2) / (private_data[0] + 1))
   WHERE aggregate_cache_oid = 112;


Now, there are some obvious problems:

1) avg requires a floating point return value, therefore an INT may
   not be an appropriate data type for cache_value or private_data.

2) aggregate caching wouldn't speed up anything but full table
   aggregates or regions of a column that are frequently needed.

3) all of the existing aggregates would have to be updated to include
   an insert, update, delete procedure (total of 60 aggregates, but
   only 7 by name).

4) the planner would have to be taught how to use/return values from
   the cache.

5) Each aggregate type makes use of the private_data column
   differently.  It's up to the cached aggregate function authors to
   not jumble up their private data space.

6) I don't know of a way to handle mixing of floating point numbers
   and integers.  That said, there's some margin of error that could
   creep into the floating point calculations such as avg.


And some benefits:

1) You only get caching for aggregates that you frequently use
   (sum(col2), count(*), etc.).

2) Aggregate function authors can write their own caching routines.

3) For tens of millions of rows, it can be very time consuming to
   sum() fifty million rows, but it's easy to amortize the cost of
   updating the cache on insert, update, delete over the course of a
   month.

4) If an aggregate cache definition isn't setup, it should be easy for
   the planner to fall back to a full table scan, as it currently is.


This definitely would be a performance boost and something that would
only be taken advantage of by DBAs that are intentionally performance
tuning their database, but for those that do, it could be a massive
win.  Thoughts?  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-01-31 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes:
 Now, there are some obvious problems:

You missed the real reason why this will never happen: it completely
kills any prospect of concurrent updates.  If transaction A has issued
an update on some row, and gone and modified the relevant aggregate
cache entries, what happens when transaction B wants to update another
row?  It has to wait for A to commit or not, so it knows whether to
believe A's changes to the aggregate cache entries.

For some aggregates you could imagine an 'undo' operator to allow
A's updates to be retroactively removed even after B has applied its
changes.  But that doesn't work very well in general.  And in any case,
you'd have to provide serialization interlocks on physical access to
each of the aggregate cache entries.  That bottleneck applied to every
update would be likely to negate any possible benefit from using the
cached values.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] [PERFORM] not using index for select min(...)

2003-01-31 Thread Sean Chittenden
  Now, there are some obvious problems:
 
 You missed the real reason why this will never happen: it completely
 kills any prospect of concurrent updates.  If transaction A has
 issued an update on some row, and gone and modified the relevant
 aggregate cache entries, what happens when transaction B wants to
 update another row?  It has to wait for A to commit or not, so it
 knows whether to believe A's changes to the aggregate cache entries.

I never claimed it was perfect, :) but it'd be is no worse than a
table lock.  For the types of applications that this would be of
biggest use to, there would likely be more reads than writes and it
wouldn't be as bad as one could imagine.  A few examples:

# No contension
Transaction A begins
Transaction A updates tab1
Transaction B begins
Transaction B updates tab1
Transaction B commits
Transaction A commits

# contension
Transaction A begins
Transaction A updates tab1
Transaction B begins
Transaction B updates tab1
Transaction A commits
Transaction B commits

This is just about the only case that I can see where there would be
contension.  In this case, transaction B would have to re-run its
trigger serially.  In the worse case scenario:

Transaction A begins
Transaction A updates tab1
Transaction B begins
Transaction B updates tab1
Transaction A commits
Transaction B selects
Transaction B updates tab1 again
Transaction B commits

In my journals or books I haven't found any examples of a transaction
based cache that'd work any better than this.  It ain't perfect, but,
AFAICT, it's as good as it's going to get.  The only thing that I
could think of that would add some efficiency in this case would be to
have transaction B read trough the committed changes from a log file.
After a threshold, it could be more efficient than having transaction
B re-run its queries.

Like I said, it ain't perfect, but what would be a better solution?
::shrug:: Even OODB's with stats agents have this problem (though
their overhead for doing this kind of work is much much lower).  -sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]