Re: [SQL] iceberg queries

2003-02-04 Thread Christoph Haller

 Does PostgreSQL optimizer handle iceberg queries well?

What do you mean by iceberg query ?
I've never heard this term.

Regards, Christoph



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [SQL] iceberg queries

2003-02-04 Thread Wei Weng
It is a query that looks like

SELECT target1, target2... targetn, SUN(t.qty)
FROM Table t
GROUP BY target1
HAVING SUM(t.qty)=10

You can replace SUM(t.qty)=10 with other aggregate constraints.




- Original Message - 
From: Christoph Haller 
To: [EMAIL PROTECTED] 
Cc: [EMAIL PROTECTED] 
Sent: Tuesday, February 04, 2003 3:39 AM
Subject: Re: [SQL] iceberg queries



 Does PostgreSQL optimizer handle iceberg queries well?

What do you mean by iceberg query ?
I've never heard this term.

Regards, Christoph


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [SQL] iceberg queries

2003-02-04 Thread Bruno Wolff III
On Tue, Feb 04, 2003 at 09:08:56 -0500,
  Wei Weng [EMAIL PROTECTED] wrote:
 It is a query that looks like
 
 SELECT target1, target2... targetn, SUN(t.qty)
 FROM Table t
 GROUP BY target1
 HAVING SUM(t.qty)=10
 
 You can replace SUM(t.qty)=10 with other aggregate constraints.

There were some recent changes to allow groups to use hashes that may
help for queries like this. This can save a sort step or using an
index scan.

In theory you might be able to make some other speed ups by taking advantage
of properties of specific aggregate functions (in particular that several
common ones monotonicly increase or decrease as they are being calculated)
and if doing an index scan on the fields used for group you might be able to
skip a lot of rows. I expect that this situation would come up pretty rarely
though.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [SQL] iceberg queries

2003-02-04 Thread Jan Wieck
Christoph Haller wrote:
 
 
  Does PostgreSQL optimizer handle iceberg queries well?
 
 What do you mean by iceberg query ?
 I've never heard this term.

Iceberg queries compute one or more aggregate functions to find
aggregate values above a specified threshold. A typical iceberg query
would be

SELECT a, count(a)
FROM tab
GROUP BY a
HAVING count(a) = 100;

This base form can easily be made more complicated by doing self joins
and the like. This type of query is often found in market research, data
warehousing and search engines.

As to the original question, if an index is available that returns the
rows in the sort order of the GROUP BY clause, PostgreSQL defaults to an
index scan, otherwise it will do a sort of the rows matching an optional
WHERE clause. This sorted set is then grouped and aggregated and
filtered by the HAVING clause after aggregation.

It is well known that this approach does not scale well for large data
sets. But in contrast to a specialized statistical software, PostgreSQL
has to answer the query precisely. So sampling or bucket methods aren't
options.


Jan

-- 
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [SQL] iceberg queries

2003-02-04 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 As to the original question, if an index is available that returns the
 rows in the sort order of the GROUP BY clause, PostgreSQL defaults to an
 index scan, otherwise it will do a sort of the rows matching an optional
 WHERE clause. This sorted set is then grouped and aggregated and
 filtered by the HAVING clause after aggregation.

Note that as of 7.4, the planner will probably pick hashed aggregation
rather than sort-based aggregation, if it can predict that the number
of groups will not be too large for a hash table to fit in memory.
This means we can do a seqscan (or, perhaps, an indexscan to match
WHERE conditions) and avoid sorting.  So I expect performance on this
type of query to be a good deal better in 7.4.  There are a few
benchmark comparisons in the pghackers archives a couple months back.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [SQL] iceberg queries

2003-02-04 Thread Jan Wieck
Tom Lane wrote:
 
 Jan Wieck [EMAIL PROTECTED] writes:
  As to the original question, if an index is available that returns the
  rows in the sort order of the GROUP BY clause, PostgreSQL defaults to an
  index scan, otherwise it will do a sort of the rows matching an optional
  WHERE clause. This sorted set is then grouped and aggregated and
  filtered by the HAVING clause after aggregation.
 
 Note that as of 7.4, the planner will probably pick hashed aggregation
 rather than sort-based aggregation, if it can predict that the number
 of groups will not be too large for a hash table to fit in memory.
 This means we can do a seqscan (or, perhaps, an indexscan to match
 WHERE conditions) and avoid sorting.  So I expect performance on this
 type of query to be a good deal better in 7.4.  There are a few
 benchmark comparisons in the pghackers archives a couple months back.
 
 regards, tom lane

If it can predict.

I guess the question was asked because one expects performance problems.
From that I conclude that the amount of data is significant in this
particular case. That does not necessarily but usually mean a large
number of unique groups.


Jan

-- 
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html