Re: No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-23 Thread Taral
On Mon, Mar 17, 2003 at 11:23:47AM -0600, Taral wrote:
 Yes, that's exactly it. It's an index _scan_. It should simply be able
 to read the maximum straight from the btree.

Still doesn't work, even with rewritten query. It sort a
Limit(Sort(Index Scan)), with 1333 rows being pulled from the index.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-17 Thread Taral
On Sat, Mar 15, 2003 at 09:23:28AM -0600, Bruno Wolff III wrote:
 On Fri, Mar 14, 2003 at 14:19:46 -0600,
   Taral [EMAIL PROTECTED] wrote:
  Same setup, different query:
  
  test= explain select max(time) from test where id = '1';
  NOTICE:  QUERY PLAN:
  
  Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
-  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)
  
  Since the index is (id, time), why isn't the index being used to
  retrieve the maximum value?
 
 It looks like an index scan is being done.
 
 If the index was on (time, id) instead of (id, time), then you could get
 a further speed up by rewriting the query as:
 select time from test where id = '1' order by time desc limit 1;

Yes, that's exactly it. It's an index _scan_. It should simply be able
to read the maximum straight from the btree.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-17 Thread Bruno Wolff III
On Mon, Mar 17, 2003 at 11:23:47 -0600,
  Taral [EMAIL PROTECTED] wrote:
 On Sat, Mar 15, 2003 at 09:23:28AM -0600, Bruno Wolff III wrote:
  On Fri, Mar 14, 2003 at 14:19:46 -0600,
Taral [EMAIL PROTECTED] wrote:
   Same setup, different query:
   
   test= explain select max(time) from test where id = '1';
   NOTICE:  QUERY PLAN:
   
   Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
 -  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)
   
   Since the index is (id, time), why isn't the index being used to
   retrieve the maximum value?
  
  It looks like an index scan is being done.
  
  If the index was on (time, id) instead of (id, time), then you could get
  a further speed up by rewriting the query as:
  select time from test where id = '1' order by time desc limit 1;
 
 Yes, that's exactly it. It's an index _scan_. It should simply be able
 to read the maximum straight from the btree.

max and min don't use indexes. They are generic aggregate functions and
postgres doesn't have the special knowledge to know that for those
aggregate functions and index can be used. You can get around this
by rewriting the query as I previously indicated.

For more details on why things are this way, search the archives. This
topic comes up a lot.

I was also mistaken about have to switch the index around for this case.
It should work the way you have it (if you rewrite the query).

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-15 Thread Bruno Wolff III
On Fri, Mar 14, 2003 at 14:19:46 -0600,
  Taral [EMAIL PROTECTED] wrote:
 Same setup, different query:
 
 test= explain select max(time) from test where id = '1';
 NOTICE:  QUERY PLAN:
 
 Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
   -  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)
 
 Since the index is (id, time), why isn't the index being used to
 retrieve the maximum value?

It looks like an index scan is being done.

If the index was on (time, id) instead of (id, time), then you could get
a further speed up by rewriting the query as:
select time from test where id = '1' order by time desc limit 1;

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


No index maximum? (was Re: [HACKERS] No merge sort?)

2003-03-14 Thread Taral
Same setup, different query:

test= explain select max(time) from test where id = '1';
NOTICE:  QUERY PLAN:

Aggregate  (cost=5084.67..5084.67 rows=1 width=0)
  -  Index Scan using idx on test  (cost=0.00..5081.33 rows=1333 width=0)

Since the index is (id, time), why isn't the index being used to
retrieve the maximum value?

On Thu, Mar 13, 2003 at 03:10:49PM -0600, Taral wrote:
 I have a table test that looks like this:
 
 CREATE TABLE test (
 id BIGINT,
 time INTEGER
 );
 
 There is an index:
 
 CREATE INDEX idx ON test(id, time);
 
 The table has been loaded with 2M rows, where time ranges sequentially
 from 0 to 199 and id is random values from 0 to 4.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: [HACKERS] No merge sort?

2003-03-14 Thread Tom Lane
Taral [EMAIL PROTECTED] writes:
 On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote:
 The idea is you look at the index to make a list of main-table tuple
 positions you are interested in, which you represent compactly as a
 compressed bitmap.  [snip]

 And it loses bigtime in the case of LIMIT. If the unlimited query
 returns 4,000 records and I only want 20, you're retrieving 200x too
 much data from disk.

Sure.  That's why we have a planner that distinguishes between startup
cost and total cost, and interpolates when a LIMIT is involved.  But
if this mergesort idea only helps for small-limit cases, that's another
restriction on its scope of usefulness...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] No merge sort?

2003-03-14 Thread Taral
On Fri, Mar 14, 2003 at 10:43:30PM -0500, Tom Lane wrote:
 Sure.  That's why we have a planner that distinguishes between startup
 cost and total cost, and interpolates when a LIMIT is involved.  But
 if this mergesort idea only helps for small-limit cases, that's another
 restriction on its scope of usefulness...

I don't think so, since even in the non-limit case it avoids having to
do a full sort if the number of initial streams is finite and small (as
in the case I demonstrated), reducing time complexity to O(N).

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


[HACKERS] No merge sort?

2003-03-13 Thread Taral
I tried general, but no response. Anyone here can shed some light on the
issue? Do I need to code merge sort into postgresql?

- Forwarded message from Taral [EMAIL PROTECTED] -

From: Taral [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Date: Wed, 12 Mar 2003 17:54:35 -0600
Subject: [GENERAL] No merge sort?
Message-ID: [EMAIL PROTECTED]

I have a table test that looks like this:

CREATE TABLE test (
id BIGINT,
time INTEGER
);

There is an index:

CREATE INDEX idx ON test(id, time);

The table has been loaded with 2M rows, where time ranges sequentially
from 0 to 199 and id is random values from 0 to 4.

This query:

SELECT * FROM idx WHERE id IN (...) AND time  198000 AND time  199800
ORDER BY time DESC LIMIT 20;

has an EXPLAIN ANALYZE of:

Limit  (cost=3635.28..3635.28 rows=20 width=12) (actual time=22.94...22.96 rows=14 
loops=1)
  -  Sort  (cost=3635.28..3635.28 rows=23 width=12) (actual time=22.93..22.93 rows=14 
loops=1)
-  Index Scan using idx, idx, ..., idx, idx on test  (cost=0.00...3634.77 
rows=23 width=12) (actual time=1.01..22.10 rows=14 loops=1)
Total runtime: 29.12 msec

This query:

SELECT * FROM idx WHERE id IN (...) AND time  199800 ORDER BY time DESC
LIMIT 20;

has an EXPLAIN ANALYZE of:

Limit  (cost=14516.46..14516.46 rows=20 width=12) (actual time=1448..83..1448.86 
rows=20 loops=1)
  -  Sort  (cost=14516.46..14516.46 rows=2527 width=12) (actual time=1448.82..1448.83 
rows=21 loops=1)
-  Index Scan using idx, idx, ..., idx, idx on test  (cost=0.00...14373.67 
rows=2527 width=12) (actual time=0.14..1437.33 rows=2048 loops=1)
Total runtime: 1454.62 msec

Since the index will output 'time' sorted data for each 'id', why isn't
a merge sort being used here? A merge sort would reduce the execution
time back to 30 ms.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: [HACKERS] No merge sort?

2003-03-13 Thread Tom Lane
Taral [EMAIL PROTECTED] writes:
 Do I need to code merge sort into postgresql?

Seems like a waste of effort to me.  I find this example less than
compelling --- the case that could be sped up is quite narrow,
and the potential performance gain not all that large.  (A sort
is a sort however you slice it, with O(N log N) runtime...)

Also, the direction we'd likely be going in in future is to merge
multiple indexscans using bitmap techniques, so that the output
ordering of the scans couldn't be counted on anyway.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] No merge sort?

2003-03-13 Thread Taral
On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote:
 Seems like a waste of effort to me.  I find this example less than
 compelling --- the case that could be sped up is quite narrow,
 and the potential performance gain not all that large.  (A sort
 is a sort however you slice it, with O(N log N) runtime...)

Actually, it's O(N) time. The index can produce time sorted data for
each id in linear time, and the merge sort can merge them in linear
time. Also, the existing system insists on loading _all_ candidate rows
whereas this method can benefit from the limit.

If you don't want to code it, I will. I need it for the livejournal
mysql-postgresql transition. (No, mysql doesn't do it right either.)
But a few pointers to the right places to look in the code would be
helpful.

 Also, the direction we'd likely be going in in future is to merge
 multiple indexscans using bitmap techniques, so that the output
 ordering of the scans couldn't be counted on anyway.

I don't understand this. What do these bitmap techniques do?

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature


Re: [HACKERS] No merge sort?

2003-03-13 Thread Tom Lane
Taral [EMAIL PROTECTED] writes:
 On Thu, Mar 13, 2003 at 04:28:34PM -0500, Tom Lane wrote:
 Seems like a waste of effort to me.  I find this example less than
 compelling --- the case that could be sped up is quite narrow,
 and the potential performance gain not all that large.  (A sort
 is a sort however you slice it, with O(N log N) runtime...)

 Actually, it's O(N) time.

Only if you assume a fixed number of input streams.

 Also, the direction we'd likely be going in in future is to merge
 multiple indexscans using bitmap techniques, so that the output
 ordering of the scans couldn't be counted on anyway.

 I don't understand this. What do these bitmap techniques do?

The idea is you look at the index to make a list of main-table tuple
positions you are interested in, which you represent compactly as a
compressed bitmap.  (There is some finagling needed because PG actually
uses block/line number rather than a pure tuple number to identify
tuples, but you can fake it with a reasonably small amount of overhead.)
Then you can combine multiple index inputs by ANDing or ORing bitmaps
(the OR case applies to your example).  Finally, you traverse the heap,
accessing the desired rows in heap-location order.  This loses in terms
of producing presorted output --- but it often saves enough in I/O costs
to more than justify doing the sort in memory.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] No merge sort?

2003-03-13 Thread Taral
On Thu, Mar 13, 2003 at 10:30:27PM -0500, Tom Lane wrote:
 The idea is you look at the index to make a list of main-table tuple
 positions you are interested in, which you represent compactly as a
 compressed bitmap.  (There is some finagling needed because PG actually
 uses block/line number rather than a pure tuple number to identify
 tuples, but you can fake it with a reasonably small amount of overhead.)
 Then you can combine multiple index inputs by ANDing or ORing bitmaps
 (the OR case applies to your example).  Finally, you traverse the heap,
 accessing the desired rows in heap-location order.  This loses in terms
 of producing presorted output --- but it often saves enough in I/O costs
 to more than justify doing the sort in memory.

And it loses bigtime in the case of LIMIT. If the unlimited query
returns 4,000 records and I only want 20, you're retrieving 200x too
much data from disk.

-- 
Taral [EMAIL PROTECTED]
This message is digitally signed. Please PGP encrypt mail to me.
Most parents have better things to do with their time than take care of
their children. -- Me


pgp0.pgp
Description: PGP signature