subject:"\"\\\[HACKERS\\\] CLUSTER and indisclustered\""

Re: [HACKERS] CLUSTER and indisclustered

2002-08-12 Thread Hannu Krosing


On Tue, 2002-08-13 at 09:25, Bruce Momjian wrote:
> 
> There is a web page about "star joins" used a lot in data warehousing,
> where you don't know what queries are going to be required and what
> indexes to create:
> 
>   http://www.dbdomain.com/a100397.htm
> 
> They show some sample queries, which is good.  Here is some
> interesting text:
> 
>   Star Transformation
> 
>   If there are bitmap indexes on SALES_REP_ID, PRODUCT_ID, and
>   DEPARTMENT_ID in the SALES table, then Oracle can resolve the query
>   using merges of the bitmap indexes.
>   
>   Because Oracle can efficiently merge multiple bitmap indexes, you can 
>   create a single bitmap index on each of the foreign-key columns in the
>   fact table rather than on every possible combination of columns.

Another way to achive the similar result would be using segmented hash
indexes, where each column maps directly to some part of hash value.

> This
>   lets you support all possible combinations of dimensions without
>   creating an unreasonable number of indexes.

---
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-12 Thread Bruce Momjian



I wanted to comment on this bitmapped index discussion because I am
hearing a lot about star joins, data warehousing, and bitmapped indexes
recently.

It seems we have several uses for bitmapped indexes:

Do index lookups in sequential heap order
Allow joining of bitmapped indexes to construct arbitrary indexes

There is a web page about "star joins" used a lot in data warehousing,
where you don't know what queries are going to be required and what
indexes to create:

http://www.dbdomain.com/a100397.htm

They show some sample queries, which is good.  Here is some
interesting text:

Star Transformation

If there are bitmap indexes on SALES_REP_ID, PRODUCT_ID, and
DEPARTMENT_ID in the SALES table, then Oracle can resolve the query
using merges of the bitmap indexes.

Because Oracle can efficiently merge multiple bitmap indexes, you can 
create a single bitmap index on each of the foreign-key columns in the
fact table rather than on every possible combination of columns. This
lets you support all possible combinations of dimensions without
creating an unreasonable number of indexes.

Added to TODO:

* Use bitmaps to fetch heap pages in sequential order [performance] 
* Use bitmaps to combine existing indexes [performance]

and I will add some of these emails to TODO.detail/performance.

---

Tom Lane wrote:
> Curt Sampson <[EMAIL PROTECTED]> writes:
> > But after doing some benchmarking of various sorts of random reads
> > and writes, it occurred to me that there might be optimizations
> > that could help a lot with this sort of thing. What if, when we've
> > got an index block with a bunch of entries, instead of doing the
> > reads in the order of the entries, we do them in the order of the
> > blocks the entries point to?
> 
> I thought to myself "didn't I just post something about that?"
> and then realized it was on a different mailing list.  Here ya go
> (and no, this is not the first time around on this list either...)
> 
> 
> I am currently thinking that bitmap indexes per se are not all that
> interesting.  What does interest me is bitmapped index lookup, which
> came back into mind after hearing Ann Harrison describe how FireBird/
> InterBase does it.
> 
> The idea is that you don't scan the index and base table concurrently
> as we presently do it.  Instead, you scan the index and make a list
> of the TIDs of the table tuples you need to visit.  This list can
> be conveniently represented as a sparse bitmap.  After you've finished
> looking at the index, you visit all the required table tuples *in
> physical order* using the bitmap.  This eliminates multiple fetches
> of the same heap page, and can possibly let you get some win from
> sequential access.
> 
> Once you have built this mechanism, you can then move on to using
> multiple indexes in interesting ways: you can do several indexscans
> in one query and then AND or OR their bitmaps before doing the heap
> scan.  This would allow, for example, "WHERE a = foo and b = bar"
> to be handled by ANDing results from separate indexes on the a and b
> columns, rather than having to choose only one index to use as we do
> now.
> 
> Some thoughts about implementation: FireBird's implementation seems
> to depend on an assumption about a fixed number of tuple pointers
> per page.  We don't have that, but we could probably get away with
> just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page.
> Also, the main downside of this approach is that the bitmap could
> get large --- but you could have some logic that causes you to fall
> back to plain sequential scan if you get too many index hits.  (It's
> interesting to think of this as lossy compression of the bitmap...
> which leads to the idea of only being fuzzy in limited areas of the
> bitmap, rather than losing all the information you have.)
> 
> A possibly nasty issue is that lazy VACUUM has some assumptions in it
> about indexscans holding pins on index pages --- that's what prevents
> it from removing heap tuples that a concurrent indexscan is just about
> to visit.  It might be that there is no problem: even if lazy VACUUM
> removes a heap tuple and someone else then installs a new tuple in that
> same TID slot, you should be okay because the new tuple is too new to
> pass your visibility test.  But I'm not convinced this is safe.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Dave Page




> -Original Message-
> From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED]] 
> Sent: 09 August 2002 03:57
> To: Alvaro Herrera
> Cc: [EMAIL PROTECTED]
> Subject: Re: [HACKERS] CLUSTER and indisclustered
> 
> 
> If you're looking for something very useful to work on, see if Gavin
> Sherry(?) can post his old CREATE OR REPLACE VIEW code.  I'm 
> pretty sure he (or someone) said that he had an old patch, 
> that needed to be synced with HEAD...  This functionality is 
> pretty essential for 7.3...
> 

I'll second that...

Dave.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Tom Lane


Neil Conway <[EMAIL PROTECTED]> writes:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> What if I [try to] extend the grammar to support an additional ANALYZE
>> in CLUSTER, so that it analyzes the table automatically?

> I don't like this -- it seems like bloat.

My reaction exactly.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Christopher Kings-Lynne


If you're looking for something very useful to work on, see if Gavin
Sherry(?) can post his old CREATE OR REPLACE VIEW code.  I'm pretty sure he
(or someone) said that he had an old patch, that needed to be synced with
HEAD...  This functionality is pretty essential for 7.3...

Chris

> -Original Message-
> From: Alvaro Herrera [mailto:[EMAIL PROTECTED]]
> Sent: Friday, 9 August 2002 10:21 AM
> To: Christopher Kings-Lynne
> Cc: Neil Conway; Tom Lane; Bruce Momjian; [EMAIL PROTECTED]
> Subject: Re: [HACKERS] CLUSTER and indisclustered
>
>
> Christopher Kings-Lynne dijo:
>
> > > > > Or maybe just do an analyze of the table automatically after the
> > > > > CLUSTERing.
> >
> > Well we have previously had discussions on the topic of adding
> analyze to
> > the end of dumps, etc. and the result has always been in favour
> of keeping
> > the command set orthogonal and not doing an automatic analyze...
>
> Oh.  Sorry for the noise.
>
> I'm trying to look at other things in the TODO so I stop pestering about
> CLUSTER.
>
> --
> Alvaro Herrera ()
> "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
> sólo le suma el nuevo terror de la locura" (Perelandra, CSLewis)
>


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Christopher Kings-Lynne


> > Well we have previously had discussions on the topic of adding
> analyze to
> > the end of dumps, etc. and the result has always been in favour
> of keeping
> > the command set orthogonal and not doing an automatic analyze...
>
> Oh.  Sorry for the noise.
>
> I'm trying to look at other things in the TODO so I stop pestering about
> CLUSTER.

All I can say is - thanks for fixing CLUSTER.  As soon as we upgrade to 7.3
I'm going on a CLUSTERing spree :)

Chris


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Alvaro Herrera


Christopher Kings-Lynne dijo: 

> > > > Or maybe just do an analyze of the table automatically after the
> > > > CLUSTERing.
> 
> Well we have previously had discussions on the topic of adding analyze to
> the end of dumps, etc. and the result has always been in favour of keeping
> the command set orthogonal and not doing an automatic analyze...

Oh.  Sorry for the noise.

I'm trying to look at other things in the TODO so I stop pestering about
CLUSTER.

-- 
Alvaro Herrera ()
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, CSLewis)


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Christopher Kings-Lynne


> > > Or maybe just do an analyze of the table automatically after the
> > > CLUSTERing.
> >
> > Hmmm... I don't really see the problem with adding a note in the docs
> > suggesting that users following a CLUSTER with an ANALYZE (...).
>
> ANALYZE is an inexpensive operation (compared to CLUSTER, anyway), so it
> can't hurt to have it done automatically.

Well we have previously had discussions on the topic of adding analyze to
the end of dumps, etc. and the result has always been in favour of keeping
the command set orthogonal and not doing an automatic analyze...

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Alvaro Herrera


Neil Conway dijo: 

> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > What if I [try to] extend the grammar to support an additional ANALYZE
> > in CLUSTER, so that it analyzes the table automatically?
> 
> I don't like this -- it seems like bloat.

Maybe you are right.


> > Or maybe just do an analyze of the table automatically after the
> > CLUSTERing.
> 
> Hmmm... I don't really see the problem with adding a note in the docs
> suggesting that users following a CLUSTER with an ANALYZE (...).

ANALYZE is an inexpensive operation (compared to CLUSTER, anyway), so it
can't hurt to have it done automatically.

-- 
Alvaro Herrera ()
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Neil Conway

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> What if I [try to] extend the grammar to support an additional ANALYZE
> in CLUSTER, so that it analyzes the table automatically?

I don't like this -- it seems like bloat. What's the advantage of

CLUSTER foo ON bar ANALYZE;

over

CLUSTER foo ON bar;
ANALYZE;

> Or maybe just do an analyze of the table automatically after the
> CLUSTERing.

Hmmm... I don't really see the problem with adding a note in the docs
suggesting that users following a CLUSTER with an ANALYZE (of course,
that assumes that the CLUSTER will significantly change the ordering
of the data in the table, which isn't always the case -- which is
another reason why make this automatic seems unwarranted, IMHO). It
seems like you're looking for a solution to a non-existent problem.

Cheers,

Neil

-- 
Neil Conway <[EMAIL PROTECTED]>
PGP Key ID: DB3C29FC

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] CLUSTER and indisclustered

2002-08-08 Thread Alvaro Herrera


Tom Lane dijo: 

> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom, should we be updating that flag after we CLUSTER instead of
> > requiring an ANALYZE after the CLUSTER?
> 
> Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
> quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
> settle for a note in the CLUSTER docs that recommends a subsequent
> ANALYZE --- this seems no different from recommending ANALYZE after bulk
> data load or other major update of a table.

What if I [try to] extend the grammar to support an additional ANALYZE
in CLUSTER, so that it analyzes the table automatically? Say

CLUSTER  ON  [ANALYZE];

Or maybe just do an analyze of the table automatically after the
CLUSTERing.

What does everybody think?

-- 
Alvaro Herrera ()
"Para tener mas hay que desear menos"


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom, should we be updating that flag after we CLUSTER instead of
> requiring an ANALYZE after the CLUSTER?

Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
settle for a note in the CLUSTER docs that recommends a subsequent
ANALYZE --- this seems no different from recommending ANALYZE after bulk
data load or other major update of a table.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane

Gavin Sherry <[EMAIL PROTECTED]> writes:
> On Sat, 3 Aug 2002, Tom Lane wrote:
>> AFAICT you're assuming that the table is *exactly* ordered by the
>> clustered attribute.  While this is true at the instant CLUSTER
>> completes, the exact ordering will be destroyed by the first insert or
>> update :-(.  I can't see much value in creating a whole new scan type

> Sorry, I meant to say that heap_insert() etc would need to set
> indisclustered to false.

<>  You could do that, but only if you are prepared to invent
a mechanism that will instantly invalidate any existing query plans
that assume the clustered ordering is good.

Up to now we've only allowed the planner to make decisions that impact
performace, not correctness of the result.  I'm uncomfortable with the
idea that a "clusterscan" plan could silently return wrong answers after
someone else updates the table and doesn't tell us they did.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Gavin Sherry wrote:
> > Gavin, is that a big win compared to just using the index and looping
> > through the entries, knowing that the index matches are on the same
> > page, and the heap matches are on the same page.
> 
> Bruce,
> 
> It would cut out the index over head. Besides at (1) (above) we would have
> determined that an index scan was too expensive and we would be using a
> SeqScan instead. This would just be faster, since a) we would locate the
> tuples more intelligently b) we wouldn't need to scan the whole heap once
> we'd found all tuples matching the scan key.

Yes, but in a clustered table, an index scan is _never_ (?) more
expensive than a sequential scan, at least if the optimizer is working
correctly.  Index scans are slower only because they assume random heap
access, but with a clustered table, there is no random heap access.  The
index takes to right to the spot to start.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom, should we be updating that flag after we CLUSTER instead of
> > requiring an ANALYZE after the CLUSTER?
> 
> Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
> quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
> settle for a note in the CLUSTER docs that recommends a subsequent
> ANALYZE --- this seems no different from recommending ANALYZE after bulk
> data load or other major update of a table.

OK.  I am sure it is not obvious to people to ANALYZE because the data
in their table hasn't changed, just the ordering.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry


On Sat, 3 Aug 2002, Bruce Momjian wrote:

> Gavin Sherry wrote:
> > Hi all,
> > 
> > It occured to me on the plane home that now that CLUSTER is fixed we may
> > be able to put pg_index.indisclustered to use. If CLUSTER was to set
> > indisclustered to true when it clusters a heap according to the given
> > index, we could speed up sequantial scans. There are two possible ways.
> > 
> > 1) Planner determines that a seqscan is appropriate *and* the retrieval is
> > qualified by the key(s) of one of the relation's indexes
> > 2) Planner determines that the relation is clustered on disk according to
> > the index over the key(s) used to qualify the retrieval
> > 3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
> > 4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
> > 5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
> > different from SeqNext) called SeqClusterNext
> > 6) SeqClusterNext() has all the heapgettup() logic with two
> > exceptions: a) we find the first tuple more intelligently (instead of
> > scanning from the first page) b) if we have found tuple(s) matching the
> > ScanKey when we encounter an non-matching tuple (via
> > HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
> > scan
> 
> Gavin, is that a big win compared to just using the index and looping
> through the entries, knowing that the index matches are on the same
> page, and the heap matches are on the same page.

Bruce,

It would cut out the index over head. Besides at (1) (above) we would have
determined that an index scan was too expensive and we would be using a
SeqScan instead. This would just be faster, since a) we would locate the
tuples more intelligently b) we wouldn't need to scan the whole heap once
we'd found all tuples matching the scan key.

Gavin


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Gavin Sherry wrote:
> Hi all,
> 
> It occured to me on the plane home that now that CLUSTER is fixed we may
> be able to put pg_index.indisclustered to use. If CLUSTER was to set
> indisclustered to true when it clusters a heap according to the given
> index, we could speed up sequantial scans. There are two possible ways.
> 
> 1) Planner determines that a seqscan is appropriate *and* the retrieval is
> qualified by the key(s) of one of the relation's indexes
> 2) Planner determines that the relation is clustered on disk according to
> the index over the key(s) used to qualify the retrieval
> 3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
> 4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
> 5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
> different from SeqNext) called SeqClusterNext
> 6) SeqClusterNext() has all the heapgettup() logic with two
> exceptions: a) we find the first tuple more intelligently (instead of
> scanning from the first page) b) if we have found tuple(s) matching the
> ScanKey when we encounter an non-matching tuple (via
> HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
> scan

Gavin, is that a big win compared to just using the index and looping
through the entries, knowing that the index matches are on the same
page, and the heap matches are on the same page.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian


Tom Lane wrote:
> Gavin Sherry <[EMAIL PROTECTED]> writes:
> > It occured to me on the plane home that now that CLUSTER is fixed we may
> > be able to put pg_index.indisclustered to use. If CLUSTER was to set
> > indisclustered to true when it clusters a heap according to the given
> > index, we could speed up sequantial scans.
> 
> AFAICT you're assuming that the table is *exactly* ordered by the
> clustered attribute.  While this is true at the instant CLUSTER
> completes, the exact ordering will be destroyed by the first insert or
> update :-(.  I can't see much value in creating a whole new scan type
> that's only usable on a perfectly-clustered table.
> 
> The existing approach to making the planner smart about clustered tables
> is to compute a physical-vs-logical-order-correlation statistic and use
> that to adjust the estimated cost of indexscans.  I believe this is a
> more robust approach than considering a table to be "clustered" or "not
> clustered", since it can deal with the gradual degradation of clustered
> order over time.  However, I will not make any great claims for the
> specific equations currently used for this purpose --- they're surely in
> need of improvement.  Feel free to take a look and see if you have any
> ideas.  The collection of the statistic is in commands/analyze.c and the
> use of it is in optimizer/path/costsize.c.

Tom, should we be updating that flag after we CLUSTER instead of
requiring an ANALYZE after the CLUSTER?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry

On Sat, 3 Aug 2002, Tom Lane wrote:

> Gavin Sherry <[EMAIL PROTECTED]> writes:
> > It occured to me on the plane home that now that CLUSTER is fixed we may
> > be able to put pg_index.indisclustered to use. If CLUSTER was to set
> > indisclustered to true when it clusters a heap according to the given
> > index, we could speed up sequantial scans.
> 
> AFAICT you're assuming that the table is *exactly* ordered by the
> clustered attribute.  While this is true at the instant CLUSTER
> completes, the exact ordering will be destroyed by the first insert or
> update :-(.  I can't see much value in creating a whole new scan type

Sorry, I meant to say that heap_insert() etc would need to set
indisclustered to false.

I do see some worth in this however. Naturally, in a situation where a
database is being modified very often this is of little value. However,
for applications focussed on analysing large amounts of static data this
could increase performance significantly. Once I get some time I will
attempt to explore this further in `diff -c` format :-).

Gavin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane

Gavin Sherry <[EMAIL PROTECTED]> writes:
> It occured to me on the plane home that now that CLUSTER is fixed we may
> be able to put pg_index.indisclustered to use. If CLUSTER was to set
> indisclustered to true when it clusters a heap according to the given
> index, we could speed up sequantial scans.

AFAICT you're assuming that the table is *exactly* ordered by the
clustered attribute.  While this is true at the instant CLUSTER
completes, the exact ordering will be destroyed by the first insert or
update :-(.  I can't see much value in creating a whole new scan type
that's only usable on a perfectly-clustered table.

The existing approach to making the planner smart about clustered tables
is to compute a physical-vs-logical-order-correlation statistic and use
that to adjust the estimated cost of indexscans.  I believe this is a
more robust approach than considering a table to be "clustered" or "not
clustered", since it can deal with the gradual degradation of clustered
order over time.  However, I will not make any great claims for the
specific equations currently used for this purpose --- they're surely in
need of improvement.  Feel free to take a look and see if you have any
ideas.  The collection of the statistic is in commands/analyze.c and the
use of it is in optimizer/path/costsize.c.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry


Hi all,

It occured to me on the plane home that now that CLUSTER is fixed we may
be able to put pg_index.indisclustered to use. If CLUSTER was to set
indisclustered to true when it clusters a heap according to the given
index, we could speed up sequantial scans. There are two possible ways.

1) Planner determines that a seqscan is appropriate *and* the retrieval is
qualified by the key(s) of one of the relation's indexes
2) Planner determines that the relation is clustered on disk according to
the index over the key(s) used to qualify the retrieval
3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
different from SeqNext) called SeqClusterNext
6) SeqClusterNext() has all the heapgettup() logic with two
exceptions: a) we find the first tuple more intelligently (instead of
scanning from the first page) b) if we have found tuple(s) matching the
ScanKey when we encounter an non-matching tuple (via
HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
scan

Any reason this isn't possible? Any reason it couldn't dramatically speed
up the performance of the type of query i've mentioned?

Gavin


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

Re: [HACKERS] CLUSTER and indisclustered

[HACKERS] CLUSTER and indisclustered

21 matches

Site Navigation

Mail list logo

Footer information