subject:"\[HACKERS\] plans for bitmap indexes\?"

Re: [HACKERS] plans for bitmap indexes?

2004-11-04 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Updated TODO:
 
  * Allow the creation of bitmap indexes which can be quickly combined
with other bitmap indexes
 
 This TODO item description is fundamentally misleading.
 
 The point was *not* about making bitmap indexes, which on its face
 suggests a persistent on-disk data structure comparable to our existing
 index types.  The point was about using transient in-memory bitmaps as
 an interface between the on-disk indexes and accessing the table proper.

There are two separate issues --- on-disk bitmap indexes and on-the-fly
in-memory created ones.  I tried to mention both but obviously it wasn't
clear.  Here is new wording:

* Allow non-bitmap indexes to be combined by creating bitmaps in memory

  Bitmap indexes index single columns that can be combined with other bitmap
  indexes to dynamically create a composite index to match a specific query.
  Each index is a bitmap, and the bitmaps are bitwise AND'ed or OR'ed to be
  combined.  They can index by tid or can be lossy requiring a scan of the
  heap page to find matching rows.

* Allow the creation of on-disk bitmap indexes which can be quickly
  combined with other bitmap indexes

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-11-03 Thread Bruce Momjian


Updated TODO:

* Allow the creation of bitmap indexes which can be quickly combined
  with other bitmap indexes

  Bitmap indexes index single columns that can be combined with other bitmap
  indexes to dynamically create a composite index to match a specific query.
  Each index is a bitmap, and the bitmaps are bitwise AND'ed or OR'ed to be
  combined.  Such indexes could be more compact if there are few unique
  value.  Also, perhaps they can be lossy requiring a scan of the heap page
  to find matching rows.

* Allow non-bitmap indexes to be combined

  Do lookups on non-bitmap indexes and create bitmaps in memory that can be
  combined with other indexes.


---

Simon Riggs wrote:
  Tom Lane
  Simon Riggs [EMAIL PROTECTED] writes:
 
   How would you dynamically build the bit maps from the indexes?
 
   Or would you:
   - copy aside and sort the indexes on CTID
   - merge join them all to find matching CTIDs
   - probe into the main table
 
  I've been taking bitmap to be a rather handwavy way of saying a
  compact representation of sets of CTIDs that is readily amenable to
  being ANDed and ORed with other sets.  I don't think it'll be a pure
  bitmap with no other superstructure; at the very least you'd want to
  apply some sort of sparse-bitmap and/or compression techniques.  I do
  suspect a bitmappy kind of representation will be more effective than
  sorting arrays of CTIDs per se, although in principle you could do it
  that way too.
 
 
 OK. You seemed to be implying that.
 
  (What you lose is the ability to retrieve data in
  index order, so this isn't a replacement for existing indexscan methods,
  just another plan type to consider.)
 
 Never seen an application that required a bitmap plan and sorted output.
 Have you? Mostly count(*), often sum() or avg(), but never sorted, surely.
 
 Considering there would always be 1 index, which index order did we want
 anyhow?
 
  One interesting thought is that the bitmappy representation could be
  lossy.  For instance, once you get to the point of needing to examine
  most of the rows on a particular page, it's probably not worth
  remembering exactly which rows; you could just remember that that whole
  page is a target, and sequentially scan all the rows on it when you do
  visit the heap.  (ANDing and ORing still works.)  This can scale up to
  visiting consecutive ranges of pages; in the limit the operation
  degenerates to a seqscan.  With this idea you can guarantee that the
  in-memory bitmaps never get impracticably large.  (Obviously if they get
  so large as to push the system into swapping, or even run the backend
  out of memory completely, you lose, so this is a real nice guarantee to
  be able to make.)  The whole thing starts to look like a self-adaptive
  interpolation between our present indexscan and seqscan techniques,
  which takes a lot of pressure off the planner to correctly guess the
  number of matching rows in advance.
 
 Well, thats the best one yet. That's the solution, if ever I heard it.
 
 The reduction in bitmap size makes their use much safer. Size matters, since
 we're likely to start using these techniques on very large databases, which
 imply obviously have very large CTID lists. The problem with guessing the
 number of rows is you're never too sure whether its worth the startup
 overhead of using the bitmap technique. my next question was going to
 be, so how will you know when to use the technique?
 
 Hmmmthinkyou'd need to be clear that the cost of scanning a block
 didn't make the whole thing impractical. Generally, since we're using this
 technique to access infrequent row combinations, we'd be looking at no more
 than one row per block usually anyway. So the technique is still I/O bound -
 a bit extra post I/O cpu work won't hurt much. OK, cool.
 
  I remember batting these ideas around with people at the 2001 OSDB
  summit conference ... I didn't think it would take us this long to get
  around to doing it ...
 
 ...as if you haven't been busy... ;-)
 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] plans for bitmap indexes?

2004-11-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Updated TODO:

 * Allow the creation of bitmap indexes which can be quickly combined
   with other bitmap indexes

This TODO item description is fundamentally misleading.

The point was *not* about making bitmap indexes, which on its face
suggests a persistent on-disk data structure comparable to our existing
index types.  The point was about using transient in-memory bitmaps as
an interface between the on-disk indexes and accessing the table proper.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-27 Thread Hannu Krosing

On K, 2004-10-27 at 00:58, Andre Maasikas wrote:
 Hannu Krosing wrote:
  the per-page clustering would make sure that all the tuples would be on
  1 (or on a few) pages.
 
 I understand that You can cluster on one column, but how do you do it for
 indexes on other columns?

Thanks to PostgreSQL's MVCC each update inserts a complete new tuple -
you just have to insert in the right page.

so if I change foo=1 to foo=2 on a tuple that has bar=2 and baz=3 then
the updated tuple will go to a page for which foo=2, bar=2 and baz=3.

if no such page has enough free space left (found by anding bitmaps for
foo=2, bar=2 and baz=3 and FSM) then a new page is inserted and the
three corresponding indexes are updated to include that page.

 BTW, lossy variants also lose count(), group by only from index

PostgreSQL has never been able to do these from index only, as the
visibility info is stored in the main relation, and not in index.

Someone brings up adding visibility info to index about once in 6 months
and is referred to previous discussions as to why it is a bad idea. The
only thing that as been added to index is a bit telling if a tuple is
definitely invisible (i.e. older than any pending transaction) which is
updated when such tuple is accessed using this index.


  and what comes to updating the index, you have to do it only once per
  100 pages ;)
 
 Sorry, how does that work, if I update foo = 'bar'-'baz' - I can flip 
 the 'baz' bit
 on right away but I have to check every other row to see
 if I can turn the 'bar' bit off

You don't touch indexes, instead you select the right page for new
tuple. The only times you touch indexes is when you insert a new page
(or when the page becomes empty during vacuum)


Hannu



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] plans for bitmap indexes?

2004-10-27 Thread Greg Stark


Hannu Krosing [EMAIL PROTECTED] writes:

 so if I change foo=1 to foo=2 on a tuple that has bar=2 and baz=3 then
 the updated tuple will go to a page for which foo=2, bar=2 and baz=3.
 
 if no such page has enough free space left (found by anding bitmaps for
 foo=2, bar=2 and baz=3 and FSM) then a new page is inserted and the
 three corresponding indexes are updated to include that page.

This is all thinking in terms of a single index though. What do you do if I
have a dozen bitmap indexes? Each could have a 10 distinct values. You would
need 100,000 pages, each of which might only have a few tuples in them.

In any case the user may prefer to have the data clustered around a btree
index using the existing CLUSTER command.

There's a logical separation between the idea of index methods and table
storage mechanisms. Trying to implement something like this that breaks that
abstraction will only make things far more confusing.

I think what you're trying to accomplish is better accomplished through
partitioned tables. Then the user can decide which keys to use to partition
the data and the optimizer can use the data to completely exclude some
partitions from consideration. And it wouldn't interfere with indexes to
access the data within a partition.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-10-27 Thread Yann Michel

Hi,

On Wed, Oct 27, 2004 at 10:13:56AM -0400, Greg Stark wrote:
 
 There's a logical separation between the idea of index methods and table
 storage mechanisms. Trying to implement something like this that breaks that
 abstraction will only make things far more confusing.
 
 I think what you're trying to accomplish is better accomplished through
 partitioned tables. Then the user can decide which keys to use to partition
 the data and the optimizer can use the data to completely exclude some
 partitions from consideration. And it wouldn't interfere with indexes to
 access the data within a partition.

this is not always the truth. In datawarehouosing applications you often
use data paritioning (time based) and bitmap indexes for fast
star-transformations. A very efficient way to solve that ist using
bitmap indexes.

Regards,
Yann

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-27 Thread Mark Kirkwood

Greg Stark wrote:
I think what you're trying to accomplish is better accomplished through
partitioned tables. Then the user can decide which keys to use to partition
the data and the optimizer can use the data to completely exclude some
partitions from consideration. And it wouldn't interfere with indexes to
access the data within a partition.
 

Though partitioning will help, you can only partition on one key (I 
guess the ability to partition *indexes* might help here).

I think that bitmap indexes provide a flexible may to get fact access to 
the result set for multiple low cardinality conditions - something that 
partitioning will generally not do.

regards
Mark
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-27 Thread Mark Kirkwood

Mark Kirkwood wrote:
I think that bitmap indexes provide a flexible may to get fact access 
to the result set
that should be *fast* access tosorry
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing

On K, 2004-10-20 at 03:03, Simon Riggs wrote:

 Well, thats the best one yet. That's the solution, if ever I heard it.
 
 The reduction in bitmap size makes their use much safer. Size matters, since
 we're likely to start using these techniques on very large databases, which
 imply obviously have very large CTID lists. The problem with guessing the
 number of rows is you're never too sure whether its worth the startup
 overhead of using the bitmap technique. my next question was going to
 be, so how will you know when to use the technique?
 
 Hmmmthinkyou'd need to be clear that the cost of scanning a block
 didn't make the whole thing impractical. Generally, since we're using this
 technique to access infrequent row combinations, we'd be looking at no more
 than one row per block usually anyway. So the technique is still I/O bound -
 a bit extra post I/O cpu work won't hurt much. OK, cool.

I still think that an initial implementation could be done with a plain
bitmap kind of bitmap, if we are content with storing one bit per page
only - a simple page bitmap for 1TB table with 8kB pages takes only 16
MB, and that's backends private memory not more scarce shared memory.

It takes only 4mb for 32kb pages.

I guess that anyone working with terabyte size tables can afford a few
megabytes of RAM for query processing.


Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing

On K, 2004-10-20 at 01:52, Mark Kirkwood wrote:

 I don't believe that read only is required. The update/insert 
 performance impact of bimap indexes is however very high (in Oracle's 
 implementation anyway) - to the point where many sites drop them before 
 adding in new data, and recreated 'em afterwards!
 
 In the advent that there is a benefit for the small on-disk footprint, 
 the insert/update throughput implications will need to be taken into 
 account.

I repeat here my earlier proposal of making the bitmap indexes
page-level and clustering data automatically on AND of all defined
bitmap indexes. 

This would mostly solve this problem too, as there will be only one
insert per page per index (when the first tuple is inserted) and one
delete (when the page gets empty).

This has a downside of suboptimal space usage but this should not (tm)
be an issue for large tables, where most combinations of bits will get
enough hits to fill several pages.

Such clustering would also help (probably a lot) all queries actually
using these indexes.

-
Hannu

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Greg Stark


Hannu Krosing [EMAIL PROTECTED] writes:

 I repeat here my earlier proposal of making the bitmap indexes
 page-level and clustering data automatically on AND of all defined
 bitmap indexes. 

The problem with page-level bitmaps is that they could be much less effective.
Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of
those matches about 1% of your tuples. If you have 100 tuples per page then
each of those bitmaps will find a tuple in about half the pages. So the
resulting AND will find about 1/8th of the pages as candidates. In reality the
number of pages it should have to fetch should be more like 1 in a million.

The other problem is that for persist on-disk indexes they require more work
to update. You would have to recheck every other tuple in the page to
recalculate the bit value instead of just being able to flip one bit.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing

On T, 2004-10-26 at 18:42, Greg Stark wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
 
  I repeat here my earlier proposal of making the bitmap indexes
  page-level and clustering data automatically on AND of all defined
  bitmap indexes. 
 
 The problem with page-level bitmaps is that they could be much less effective.
 Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of
 those matches about 1% of your tuples. If you have 100 tuples per page then
 each of those bitmaps will find a tuple in about half the pages. So the
 resulting AND will find about 1/8th of the pages as candidates. In reality the
 number of pages it should have to fetch should be more like 1 in a million.
 
 The other problem is that for persist on-disk indexes they require more work
 to update. You would have to recheck every other tuple in the page to
 recalculate the bit value instead of just being able to flip one bit.

Read again ;)

the per-page clustering would make sure that all the tuples would be on
1 (or on a few) pages.

and what comes to updating the index, you have to do it only once per
100 pages ;)


Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Hannu Krosing

On T, 2004-10-26 at 23:53, Hannu Krosing wrote:
 On T, 2004-10-26 at 18:42, Greg Stark wrote:
  Hannu Krosing [EMAIL PROTECTED] writes:
  
   I repeat here my earlier proposal of making the bitmap indexes
   page-level and clustering data automatically on AND of all defined
   bitmap indexes. 
  
  The problem with page-level bitmaps is that they could be much less effective.
  Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of
  those matches about 1% of your tuples. If you have 100 tuples per page then
  each of those bitmaps will find a tuple in about half the pages. So the
  resulting AND will find about 1/8th of the pages as candidates. In reality the
  number of pages it should have to fetch should be more like 1 in a million.

Another way to solve the unequal number of tuples per page problem was
to have a fixed length bitmap with rollower (mod length) for each page. 

So your 100 tuples per page on average table should get a 32 (or 64)
bits per page bitmap where bits 1, 33, 65 and 97 would all be in the
same position (for 32 bits), but one could still do fast ANDS and ORS
with high degree of accuracy.

I guess the per-page clustering idea described in my previous mail can
even be extended inside the pages (i.e. cluster on same bits in
2/4/8/16/32bit page bitmap) if simple per/page bitmaps would waste too
much space (many different values, few actual rows - i.e. not a good
candidate for real bitmap indexes ;-p )

  The other problem is that for persist on-disk indexes they require more work
  to update. You would have to recheck every other tuple in the page to
  recalculate the bit value instead of just being able to flip one bit.
 
 Read again ;)
 
 the per-page clustering would make sure that all the tuples would be on
 1 (or on a few) pages.
 
 and what comes to updating the index, you have to do it only once per
 100 pages ;)

This kind of clustering index works best when created on an empty table,
so all tuples can be inserted on their rightful pages.

If this kind of BM index is created on a table with some data, we need
an additional bitmap for gray pages - that is pages containing tuples
matching several combinations of index bits. 

The way to sharpen a table with gray pages would be either a CLUSTER
command or VACUUM (which could check for same-bit-combination-ness.

At least an empty page would be initially (or after becoming empty
during vacuum) marked non-gray and it should also never become gray
unless a new bitmap index is added.

-
Hannu



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-26 Thread Andre Maasikas

Hannu Krosing wrote:
the per-page clustering would make sure that all the tuples would be on
1 (or on a few) pages.
I understand that You can cluster on one column, but how do you do it for
indexes on other columns?
BTW, lossy variants also lose count(), group by only from index
and what comes to updating the index, you have to do it only once per
100 pages ;)
Sorry, how does that work, if I update foo = 'bar'-'baz' - I can flip 
the 'baz' bit
on right away but I have to check every other row to see
if I can turn the 'bar' bit off

Andre
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] plans for bitmap indexes?

2004-10-21 Thread Jim C. Nasby

On Tue, Oct 19, 2004 at 07:38:49PM -0300, Alvaro Herrera wrote:
 Huh, you are wrong.  At least btree does index null values, and one
 other index method does too.  The other two index methods don't.  What
 doesn't work is using an index with the IS NULL construct, because it's
 not an operator.  Maybe that can be fixed by some other means ... some
 parser magic perhaps.

Wow, that's news to me, and important to remember! Where is it
documented? I don't see it in the index chapter...

I think it would be very helpful to have a chapter that gives
information like this for experienced DBAs who are trying to learn
PostgreSQL and don't want to read through all the documentation but need
to know important differences in how PostgreSQL works that would be easy
to miss. Not indexing NULLs (ala Oracle), or not using an index for IS
NULL would certainly be examples. Things listed at
http://sql-info.de/postgresql/postgres-gotchas.html would also be likely
candidates.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] plans for bitmap indexes?

2004-10-20 Thread Sailesh Krishnamurthy

 Gavin == Gavin Sherry [EMAIL PROTECTED] writes:

Gavin I'm uncertain about the potential benefit of
Gavin this. Isn't/shouldn't the effects of caching be assisting
Gavin us here?

It all depends on how large your table is, and how busy the system is
(other pressures on the cache). While it's difficult to plan for the
latter you can plan for the former. For the latter you could assume
that the effective cache size is some fraction of the real size to
account for the effects of other queries. Unless you are real sure
that you will never kick out a page from the buffer cache .. 

I believe that for large enough tables this can certainly help .. it
sure is something that many other systems have implemented. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Simon Riggs

Mark Kirkwood wrote
  Tom Lane wrote:
 
 I believe that the term bitmap index is also used with a different
 meaning wherein it actually does describe a particular kind of on-disk
 index structure, with one bit per table row.
 
 IMHO building in-memory bitmaps (the first idea) is a very good idea to
 pursue for Postgres.  I'm not at all sold on on-disk bitmap indexes,
 though ... those I suspect *are* sufficiently replaced by partial
 indexes.
 

Well, if we could cache the bitmap after it was created the first time then
that might offer almost the same thing. :-)

I was thinking about this recently, then realised that building the bitmap
would not be as easily, since PostgreSQL doesn't index null values. That
would mean that the sets of CTIDs in each index would be disjoint. My
thinking about dynamic bitmaps came from Teradata, which does index null
values.

How would you dynamically build the bit maps from the indexes?

Or would you:
- copy aside and sort the indexes on CTID
- merge join them all to find matching CTIDs
- probe into the main table

Hopefully, I've missed something that you've thought of !

 I believe that the benefit of on-disk bitmap indexes is supposed to be
 reduced storage size (compared to btree).

 In the cases where I have put them to use, they certainly occupy
 considerably less disk than a comparable btree index - provided there
 are not too many district values in the indexed column.


The main problem is the need for the table to be read-only. Until we have
partitioning, we wouldn't be able to easily guarantee parts of a table as
being (effectively) read-only.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Alvaro Herrera

On Tue, Oct 19, 2004 at 11:22:31PM +0100, Simon Riggs wrote:

 I was thinking about this recently, then realised that building the bitmap
 would not be as easily, since PostgreSQL doesn't index null values. That
 would mean that the sets of CTIDs in each index would be disjoint. My
 thinking about dynamic bitmaps came from Teradata, which does index null
 values.

Huh, you are wrong.  At least btree does index null values, and one
other index method does too.  The other two index methods don't.  What
doesn't work is using an index with the IS NULL construct, because it's
not an operator.  Maybe that can be fixed by some other means ... some
parser magic perhaps.

 Or would you:
 - copy aside and sort the indexes on CTID
 - merge join them all to find matching CTIDs
 - probe into the main table

IIRC part of the trick was to build bitmaps to apply bitwise-AND/OR
operators.  This allows to use multiple indexes for one scan, for
example.


I don't understand your comment about read only tables ...

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
I call it GNU/Linux. Except the GNU/ is silent. (Ben Reiter)


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Mark Kirkwood

Simon Riggs wrote:

I believe that the benefit of on-disk bitmap indexes is supposed to be
reduced storage size (compared to btree).
   

The main problem is the need for the table to be read-only. Until we have
partitioning, we wouldn't be able to easily guarantee parts of a table as
being (effectively) read-only.
 

I don't believe that read only is required. The update/insert 
performance impact of bimap indexes is however very high (in Oracle's 
implementation anyway) - to the point where many sites drop them before 
adding in new data, and recreated 'em afterwards!

In the advent that there is a benefit for the small on-disk footprint, 
the insert/update throughput implications will need to be taken into 
account.

cheers
Mark
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 I was thinking about this recently, then realised that building the bitmap
 would not be as easily, since PostgreSQL doesn't index null values.

As Alvaro already pointed out, this statement is bogus; and I'm not sure
what it has to do with the topic anyway.  All you care about is the rows
that the index fingers as matching your scan condition.  If the scan
condition is strict (which it usually is) it does not matter whether the
index stores entries for nulls or not.

 How would you dynamically build the bit maps from the indexes?

 Or would you:
 - copy aside and sort the indexes on CTID
 - merge join them all to find matching CTIDs
 - probe into the main table

I've been taking bitmap to be a rather handwavy way of saying a
compact representation of sets of CTIDs that is readily amenable to
being ANDed and ORed with other sets.  I don't think it'll be a pure
bitmap with no other superstructure; at the very least you'd want to
apply some sort of sparse-bitmap and/or compression techniques.  I do
suspect a bitmappy kind of representation will be more effective than
sorting arrays of CTIDs per se, although in principle you could do it
that way too.

But yeah, the basic idea is to scan an index and build some sort of
in-memory set of CTIDs of selected rows; possibly AND or OR this with
other sets built from other indexes; and then scan the set and probe
into the heap at the indicated places.  One huge advantage is that the
actual heap visiting becomes efficient, eg you never visit the same page
more than once.  (What you lose is the ability to retrieve data in
index order, so this isn't a replacement for existing indexscan methods,
just another plan type to consider.)

One interesting thought is that the bitmappy representation could be
lossy.  For instance, once you get to the point of needing to examine
most of the rows on a particular page, it's probably not worth
remembering exactly which rows; you could just remember that that whole
page is a target, and sequentially scan all the rows on it when you do
visit the heap.  (ANDing and ORing still works.)  This can scale up to
visiting consecutive ranges of pages; in the limit the operation
degenerates to a seqscan.  With this idea you can guarantee that the
in-memory bitmaps never get impracticably large.  (Obviously if they get
so large as to push the system into swapping, or even run the backend
out of memory completely, you lose, so this is a real nice guarantee to
be able to make.)  The whole thing starts to look like a self-adaptive
interpolation between our present indexscan and seqscan techniques,
which takes a lot of pressure off the planner to correctly guess the
number of matching rows in advance.

I remember batting these ideas around with people at the 2001 OSDB
summit conference ... I didn't think it would take us this long to get
around to doing it ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Josh Berkus

Tom,

 I've been taking bitmap to be a rather handwavy way of saying a
 compact representation of sets of CTIDs that is readily amenable to
 being ANDed and ORed with other sets.  

Well, actually I think we're talking about two different features:

1) a way to use more than one index per operation;
2) a more compact and thus faster index representation

The fact that Oracle solved both problems with the same code doesn't, 
obviously mean that we have to.   There's been a lot of discussion around 
problem (2) on this thread, but I don't want to lose sight of problem 
(1)  especially since that's the problem faced by several active 
community members right now.

You gave the impression that (1) could be implemented with regular BTree 
indexes in an earlier e-mail.   Would that be very hard to do?

 The whole thing starts to look like a self-adaptive 
 interpolation between our present indexscan and seqscan techniques,
 which takes a lot of pressure off the planner to correctly guess the
 number of matching rows in advance.

This would be way cool.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Gavin Sherry

On Tue, 19 Oct 2004, Josh Berkus wrote:

 Tom,

  I've been taking bitmap to be a rather handwavy way of saying a
  compact representation of sets of CTIDs that is readily amenable to
  being ANDed and ORed with other sets.

 Well, actually I think we're talking about two different features:

 1) a way to use more than one index per operation;
 2) a more compact and thus faster index representation

For those interested, how this generally works is that for every distinct
value in the column being indexed, a bitmap of unique row identifiers (ie,
tids) is created. With compression, this can greatly reduce the size of
indexes on a large number of rows with a small number of distinct values
(a situation in which we're highly likely to use seq scan index of index
in Postgres).

For qualifications like: bitmapcol1 AND/OR bitmapcol2, we can use bitmap
and/or respectively. Of course, this is all in theory.

Bitmap indexes can suffer concurrency issues, depending on the granularity
of locking.

 You gave the impression that (1) could be implemented with regular BTree
 indexes in an earlier e-mail.   Would that be very hard to do?

  The whole thing starts to look like a self-adaptive
  interpolation between our present indexscan and seqscan techniques,
  which takes a lot of pressure off the planner to correctly guess the
  number of matching rows in advance.

 This would be way cool.

I think there's a lot of be gained by the technique above as an
alternative to our current access methods. Its just a feeling however, I
haven't prototyped this.

Thanks,

Gavin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Simon Riggs

 Tom Lane
 Simon Riggs [EMAIL PROTECTED] writes:

  How would you dynamically build the bit maps from the indexes?

  Or would you:
  - copy aside and sort the indexes on CTID
  - merge join them all to find matching CTIDs
  - probe into the main table

 I've been taking bitmap to be a rather handwavy way of saying a
 compact representation of sets of CTIDs that is readily amenable to
 being ANDed and ORed with other sets.  I don't think it'll be a pure
 bitmap with no other superstructure; at the very least you'd want to
 apply some sort of sparse-bitmap and/or compression techniques.  I do
 suspect a bitmappy kind of representation will be more effective than
 sorting arrays of CTIDs per se, although in principle you could do it
 that way too.


OK. You seemed to be implying that.

 (What you lose is the ability to retrieve data in
 index order, so this isn't a replacement for existing indexscan methods,
 just another plan type to consider.)

Never seen an application that required a bitmap plan and sorted output.
Have you? Mostly count(*), often sum() or avg(), but never sorted, surely.

Considering there would always be 1 index, which index order did we want
anyhow?

 One interesting thought is that the bitmappy representation could be
 lossy.  For instance, once you get to the point of needing to examine
 most of the rows on a particular page, it's probably not worth
 remembering exactly which rows; you could just remember that that whole
 page is a target, and sequentially scan all the rows on it when you do
 visit the heap.  (ANDing and ORing still works.)  This can scale up to
 visiting consecutive ranges of pages; in the limit the operation
 degenerates to a seqscan.  With this idea you can guarantee that the
 in-memory bitmaps never get impracticably large.  (Obviously if they get
 so large as to push the system into swapping, or even run the backend
 out of memory completely, you lose, so this is a real nice guarantee to
 be able to make.)  The whole thing starts to look like a self-adaptive
 interpolation between our present indexscan and seqscan techniques,
 which takes a lot of pressure off the planner to correctly guess the
 number of matching rows in advance.

Well, thats the best one yet. That's the solution, if ever I heard it.

The reduction in bitmap size makes their use much safer. Size matters, since
we're likely to start using these techniques on very large databases, which
imply obviously have very large CTID lists. The problem with guessing the
number of rows is you're never too sure whether its worth the startup
overhead of using the bitmap technique. my next question was going to
be, so how will you know when to use the technique?

Hmmmthinkyou'd need to be clear that the cost of scanning a block
didn't make the whole thing impractical. Generally, since we're using this
technique to access infrequent row combinations, we'd be looking at no more
than one row per block usually anyway. So the technique is still I/O bound -
a bit extra post I/O cpu work won't hurt much. OK, cool.

 I remember batting these ideas around with people at the 2001 OSDB
 summit conference ... I didn't think it would take us this long to get
 around to doing it ...

...as if you haven't been busy... ;-)

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Simon Riggs

Alvaro Herrera
 On Tue, Oct 19, 2004 at 11:22:31PM +0100, Simon Riggs wrote:

  I was thinking about this recently, then realised that building the
bitmap
  would not be as easily, since PostgreSQL doesn't index null values. That
  would mean that the sets of CTIDs in each index would be disjoint. My
  thinking about dynamic bitmaps came from Teradata, which does index null
  values.

 Huh, you are wrong.

Always happy to learn. Thanks for letting me know.

 At least btree does index null values, and one
 other index method does too.  The other two index methods don't.  What
 doesn't work is using an index with the IS NULL construct, because it's
 not an operator.  Maybe that can be fixed by some other means ... some
 parser magic perhaps.

The manual says this (CREATE INDEX)
Indexes are not used for IS NULL clauses by default. The best way to use
indexes in such cases is to create a partial index using an IS NULL
comparison. 

Perhaps we can find a better way of wording this to explain what actually
occurs, which after your comments, I'm less clear on than I was before.
Could you clarify further, so we can update the documentation to be very
specific, or at least clearer.

  Or would you:
  - copy aside and sort the indexes on CTID
  - merge join them all to find matching CTIDs
  - probe into the main table

 IIRC part of the trick was to build bitmaps to apply bitwise-AND/OR
 operators.  This allows to use multiple indexes for one scan, for
 example.

Yes, an implication of my question was and would that then give greater
overhead for 2 indexes...

 I don't understand your comment about read only tables ...

These are restrictions on the Oracle implementation.

If you had a larger data warehouse table that grew over time, then typically
the older data wouldn't change much and so a read-only technique could be
sensibly applied.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Sailesh Krishnamurthy

 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom One huge advantage is that the actual heap visiting becomes
Tom efficient, eg you never visit the same page more than once.
Tom (What you lose is the ability to retrieve data in index
Tom order, so this isn't a replacement for existing indexscan
Tom methods, just another plan type to consider.)

Even without bitmap indexes, without trying to use multiple indexes
etc. this (visiting a page only once) is useful. 

In other words, I'd like to see the indexscan broken up into: (1) an
operator that returns a list of TIDs, (2) Sort the TIDs and (3) an
operator that fetches heap tuples from the sorted TID list. 

Of course the resulting data from the heap will be out of order but
that often times is less important than unnecessarily visiting (and
possibly even re-fetching from disk) the same heap page twice for a
given index scan. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Gavin Sherry

On Tue, 19 Oct 2004, Sailesh Krishnamurthy wrote:

  Tom == Tom Lane [EMAIL PROTECTED] writes:

 Tom One huge advantage is that the actual heap visiting becomes
 Tom efficient, eg you never visit the same page more than once.
 Tom (What you lose is the ability to retrieve data in index
 Tom order, so this isn't a replacement for existing indexscan
 Tom methods, just another plan type to consider.)

 Even without bitmap indexes, without trying to use multiple indexes
 etc. this (visiting a page only once) is useful.

 In other words, I'd like to see the indexscan broken up into: (1) an
 operator that returns a list of TIDs, (2) Sort the TIDs and (3) an
 operator that fetches heap tuples from the sorted TID list.

I'm uncertain about the potential benefit of this. Isn't/shouldn't the
effects of caching be assisting us here?

Gavin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] plans for bitmap indexes?

2004-10-17 Thread Mark Kirkwood

Tom Lane wrote:
I believe that the term bitmap index is also used with a different
meaning wherein it actually does describe a particular kind of on-disk
index structure, with one bit per table row.
IMHO building in-memory bitmaps (the first idea) is a very good idea to
pursue for Postgres.  I'm not at all sold on on-disk bitmap indexes,
though ... those I suspect *are* sufficiently replaced by partial
indexes.
 

I believe that the benefit of on-disk bitmap indexes is supposed to be 
reduced storage size (compared to btree).

In the cases where I have put them to use, they certainly occupy 
considerably less disk than a comparable btree index - provided there 
are not too many district values in the indexed column.

regards
Mark
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] plans for bitmap indexes?

2004-10-15 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 On Thu, Oct 14, 2004 at 11:08:54PM +0200, Yann Michel wrote:
 BTW: Is there any more documented CVS-version available? I mean it would
 be really nice to read some comments from time to time or at least more
 comments about each function/method's purpose or functionality.

 Huh, the code is reasonably commented.  Certainly not following
 Javadoc-like standards, but it can be read.

Also, have you read the Internals volume of the SGML docs?  Mostly
pretty high-level stuff, but that's what you need to get started.
The developer's FAQ is also required reading for newbies.
There are also README files in various parts of the source tree that
provide information about various sub-modules.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-10-15 Thread Yann Michel

Hi Tom,

On Fri, Oct 15, 2004 at 11:27:05AM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  On Thu, Oct 14, 2004 at 11:08:54PM +0200, Yann Michel wrote:
  BTW: Is there any more documented CVS-version available? I mean it would
  be really nice to read some comments from time to time or at least more
  comments about each function/method's purpose or functionality.
 
  Huh, the code is reasonably commented.  Certainly not following
  Javadoc-like standards, but it can be read.
 
 Also, have you read the Internals volume of the SGML docs?  Mostly
 pretty high-level stuff, but that's what you need to get started.
 The developer's FAQ is also required reading for newbies.
 There are also README files in various parts of the source tree that
 provide information about various sub-modules.

I have not jet been reading all of it but some of the README files. I
will keep that hint in mind but first of all I'll read something about
bitmap compression and other relevant techniques before starting to
discover the index internals of postgresql... ;-) I've been using all
kinds of functions in oracle for a long time but never had the
experience to implement any indexing strategies. The only thing I did
were some operating system extensions for minix during my os-studies
(scheduling, driver, acl etc.)

If there is anything additional/special to know further, I apreciate any
hints.

Regards and thanks,
Yann

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-14 Thread Zeugswetter Andreas DAZ SD


  create index people_male_gay_ix on people (city) where gender = 'male' and
  orientation = 'gay';
 
 You've forgotten part of my premise (based on a real case I discussed on IRC) 
 that there are EIGHTEEN criteria columns.

That is why I said maybe :-) Whether it helps depends on the number of 
actually (often) used access patterns.

   That would require, by the method 
 you have above, roughly 18(3rd factorial) indexes, times the number of values 
 allowed by each column, which if it averaged, say 5 values, would be 24,480 

Well, an index only needs to reduce the cost enough so that you can afford your 
workload and have reasonable response times, so you might only need to create a 
few of them.

I was actually only trying to help optimize this example without the bitmap index
feature, not trying to say that for this example partial indexes are better.
Especially since the first example, that mentioned partial indexes was not 
the way to do it for a value that represents a large part of your table (here approx 
50%).
(don't do: create index people_male on people (gender) where gender='male';)

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-14 Thread Yann Michel

Hi,

On Sat, Oct 09, 2004 at 01:31:36PM -0400, Chris Browne wrote:
 
 The most nearly comparable thing is be the notion of partial
 indexes, where, supposing you had 60 region codes (e.g. - 50 US
 states, 10 Canadian provinces), you might set up indices thus:
 
[...]
 
 The partial indexes will not ALWAYS be useful; in cases where they
 aren't, it is not inconceivable that there are improvements to be made
 in the query optimizer...

So what you are suggesting here is the tree-fashioned-static way of
real bitmap indexes. I.E. each time a new value is inserted vor any kind
of thus indexes column you have to create a new index which is not very
usefull as you can think of. In addition nothing about the real
granularity is known to the optimizer to let it guess the best execution
plan, i.e. to do a full table scan or use an index. That means if one
attributes value is representative for 80 percent it is usefull to do a
full table scan whereas if its value is representative for only 5
percent the index might be better. But as I understood the partial index
concept, no statistics for value representation are available.

Therefore I started to do read some articles and books about bitmap
index implementations to maby contribute... we will see...

BTW: Is there any more documented CVS-version available? I mean it would
be really nice to read some comments from time to time or at least more
comments about each function/method's purpose or functionality.

Regards,
Yann

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-14 Thread Alvaro Herrera

On Thu, Oct 14, 2004 at 11:08:54PM +0200, Yann Michel wrote:

 BTW: Is there any more documented CVS-version available? I mean it would
 be really nice to read some comments from time to time or at least more
 comments about each function/method's purpose or functionality.

Huh, the code is reasonably commented.  Certainly not following
Javadoc-like standards, but it can be read.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
La rebeldía es la virtud original del hombre (Arthur Schopenhauer)


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Neil Conway

On Sun, 2004-10-10 at 03:36, Chris Browne wrote:
 There are doubtless cases where the optimizer won't use them where it
 would be plausible to do so; that suggests, to me, possibilities for
 enhancing the optimizer.

Speaking of which, if anyone has any examples of queries for which we
ought to be able to use a partial index but currently cannot, please
speak up (or mail me privately).

-Neil



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Zeugswetter Andreas DAZ SD


  The most nearly comparable thing is be the notion of partial
  indexes, where, supposing you had 60 region codes (e.g. - 50 US
  states, 10 Canadian provinces), you might set up indices thus:

 For example, imagine you have a table on a dating website with 18 columns 
 representing 18 different characteristics for matching.  Imagine that you 
 index each of those columns seperately. If you do:
 
 SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city = 
 'San Francisco';

I think bitmap indexes do have valid use cases, but partitioned indexes
are really a wonderful feature with a lot of use cases, maybe including this one.

Workable examples for useful partitioned indexes, that help here are:

create index people_male_ix on people (city) where gender = 'male';
create index people_gay_ix on people (city) where orientation = 'gay';
create index people_male_gay_ix on people (city) where gender = 'male' and orientation 
= 'gay';

Note, that the indexed column differs from the partitioning clause.
Note also, that the last index will perform way better than a combo of bitmap indexes.

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Hannu Krosing

On K, 2004-10-13 at 00:09, Greg Stark wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 
  SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city = 
  'San Francisco';
 
 There are actually two TODOs here.
 
 1) a bitmap scan that would be usable with any type of index. The tuple
locations can be read in for each criteria and sorted by location and built
into bitmaps. The results can be combined using bitmap operations and the
tuples scanned in physical order.
 
 2) A persistent bitmap index that would enable skipping the first step of the
above.
 
 In the case if all the columns were btree indexes it might still make sense to
 scan through all the indexes and combine the results before reading in the
 actual tuples. Especially if the tuples are very wide and each column
 individually very unselective, but the combination very selective.
 
 However it would work even better if gender and orientation could be stored on
 disk in a bitmap representation. They're very low cardinality and could be
 stored quite compactly. The result would read the data faster, skip the sort,
 and be able to start returning tuples even before it finished reading the
 entire index.

We could go even further and use the same bm indexes for selecting the
page where the tuple is stored (found by AND of all bitmap indexes plus 
fsm) and achieve natural clustering 

If this is done consistently we need only 1 bit/page in our index 
(straight bitmap for 1GB fits in  16384 kb or 4 database pages)

This approach may result in poor utilisation of database pages for small
tables, but one would not use bitmap indexes for small tables in the 
first place.

--
Hannu





---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Reini Urban

Josh Berkus schrieb:
The most nearly comparable thing is be the notion of partial
indexes, where, supposing you had 60 region codes (e.g. - 50 US
states, 10 Canadian provinces), you might set up indices thus:
I'm afraid that you're mistaken about the functionality of bitmap indexes.   
The purpose of a bitmap index is not to partition an index, but to allow 
multiple indexes to be used in the same operation.
uh. sorry! In my first harsh replay I didn't know that. I thought you 
wanted a new index type for binary images in BLOBS.
(just some hash, maybe optimized for image similarity)

For example, imagine you have a table on a dating website with 18 columns 
representing 18 different characteristics for matching.  Imagine that you 
index each of those columns seperately. If you do:

SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city = 
'San Francisco';

... then the planner can use an index on orientation OR on gender OR on city, 
but not all three.   Multicolumn indexes are no solution for this use case 
because you'd have to create a multicolumn index for each possible combo of 
two or three columns ( 18! ).   

The Bitmap index allows the query executor to use several indexes on the same 
operation, comparing them and selecting rows where they overlap like a Venn 
diagram.
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Tom Lane

Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes:
 Workable examples for useful partitioned indexes, that help here are:

 create index people_male_ix on people (city) where gender = 'male';
 create index people_gay_ix on people (city) where orientation = 'gay';
 create index people_male_gay_ix on people (city) where gender = 'male' and 
 orientation = 'gay';

 Note, that the indexed column differs from the partitioning clause.
 Note also, that the last index will perform way better than a combo of bitmap 
 indexes.

This is definitely a useful technique in some cases, but it's got its
limits.  You have to have only a fairly small number of interesting
conditions (else the number of indexes gets out of hand) and those
conditions have to be spelled out explicitly in the query.  That is,
the last index will indeed work for
SELECT * FROM people WHERE orientation = 'gay'
AND gender = 'male' AND city = 'San Francisco';
but it will not work for
SELECT * FROM people WHERE orientation = $1
AND gender = $2 AND city = $3;
which is the sort of thing that the planner is increasingly going to
have to deal with.

Combining bitmaps at runtime is certainly somewhat more expensive to
execute, but it can deal with cases where the specific values being
searched for are not known until runtime.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Josh Berkus

Andreas,

 I think bitmap indexes do have valid use cases, but partitioned indexes
 are really a wonderful feature with a lot of use cases, 

Sure, no question.  That's why we have them.

 maybe including 
 this one.

Nope, not at all.

 Workable examples for useful partitioned indexes, that help here are:

 create index people_male_ix on people (city) where gender = 'male';
 create index people_gay_ix on people (city) where orientation = 'gay';
 create index people_male_gay_ix on people (city) where gender = 'male' and
 orientation = 'gay';

You've forgotten part of my premise (based on a real case I discussed on IRC) 
that there are EIGHTEEN criteria columns.   That would require, by the method 
you have above, roughly 18(3rd factorial) indexes, times the number of values 
allowed by each column, which if it averaged, say 5 values, would be 24,480 
indexes.   A little impractical, hmmm?   I think that might even break a 
system limit somewhere.

Tom,

 Note that what Josh is describing is not really a distinct index type,
 but a different way of using an index: that is, you pull candidate tuple
 locations from several indexes and intersect or union those sets before
 you go to the heap.  In principle this works whatever the index access
 methods are. 

Yes, exactly.They're known as bitmap indexes because that's how Oracle 
implemented them, and AFAIK only Oracle currently has anything analogous.   
I'd personally be interested in any scheme that allowed the relatively 
efficient usage of multiple indexes on a single operation.

BTW, Tom, I was talking to Sean last night and he was saying that our current 
planner cost calculations assume that a 2-column index fetch will be twice as 
expensive as a 1-column index fetch.   Is this right?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-13 Thread Tom Lane

Josh Berkus [EMAIL PROTECTED] writes:
 BTW, Tom, I was talking to Sean last night and he was saying that our
 current planner cost calculations assume that a 2-column index fetch
 will be twice as expensive as a 1-column index fetch.  Is this right?

No.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] plans for bitmap indexes?

2004-10-12 Thread Chris Browne

[EMAIL PROTECTED] (Josh Berkus) writes:
  Lots of people have talked about it but I don't know anyone coding it.

 I would love to have bitmap indexes in Postgres, as would a lot of other 
 community members.   However, they are far from trivial to code.  Are you 
 offering to help?

I'm curious as to whether partial indexes might suffice as an
alternative.

There are doubtless cases where the optimizer won't use them where it
would be plausible to do so; that suggests, to me, possibilities for
enhancing the optimizer.
-- 
let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxxian.html
A VAX is virtually a computer, but not quite.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] plans for bitmap indexes?

2004-10-12 Thread Chris Browne

[EMAIL PROTECTED] (Yann Michel) writes:
 On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:
 I think what Reini was asking was why do you think you need bitmap
 indexes as opposed to any existing type? 

 due to I'm developing a datawarehousing application we have lots of
 fact-data in our central fact-table. As I know how to improve
 performance on Oracle based datawarehouses, I'm used to add bitmap
 indexes for atributes having only a few distinct values. So I was
 looking for any comparable indexing technology but didn't find any
 so far.

The most nearly comparable thing is be the notion of partial
indexes, where, supposing you had 60 region codes (e.g. - 50 US
states, 10 Canadian provinces), you might set up indices thus:

TABLE=my_table
FIELD=stateprov
FILE=$HOME/regionlist.txt
for region in `cat $FILE`; do
  query=create index ${TABLE}_partial_on_${region} on $TABLE($FIELD) where $FIELD = 
'$region';
  echo $query | psql -d datawarehouse
done

That would set up 60 (or whatever $HOME/regionlist.txt indicated)
partial indices on the table on that field.

By the way, I thought ahead a little, in that; doing the same thing
for country codes might be as easy as replacing:

FIELD=country
FILE=$HOME/countrylist.txt

The partial indexes will not ALWAYS be useful; in cases where they
aren't, it is not inconceivable that there are improvements to be made
in the query optimizer...
-- 
let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxxian.html
A VAX is virtually a computer, but not quite.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] plans for bitmap indexes?

2004-10-12 Thread Greg Stark


Josh Berkus [EMAIL PROTECTED] writes:

 SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city = 
 'San Francisco';

There are actually two TODOs here.

1) a bitmap scan that would be usable with any type of index. The tuple
   locations can be read in for each criteria and sorted by location and built
   into bitmaps. The results can be combined using bitmap operations and the
   tuples scanned in physical order.

2) A persistent bitmap index that would enable skipping the first step of the
   above.

In the case if all the columns were btree indexes it might still make sense to
scan through all the indexes and combine the results before reading in the
actual tuples. Especially if the tuples are very wide and each column
individually very unselective, but the combination very selective.

However it would work even better if gender and orientation could be stored on
disk in a bitmap representation. They're very low cardinality and could be
stored quite compactly. The result would read the data faster, skip the sort,
and be able to start returning tuples even before it finished reading the
entire index.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-12 Thread Tom Lane

Josh Berkus [EMAIL PROTECTED] writes:
 The Bitmap index allows the query executor to use several indexes on
 the same operation, comparing them and selecting rows where they
 overlap like a Venn diagram.

Note that what Josh is describing is not really a distinct index type,
but a different way of using an index: that is, you pull candidate tuple
locations from several indexes and intersect or union those sets before
you go to the heap.  In principle this works whatever the index access
methods are.

I believe that the term bitmap index is also used with a different
meaning wherein it actually does describe a particular kind of on-disk
index structure, with one bit per table row.

IMHO building in-memory bitmaps (the first idea) is a very good idea to
pursue for Postgres.  I'm not at all sold on on-disk bitmap indexes,
though ... those I suspect *are* sufficiently replaced by partial
indexes.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-12 Thread Gaetano Mendola

Josh Berkus wrote:
Chris,

The most nearly comparable thing is be the notion of partial
indexes, where, supposing you had 60 region codes (e.g. - 50 US
states, 10 Canadian provinces), you might set up indices thus:

I'm afraid that you're mistaken about the functionality of bitmap indexes.   
The purpose of a bitmap index is not to partition an index, but to allow 
multiple indexes to be used in the same operation.

For example, imagine you have a table on a dating website with 18 columns 
representing 18 different characteristics for matching.  Imagine that you 
index each of those columns seperately. If you do:

SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city = 
'San Francisco';

... then the planner can use an index on orientation OR on gender OR on city, 
but not all three.   Multicolumn indexes are no solution for this use case 
because you'd have to create a multicolumn index for each possible combo of 
two or three columns ( 18! ).
I'm wondering if in some cases the following query could be more efficient
select * from people where orientation = 'gay'
intersect
select * from people where gender = 'male'
intersect
select * from people where city =


Regards
Gaetano Mendola
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Yann Michel

Hi,

On Thu, Oct 07, 2004 at 06:54:15PM -0400, Bruce Momjian wrote:
  I'd like to know if there are any plans on introducing bitmap indexes
  into postgresql. I think this could mean a big performance improvement
  especially for datawarehousing applications. I know that there is an
  index type hash but I don't know how both types are comparable due to
  they are both best usable for equality expressions.
 
 Lots of people have talked about it but I don't know anyone coding it.

have you ever discussed if bitmap indexes lead to better query
performance than hash indexes will do?

Regards,
Yann

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Dave Page

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Yann Michel
 Sent: 08 October 2004 08:28
 To: Reini Urban
 Cc: [EMAIL PROTECTED]
 Subject: Re: [HACKERS] plans for bitmap indexes?

 I don't know what this means to my questin, but anyway...

  If you don't want to code that in the app, the database 
 backend is the 
  solution. The database is the golden hammer.

 Therefore I asked wheather there are any thoughts of 
 implementing them.

I think what Reini was asking was why do you think you need bitmap
indexes as opposed to any existing type? 

Regards, Dave.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Yann Michel

Hi,

On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:
 I think what Reini was asking was why do you think you need bitmap
 indexes as opposed to any existing type? 

due to I'm developing a datawarehousing application we have lots of
fact-data in our central fact-table. As I know how to improve
performance on Oracle based datawarehouses, I'm used to add bitmap
indexes for atributes having only a few distinct values. 
So I was looking for any comparable indexing technology but didn't find
any so far.

Regards,
Yann

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Hannu Krosing

On R, 2004-10-08 at 12:45, Yann Michel wrote:
 Hi,
 
 On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:
  I think what Reini was asking was why do you think you need bitmap
  indexes as opposed to any existing type? 
 
 due to I'm developing a datawarehousing application we have lots of
 fact-data in our central fact-table. As I know how to improve
 performance on Oracle based datawarehouses, I'm used to add bitmap
 indexes for atributes having only a few distinct values. 
 So I was looking for any comparable indexing technology but didn't find
 any so far.

There is currently no suitable index type for this type of queries (huge
tables with a few distinct values).

You may try to optimise performance by partitioning your fact tables on
these few dimension values by using table inheritance or union all
views.

There was a discussion on partitioning postgres tables on
pgsql-performance list a few days ago.

-
Hannu


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Oleg Bartunov

On Fri, 8 Oct 2004, Yann Michel wrote:
Hi,
On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:
I think what Reini was asking was why do you think you need bitmap
indexes as opposed to any existing type?
due to I'm developing a datawarehousing application we have lots of
fact-data in our central fact-table. As I know how to improve
performance on Oracle based datawarehouses, I'm used to add bitmap
indexes for atributes having only a few distinct values.
So I was looking for any comparable indexing technology but didn't find
any so far.
Yann, you may try our contrib/intarray module
Regards,
Yann
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Josh Berkus

Yann,

  Lots of people have talked about it but I don't know anyone coding it.

I would love to have bitmap indexes in Postgres, as would a lot of other 
community members.   However, they are far from trivial to code.  Are you 
offering to help?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Yann Michel

Hi Josh,

On Fri, Oct 08, 2004 at 09:59:41AM -0700, Josh Berkus wrote:
 
   Lots of people have talked about it but I don't know anyone coding it.
 
 I would love to have bitmap indexes in Postgres, as would a lot of other 
 community members.   However, they are far from trivial to code.  Are you 
 offering to help?

I'd like to help you, but I think, that my C-Experience is not good
enough for beeing able to. I mean, I coded some C-stuff and I know how
bitmap indexes (should) work but I guess that this won't be enough. In
addidtion I never took a look into postgresql's sources. 

Regards,
Yann

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Josh Berkus

Yann,

 I'd like to help you, but I think, that my C-Experience is not good
 enough for beeing able to. I mean, I coded some C-stuff and I know how
 bitmap indexes (should) work but I guess that this won't be enough. In
 addidtion I never took a look into postgresql's sources.

Well, there's no time like the present to get started!  ;-)

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Yann Michel

Hi Josh,

On Fri, Oct 08, 2004 at 10:18:27AM -0700, Josh Berkus wrote:
 
  I'd like to help you, but I think, that my C-Experience is not good
  enough for beeing able to. I mean, I coded some C-stuff and I know how
  bitmap indexes (should) work but I guess that this won't be enough. In
  addidtion I never took a look into postgresql's sources.
 
 Well, there's no time like the present to get started!  ;-)

O.K. I downloaded it :-) 
We will see if and how I can help 

Regards,
Yann

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Sailesh Krishnamurthy

 Yann == Yann Michel [EMAIL PROTECTED] writes:

Yann O.K. I downloaded it :-) We will see if and how I can
Yann help

FYI .. in case you aren't aware already:

http://portal.acm.org/citation.cfm?id=98720

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[HACKERS] plans for bitmap indexes?

2004-10-07 Thread Yann Michel

Hi,

I'd like to know if there are any plans on introducing bitmap indexes
into postgresql. I think this could mean a big performance improvement
especially for datawarehousing applications. I know that there is an
index type hash but I don't know how both types are comparable due to
they are both best usable for equality expressions.

Thanks in advance!

Regards,
Yann

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] plans for bitmap indexes?

2004-10-07 Thread Reini Urban

Yann Michel schrieb:
I'd like to know if there are any plans on introducing bitmap indexes
into postgresql. I think this could mean a big performance improvement
especially for datawarehousing applications. I know that there is an
index type hash but I don't know how both types are comparable due to
they are both best usable for equality expressions.
Sure, and the next will come with musicbrainz.
Why not :)
If you don't want to code that in the app, the database backend is the 
solution. The database is the golden hammer. 
http://c2.com/cgi/wiki?GoldenHammer
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] plans for bitmap indexes?

2004-10-07 Thread Bruce Momjian

Yann Michel wrote:
 Hi,
 
 I'd like to know if there are any plans on introducing bitmap indexes
 into postgresql. I think this could mean a big performance improvement
 especially for datawarehousing applications. I know that there is an
 index type hash but I don't know how both types are comparable due to
 they are both best usable for equality expressions.

Lots of people have talked about it but I don't know anyone coding it.
-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

57 matches

Mail list logo