Re: [PATCHES] On-disk bitmap index implementation

2006-12-27 Thread Heikki Linnakangas

Gavin Sherry wrote:

There are still some things Jie and I have not gotten to yet:
...
o Test WAL replay more thoroughly.


Found one WAL related bug:

postgres=# CREATE TABLE test (i int);
CREATE TABLE
postgres=# INSERT INTO test SELECT a FROM generate_series(1,10) a;
INSERT 0 10
postgres=# CREATE INDEX mdx ON test USING bitmap(i);
CREATE INDEX
postgres=# INSERT INTO test VALUES (11);
INSERT 0 1
postgres=# \q

killall -9 postgres, and restart. Redo fails with:

PANIC:  bm_insert_redo: LOV item is not inserted in pos 2(requested 12)
CONTEXT:  xlog redo insert a new LOV item: rel 1663/10817/16388

I haven't dug deeper yet.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] On-disk bitmap index implementation

2006-12-27 Thread Gavin Sherry
On Wed, 27 Dec 2006, Heikki Linnakangas wrote:

 Gavin Sherry wrote:
  There are still some things Jie and I have not gotten to yet:
  ...
  o Test WAL replay more thoroughly.

 Found one WAL related bug:

 postgres=# CREATE TABLE test (i int);
 CREATE TABLE
 postgres=# INSERT INTO test SELECT a FROM generate_series(1,10) a;
 INSERT 0 10
 postgres=# CREATE INDEX mdx ON test USING bitmap(i);
 CREATE INDEX
 postgres=# INSERT INTO test VALUES (11);
 INSERT 0 1
 postgres=# \q

 killall -9 postgres, and restart. Redo fails with:

 PANIC:  bm_insert_redo: LOV item is not inserted in pos 2(requested 12)
 CONTEXT:  xlog redo insert a new LOV item: rel 1663/10817/16388

 I haven't dug deeper yet.

Yes, there were a bunch of WAL issues we wanted to address. Jie has been
working on this too. Thanks for the feedback, we can use this as a test.

Thanks,

Gavin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] On-disk bitmap index implementation

2006-12-05 Thread Heikki Linnakangas

Gavin Sherry wrote:

o Improving VACUUM support -- currently, VACUUM FULL means REINDEX for
  bitmaps. Heikki Linnakangas offered to work on this. Heikki, are you
  still interested?


BTW vacuuming seems quite broken as it is:

[EMAIL PROTECTED]:~/pgsql.bitmap$ ~/pgsql.bitmap/bin/psql -a 
postgres  vacuumtest.sql

drop table if exists test;
DROP TABLE
create table test (key int);
CREATE TABLE
create index test_bm on test using bitmap (key);
CREATE INDEX
insert into test values (1);
INSERT 0 1
delete from test;
DELETE 1
vacuum test;
VACUUM
insert into test values (2);
INSERT 0 1
select * from test where key = 1;
 key
-
   2
(1 row)

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] On-disk bitmap index implementation

2006-12-05 Thread Gavin Sherry
On Tue, 5 Dec 2006, Heikki Linnakangas wrote:

 Gavin Sherry wrote:
  o Improving VACUUM support -- currently, VACUUM FULL means REINDEX for
bitmaps. Heikki Linnakangas offered to work on this. Heikki, are you
still interested?

 BTW vacuuming seems quite broken as it is:

 [EMAIL PROTECTED]:~/pgsql.bitmap$ ~/pgsql.bitmap/bin/psql -a
 postgres  vacuumtest.sql
 drop table if exists test;
 DROP TABLE
 create table test (key int);
 CREATE TABLE
 create index test_bm on test using bitmap (key);
 CREATE INDEX
 insert into test values (1);
 INSERT 0 1
 delete from test;
 DELETE 1
 vacuum test;
 VACUUM
 insert into test values (2);
 INSERT 0 1
 select * from test where key = 1;
   key
 -
 2
 (1 row)

Oops :-).

Thanks for pointing it out. I think I might have busted something merging
with HEAD. Don't you hate that?

Thanks,

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] On-disk bitmap index implementation

2006-12-04 Thread Simon Riggs
On Tue, 2006-12-05 at 00:18 +1100, Gavin Sherry wrote:

 o Determine if we need to provide anything for rm_startup, rm_cleanup,
   rm_safe_restartpoint RmgrData function pointers.

safe_restartpoint gives true/false based upon whether there are
multi-record WAL states that have only been partially received. For
example, a btree index split needs multiple WAL records as it recurses
up the index tree. If you've got one record but not the others yet you
have an incomplete state and so cannot safely write a restartpoint.

I'll document that if you/anyone might suggest where the best place is?

 o Look into adding an AM option such that the user can determine word size
   at index creation time. For higher-cardinality data (above 1000 distinct
   values), 16 bit word sizes can really help with performance. Although
   the word size is not just assumed to be a certain size across the code,
   macros are used extensively to interact with the word size. Making it
   different for each index might be a little messy.

...and is is it a typical case to have a bitmap with less than 1000
distinct values?? Surely we want that as the sole assumption?

Nearly unique bitmaps can suffer a little I think, if it makes the most
common case faster. But I'd like to see the perf results first, I guess.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] On-disk bitmap index implementation

2006-12-04 Thread Heikki Linnakangas

Gavin Sherry wrote:

Hi all,

Attached is a patch implementing bitmap indexes. It includes major
enhancements on the patch submitted during feature freeze for 8.2 here[1].

In particular: much better integration with the existing bitmap scan code
with the internals of the bitmap streaming pushed down into the AM and
hidden from the executor code; completely new index creation algorithm
which reduced creation time by 20-75% depending on the data; modifications
to the encoding mechanism to suit the integration with bitmap index scans;
work on memory management; lots of code rewriting; range query support.
The code is also much cleaner now.


Thanks! I'll take a look at it.

We need to give the indexam API some further thought. As you know, I've
been working on the Grouped Index Tuples stuff, which also requires
changes to the API to get full benefit. There's a bunch of functionality
I'd like to see:

* Support for streamed bitmaps, like you have implemented.

* Support for candidate matches. This is needed for GIT, as well as
range-encoded bitmap indexes if/when we add them.

* Support for returning tuples in partial order. This is again needed 
for GIT, because grouped tuples don't keep track of the ordering within 
the group, so they need to be sorted if ordering necessary. And again 
it's also useful to return tuples in order from range-encoded bitmaps.


* Support for kill_prior_tuple on bitmap scans.

* A bulk insert API. When inserting a lot of tuples with similar keys, 
we could a considerable amount of CPU with a bulk insert API. A bulk 
insert to a B-tree for example would only need to descend the tree once, 
find the insert location, lock the target page just once and insert all 
the tuples that belong to that page. That would potentially also reduce 
WAL traffic.



There are still some things Jie and I have not gotten to yet:

o Improving VACUUM support -- currently, VACUUM FULL means REINDEX for
  bitmaps. Heikki Linnakangas offered to work on this. Heikki, are you
  still interested?


Yeah, I can look into that.


o Test WAL replay more thoroughly.


I've had that problem too with a lot of things I've hacked. I've used a 
shell script that does the operation under test, runs a select, kills 
and restarts postmaster, and reruns the select. If the select after 
crash returns the same result as before, presumably WAL code works. But 
you need to watch out for full page writes that might mask bugs in the 
redo code.


Anyone have a more sophisticated method?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] On-disk bitmap index implementation

2006-12-04 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

We need to give the indexam API some further thought. As you know, I've
been working on the Grouped Index Tuples stuff, which also requires
changes to the API to get full benefit. There's a bunch of functionality
I'd like to see:

* Support for streamed bitmaps, like you have implemented.

* Support for candidate matches. This is needed for GIT, as well as
range-encoded bitmap indexes if/when we add them.

* Support for returning tuples in partial order. This is again needed 
for GIT, because grouped tuples don't keep track of the ordering within 
the group, so they need to be sorted if ordering necessary. And again 
it's also useful to return tuples in order from range-encoded bitmaps.


* Support for kill_prior_tuple on bitmap scans.

* A bulk insert API. When inserting a lot of tuples with similar keys, 
we could a considerable amount of CPU with a bulk insert API. A bulk 
insert to a B-tree for example would only need to descend the tree once, 
find the insert location, lock the target page just once and insert all 
the tuples that belong to that page. That would potentially also reduce 
WAL traffic.


Forgot one:

* Ability return index tuple contents, not just pointers to heap, to 
allow the executor to use the values stored in the index, see 
http://archives.postgresql.org/pgsql-performance/2006-09/msg00080.php



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] On-disk bitmap index implementation

2006-12-04 Thread Gavin Sherry
On Mon, 4 Dec 2006, Simon Riggs wrote:

 On Tue, 2006-12-05 at 00:18 +1100, Gavin Sherry wrote:

  o Determine if we need to provide anything for rm_startup, rm_cleanup,
rm_safe_restartpoint RmgrData function pointers.

 safe_restartpoint gives true/false based upon whether there are
 multi-record WAL states that have only been partially received. For
 example, a btree index split needs multiple WAL records as it recurses
 up the index tree. If you've got one record but not the others yet you
 have an incomplete state and so cannot safely write a restartpoint.

 I'll document that if you/anyone might suggest where the best place is?

transam/README ?


  o Look into adding an AM option such that the user can determine word size
at index creation time. For higher-cardinality data (above 1000 distinct
values), 16 bit word sizes can really help with performance. Although
the word size is not just assumed to be a certain size across the code,
macros are used extensively to interact with the word size. Making it
different for each index might be a little messy.

 ...and is is it a typical case to have a bitmap with less than 1000
 distinct values?? Surely we want that as the sole assumption?

 Nearly unique bitmaps can suffer a little I think, if it makes the most
 common case faster. But I'd like to see the perf results first, I guess.

I'll put together some performance data on different word sizes.

Thanks,

Gavin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] On-disk bitmap index implementation

2006-12-04 Thread Gavin Sherry
On Mon, 4 Dec 2006, Heikki Linnakangas wrote:

  o Test WAL replay more thoroughly.

 I've had that problem too with a lot of things I've hacked. I've used a
 shell script that does the operation under test, runs a select, kills
 and restarts postmaster, and reruns the select. If the select after
 crash returns the same result as before, presumably WAL code works. But
 you need to watch out for full page writes that might mask bugs in the
 redo code.

 Anyone have a more sophisticated method?

Well, I've done a combination of what you did and replaying a bunch of
operations using PITR.

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match