On Wed, 2006-12-27 at 22:16 +1100, Gavin Sherry wrote: > > But actually I'm not convinced we need to worry about efficient storage > > of small bitmaps at all. The typical use case for bitmap indexes is > > large tables with small number of distinct values, and the problem > > doesn't really arise in that scenario. Let's keep it simple for now, we > > can enhance it in later releases. > > The scenario I'm concerned about is where a sales data base, say, has > 100,000 products. However, only 500 or 1000 products are popular. They > dominate, say >99% of the sales. The other 99,900 products consume a > little bit over 8K each for very little benefit :-(. > > This is pretty contrived but it seem real world enough...
Well, that seems to me to be the typical case. It's called a Zipf Distribution and applies to categorical data everywhere. The main question is how you design your database. We might think of indexing the product group (a higher level of the Product Dimension), since these tend to have a low number of values but these may have been normalised (or "placed into a Dimension table"). This might leave you with only the ProductId values in the large Fact table, which as Gavin points out, may be sparsely populated. Another idea might to store the first N heap pointers in the auxiliary heap, rather than allocating them a whole page for the first value. That would be even more space efficient than allocating a fixed size part of a page. At least that would provide some utility for that part of the storage mechanism. However, I think Heikki's KISG approach seems good for now. The infrequent values will probably be infrequently accessed, so we may never need to read them at all (wishful thinking?). If we can get something ready for commit soon, that will leave lots of time before 8.3 freeze to measure where things can be further improved in terms of space and performance. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate