Tom Lane wrote:


I'm surprised no one caught me making this bogus computation.  I
realized this morning it's wrong: if there are 10000 distinct values
then on the average the 1-bits would be about 10000 bits apart, not 100.

Right - I didn't think 10000 was *that* bad, but was too sleepy to try working it out :-).



I don't believe the 100x numbers that have been
bandied around in this discussion, but 10x is plenty enough to be
interesting.


Yep - I have not managed to get 100x in any of my tests. However, I do see some about half that for the TPCH scale 10 dataset:

tpch=# \i relsizes.sql            (BTREE)
        relname         | relpages
------------------------+----------
 customer               |    41019
 customer_c_custkey     |     3288
 customer_c_mktsegment  |     5779
 customer_c_nationkey   |     3288
 lineitem               |  1535724
 lineitem_l_linenumber  |   131347
 lineitem_l_orderkey    |   131347
 orders                 |   307567
 orders_o_custkey       |    32847
 orders_o_orderpriority |    65876
 orders_o_orderstatus   |    41131


tpch=# \i relsizes.sql            (MAINLY BITMAP)
        relname         | relpages
------------------------+----------
 customer               |    41019
 customer_c_custkey     |     3288
 customer_c_mktsegment  |      157
 customer_c_nationkey   |      336
 lineitem               |  1535724
 lineitem_l_linenumber  |     7571
 lineitem_l_orderkey    |   131347
 orders                 |   307567
 orders_o_custkey       |    32847
 orders_o_orderpriority |     1427
 orders_o_orderstatus   |      717

The orders_o_orderpriority and orders_o_orderstatus bitmap indexes are 46 and 57 times smaller than their btree counterparts (hmm...might we see even better compression for larger scale factors?).

An obvious deduction is that the TPCH dataset is much more amenable to run compression than my synthetic Zipfian data was. The interesting question is how well "real" datasets are run compressable, I suspect "better than my Zipfian data" is a safe assumption!

Cheers

Mark

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to