Re: [HACKERS] Reducing the overhead of NUMERIC data

Simon Riggs Tue, 01 Nov 2005 15:05:16 -0800

On Tue, 2005-11-01 at 23:16 +0100, Martijn van Oosterhout wrote:

lots of useful things, thank you.


> > So, assuming I have this all correct, means we could reduce the on disk
> > storage for NUMERIC datatypes to the following struct. This gives an
> > overhead of just 2.5 bytes, plus the loss of the optimization of
> > trailing zeroes, which I assess as having almost no value anyway in
> > 99.9999% of data values (literally...).
> 
> Actually, I have a table with a column declared as numeric(12,4)
> because there has to be 4 decimal places. As it turns out, the decimal
> places are mostly zero so the optimisation works for me.

Of course it fits some data. The point is whether it is useful for most
people's data.

My contention is that *most* (but definitely nowhere near all) NUMERIC
data is either financial or measured data. That usually means it has
digits that follow Benfold's Law - which for this discussion is a
variant on a uniform random distribution.

Optimizing for trailing zeroes just isn't worth the very minimal
benefits, in most cases. It doesn't really matter that it saves on
storage and processing time in those cases - Amdahl's Law says we can
ignore that saving because the optimized case is not prevalent enough
for us to care.

Anybody like to work out a piece of SQL to perform data profiling and
derive the distribution of values with trailing zeroes? I'd be happy to
be proved wrong with an analysis of real data tables.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] Reducing the overhead of NUMERIC data

Reply via email to