Re: RFC : Correcting some problems in rounding/number handling

Christopher Browne Wed, 05 Jul 2000 18:57:22 -0700
On 05 Jul 2000 10:48:18 EST, the world broke into rejoicing as
Bill Gribble <[EMAIL PROTECTED]>  said:
> Christopher Browne <[EMAIL PROTECTED]> writes:
> > Come on, people.  The issue is _not_ what "object system" is being
> > used, or what language is being used, but rather _what the numeric
> > representation should be_.
> 
> I think most of your comments here are right on point.  However, I
> disagree with you on a couple of things.  I'd appreciate hearing your
> responses.

I'm overstating the case in a couple of places, but I'm not sure we're
_all_ that far apart.

> At the top level, I think the first thing to nail down is a numeric
> representation rather than a monetary-value representation.  There
> needs to be a layer that enforces restrictions such as "can't add
> dollars to pounds", but I believe that we must first solve the
> lower-level problem; it *is* possible to add a number denominated in
> hundredths to a number denominated in thousandths.
> 
> The lowest level of arithmetic operations should be agnostic about
> currencies, and so (I think) the data structure representing numeric
> values should have no information about currency in it.  The most
> primitive level of financial information, including currency, is ATM
> the 'split', and I believe that currency restrictions should remain in
> the split and in operations on splits.  

I'm of two minds here; there are two reasonable directions to take:

a) Provide a "generic" numeric representation, where I _do_ rather
like your:
         struct gnc_numeric {
           int64  numerator;
           int32  denominator;
         };
 
The other, that I'm growing to like a _lot_ better, is...

b) One that is strongly tied to the "commodity."

Thus, you have:

struct gnc_commodity_value {
   gint64 quantity;
   creference commodity_id;
};

along with

struct commodity_info {
   creference commodity_id;
   char commodity_name;
   gint32 denominator;
};

(Note: I'm being a bit canny here, and not specifying the real type of
"creferences."  To Be Determined...)

The thing is, when you look at transactions, they are all going to be
denominated based on some commodity.  For instance, many of my accounts
will mostly use $USD transactions; others will use $CDN transactions.

What would be rather nice is to uniformly know that _all_ the USD
transactions are presented in the basic unit of _pennies_.  I can take
some transactions, thus:

gnc_commodity_value v1 = { 200, USD }, v2 = { 300, USD }, 
                    v3 = { 25000245, USD};
/* Representing $2.00, $3.00, and $250,002.45 */

Then I have commodity info:
commodity_info c1 = { USD, "USD $", 100}, { CDN, "CDN $", 100};

And the denominator of 100, for v1, v2, and v3, is _implicit_ in the
fact that they are all USD amounts.

If I'm storing the values in a relational database, and _don't_ have
the ability to create customized operators, it's a _very good thing_
that the denominator isn't part of the value, because it means I can
have the RDBMS do useful computations for me.

An identifiable reason to prefer this is that it is, in relational terms,
"more nearly normalized" than representation a).  If the denominator
can be computed, and always _will_ be, then it makes little sense to
replicate it in every transaction.

> Side note: In trying to diagram the architecture of gnucash as it
> exists now, we have discovered that there are several types of
> "financial restrictions" that are implemented and enforced in a
> variety of places in gnucash.  For example, subaccounts can only have
> a set of account-types that are related to the parent's type;
> transactions can't be entered if the splits don't have a common
> currency; and so on.  It may make sense to put all such restrictions
> on the values of particular objects/operations into a single "module"
> which lives at the engine level.

I strongly agree.   It would be a Very Good Thing to have the "Engine"
provide some group of "data validation" functions, so that the validation
is done by the _engine_ and not by the _GUI_.

> > -> The numeric amount should involve a "big integer," and a radix.
> 
> This may be a terminological misunderstanding, but I hear "radix" and
> I think "position of the decimal point."  If that's what you mean
> (which interpretation is borne out by your choice of uint8 for the
> radix) I disagree.  I think we need a denominator rather than a radix,
> and the denominator needs to be at least uint32.  In keeping with my
> preamble above, I want a representation that we can use for *all* the
> variables in financial formulas, including prices and
> fractional-share-amounts, and for that we need non-decimal
> denominators.

I'm thinking denominator here.  And modulo the consideration that I'd
rather have the denominator stored _ONCE_, as part of the "commodity,"
rather than in each transaction, I generally agree with your reasoning.

> Yes, most of the exchanges are going decimal (but not all -- see Jon
> Trowbridge's recent post), but the historical data will be around
> forever.

The "moving away from powers of 2" thing irritates me; the old style of
having 8ths, 16ths, and even 32nds allows doing _precise_ arithmetic
readily using binary numbers, and the move to decimal moves away from
that.

> Jon has suggested using a small number of bits of the "big integer" as
> an index into a table of the (finite) possible values for the
> denominator.  This lets us use just one 64-bit int as the number.  I
> don't think I like this (why restrict values, and is the C compiler
> smart enough to enforce type safety?) but it's a possibility.

My preference is for the denominator to come as a 'reference;' from the
"currency" or "commodity" info, you would cross-reference to the
denominator.  Which means that it only gets stored _once_, and makes
it perfectly reasonable to use all 64 bits to represent numerator.

> > -> There should be some indication of the currency that is involved.
> 
> I disagree with this.  I think the currency information belongs at the
> level where currency actually comes in to play; that is, at the level
> of the single financial event or journal entry.  "You can't add pounds
> to dollars" is a policy statement from a particular domain; I think it
> makes sense to provide the mechanism at the lowest level and enforce
> policy from above.  This is in keeping with the philosophy that has
> been used throughout the engine.

Any amount that we work with _will_ represent some commodity.  There
will _always_ be _at least_ one, if not two, commodities involved.
Two commodities if we're talking Inventory, Stocks, or Currency
Exchange.

> > If others have suggestions for _data structures_, feel free to 
> > suggest reasons why your _data structure_ is preferable to one
> > of the above.
> 
> Well, it's not so much a question of data structures as it is of data
> semantics and the functional properties of the API.
> 
> The data structure I prefer is 
> 
> struct gnc_numeric {
>   int64  numerator;
>   int32  denominator;
> };
> 
> This looks like a rational number, and it is, but the key thing is the
> way these structures are manipulated by the arithmetic routines that
> use them.  I don't think traditional rational-number semantics are
> appropriate (i.e. finding a relatively-prime numerator and denominator
> after every operation) and we won't always be performing operations
> that result in exact answers... one of the main reasons for this whole
> design exercise is to have a representation that can handle the
> *inexact* nature of financial transactions, which operate in whole
> numbers of smallest-transaction-units even if the actual computed cost
> of the transaction is an exact value which is not a whole number of
> smallest-transaction-units.

I have _no_ disagreement with the "don't reduce the fraction" side of
things; if the amount is $1.20 USD, that should _always_ look like {120,
100}, and should _not_ get reduced to the "theoretically equivalent"
{6, 5}.

However, I think I'd rather have {120, USD}, and then look up that 
USD indicates {"USD $", 100}...

> The API should include a number of features that you don't discuss:
> 
> - Control over denominator-conversion policy.
> 
> Somehow (either by pre-setting the denominator of the result-struct or
> passing in an extra argument) we need to be able to specify what
> happens when we operate on arguments with different denominators.  For
> example, multiplying a number of shares(1000ths) by a price (64ths) to
> get a total value (pennies), we need to use ceil() to get the
> next-highest whole penny.  We should be able to do this without a
> bunch of other nonsense.  

There needs to be a way of having policy; if we have:

commodity_struct cusd = { "USD $", 100 }, cstock = {"ABC Inc", 64}
gnc_commodity_value vabc = { 172, ABCINC };  /* 2 44/64 shares of ABC */
and then try to value tht stock in dollars, when the price is $2.55 USD
per share, I'd hope to see something like:

mass_value = ( vabc.quantity * 255 ) / 64
         (or 685)
and then pass that back looking like:

gnc_commodity_value value_of_abc_in_usd;
value_of_abc_in_usd.quantity = 685;
value_of_abc_in_usd.commodity = USD;

Note that _all_ of this stuff winds up using integer operations.

> this example assumes a pointer-based API, but we could just as easily
> hand around the entire structure and not have a return-argument.
> 
> In this example, the price is 63/64 and we are buying 1 share: 
> 
>   gnc_numeric  price      = { 63, 64 };
>   gnc_numeric  num_shares = { 1000, 1000 };
>   gnc_numeric  result     = { 0, 100 };
> 
>   gnc_numeric_multiply(&price, &num_shares, &result, GNC_CEIL);
> 
> result should contain { 99, 100 } because it costs 99 cents to buy a 
> share that's priced at .984375 .  
>
> Under other circumstances, we may want the operation to be carried out
> *exactly*, that is, to select a denominator for result so that no it
> can be represented exactly.
> 
>   gnc_numeric value_1 = { 11, 13 };
>   gnc_numeric value_2 = { 9, 11 };
>   gnc_numeric result; 
> 
>   gnc_numeric_add(& value_1, & value_2, & result, GNC_EXACT);
> 
> .. with result ending up with { 238, 143 }.  
> 
> We may not need this ability right now, but it's not a big leap in
> complexity and more generality is better, IMO, as long as generality
> doesn't interfere with the specified design goals.

This generality interferes with the possibility of having an SQL
engine do the calculation for you, via:
  select sum( value ) from txntable where currency = "USD"
   and date = something;

> - An interface to get information about truncation/promotion errors.  
> 
> When returning a result that does not exactly represent the result of
> a computation, we need to return information about the difference
> between the exact value and the returned value.  This may be through
> an alternate API that takes an extra return-argument pointer which
> holds an exact representation of the difference between the returned
> value and the exact result of the computation (which should be easily
> computed given the remainder of the API I discuss, at a little extra
> computational cost).

I suspect it would be more useful, in the long run, to have an optional
argument to the computation functions that allows requesting a 
particular rounding policy on demand.
--
[EMAIL PROTECTED] - <http://www.ntlug.org/~cbbrowne/>
Rules of the Evil Overlord #103. "I will make it clear that I do know
the meaning of the word "mercy"; I simply choose not show them any."
<http://www.eviloverlord.com/>

--
Gnucash Developer's List
To unsubscribe send empty email to: [EMAIL PROTECTED]
Re: RFC : Correcting some problems in rounding/number handling

Reply via email to