Hi all, CouchDB has always had a somewhat complicated relationship with 
numbers. I’d like to dig into that a little bit and see if any changes are 
warranted, or if we can at least be really clear about exactly how they’re 
handled going forward.

Most of you are likely aware that JS represents *all* numbers as IEEE 754 
double precision floats. This means that any number in a JSON document with 
more than 15 significant digits is at risk of being corrupted when it passes 
through the JS engine during a view build, for example. Our current behavior is 
to let that silent corruption occur and put whatever number comes out of the JS 
engine into the view, formatting as a double, int64, or bignum based on jiffy’s 
decoding of the JSON output from the JS code.

On the other hand, FoundationDB’s tuple layer encoding is quite a bit more 
specific. It has a whole bunch of typecodes for integers of practically 
arbitrary size (up to 255 bytes), along with codes for 32 bit and 64 bit 
floating point numbers. The typecodes control the sorting; i.e., integers sort 
separately from floats.

We also have the ever-popular Lucene indexes for folks who build CouchDB with 
the search extension. I don’t have all the details for the number handling in 
that one handy, but it is another one to keep in mind.

One question that comes up fairly quickly — when a user emits a number as a key 
in a view, what do we store in FoundationDB? In order to respect CouchDB’s 
existing collation rules we need to use the same typecode for all numbers. Do 
we simply treat every number as a double, since they were all coerced into that 
representation anyway in JS?

But now let’s consider Mango indexes, which don’t suffer from any of 
JavaScript’s sloppiness around number handling. If we’re to respect CouchDB’s 
current collation rules we still need a common typecode and sortable binary 
representation across integers and floats. Do we end up using the IEEE 754 
float representation of each number as a “sort key” and storing the original 
number alongside it?

I feel like this ends up being a rabbit hole, but one where we owe it to our 
users to thoroughly explore and produce a definitive guide :)

Cheers, Adam












Reply via email to