Hi all, CouchDB has always had a somewhat complicated relationship with numbers. I’d like to dig into that a little bit and see if any changes are warranted, or if we can at least be really clear about exactly how they’re handled going forward.
Most of you are likely aware that JS represents *all* numbers as IEEE 754 double precision floats. This means that any number in a JSON document with more than 15 significant digits is at risk of being corrupted when it passes through the JS engine during a view build, for example. Our current behavior is to let that silent corruption occur and put whatever number comes out of the JS engine into the view, formatting as a double, int64, or bignum based on jiffy’s decoding of the JSON output from the JS code. On the other hand, FoundationDB’s tuple layer encoding is quite a bit more specific. It has a whole bunch of typecodes for integers of practically arbitrary size (up to 255 bytes), along with codes for 32 bit and 64 bit floating point numbers. The typecodes control the sorting; i.e., integers sort separately from floats. We also have the ever-popular Lucene indexes for folks who build CouchDB with the search extension. I don’t have all the details for the number handling in that one handy, but it is another one to keep in mind. One question that comes up fairly quickly — when a user emits a number as a key in a view, what do we store in FoundationDB? In order to respect CouchDB’s existing collation rules we need to use the same typecode for all numbers. Do we simply treat every number as a double, since they were all coerced into that representation anyway in JS? But now let’s consider Mango indexes, which don’t suffer from any of JavaScript’s sloppiness around number handling. If we’re to respect CouchDB’s current collation rules we still need a common typecode and sortable binary representation across integers and floats. Do we end up using the IEEE 754 float representation of each number as a “sort key” and storing the original number alongside it? I feel like this ends up being a rabbit hole, but one where we owe it to our users to thoroughly explore and produce a definitive guide :) Cheers, Adam
