Re: Numbers in JavaScript, Lucene, and FoundationDB
Its late so just a few quick notes here: Jiffy decodes numbers based on their encoding. I.e., any number that includes a decimal point or exponent is decoded as a double while any integer is decoded as an integer or bignum depending on size. While encoding jiffy will also encode 1.0 as "1.0" and 1 as "1". Generally speaking this seems to be the least surprising behavior for users. That said, one particular aspect of JSON and numbers in particular has always been around money math. Things like "$1 / 3" follow a different set of rules than arbitrary floating point arithmetic. CouchDB has a long history of telling users that numbers mostly behave like doubles given our JavaScript default. Given that, I would expect anyone that needs a JSON oriented database that has fancy numerical needs to already be paying special attention to their numeric data. The FoundationDB collation does definitely present new questions given that we're forced to implement a strict byte ordering. On the face of it I'm more than fine forcing everything to doubles and providing the mentioned warning label. I do know that FoundationDB's tuple layer has some ¯\_(ツ)_/¯ semantics for "invalid" doubles (-Nan, Nan, -0, other oddities I'd never heard of). So there may be caveats to mention there as well. However, for the most part I'd our standard reply of "if you care about your numbers to the actual bit representation level, use a string representation" is while maybe not officially official, still the best advice given JSON. That of course ignores the fact that `emit(1, 2)` returns a view row of `("1.0", "2.0")` which Adam noted as another whole big thing. On that I don't have any amazing thoughts this late at night. On Thu, May 16, 2019 at 9:39 PM Adam Kocoloski wrote: > > Hi all, CouchDB has always had a somewhat complicated relationship with > numbers. I’d like to dig into that a little bit and see if any changes are > warranted, or if we can at least be really clear about exactly how they’re > handled going forward. > > Most of you are likely aware that JS represents *all* numbers as IEEE 754 > double precision floats. This means that any number in a JSON document with > more than 15 significant digits is at risk of being corrupted when it passes > through the JS engine during a view build, for example. Our current behavior > is to let that silent corruption occur and put whatever number comes out of > the JS engine into the view, formatting as a double, int64, or bignum based > on jiffy’s decoding of the JSON output from the JS code. > > On the other hand, FoundationDB’s tuple layer encoding is quite a bit more > specific. It has a whole bunch of typecodes for integers of practically > arbitrary size (up to 255 bytes), along with codes for 32 bit and 64 bit > floating point numbers. The typecodes control the sorting; i.e., integers > sort separately from floats. > > We also have the ever-popular Lucene indexes for folks who build CouchDB with > the search extension. I don’t have all the details for the number handling in > that one handy, but it is another one to keep in mind. > > One question that comes up fairly quickly — when a user emits a number as a > key in a view, what do we store in FoundationDB? In order to respect > CouchDB’s existing collation rules we need to use the same typecode for all > numbers. Do we simply treat every number as a double, since they were all > coerced into that representation anyway in JS? > > But now let’s consider Mango indexes, which don’t suffer from any of > JavaScript’s sloppiness around number handling. If we’re to respect CouchDB’s > current collation rules we still need a common typecode and sortable binary > representation across integers and floats. Do we end up using the IEEE 754 > float representation of each number as a “sort key” and storing the original > number alongside it? > > I feel like this ends up being a rabbit hole, but one where we owe it to our > users to thoroughly explore and produce a definitive guide :) > > Cheers, Adam > > > > > > > > > > > >
Numbers in JavaScript, Lucene, and FoundationDB
Hi all, CouchDB has always had a somewhat complicated relationship with numbers. I’d like to dig into that a little bit and see if any changes are warranted, or if we can at least be really clear about exactly how they’re handled going forward. Most of you are likely aware that JS represents *all* numbers as IEEE 754 double precision floats. This means that any number in a JSON document with more than 15 significant digits is at risk of being corrupted when it passes through the JS engine during a view build, for example. Our current behavior is to let that silent corruption occur and put whatever number comes out of the JS engine into the view, formatting as a double, int64, or bignum based on jiffy’s decoding of the JSON output from the JS code. On the other hand, FoundationDB’s tuple layer encoding is quite a bit more specific. It has a whole bunch of typecodes for integers of practically arbitrary size (up to 255 bytes), along with codes for 32 bit and 64 bit floating point numbers. The typecodes control the sorting; i.e., integers sort separately from floats. We also have the ever-popular Lucene indexes for folks who build CouchDB with the search extension. I don’t have all the details for the number handling in that one handy, but it is another one to keep in mind. One question that comes up fairly quickly — when a user emits a number as a key in a view, what do we store in FoundationDB? In order to respect CouchDB’s existing collation rules we need to use the same typecode for all numbers. Do we simply treat every number as a double, since they were all coerced into that representation anyway in JS? But now let’s consider Mango indexes, which don’t suffer from any of JavaScript’s sloppiness around number handling. If we’re to respect CouchDB’s current collation rules we still need a common typecode and sortable binary representation across integers and floats. Do we end up using the IEEE 754 float representation of each number as a “sort key” and storing the original number alongside it? I feel like this ends up being a rabbit hole, but one where we owe it to our users to thoroughly explore and produce a definitive guide :) Cheers, Adam
Re: Design doc index switching
Looks great, but how that shadow ddoc would replicate? What happens when tgt node received shadow ddoc, rebuilt shadow index, and then updates original ddoc? ermouth пт, 17 мая 2019 г. в 00:03, Robert Samuel Newson : > I suggest an alternative; the new design document could include the _id of > design document it’s replacing (“_replaces”:”_design/foo”). On completion > of the view build of the new design document, CouchDB itself updates the > named _id to the same content as the new design document (strictly, only > the parts needed to make the view sig match) (perhaps it also deletes the > new document). > >
Re: Design doc index switching
I suggest an alternative; the new design document could include the _id of design document it’s replacing (“_replaces”:”_design/foo”). On completion of the view build of the new design document, CouchDB itself updates the named _id to the same content as the new design document (strictly, only the parts needed to make the view sig match) (perhaps it also deletes the new document). The advantage to this is that queries to the original design document continue to work throughout and at no point is there a discrepancy between the design documents contents (the map and reduce functions, etc) and the results you get from it. B. > On 16 May 2019, at 14:55, Jan Lehnardt wrote: > > +1 on solving this for all users, and same caveats as Stefan raises :) > >> On 16. May 2019, at 09:38, Stefan du Fresne wrote: >> >> Hey Garren, >> >> Having this a native part of CouchDB seems like a really cool idea: we have >> automated the manual dance you're talking about with our deployment tooling, >> but it would be really nice not to have to! >> >> I'm not clear how it would work though, at least in terms of coherent >> deployments. View changes are, like SQL migrations, an often non-backwards >> compatible change that has to occur as your new code deploys. >> >> Currently the naive approach is you deploy your new code alongside design >> doc changes, which then block view queries on first request until they're >> ready to go. >> >> The better approach is what you describe, which is what we do now, where we >> extract our design documents out of our deployment bundle and place them in >> a "staging" location to allow them to warm, then rename them and do the >> actual code deployment once that's complete (managed by an external >> deployment service we built). This importantly lets us split the "warming" >> bit from the deployment bit: we only deploy new code once the design >> documents that are shipped with that code is ready to go. >> >> How would you foresee this kind of flow happening here? Would there be a way >> to query the design doc to know if it had flipped to the new version yet? >> Would you be able to control when this flip occurs? Or would the expectation >> be that your code handles both versions gracefully? >> >> As an example to mull over, let's say you have design doc v1, which has view >> a. You push design doc v2, which has added view b, but has also changed view >> a in some backwards incompatible way. While v2 is still building and is not >> yet the active doc: >> - If you queried view a you'd get the v1 version, that's clear >> - If you queried view b you'd get... a 404? Some other custom code? >> - If you GET the design document what doc would you see? Presumably v2? >> - Could you query something to determine which version is currently active? >> Or perhaps just whether there is a background version building at all? >> >> Cheers, >> Stefan >> >>> On 16 May 2019, at 07:51, Garren Smith wrote: >>> >>> Hi Everyone, >>> >>> A common pattern we see for updating large indexes that can take a few days >>> to build, is create a new design docs with the new updated views. Then once >>> the new design doc is built, a user changes the new design doc’s id to the >>> old design doc. That way the CouchDB url for the views remain the same and >>> any requests to the design doc url automatically get the latest views only >>> once they built. >>> >>> This is an effective way of managing building large indexes, but the >>> process is quite complicated and often users get it wrong. I would like to >>> propose that we move this process into CouchDB and let CouchDB handle the >>> actual process. From a users perspective, they would add a field to the >>> options of a design document that lets CouchDB know, that this build needs >>> to be built in the background and only replace the current index once its >>> built: >>> >>> ``` >>> { >>> "_id": "_design/design-doc-id", >>> "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2", >>> "views": { >>> "map-view": { >>>"map": "function (doc) {\n emit(doc._id, 1);\n}" >>> } >>> }, >>> "language": "javascript", >>> "options": { >>> "build_and_replace": true >>> } >>> } >>> ``` >>> >>> I think this is something we could build quite effectively once we have >>> CouchDB running on top of FoundationDB. I don’t want to implement it for >>> version 1 of CouchDB on FDB, but it would be nice to keep this in mind as >>> we build out the map/reduce indexes. >>> >>> What do you think? Any issues we might have by doing this internally? >>> >>> Cheers >>> Garren >> > > -- > Professional Support for Apache CouchDB: > https://neighbourhood.ie/couchdb-support/ >
Re: Design doc index switching
+1 on solving this for all users, and same caveats as Stefan raises :) > On 16. May 2019, at 09:38, Stefan du Fresne wrote: > > Hey Garren, > > Having this a native part of CouchDB seems like a really cool idea: we have > automated the manual dance you're talking about with our deployment tooling, > but it would be really nice not to have to! > > I'm not clear how it would work though, at least in terms of coherent > deployments. View changes are, like SQL migrations, an often non-backwards > compatible change that has to occur as your new code deploys. > > Currently the naive approach is you deploy your new code alongside design doc > changes, which then block view queries on first request until they're ready > to go. > > The better approach is what you describe, which is what we do now, where we > extract our design documents out of our deployment bundle and place them in a > "staging" location to allow them to warm, then rename them and do the actual > code deployment once that's complete (managed by an external deployment > service we built). This importantly lets us split the "warming" bit from the > deployment bit: we only deploy new code once the design documents that are > shipped with that code is ready to go. > > How would you foresee this kind of flow happening here? Would there be a way > to query the design doc to know if it had flipped to the new version yet? > Would you be able to control when this flip occurs? Or would the expectation > be that your code handles both versions gracefully? > > As an example to mull over, let's say you have design doc v1, which has view > a. You push design doc v2, which has added view b, but has also changed view > a in some backwards incompatible way. While v2 is still building and is not > yet the active doc: > - If you queried view a you'd get the v1 version, that's clear > - If you queried view b you'd get... a 404? Some other custom code? > - If you GET the design document what doc would you see? Presumably v2? > - Could you query something to determine which version is currently active? > Or perhaps just whether there is a background version building at all? > > Cheers, > Stefan > >> On 16 May 2019, at 07:51, Garren Smith wrote: >> >> Hi Everyone, >> >> A common pattern we see for updating large indexes that can take a few days >> to build, is create a new design docs with the new updated views. Then once >> the new design doc is built, a user changes the new design doc’s id to the >> old design doc. That way the CouchDB url for the views remain the same and >> any requests to the design doc url automatically get the latest views only >> once they built. >> >> This is an effective way of managing building large indexes, but the >> process is quite complicated and often users get it wrong. I would like to >> propose that we move this process into CouchDB and let CouchDB handle the >> actual process. From a users perspective, they would add a field to the >> options of a design document that lets CouchDB know, that this build needs >> to be built in the background and only replace the current index once its >> built: >> >> ``` >> { >> "_id": "_design/design-doc-id", >> "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2", >> "views": { >> "map-view": { >> "map": "function (doc) {\n emit(doc._id, 1);\n}" >> } >> }, >> "language": "javascript", >> "options": { >> "build_and_replace": true >> } >> } >> ``` >> >> I think this is something we could build quite effectively once we have >> CouchDB running on top of FoundationDB. I don’t want to implement it for >> version 1 of CouchDB on FDB, but it would be nice to keep this in mind as >> we build out the map/reduce indexes. >> >> What do you think? Any issues we might have by doing this internally? >> >> Cheers >> Garren > -- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/
Re: Design doc index switching
Hi Garren, +1. I actually went hunting in GitHub for an issue on this, and can't find one. It probably goes back to JIRA, and I don't have the energy to dig through that now. The closest issue that captures this is the same thing - but for *databases* - and is on our official roadmap from the last summit: https://github.com/apache/couchdb/issues/1502 Could we consider this use case at the same time? -Joan On 2019-05-16 2:51 a.m., Garren Smith wrote: Hi Everyone, A common pattern we see for updating large indexes that can take a few days to build, is create a new design docs with the new updated views. Then once the new design doc is built, a user changes the new design doc’s id to the old design doc. That way the CouchDB url for the views remain the same and any requests to the design doc url automatically get the latest views only once they built. This is an effective way of managing building large indexes, but the process is quite complicated and often users get it wrong. I would like to propose that we move this process into CouchDB and let CouchDB handle the actual process. From a users perspective, they would add a field to the options of a design document that lets CouchDB know, that this build needs to be built in the background and only replace the current index once its built: ``` { "_id": "_design/design-doc-id", "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2", "views": { "map-view": { "map": "function (doc) {\n emit(doc._id, 1);\n}" } }, "language": "javascript", "options": { "build_and_replace": true } } ``` I think this is something we could build quite effectively once we have CouchDB running on top of FoundationDB. I don’t want to implement it for version 1 of CouchDB on FDB, but it would be nice to keep this in mind as we build out the map/reduce indexes. What do you think? Any issues we might have by doing this internally? Cheers Garren
Re: Design doc index switching
Hey Garren, Having this a native part of CouchDB seems like a really cool idea: we have automated the manual dance you're talking about with our deployment tooling, but it would be really nice not to have to! I'm not clear how it would work though, at least in terms of coherent deployments. View changes are, like SQL migrations, an often non-backwards compatible change that has to occur as your new code deploys. Currently the naive approach is you deploy your new code alongside design doc changes, which then block view queries on first request until they're ready to go. The better approach is what you describe, which is what we do now, where we extract our design documents out of our deployment bundle and place them in a "staging" location to allow them to warm, then rename them and do the actual code deployment once that's complete (managed by an external deployment service we built). This importantly lets us split the "warming" bit from the deployment bit: we only deploy new code once the design documents that are shipped with that code is ready to go. How would you foresee this kind of flow happening here? Would there be a way to query the design doc to know if it had flipped to the new version yet? Would you be able to control when this flip occurs? Or would the expectation be that your code handles both versions gracefully? As an example to mull over, let's say you have design doc v1, which has view a. You push design doc v2, which has added view b, but has also changed view a in some backwards incompatible way. While v2 is still building and is not yet the active doc: - If you queried view a you'd get the v1 version, that's clear - If you queried view b you'd get... a 404? Some other custom code? - If you GET the design document what doc would you see? Presumably v2? - Could you query something to determine which version is currently active? Or perhaps just whether there is a background version building at all? Cheers, Stefan > On 16 May 2019, at 07:51, Garren Smith wrote: > > Hi Everyone, > > A common pattern we see for updating large indexes that can take a few days > to build, is create a new design docs with the new updated views. Then once > the new design doc is built, a user changes the new design doc’s id to the > old design doc. That way the CouchDB url for the views remain the same and > any requests to the design doc url automatically get the latest views only > once they built. > > This is an effective way of managing building large indexes, but the > process is quite complicated and often users get it wrong. I would like to > propose that we move this process into CouchDB and let CouchDB handle the > actual process. From a users perspective, they would add a field to the > options of a design document that lets CouchDB know, that this build needs > to be built in the background and only replace the current index once its > built: > > ``` > { > "_id": "_design/design-doc-id", > "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2", > "views": { >"map-view": { > "map": "function (doc) {\n emit(doc._id, 1);\n}" >} > }, > "language": "javascript", >"options": { >"build_and_replace": true >} > } > ``` > > I think this is something we could build quite effectively once we have > CouchDB running on top of FoundationDB. I don’t want to implement it for > version 1 of CouchDB on FDB, but it would be nice to keep this in mind as > we build out the map/reduce indexes. > > What do you think? Any issues we might have by doing this internally? > > Cheers > Garren