Re: Numbers in JavaScript, Lucene, and FoundationDB

2019-05-16 Thread Paul Davis
Its late so just a few quick notes here:

Jiffy decodes numbers based on their encoding. I.e., any number that
includes a decimal point or exponent is decoded as a double while any
integer is decoded as an integer or bignum depending on size. While
encoding jiffy will also encode 1.0 as "1.0" and 1 as "1". Generally
speaking this seems to be the least surprising behavior for users.

That said, one particular aspect of JSON and numbers in particular has
always been around money math. Things like "$1 / 3" follow a different
set of rules than arbitrary floating point arithmetic. CouchDB has a
long history of telling users that numbers mostly behave like doubles
given our JavaScript default. Given that, I would expect anyone that
needs a JSON oriented database that has fancy numerical needs to
already be paying special attention to their numeric data.

The FoundationDB collation does definitely present new questions given
that we're forced to implement a strict byte ordering. On the face of
it I'm more than fine forcing everything to doubles and providing the
mentioned warning label. I do know that FoundationDB's tuple layer has
some ¯\_(ツ)_/¯ semantics for "invalid" doubles (-Nan, Nan, -0, other
oddities I'd never heard of). So there may be caveats to mention there
as well. However, for the most part I'd our standard reply of "if you
care about your numbers to the actual bit representation level, use a
string representation" is while maybe not officially official, still
the best advice given JSON.

That of course ignores the fact that `emit(1, 2)` returns a view row
of `("1.0", "2.0")` which Adam noted as another whole big thing. On
that I don't have any amazing thoughts this late at night.

On Thu, May 16, 2019 at 9:39 PM Adam Kocoloski  wrote:
>
> Hi all, CouchDB has always had a somewhat complicated relationship with 
> numbers. I’d like to dig into that a little bit and see if any changes are 
> warranted, or if we can at least be really clear about exactly how they’re 
> handled going forward.
>
> Most of you are likely aware that JS represents *all* numbers as IEEE 754 
> double precision floats. This means that any number in a JSON document with 
> more than 15 significant digits is at risk of being corrupted when it passes 
> through the JS engine during a view build, for example. Our current behavior 
> is to let that silent corruption occur and put whatever number comes out of 
> the JS engine into the view, formatting as a double, int64, or bignum based 
> on jiffy’s decoding of the JSON output from the JS code.
>
> On the other hand, FoundationDB’s tuple layer encoding is quite a bit more 
> specific. It has a whole bunch of typecodes for integers of practically 
> arbitrary size (up to 255 bytes), along with codes for 32 bit and 64 bit 
> floating point numbers. The typecodes control the sorting; i.e., integers 
> sort separately from floats.
>
> We also have the ever-popular Lucene indexes for folks who build CouchDB with 
> the search extension. I don’t have all the details for the number handling in 
> that one handy, but it is another one to keep in mind.
>
> One question that comes up fairly quickly — when a user emits a number as a 
> key in a view, what do we store in FoundationDB? In order to respect 
> CouchDB’s existing collation rules we need to use the same typecode for all 
> numbers. Do we simply treat every number as a double, since they were all 
> coerced into that representation anyway in JS?
>
> But now let’s consider Mango indexes, which don’t suffer from any of 
> JavaScript’s sloppiness around number handling. If we’re to respect CouchDB’s 
> current collation rules we still need a common typecode and sortable binary 
> representation across integers and floats. Do we end up using the IEEE 754 
> float representation of each number as a “sort key” and storing the original 
> number alongside it?
>
> I feel like this ends up being a rabbit hole, but one where we owe it to our 
> users to thoroughly explore and produce a definitive guide :)
>
> Cheers, Adam
>
>
>
>
>
>
>
>
>
>
>
>


Numbers in JavaScript, Lucene, and FoundationDB

2019-05-16 Thread Adam Kocoloski
Hi all, CouchDB has always had a somewhat complicated relationship with 
numbers. I’d like to dig into that a little bit and see if any changes are 
warranted, or if we can at least be really clear about exactly how they’re 
handled going forward.

Most of you are likely aware that JS represents *all* numbers as IEEE 754 
double precision floats. This means that any number in a JSON document with 
more than 15 significant digits is at risk of being corrupted when it passes 
through the JS engine during a view build, for example. Our current behavior is 
to let that silent corruption occur and put whatever number comes out of the JS 
engine into the view, formatting as a double, int64, or bignum based on jiffy’s 
decoding of the JSON output from the JS code.

On the other hand, FoundationDB’s tuple layer encoding is quite a bit more 
specific. It has a whole bunch of typecodes for integers of practically 
arbitrary size (up to 255 bytes), along with codes for 32 bit and 64 bit 
floating point numbers. The typecodes control the sorting; i.e., integers sort 
separately from floats.

We also have the ever-popular Lucene indexes for folks who build CouchDB with 
the search extension. I don’t have all the details for the number handling in 
that one handy, but it is another one to keep in mind.

One question that comes up fairly quickly — when a user emits a number as a key 
in a view, what do we store in FoundationDB? In order to respect CouchDB’s 
existing collation rules we need to use the same typecode for all numbers. Do 
we simply treat every number as a double, since they were all coerced into that 
representation anyway in JS?

But now let’s consider Mango indexes, which don’t suffer from any of 
JavaScript’s sloppiness around number handling. If we’re to respect CouchDB’s 
current collation rules we still need a common typecode and sortable binary 
representation across integers and floats. Do we end up using the IEEE 754 
float representation of each number as a “sort key” and storing the original 
number alongside it?

I feel like this ends up being a rabbit hole, but one where we owe it to our 
users to thoroughly explore and produce a definitive guide :)

Cheers, Adam














Re: Design doc index switching

2019-05-16 Thread ermouth
Looks great, but how that shadow ddoc would replicate?

What happens when tgt node received shadow ddoc, rebuilt shadow index, and
then updates original ddoc?

ermouth


пт, 17 мая 2019 г. в 00:03, Robert Samuel Newson :

> I suggest an alternative; the new design document could include the _id of
> design document it’s replacing (“_replaces”:”_design/foo”). On completion
> of the view build of the new design document, CouchDB itself updates the
> named _id to the same content as the new design document (strictly, only
> the parts needed to make the view sig match) (perhaps it also deletes the
> new document).
>
>


Re: Design doc index switching

2019-05-16 Thread Robert Samuel Newson
I suggest an alternative; the new design document could include the _id of 
design document it’s replacing (“_replaces”:”_design/foo”). On completion of 
the view build of the new design document, CouchDB itself updates the named _id 
to the same content as the new design document (strictly, only the parts needed 
to make the view sig match) (perhaps it also deletes the new document).

The advantage to this is that queries to the original design document continue 
to work throughout and at no point is there a discrepancy between the design 
documents contents (the map and reduce functions, etc) and the results you get 
from it.

B.

> On 16 May 2019, at 14:55, Jan Lehnardt  wrote:
> 
> +1 on solving this for all users, and same caveats as Stefan raises :)
> 
>> On 16. May 2019, at 09:38, Stefan du Fresne  wrote:
>> 
>> Hey Garren,
>> 
>> Having this a native part of CouchDB seems like a really cool idea: we have 
>> automated the manual dance you're talking about with our deployment tooling, 
>> but it would be really nice not to have to!
>> 
>> I'm not clear how it would work though, at least in terms of coherent 
>> deployments. View changes are, like SQL migrations, an often non-backwards 
>> compatible change that has to occur as your new code deploys.
>> 
>> Currently the naive approach is you deploy your new code alongside design 
>> doc changes, which then block view queries on first request until they're 
>> ready to go.
>> 
>> The better approach is what you describe, which is what we do now, where we 
>> extract our design documents out of our deployment bundle and place them in 
>> a "staging" location to allow them to warm, then rename them and do the 
>> actual code deployment once that's complete (managed by an external 
>> deployment service we built). This importantly lets us split the "warming" 
>> bit from the deployment bit: we only deploy new code once the design 
>> documents that are shipped with that code is ready to go.
>> 
>> How would you foresee this kind of flow happening here? Would there be a way 
>> to query the design doc to know if it had flipped to the new version yet? 
>> Would you be able to control when this flip occurs? Or would the expectation 
>> be that your code handles both versions gracefully?
>> 
>> As an example to mull over, let's say you have design doc v1, which has view 
>> a. You push design doc v2, which has added view b, but has also changed view 
>> a in some backwards incompatible way. While v2 is still building and is not 
>> yet the active doc:
>> - If you queried view a you'd get the v1 version, that's clear
>> - If you queried view b you'd get... a 404? Some other custom code?
>> - If you GET the design document what doc would you see? Presumably v2?
>> - Could you query something to determine which version is currently active? 
>> Or perhaps just whether there is a background version building at all?
>> 
>> Cheers,
>> Stefan
>> 
>>> On 16 May 2019, at 07:51, Garren Smith  wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> A common pattern we see for updating large indexes that can take a few days
>>> to build, is create a new design docs with the new updated views. Then once
>>> the new design doc is built, a user changes the new design doc’s id to the
>>> old design doc. That way the CouchDB url for the views remain the same and
>>> any requests to the design doc url automatically get the latest views only
>>> once they built.
>>> 
>>> This is an effective way of managing building large indexes, but the
>>> process is quite complicated and often users get it wrong. I would like to
>>> propose that we move this process into CouchDB and let CouchDB handle the
>>> actual process. From a users perspective, they would add a field to the
>>> options of a design document that lets CouchDB know, that this build needs
>>> to be built in the background and only replace the current index once its
>>> built:
>>> 
>>> ```
>>> {
>>> "_id": "_design/design-doc-id",
>>> "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2",
>>> "views": {
>>>  "map-view": {
>>>"map": "function (doc) {\n  emit(doc._id, 1);\n}"
>>>  }
>>> },
>>> "language": "javascript",
>>>  "options": {
>>>  "build_and_replace": true
>>>  }
>>> }
>>> ```
>>> 
>>> I think this is something we could build quite effectively once we have
>>> CouchDB running on top of FoundationDB. I don’t want to implement it for
>>> version 1 of CouchDB on FDB, but it would be nice to keep this in mind as
>>> we build out the map/reduce indexes.
>>> 
>>> What do you think? Any issues we might have by doing this internally?
>>> 
>>> Cheers
>>> Garren
>> 
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 



Re: Design doc index switching

2019-05-16 Thread Jan Lehnardt
+1 on solving this for all users, and same caveats as Stefan raises :)

> On 16. May 2019, at 09:38, Stefan du Fresne  wrote:
> 
> Hey Garren,
> 
> Having this a native part of CouchDB seems like a really cool idea: we have 
> automated the manual dance you're talking about with our deployment tooling, 
> but it would be really nice not to have to!
> 
> I'm not clear how it would work though, at least in terms of coherent 
> deployments. View changes are, like SQL migrations, an often non-backwards 
> compatible change that has to occur as your new code deploys.
> 
> Currently the naive approach is you deploy your new code alongside design doc 
> changes, which then block view queries on first request until they're ready 
> to go.
> 
> The better approach is what you describe, which is what we do now, where we 
> extract our design documents out of our deployment bundle and place them in a 
> "staging" location to allow them to warm, then rename them and do the actual 
> code deployment once that's complete (managed by an external deployment 
> service we built). This importantly lets us split the "warming" bit from the 
> deployment bit: we only deploy new code once the design documents that are 
> shipped with that code is ready to go.
> 
> How would you foresee this kind of flow happening here? Would there be a way 
> to query the design doc to know if it had flipped to the new version yet? 
> Would you be able to control when this flip occurs? Or would the expectation 
> be that your code handles both versions gracefully?
> 
> As an example to mull over, let's say you have design doc v1, which has view 
> a. You push design doc v2, which has added view b, but has also changed view 
> a in some backwards incompatible way. While v2 is still building and is not 
> yet the active doc:
> - If you queried view a you'd get the v1 version, that's clear
> - If you queried view b you'd get... a 404? Some other custom code?
> - If you GET the design document what doc would you see? Presumably v2?
> - Could you query something to determine which version is currently active? 
> Or perhaps just whether there is a background version building at all?
> 
> Cheers,
> Stefan
> 
>> On 16 May 2019, at 07:51, Garren Smith  wrote:
>> 
>> Hi Everyone,
>> 
>> A common pattern we see for updating large indexes that can take a few days
>> to build, is create a new design docs with the new updated views. Then once
>> the new design doc is built, a user changes the new design doc’s id to the
>> old design doc. That way the CouchDB url for the views remain the same and
>> any requests to the design doc url automatically get the latest views only
>> once they built.
>> 
>> This is an effective way of managing building large indexes, but the
>> process is quite complicated and often users get it wrong. I would like to
>> propose that we move this process into CouchDB and let CouchDB handle the
>> actual process. From a users perspective, they would add a field to the
>> options of a design document that lets CouchDB know, that this build needs
>> to be built in the background and only replace the current index once its
>> built:
>> 
>> ```
>> {
>> "_id": "_design/design-doc-id",
>> "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2",
>> "views": {
>>   "map-view": {
>> "map": "function (doc) {\n  emit(doc._id, 1);\n}"
>>   }
>> },
>> "language": "javascript",
>>   "options": {
>>   "build_and_replace": true
>>   }
>> }
>> ```
>> 
>> I think this is something we could build quite effectively once we have
>> CouchDB running on top of FoundationDB. I don’t want to implement it for
>> version 1 of CouchDB on FDB, but it would be nice to keep this in mind as
>> we build out the map/reduce indexes.
>> 
>> What do you think? Any issues we might have by doing this internally?
>> 
>> Cheers
>> Garren
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/



Re: Design doc index switching

2019-05-16 Thread Joan Touzet

Hi Garren,

+1. I actually went hunting in GitHub for an issue on this, and can't 
find one. It probably goes back to JIRA, and I don't have the energy to 
dig through that now.


The closest issue that captures this is the same thing - but for 
*databases* - and is on our official roadmap from the last summit:


  https://github.com/apache/couchdb/issues/1502

Could we consider this use case at the same time?

-Joan

On 2019-05-16 2:51 a.m., Garren Smith wrote:

Hi Everyone,

A common pattern we see for updating large indexes that can take a few days
to build, is create a new design docs with the new updated views. Then once
the new design doc is built, a user changes the new design doc’s id to the
old design doc. That way the CouchDB url for the views remain the same and
any requests to the design doc url automatically get the latest views only
once they built.

This is an effective way of managing building large indexes, but the
process is quite complicated and often users get it wrong. I would like to
propose that we move this process into CouchDB and let CouchDB handle the
actual process. From a users perspective, they would add a field to the
options of a design document that lets CouchDB know, that this build needs
to be built in the background and only replace the current index once its
built:

```
{
   "_id": "_design/design-doc-id",
   "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2",
   "views": {
 "map-view": {
   "map": "function (doc) {\n  emit(doc._id, 1);\n}"
 }
   },
   "language": "javascript",
 "options": {
 "build_and_replace": true
 }
}
```

I think this is something we could build quite effectively once we have
CouchDB running on top of FoundationDB. I don’t want to implement it for
version 1 of CouchDB on FDB, but it would be nice to keep this in mind as
we build out the map/reduce indexes.

What do you think? Any issues we might have by doing this internally?

Cheers
Garren



Re: Design doc index switching

2019-05-16 Thread Stefan du Fresne
Hey Garren,

Having this a native part of CouchDB seems like a really cool idea: we have 
automated the manual dance you're talking about with our deployment tooling, 
but it would be really nice not to have to!

I'm not clear how it would work though, at least in terms of coherent 
deployments. View changes are, like SQL migrations, an often non-backwards 
compatible change that has to occur as your new code deploys.

Currently the naive approach is you deploy your new code alongside design doc 
changes, which then block view queries on first request until they're ready to 
go.

The better approach is what you describe, which is what we do now, where we 
extract our design documents out of our deployment bundle and place them in a 
"staging" location to allow them to warm, then rename them and do the actual 
code deployment once that's complete (managed by an external deployment service 
we built). This importantly lets us split the "warming" bit from the deployment 
bit: we only deploy new code once the design documents that are shipped with 
that code is ready to go.

How would you foresee this kind of flow happening here? Would there be a way to 
query the design doc to know if it had flipped to the new version yet? Would 
you be able to control when this flip occurs? Or would the expectation be that 
your code handles both versions gracefully?

As an example to mull over, let's say you have design doc v1, which has view a. 
You push design doc v2, which has added view b, but has also changed view a in 
some backwards incompatible way. While v2 is still building and is not yet the 
active doc:
 - If you queried view a you'd get the v1 version, that's clear
 - If you queried view b you'd get... a 404? Some other custom code?
 - If you GET the design document what doc would you see? Presumably v2?
 - Could you query something to determine which version is currently active? Or 
perhaps just whether there is a background version building at all?

Cheers,
Stefan

> On 16 May 2019, at 07:51, Garren Smith  wrote:
> 
> Hi Everyone,
> 
> A common pattern we see for updating large indexes that can take a few days
> to build, is create a new design docs with the new updated views. Then once
> the new design doc is built, a user changes the new design doc’s id to the
> old design doc. That way the CouchDB url for the views remain the same and
> any requests to the design doc url automatically get the latest views only
> once they built.
> 
> This is an effective way of managing building large indexes, but the
> process is quite complicated and often users get it wrong. I would like to
> propose that we move this process into CouchDB and let CouchDB handle the
> actual process. From a users perspective, they would add a field to the
> options of a design document that lets CouchDB know, that this build needs
> to be built in the background and only replace the current index once its
> built:
> 
> ```
> {
>  "_id": "_design/design-doc-id",
>  "_rev": "2-8d361a23b4cb8e213f0868ea3d2742c2",
>  "views": {
>"map-view": {
>  "map": "function (doc) {\n  emit(doc._id, 1);\n}"
>}
>  },
>  "language": "javascript",
>"options": {
>"build_and_replace": true
>}
> }
> ```
> 
> I think this is something we could build quite effectively once we have
> CouchDB running on top of FoundationDB. I don’t want to implement it for
> version 1 of CouchDB on FDB, but it would be nice to keep this in mind as
> we build out the map/reduce indexes.
> 
> What do you think? Any issues we might have by doing this internally?
> 
> Cheers
> Garren