Re: [DISCUSS] New Reduce design for FDB

2020-06-24 Thread Joan Touzet




On 2020-06-24 1:32 p.m., Garren Smith wrote:

On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet  wrote:


Hi Garren,

If the "options" field is left out, what is the default behaviour?



All group_levels will be indexed. I imagine this is what most CouchDB uses
will want.


Great!





Is there no way to specify multiple group_levels to get results that
match the original CouchDB behaviour? Your changed behaviour would be
acceptable if I could do something like `?group_level=2,3,4,5`.



I imagine we could, it would make the code a lot more complex. What is the
reason for that?
I find the fact that we return multiple group_levels for a set group_level
very confusing. To me it feels like
the reason we return extra group_levels is because of how b-tree's work
rather than it being a useful thing for a user.


This is the canonical example (and the previous 2-3 slides)

https://speakerdeck.com/wohali/10-common-misconceptions-about-apache-couchdb?slide=25

There are ways to do this with your approach, but they'll require retooling.





-Joan

On 24/06/2020 08:03, Garren Smith wrote:

Quick Note I have a gist markdown version of this that might be easier to
read

https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180


Hi Everyone,

The team at Cloudant have been relooking at Reduce indexes for CouchDB on
FDB and we want to simply what we had initially planned and change some

of

the reduce behaviour compared to CouchDB 3.x

Our initial design was to use a skip list. However this hasn’t proven to

be

particularly useful approach. It would take very long to update and I

can’t

find a good algorithm to query the skip list effectively.

So instead I would like to propose a much simpler reduce implementation.

I

would like to use this as the base for reduce and we can look at adding
more functionality later if we need to.

For the new reduce design, instead of creating a skip list, we will

instead

create group_level indexes for a key. For example say we have the

following

keys we want to add to a reduce index:

```
([2019, 6, 1] , 1)
([2019, 6, 20] , 1)
([2019, 7, 3] , 1)
```

We would then create the following group_level indexes:

```
Level 0:
(null, 3)

Level=1:
([2019], 3)

Level 2:
([2019,6], 2)
([2019, 7] , 1)

Level3:
([2019, 6, 1,] , 1)
([2019, 6, 20,] , 1)
([2019, 7, 3,] , 1)
```

All of these group_level indexes would form part of the reduce index and
would be updated at the same time. We don’t need to know the actual
`group_levels` ahead of time as we would take any key we need to index

look

at its length and add it to the group_levels it would belong to.

Another nice optimization we can do with this is when a user creates a

view

they can specify the number of group levels to index e.g:

```
{
_id: _design/my-ddoc
views: {
   one: {
 map: function (doc) {emit(doc.val, 1)},
 reduce: "_sum"
   },

   two: {
 map: function (doc) {emit(doc.age, 1)},
reduce: "_count"
   }
 },

 options: {group_levels: [1,3,5]}
}
```
This gives the user the ability to trade off index build speed, storage
overhead and performance.

One caveat of that, for now, is if a user changes the number of
`group_levels` to be indexed, the index is invalidated and we would have

to

build it from scratch again. Later we could look at doing some work

around

that so that isn’t the case.

This design will result in a behaviour change. Previously with reduce if
you set `group_level=2`. It will return all results with `group_level=2`
and below. E.g  reduce key/values of the following:

```
# group = true
("key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2,6],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3,6],"value":1},
{"key":[3,1],"value":1},
{"key":[3,1,5],"value":1},
{"key":[3,4,5],"value":1}
```

Then doing a query group_level=2 returns:

```
# group_level = 2
{"rows":[
{"key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```

I want to **CHANGE** this behaviour, so if a query specifies
`group_level=2` then **only** `group_level=2` returns would be returned.
E.g from the example above the results would be:

```
# group_level = 2
{"rows":[
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```


## Group_level=0
`Group_level=0` queries would work as follows:
1. `group_level=0` without startkey/endkey and then the group_level=0

index

is used
2. For a `group_level=0` with a startkey/endkey or where `group_level=0`

is

not indexed, the query will look for the smallest `group_level` and use
that to calculate the `group_level=0` result
3. `group_level=0` indexes with a startkey/endkey could timeout and be

slow

in some cases because we having 

Re: [DISCUSS] New Reduce design for FDB

2020-06-24 Thread Garren Smith
On Wed, Jun 24, 2020 at 6:47 PM Joan Touzet  wrote:

> Hi Garren,
>
> If the "options" field is left out, what is the default behaviour?
>

All group_levels will be indexed. I imagine this is what most CouchDB uses
will want.


> Is there no way to specify multiple group_levels to get results that
> match the original CouchDB behaviour? Your changed behaviour would be
> acceptable if I could do something like `?group_level=2,3,4,5`.
>

I imagine we could, it would make the code a lot more complex. What is the
reason for that?
I find the fact that we return multiple group_levels for a set group_level
very confusing. To me it feels like
the reason we return extra group_levels is because of how b-tree's work
rather than it being a useful thing for a user.


> -Joan
>
> On 24/06/2020 08:03, Garren Smith wrote:
> > Quick Note I have a gist markdown version of this that might be easier to
> > read
> https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180
> >
> > Hi Everyone,
> >
> > The team at Cloudant have been relooking at Reduce indexes for CouchDB on
> > FDB and we want to simply what we had initially planned and change some
> of
> > the reduce behaviour compared to CouchDB 3.x
> >
> > Our initial design was to use a skip list. However this hasn’t proven to
> be
> > particularly useful approach. It would take very long to update and I
> can’t
> > find a good algorithm to query the skip list effectively.
> >
> > So instead I would like to propose a much simpler reduce implementation.
> I
> > would like to use this as the base for reduce and we can look at adding
> > more functionality later if we need to.
> >
> > For the new reduce design, instead of creating a skip list, we will
> instead
> > create group_level indexes for a key. For example say we have the
> following
> > keys we want to add to a reduce index:
> >
> > ```
> > ([2019, 6, 1] , 1)
> > ([2019, 6, 20] , 1)
> > ([2019, 7, 3] , 1)
> > ```
> >
> > We would then create the following group_level indexes:
> >
> > ```
> > Level 0:
> > (null, 3)
> >
> > Level=1:
> > ([2019], 3)
> >
> > Level 2:
> > ([2019,6], 2)
> > ([2019, 7] , 1)
> >
> > Level3:
> > ([2019, 6, 1,] , 1)
> > ([2019, 6, 20,] , 1)
> > ([2019, 7, 3,] , 1)
> > ```
> >
> > All of these group_level indexes would form part of the reduce index and
> > would be updated at the same time. We don’t need to know the actual
> > `group_levels` ahead of time as we would take any key we need to index
> look
> > at its length and add it to the group_levels it would belong to.
> >
> > Another nice optimization we can do with this is when a user creates a
> view
> > they can specify the number of group levels to index e.g:
> >
> > ```
> > {
> > _id: _design/my-ddoc
> >views: {
> >   one: {
> > map: function (doc) {emit(doc.val, 1)},
> > reduce: "_sum"
> >   },
> >
> >   two: {
> > map: function (doc) {emit(doc.age, 1)},
> >reduce: "_count"
> >   }
> > },
> >
> > options: {group_levels: [1,3,5]}
> > }
> > ```
> > This gives the user the ability to trade off index build speed, storage
> > overhead and performance.
> >
> > One caveat of that, for now, is if a user changes the number of
> > `group_levels` to be indexed, the index is invalidated and we would have
> to
> > build it from scratch again. Later we could look at doing some work
> around
> > that so that isn’t the case.
> >
> > This design will result in a behaviour change. Previously with reduce if
> > you set `group_level=2`. It will return all results with `group_level=2`
> > and below. E.g  reduce key/values of the following:
> >
> > ```
> > # group = true
> > ("key":1,"value":2},
> > {"key":2,"value":2},
> > {"key":3,"value":2},
> > {"key":[1,1],"value":1},
> > {"key":[1,2,6],"value":1},
> > {"key":[2,1],"value":1},
> > {"key":[2,3,6],"value":1},
> > {"key":[3,1],"value":1},
> > {"key":[3,1,5],"value":1},
> > {"key":[3,4,5],"value":1}
> > ```
> >
> > Then doing a query group_level=2 returns:
> >
> > ```
> > # group_level = 2
> > {"rows":[
> > {"key":1,"value":2},
> > {"key":2,"value":2},
> > {"key":3,"value":2},
> > {"key":[1,1],"value":1},
> > {"key":[1,2],"value":1},
> > {"key":[2,1],"value":1},
> > {"key":[2,3],"value":1},
> > {"key":[3,1],"value":2},
> > {"key":[3,4],"value":1}
> > ]}
> > ```
> >
> > I want to **CHANGE** this behaviour, so if a query specifies
> > `group_level=2` then **only** `group_level=2` returns would be returned.
> > E.g from the example above the results would be:
> >
> > ```
> > # group_level = 2
> > {"rows":[
> > {"key":[1,1],"value":1},
> > {"key":[1,2],"value":1},
> > {"key":[2,1],"value":1},
> > {"key":[2,3],"value":1},
> > {"key":[3,1],"value":2},
> > {"key":[3,4],"value":1}
> > ]}
> > ```
> >
> >
> > ## Group_level=0
> > `Group_level=0` queries would work as follows:
> > 1. `group_level=0` without startkey/endkey and then the group_level=0
> index
> > is used
> > 2. For a `group_level=0` with a startkey/endkey or where `group_level=0`

Re: [DISCUSS] New Reduce design for FDB

2020-06-24 Thread Joan Touzet

Hi Garren,

If the "options" field is left out, what is the default behaviour?

Is there no way to specify multiple group_levels to get results that 
match the original CouchDB behaviour? Your changed behaviour would be 
acceptable if I could do something like `?group_level=2,3,4,5`.


-Joan

On 24/06/2020 08:03, Garren Smith wrote:

Quick Note I have a gist markdown version of this that might be easier to
read https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180

Hi Everyone,

The team at Cloudant have been relooking at Reduce indexes for CouchDB on
FDB and we want to simply what we had initially planned and change some of
the reduce behaviour compared to CouchDB 3.x

Our initial design was to use a skip list. However this hasn’t proven to be
particularly useful approach. It would take very long to update and I can’t
find a good algorithm to query the skip list effectively.

So instead I would like to propose a much simpler reduce implementation. I
would like to use this as the base for reduce and we can look at adding
more functionality later if we need to.

For the new reduce design, instead of creating a skip list, we will instead
create group_level indexes for a key. For example say we have the following
keys we want to add to a reduce index:

```
([2019, 6, 1] , 1)
([2019, 6, 20] , 1)
([2019, 7, 3] , 1)
```

We would then create the following group_level indexes:

```
Level 0:
(null, 3)

Level=1:
([2019], 3)

Level 2:
([2019,6], 2)
([2019, 7] , 1)

Level3:
([2019, 6, 1,] , 1)
([2019, 6, 20,] , 1)
([2019, 7, 3,] , 1)
```

All of these group_level indexes would form part of the reduce index and
would be updated at the same time. We don’t need to know the actual
`group_levels` ahead of time as we would take any key we need to index look
at its length and add it to the group_levels it would belong to.

Another nice optimization we can do with this is when a user creates a view
they can specify the number of group levels to index e.g:

```
{
_id: _design/my-ddoc
   views: {
  one: {
map: function (doc) {emit(doc.val, 1)},
reduce: "_sum"
  },

  two: {
map: function (doc) {emit(doc.age, 1)},
   reduce: "_count"
  }
},

options: {group_levels: [1,3,5]}
}
```
This gives the user the ability to trade off index build speed, storage
overhead and performance.

One caveat of that, for now, is if a user changes the number of
`group_levels` to be indexed, the index is invalidated and we would have to
build it from scratch again. Later we could look at doing some work around
that so that isn’t the case.

This design will result in a behaviour change. Previously with reduce if
you set `group_level=2`. It will return all results with `group_level=2`
and below. E.g  reduce key/values of the following:

```
# group = true
("key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2,6],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3,6],"value":1},
{"key":[3,1],"value":1},
{"key":[3,1,5],"value":1},
{"key":[3,4,5],"value":1}
```

Then doing a query group_level=2 returns:

```
# group_level = 2
{"rows":[
{"key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```

I want to **CHANGE** this behaviour, so if a query specifies
`group_level=2` then **only** `group_level=2` returns would be returned.
E.g from the example above the results would be:

```
# group_level = 2
{"rows":[
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```


## Group_level=0
`Group_level=0` queries would work as follows:
1. `group_level=0` without startkey/endkey and then the group_level=0 index
is used
2. For a `group_level=0` with a startkey/endkey or where `group_level=0` is
not indexed, the query will look for the smallest `group_level` and use
that to calculate the `group_level=0` result
3. `group_level=0` indexes with a startkey/endkey could timeout and be slow
in some cases because we having to do quite a lot of aggregation when
reading keys. But I don’t think that is much different from how it is done
now.

## Group=true
We will always build the `group=true` index.

## Querying non-indexed group_level
If a query has a `group_level` that is not indexed. We can do two things
here, CouchDB could use the nearest  `group_level` to service the query or
it could return an error that this `group_level` is not available to query.
I would like to make this configurable so that an admin can choose how
reduce indexes behave.

## Supported Builtin Reduces
Initially, we would support reduces that can be updated by calculating a
delta change and applying it to all the group_levels. That means we can
support `_sum` and `_count` quite easily. Initially, we won’t implement
`max` and `min`. 

Re: CouchDB and Rust blogs

2020-06-24 Thread Garren Smith
Hi Jan,

Thanks, one of my first attempts https://github.com/garrensmith/fortuna was
embedding v8 as a nif. It was my first nif and the implementation is wrong,
but it did prove it was possible and something we could consider going
forward. The one thing I'm not 100% sure of is moving V8 across threads.
Paul and I have had a discussion around that and we can't find good
evidence on whether moving V8 across threads is a good or bad idea. I
happened to speak to Ryan Dahl about it in the deno discord channel and he
recommended against it. I think a good starting point would be to use
https://github.com/rusterlium/rustler to create a nif and then use some of
the code from fortuna.

Also, I'm not sure embedding Deno would work. The Deno isolate is designed
to run in an async environment, so I think rather using standard v8 would
be better.

Cheers
Garren

On Wed, Jun 24, 2020 at 5:33 PM Jan Lehnardt  wrote:

> Congrats Garren, this is really cool! :)
>
> One related question: have you pondered embedding a JS engine into Erlang
> itself as well?
>
> Best
> Jan
> —
>
> > On 24. Jun 2020, at 16:21, Garren Smith  wrote:
> >
> > Hi All,
> >
> > I've been playing around with the rust language quite a bit recently and
> > using it to write some rust related side projects. I've recently
> finished a
> > CouchDB View Server written in Rust using V8. Here is a blog post about
> > that details the new View Server protocol for CouchDB 4.x and my Rust
> > implementation
> > https://www.garrensmith.com/blogs/fortuna-rs-couchdb-view-server
> >
> > A few weeks back I also wrote about a miniCouchDB implementation I wrote
> in
> > Rust during a Cloudant Hack week.
> > https://www.garrensmith.com/blogs/mini-couch-hack-week
> >
> > Cheers
> > Garren
>
>


Re: CouchDB and Rust blogs

2020-06-24 Thread Jan Lehnardt
Congrats Garren, this is really cool! :)

One related question: have you pondered embedding a JS engine into Erlang 
itself as well?

Best
Jan
—

> On 24. Jun 2020, at 16:21, Garren Smith  wrote:
> 
> Hi All,
> 
> I've been playing around with the rust language quite a bit recently and
> using it to write some rust related side projects. I've recently finished a
> CouchDB View Server written in Rust using V8. Here is a blog post about
> that details the new View Server protocol for CouchDB 4.x and my Rust
> implementation
> https://www.garrensmith.com/blogs/fortuna-rs-couchdb-view-server
> 
> A few weeks back I also wrote about a miniCouchDB implementation I wrote in
> Rust during a Cloudant Hack week.
> https://www.garrensmith.com/blogs/mini-couch-hack-week
> 
> Cheers
> Garren



Re: CouchDB and Rust blogs

2020-06-24 Thread Alessio 'Blaster' Biancalana
Wow that's very cool man, I'll read it and share it with my fellow
rustaceans!

On Wed, Jun 24, 2020 at 4:21 PM Garren Smith  wrote:

> Hi All,
>
> I've been playing around with the rust language quite a bit recently and
> using it to write some rust related side projects. I've recently finished a
> CouchDB View Server written in Rust using V8. Here is a blog post about
> that details the new View Server protocol for CouchDB 4.x and my Rust
> implementation
> https://www.garrensmith.com/blogs/fortuna-rs-couchdb-view-server
>
> A few weeks back I also wrote about a miniCouchDB implementation I wrote in
> Rust during a Cloudant Hack week.
> https://www.garrensmith.com/blogs/mini-couch-hack-week
>
> Cheers
> Garren
>


CouchDB and Rust blogs

2020-06-24 Thread Garren Smith
Hi All,

I've been playing around with the rust language quite a bit recently and
using it to write some rust related side projects. I've recently finished a
CouchDB View Server written in Rust using V8. Here is a blog post about
that details the new View Server protocol for CouchDB 4.x and my Rust
implementation
https://www.garrensmith.com/blogs/fortuna-rs-couchdb-view-server

A few weeks back I also wrote about a miniCouchDB implementation I wrote in
Rust during a Cloudant Hack week.
https://www.garrensmith.com/blogs/mini-couch-hack-week

Cheers
Garren


Re: Newsfeed IFRAME in Fauxton and IP collection

2020-06-24 Thread Robert Samuel Newson
Hi,

I share the discomfort in fauxton making a remote connection without warning 
and agree with Jan that some confirmation screen should be added.

It's also fine for this to be on master while it develops, master is not a 
release and is not guaranteed to be releasable either. Anyone deploying master 
directly does so at their own risk.

Finally, we kindly ask that all security related issues are responsibly 
disclosed to secur...@couchdb.apache.org.

B.

> On 24 Jun 2020, at 14:46, Jan Lehnardt  wrote:
> 
> 
> 
>> On 24. Jun 2020, at 14:31, ermouth  wrote:
>> 
>>> My PR was meant to start this discussion
>> 
>> Unfortunately it was instead merged to master, which is unbearable imho.
>> Shouldn’t that PR be rolled back and removed from the master branch
>> immediately then?
> 
> 
> as long as we make sure we don’t cut a release from this, which is currently
> not planned, there is no need to rush a revert.
> 
>> As a proposal it’s ok, but to achieve intended goal I think it’s enough to
>> add blogs to Documentation section. Btw making that section look like a
>> grid of tiles with appropriate icons might greatly increase both its
>> attractiveness and UX quality.
> 
> People don’t usually click through to the blog. There is tons of good 
> information
> there that folks in support channels ask questions about time and time again. 
> I wanted to give all this a more prominent spot, so folks can learn about all
> the good stuff on their own.
> 
> Best
> Jan
> —



Re: Newsfeed IFRAME in Fauxton and IP collection

2020-06-24 Thread Jan Lehnardt



> On 24. Jun 2020, at 14:31, ermouth  wrote:
> 
>> My PR was meant to start this discussion
> 
> Unfortunately it was instead merged to master, which is unbearable imho.
> Shouldn’t that PR be rolled back and removed from the master branch
> immediately then?


as long as we make sure we don’t cut a release from this, which is currently
not planned, there is no need to rush a revert.

> As a proposal it’s ok, but to achieve intended goal I think it’s enough to
> add blogs to Documentation section. Btw making that section look like a
> grid of tiles with appropriate icons might greatly increase both its
> attractiveness and UX quality.

People don’t usually click through to the blog. There is tons of good 
information
there that folks in support channels ask questions about time and time again. 
I wanted to give all this a more prominent spot, so folks can learn about all
the good stuff on their own.

Best
Jan
—

Re: Newsfeed IFRAME in Fauxton and IP collection

2020-06-24 Thread ermouth
> My PR was meant to start this discussion

Unfortunately it was instead merged to master, which is unbearable imho.
Shouldn’t that PR be rolled back and removed from the master branch
immediately then?

As a proposal it’s ok, but to achieve intended goal I think it’s enough to
add blogs to Documentation section. Btw making that section look like a
grid of tiles with appropriate icons might greatly increase both its
attractiveness and UX quality.

ermouth


[DISCUSS] New Reduce design for FDB

2020-06-24 Thread Garren Smith
Quick Note I have a gist markdown version of this that might be easier to
read https://gist.github.com/garrensmith/1ad1176e007af9c389301b1b6b00f180

Hi Everyone,

The team at Cloudant have been relooking at Reduce indexes for CouchDB on
FDB and we want to simply what we had initially planned and change some of
the reduce behaviour compared to CouchDB 3.x

Our initial design was to use a skip list. However this hasn’t proven to be
particularly useful approach. It would take very long to update and I can’t
find a good algorithm to query the skip list effectively.

So instead I would like to propose a much simpler reduce implementation. I
would like to use this as the base for reduce and we can look at adding
more functionality later if we need to.

For the new reduce design, instead of creating a skip list, we will instead
create group_level indexes for a key. For example say we have the following
keys we want to add to a reduce index:

```
([2019, 6, 1] , 1)
([2019, 6, 20] , 1)
([2019, 7, 3] , 1)
```

We would then create the following group_level indexes:

```
Level 0:
(null, 3)

Level=1:
([2019], 3)

Level 2:
([2019,6], 2)
([2019, 7] , 1)

Level3:
([2019, 6, 1,] , 1)
([2019, 6, 20,] , 1)
([2019, 7, 3,] , 1)
```

All of these group_level indexes would form part of the reduce index and
would be updated at the same time. We don’t need to know the actual
`group_levels` ahead of time as we would take any key we need to index look
at its length and add it to the group_levels it would belong to.

Another nice optimization we can do with this is when a user creates a view
they can specify the number of group levels to index e.g:

```
{
_id: _design/my-ddoc
  views: {
 one: {
   map: function (doc) {emit(doc.val, 1)},
   reduce: "_sum"
 },

 two: {
   map: function (doc) {emit(doc.age, 1)},
  reduce: "_count"
 }
   },

   options: {group_levels: [1,3,5]}
}
```
This gives the user the ability to trade off index build speed, storage
overhead and performance.

One caveat of that, for now, is if a user changes the number of
`group_levels` to be indexed, the index is invalidated and we would have to
build it from scratch again. Later we could look at doing some work around
that so that isn’t the case.

This design will result in a behaviour change. Previously with reduce if
you set `group_level=2`. It will return all results with `group_level=2`
and below. E.g  reduce key/values of the following:

```
# group = true
("key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2,6],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3,6],"value":1},
{"key":[3,1],"value":1},
{"key":[3,1,5],"value":1},
{"key":[3,4,5],"value":1}
```

Then doing a query group_level=2 returns:

```
# group_level = 2
{"rows":[
{"key":1,"value":2},
{"key":2,"value":2},
{"key":3,"value":2},
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```

I want to **CHANGE** this behaviour, so if a query specifies
`group_level=2` then **only** `group_level=2` returns would be returned.
E.g from the example above the results would be:

```
# group_level = 2
{"rows":[
{"key":[1,1],"value":1},
{"key":[1,2],"value":1},
{"key":[2,1],"value":1},
{"key":[2,3],"value":1},
{"key":[3,1],"value":2},
{"key":[3,4],"value":1}
]}
```


## Group_level=0
`Group_level=0` queries would work as follows:
1. `group_level=0` without startkey/endkey and then the group_level=0 index
is used
2. For a `group_level=0` with a startkey/endkey or where `group_level=0` is
not indexed, the query will look for the smallest `group_level` and use
that to calculate the `group_level=0` result
3. `group_level=0` indexes with a startkey/endkey could timeout and be slow
in some cases because we having to do quite a lot of aggregation when
reading keys. But I don’t think that is much different from how it is done
now.

## Group=true
We will always build the `group=true` index.

## Querying non-indexed group_level
If a query has a `group_level` that is not indexed. We can do two things
here, CouchDB could use the nearest  `group_level` to service the query or
it could return an error that this `group_level` is not available to query.
I would like to make this configurable so that an admin can choose how
reduce indexes behave.

## Supported Builtin Reduces
Initially, we would support reduces that can be updated by calculating a
delta change and applying it to all the group_levels. That means we can
support `_sum` and `_count` quite easily. Initially, we won’t implement
`max` and `min`. However, I would like to add them as soon after.

I would also later like to add support for `_stats` reducer. The best
option I can think of is breaking up each field in `_stats` into its own
k/v row in FDB.

I’m not sure how to handle the `_approx_count_distinct`. At the moment I
don’t understand the algorithm well enough to know if we could 

Re: Newsfeed IFRAME in Fauxton and IP collection

2020-06-24 Thread Jan Lehnardt
Thanks ermouth,

I’m surprised my proposal made it through without discussion. I have the
same question ;D

FWIW, this “leaks” the browser connection to the internet, not necessarily
CouchDB instance data.

For a production version of this, I would at least expect an opt-in button
on that page, before loading remote content.

My PR was meant to start this discussion :)

Best
Jan
—

> On 24. Jun 2020, at 10:33, ermouth  wrote:
> 
> Since I hadn’t received any answer at Github, I’d like to raise an
> important CouchDB Fauxton security question publicly.
> 
> One of the latest Fauxton PRs (
> https://github.com/apache/couchdb-fauxton/pull/1284) adds a remote newsfeed
> to Fauxton. Emitting a newsfeed in the admin panel in that way may lead to
> IP collection of CouchDB instances (or subnets, that is even worse)
> somewhere.
> 
> Where is this ‘somewhere’ located? Pinging blog.couchdb.org shows it points
> to lb.wordpress.com, which seems a bit ridiculous. CouchDB instances are
> not uncommon for very critical parts of infrastructure and security
> projects, and I doubt anyone wants to expose node IPs to _whatever_ logs,
> esp wordpress.com.
> 
> So I’d like to ask devs and users: does anyone think adding news to the
> admin panel worth creating such a security hole?
> 
> ermouth



Newsfeed IFRAME in Fauxton and IP collection

2020-06-24 Thread ermouth
Since I hadn’t received any answer at Github, I’d like to raise an
important CouchDB Fauxton security question publicly.

One of the latest Fauxton PRs (
https://github.com/apache/couchdb-fauxton/pull/1284) adds a remote newsfeed
to Fauxton. Emitting a newsfeed in the admin panel in that way may lead to
IP collection of CouchDB instances (or subnets, that is even worse)
somewhere.

Where is this ‘somewhere’ located? Pinging blog.couchdb.org shows it points
to lb.wordpress.com, which seems a bit ridiculous. CouchDB instances are
not uncommon for very critical parts of infrastructure and security
projects, and I doubt anyone wants to expose node IPs to _whatever_ logs,
esp wordpress.com.

So I’d like to ask devs and users: does anyone think adding news to the
admin panel worth creating such a security hole?

ermouth