Re: [DISCUSS] _changes feed on database partitions

Robert Samuel Newson Thu, 14 May 2020 14:14:25 -0700

Per-shard will not mean anything from 4.0 onward so at least I was not thinking 
of that.


per-partition has meaning from 3.0 onward as partitions are user defined.

B.

> On 14 May 2020, at 22:11, Joan Touzet <woh...@apache.org> wrote:
> 
> 
> 
> On 2020-05-13 10:07 a.m., Robert Samuel Newson wrote:
>> Hi,
>> Yes, I think this would be a good addition for 3.0. I think we didn't add it 
>> before because of concerns of accidental misuse (attempting to replicate 
>> with it but forgetting a range, etc)?
> 
> This was definitely a concern, but it makes me wonder: are we also 
> potentially discussing adding per-shard _changes feeds? I wonder if the work 
> is roughly analogous.
> 
>> Whatever the reasons, I think exposing the per-partition _changes feed 
>> exactly as you've described will be useful. We should state explicitly in 
>> the accompanying docs that the replicator does not use this endpoint 
>> (though, of course, it might be enhanced to do so in a future release).
>> From 4.0 onward, there's a discussion elsewhere on whether any of the 
>> _partition endpoints continue to exist (leaning towards keeping them just to 
>> avoid unnecessary upgrade pain?), so a note in that thread would be good 
>> too. It does seem odd to enhance an endpoint in 3.0 to then remove it 
>> entirely in 4.0. The reasons for removing _partition are compelling however, 
>> as the motivating (internal) reason for introducing _partition is gone.
>> B.
>>> On 12 May 2020, at 22:59, Adam Kocoloski <kocol...@apache.org> wrote:
>>> 
>>> Hi all,
>>> 
>>> When we introduced partitioned databases in 3.0 we declined to add a 
>>> partition-specific _changes endpoint, because we didn’t have a prebuilt 
>>> index that could support it. It sounds like the lack of that endpoint is a 
>>> bit of a drag. I wanted to start this thread to consider adding it.
>>> 
>>> Note: this isn’t a fully-formed proposal coming from my team with a plan to 
>>> staff the development of it. Just a discussion :)
>>> 
>>> In the simplest case, a _changes feed could be implemented by scanning the 
>>> by_seq index of the shard that hosts the named partition. We already get 
>>> some efficiencies here: we don’t need to touch any of the other shards of 
>>> the database, and we have enough information in the by_seq btree to filter 
>>> out documents from other partitions without actually retrieving them from 
>>> disk, so we can push the filter down quite nicely without a lot of extra 
>>> processing. It’s just a very cheap binary prefix pattern match on the docid.
>>> 
>>> Most consumers of the _changes feed work incrementally, and we can support 
>>> that here as well. It’s not like we need to do a full table scan on every 
>>> incremental request.
>>> 
>>> If the shard is hosting so many partitions that this filter is becoming a 
>>> bottleneck, resharding (also new in 3.0) is probably a good option. 
>>> Partitioned databases are particularly amenable to increasing the shard 
>>> count. Global indexes on the database become more expensive to query, but 
>>> those ought to be a smaller percentage of queries in this data model.
>>> 
>>> Finally, if the overhead of filtering out non-matching partitions is just 
>>> too high, we could support the use of user-created indexes, e.g. by having 
>>> a user create a Mango index on _local_seq. If such an index exists, our 
>>> “query planner” uses it for the partitioned _changes feed. If not, resort 
>>> to the scan on the shard’s by_seq index as above.
>>> 
>>> I’d like to do some basic benchmarking, but I have a feeling the by_seq 
>>> work quite well in the majority of cases, and the user-defined index is a 
>>> good "escape valve” if we need it. WDYT?
>>> 
>>> Adam

Re: [DISCUSS] _changes feed on database partitions

Reply via email to