Per-shard will not mean anything from 4.0 onward so at least I was not thinking of that.
per-partition has meaning from 3.0 onward as partitions are user defined. B. > On 14 May 2020, at 22:11, Joan Touzet <woh...@apache.org> wrote: > > > > On 2020-05-13 10:07 a.m., Robert Samuel Newson wrote: >> Hi, >> Yes, I think this would be a good addition for 3.0. I think we didn't add it >> before because of concerns of accidental misuse (attempting to replicate >> with it but forgetting a range, etc)? > > This was definitely a concern, but it makes me wonder: are we also > potentially discussing adding per-shard _changes feeds? I wonder if the work > is roughly analogous. > >> Whatever the reasons, I think exposing the per-partition _changes feed >> exactly as you've described will be useful. We should state explicitly in >> the accompanying docs that the replicator does not use this endpoint >> (though, of course, it might be enhanced to do so in a future release). >> From 4.0 onward, there's a discussion elsewhere on whether any of the >> _partition endpoints continue to exist (leaning towards keeping them just to >> avoid unnecessary upgrade pain?), so a note in that thread would be good >> too. It does seem odd to enhance an endpoint in 3.0 to then remove it >> entirely in 4.0. The reasons for removing _partition are compelling however, >> as the motivating (internal) reason for introducing _partition is gone. >> B. >>> On 12 May 2020, at 22:59, Adam Kocoloski <kocol...@apache.org> wrote: >>> >>> Hi all, >>> >>> When we introduced partitioned databases in 3.0 we declined to add a >>> partition-specific _changes endpoint, because we didn’t have a prebuilt >>> index that could support it. It sounds like the lack of that endpoint is a >>> bit of a drag. I wanted to start this thread to consider adding it. >>> >>> Note: this isn’t a fully-formed proposal coming from my team with a plan to >>> staff the development of it. Just a discussion :) >>> >>> In the simplest case, a _changes feed could be implemented by scanning the >>> by_seq index of the shard that hosts the named partition. We already get >>> some efficiencies here: we don’t need to touch any of the other shards of >>> the database, and we have enough information in the by_seq btree to filter >>> out documents from other partitions without actually retrieving them from >>> disk, so we can push the filter down quite nicely without a lot of extra >>> processing. It’s just a very cheap binary prefix pattern match on the docid. >>> >>> Most consumers of the _changes feed work incrementally, and we can support >>> that here as well. It’s not like we need to do a full table scan on every >>> incremental request. >>> >>> If the shard is hosting so many partitions that this filter is becoming a >>> bottleneck, resharding (also new in 3.0) is probably a good option. >>> Partitioned databases are particularly amenable to increasing the shard >>> count. Global indexes on the database become more expensive to query, but >>> those ought to be a smaller percentage of queries in this data model. >>> >>> Finally, if the overhead of filtering out non-matching partitions is just >>> too high, we could support the use of user-created indexes, e.g. by having >>> a user create a Mango index on _local_seq. If such an index exists, our >>> “query planner” uses it for the partitioned _changes feed. If not, resort >>> to the scan on the shard’s by_seq index as above. >>> >>> I’d like to do some basic benchmarking, but I have a feeling the by_seq >>> work quite well in the majority of cases, and the user-defined index is a >>> good "escape valve” if we need it. WDYT? >>> >>> Adam