Re: Shard level querying in CouchDB Proposal

Mike Rhodes Thu, 02 Aug 2018 01:51:03 -0700

Joan,

I'm in agreement that this feature isn't "more important" -- but neither is it 
"less important". They're both vital for different sets of users (app devs vs. 
admins I think).

Disclosure: this URL scheme was mostly my thinking and is partly based around a 
future that is more partition aware.

For users with large databases -- think tens/hundreds of shards -- this is 
vitally important and front-and-centre in the data model and development 
mindset. Therefore, I think my argument will in part be the same as yours -- 
that I think partitions are, for a developer, a new but totally first-class 
aspect of the data model, in the same way that shards are for a more 
admin-based perspective. So I think both _shards and _partitions make sense, 
and both concepts are important to developers and admins -- different people, 
or the same people wearing different hats at different times.

This is because it's the primary data that's partitioned, its not just a view 
index partition or a Mango partition. From this I see a partition as a further 
logical subdivision of documents within a CouchDB instance -- in the same way 
that a database is a logical subdivision of documents.

Therefore having partition as a first-class part of the URL rather than a 
secondary part makes sense to me. The CouchDB path hierarchy currently is of 
the form /<data subdivision>/<index within data subdivision> which in the world 
view above implies the logical /<data subdivision>/<further data 
subdivision>/<index within data subdivision> to maintain consistency.

I will admit that there is a certain awkwardness in shoe-horning this new 
concept onto an existing API (e.g., should /db/_partition/foo/dockey do a 
document GET? should POST /db/_partition/foo auto-generate a dockey?) but I 
feel that having the defined namespace allows us to make those choices without 
radically changing the API and would allow future expansion of the first-class 
nature of this API.

In addition, when developing I think it makes sense to a user (well, it does to 
me anyway) that we can have the notion of "requests made to endpoints under the 
_partition namespace are more performant and preferred for large scale 
databases" being easier to consume than "you can use endpoints X, Y, Z in a 
scalable manner if you also provide this bit of path on the end". As new 
endpoints become partition aware -- if that makes sense, which I suspect the 
will end up being, not least something like /db/_partition/foo/_info for 
partition size, doc count etc. -- they have a natural place to live within the 
existing path hierarchy.

I do agree with the confusion aspect of shards and partitions, and I'm unsure 
exactly the way forward here yet :(

-- 
Mike.

On Wed, 1 Aug 2018, at 14:07, Joan Touzet wrote:
> Hi everyone,
> 
> Recently, Garren and Robert started making progress on this proposal via a PR:
> 
> https://github.com/apache/couchdb/pull/1480
> 
> and specifically:
> 
> https://github.com/apache/couchdb/pull/1480#issuecomment-409565736
> 
> which has lead to me with a strong -0.75 on the proposed endpoint of:
> 
>     /db/_partition/:partitionkey/_designdoc/name/_view/viewname 
> 
> Here's why.
> 
> We absolutely must get rid of port 5986, which is currently the only way 
> to get to actual disk-level shards in CouchDB today. The route for that 
> will probably look something like:
> 
>     /db/_shard/00000000-1fffffff/...
> 
> This is critical for cluster-level administration, health checks, etc. 
> and to fully remove the old couch_httpd code from the codebase (which is 
> desperately overdue for happening, and must happen prior to a 3.0 
> release). I'm sad we don't have this code yet, especially since like 
> children in a well-stocked larder we're rushing to the jams and pies 
> before having our main courses, but such is the nature of shiny things.
> 
> Now I see we are introducing view partitions, which to me really should 
> be below the view portion here:
> 
>     /db/_designdoc/name/_view/viewname/_partition/:partitionkey/
> 
> End users who are new to CouchDB 2.x are still just learning about 
> shards. Partitions are only going to further muddy the waters.
> 
> As I said on IRC, "i 100% guarantee you that people will not understand 
> the difference between a db shard and a db partition if we introduce 
> both concepts without careful thought :)" To me, this is not carefully 
> thought out.
> 
> Garren mentions this will also surface for find/index, and thus makes a 
> case for it being farther up. But I argue that with /db/_shard and /db/
> _partition people will have no idea what they are doing.
> 
> Please help me disentangle this ball of yarn. And don't make the new 
> feature "more important" than shard-level access, it's not.
> 
> -Joan

Re: Shard level querying in CouchDB Proposal

Reply via email to