All,
Just to be clear before starting: this is a proposal from Cloudant not just
myself :D
## What?
Introduce a user-specified shard key per document and a way for the user to
scope queries to a single shard using this key, thereby reducing query latency
by allowing the database to consult only the shard required to answer the
user's query. Documents can share a shard key and all documents with a given
shard key are written to the same shard. The shard key therefore overrides
using the document ID as the key to shard by.
## What problem does this proposal address?
To improve performance and throughput of queries generally, and specifically to
significantly decrease the effects of data size on query latency and cluster
load leading to better data scaling characteristics of the database.
CouchDB's current queries -- whether to views, search indexes or mango --
perform badly when the shard count (Q) for a database is high. This is because
the coordinator node can't know in advance which shards hold results for the
query, so must make a scatter-gather request to all shards, even those that in
the end return no results. In turn this generates more load on all servers in
the cluster because of many possibly redundant file reads.
We did some hacky performance testing of this idea by modifying the current
view API to allow one to specify a specific view shard in the query. The
results were very promising, as they showed that queries started scaling with
similar characteristics to lookups, as you'd hope for, in that Q ceased having
such a large effect and load on the cluster decreased.
## API
There are a few general considerations which act as constraints on the API. To
get these out in the open:
• The combined <shardkey>:<dockey> must be a valid document id. This enforces
uniqueness of shardkey:docid across the database and invariance between
document updates.
• Aside from calculation of shard location, we treat the combined
<shardkey>:<dockey> as we do any other document _id.
• All existing tooling should continue to work as-is (e.g., changes feed
doesn't have to change to specify a shardkey:dockey pair and the replicator
will "just work").
• For Cloudant, it's useful to be able to differentiate queries by path for our
internal accounting.
For this we have a documentid becoming a composite thing:
`<shardkey>:<documentkey>`
• shardkey can be any valid characters for a document ID, but must not start
with an _ or contain a : or /.
• documentkey can be any valid characters for a document ID. Further :
characters are treated as part of the key; meaningless.
From this, our proposed API is, which should also help clarify a bit how things
work:
• Enable shard keys using database create: `PUT /db?use_shard_key=true`.
Default false.
• For databases where this is enabled, every document needs a shard key.
• Upload a document: `PUT /db/<shardkey>:<dockey>`
• In these databases, documents need user-specified document IDs as the
shard key needs to be set by the user.
• Therefore reject POSTing documents to /db for simplicity (rather than
having validation of an _id field).
• Other things that create documents will need the validation of document ID
too:
• Copy a document. Same as now but reject when destination document id
does not have a shard key.
• Calls like `POST /db/_bulk_docs`, `POST
/db/_design/mydoc/_update/myupdatefunction` and others that accept documents
via POST bodies will need to inspect the body for shard keys.
• Get a document: `GET /db/<shardkey>:<dockey>`.
• Special documents:
• Design documents and local documents have no shardkey in their docID.
This is enforced.
• Reject documents with IDs of the form `<shardkey>:_design/foo`.
• Query requests introduce a shardkey into the path, following an explicit
`_shardkey` path part:
• Mango: `POST /db/_shardkey/<shardkey>/_find`.
• Views
• `GET
/db/_shardkey/<shardkey>/_design/mydoc/_view/myview?include_docs=true`
• `POST
/db/_shardkey/<shardkey>/_design/mydoc/_view/myview?include_docs=true`
• Query results are restricted to documents with the shard key
specified. Which makes things harder but leaves the door open for future things
like shard-splitting without changing result sets. And it seems like what one
would expect!
• The above is a white list of APIs, so a few notes on some cases I've left out
with reasons:
• _all_docs (can see uses, but would rather keep API small to start
with). Note that Mango depends on the internal _all_docs API.
• _changes (use-cases better served by explicit shard API).
## What are we NOT addressing here?
This proposal only provides indirect shard addressing -- it's specifically not
covering things discussed previously [1] per-shard changes feeds where one
needs to directly address each shard. The shard key only controls the bucketing
of documents into shards and doesn't say anything about, for example, whether
two shard keys end up on the same shard -- that's an internal detail.
I'd really appreciate thoughts and comments before Cloudant get started on
implementation of this.
Mike.
[1]: https://issues.apache.org/jira/browse/COUCHDB-2791