Since CouchDB 2.0 clustered databases have had a fixed Q value defined at
creation. This often requires users to predict database usage ahead of time
which can be hard to do. A too low of a value might result in large shards,
slower performance, and needing more disk space to do compactions.



It would be nice to start with a low Q initially, for example Q=1 and as
usage grows to be able to split some shards that grow too big. Especially
after the partitioned query work, (
https://github.com/apache/couchdb/pull/1789) there will be a higher chance
of having uneven sized shards and so it will be beneficial to split the
larger ones to even out the size distribution across the cluster.



This proposal is basically to introduce such a feature to Apache CouchDB
2.x.



>From the user's perspective, there would be a new HTTP API endpoint. A POST
request to it with a node and a shard path would start a shard split job.
Users would be able to monitor the state of this job and see when it
completed. In the future this opens the possibility of writing an
auto-splitting service that splits shards automatically when they reach a
particular size or based on other parameters.



Paul Davis and I have been experimenting over the last few months to see if
it is possible to do this. That progress so far is here:



https://github.com/cloudant/couchdb/commits/shard-splitting



Most of the bits are in mem3_shard_* and couch_db_split modules.



There is an initial bulk copy of data from the source shard to the target
shards. So a shard in the 00-ff range would be split into two shards with
ranges 00-7f and 80-ff. While copying, each document ID is hashed and
depending which side of the range it falls, it would end up either in the
00-7f shard or the 80-ff one. Then, when that is done, indices are rebuilt
for each shard range. Finally, the cluster-wide shard map is updated and
the source shard is deleted.



There are other details such as the internal replicator needing to know how
to replicate to a target that was split, and handling uneven shard copies
in fabric coordinators. The HTTP API also would need to be figured out and
implemented and many other bits.



What does the community at large think about this? If we like it, I can
move that work to an ASF CouchDB branch and open a PR to finalize the
design and continue the discussion there.

Reply via email to