That looks really slick. I like it! Adam
> On Feb 12, 2019, at 12:08 PM, Nick Vatamaniuc <vatam...@gmail.com> wrote: > > Shard Splitting API Proposal > > I'd like thank everyone who contributed to the API discussion. As a result > we have a much better and consistent API that what we started with. > > Before continuing I wanted to summarize to see what we ended up with. The > main changes since the initial proposal were switching to using /_reshard > as the main endpoint and having a detailed state transition history for > jobs. > > * GET /_reshard > > Top level summary. Besides the new _reshard endpoint, there `reason` and > the stats are more detailed. > > Returns > > { > "completed": 3, > "failed": 4, > "running": 0, > "state": "stopped", > "state_reason": "Manual rebalancing", > "stopped": 0, > "total": 7 > } > > * PUT /_reshard/state > > Start or stop global rebalacing. > > Body > > { > "state": "stopped", > "reason": "Manual rebalancing" > } > > Returns > > { > "ok": true > } > > * GET /_reshard/state > > Return global resharding state and reason. > > { > "reason": "Manual rebalancing", > "state": "stopped" > } > > > * GET /_reshard/jobs > > Get the state of all the resharding jobs on the cluster. Now we have a > detailed > state transition history which looks similar what _scheduler/jobs have. > > { > "jobs": [ > { > "history": [ > { > "detail": null, > "timestamp": "2019-02-06T22:28:06Z", > "type": "new" > }, > ... > { > "detail": null, > "timestamp": "2019-02-06T22:28:10Z", > "type": "completed" > } > ], > "id": > "001-0a308ef9f7bd24bd4887d6e619682a6d3bb3d0fd94625866c5216ec1167b4e23", > "job_state": "completed", > "node": "node1@127.0.0.1", > "source": "shards/00000000-ffffffff/db1.1549492084", > "split_state": "completed", > "start_time": "2019-02-06T22:28:06Z", > "state_info": {}, > "targets": [ > "shards/00000000-7fffffff/db1.1549492084", > "shards/80000000-ffffffff/db1.1549492084" > ], > "type": "split", > "update_time": "2019-02-06T22:28:10Z" > }, > { > .... > }, > ], > "offset": 0, > "total_rows": 7 > } > > * POST /_reshard/jobs > > Create a new resharding job. This can now take other parameters and can > split multiple ranges. > > To split one shard on a particular node > > { > "type": "split", > "shard": "shards/80000000-bfffffff/db1.1549492084" > "node": "node1@127.0.0.1" > } > > To split a particulr range on all nodes: > > { > "type": "split", > "db" : "db1", > "range" : "80000000-bfffffff" > } > > To split a range on just one node: > > { > "type": "split", > "db" : "db1", > "range" : "80000000-bfffffff", > "node": "node1@127.0.0.1" > } > > To split all ranges of a db on one node: > > { > "type": "split", > "db" : "db1", > "node": "node1@127.0.0.1" > } > > Result now may contain multiple job IDs > > [ > { > "id": > "001-d457a4ea82877a26abbcbcc0e01c4b0070027e72b5bf0c4ff9c89eec2da9e790", > "node": "node1@127.0.0.1", > "ok": true, > "shard": "shards/80000000-bfffffff/db1.1549986514" > }, > { > "id": > "001-7c1d20d2f7ef89f6416448379696a2cc98420e3e7855fdb21537d394dbc9b35f", > "node": "node1@127.0.0.1", > "ok": true, > "shard": "shards/c0000000-ffffffff/db1.1549986514" > } > ] > > * GET /_reshard/jobs/$jobid > > Get just one job by ID. > > Returns something like this (notice it was stopped since resharding is > stopped on the cluster). > > { > "history": [ > { > "detail": null, > "timestamp": "2019-02-12T16:55:41Z", > "type": "new" > }, > { > "detail": "Shard splitting disabled", > "timestamp": "2019-02-12T16:55:41Z", > "type": "stopped" > } > ], > "id": > "001-d457a4ea82877a26abbcbcc0e01c4b0070027e72b5bf0c4ff9c89eec2da9e790", > "job_state": "stopped", > "node": "node1@127.0.0.1", > "source": "shards/80000000-bfffffff/db1.1549986514", > "split_state": "new", > "start_time": "1970-01-01T00:00:00Z", > "state_info": { > "reason": "Shard splitting disabled" > }, > "targets": [ > "shards/80000000-9fffffff/db1.1549986514", > "shards/a0000000-bfffffff/db1.1549986514" > ], > "type": "split", > "update_time": "2019-02-12T16:55:41Z" > } > > > * GET /_reshard/jobs/$jobid/state > > Get the running state of a particular job only > > { > "reason": "Shard splitting disabled", > "state": "stopped" > } > > * PUT /_reshard/jobs/$jobid/state > > Stop or resume a particular job > > Body > > { > "state": "stopped", > "reason": "Pause this job for now" > } > > > What do we think? Hopefully I captured everyone's input. > > I had started to test most of this behavior in API tests: > > https://github.com/apache/couchdb/blob/shard-split/src/mem3/test/mem3_reshard_api_test.erl > > Cheers, > > -Nick > > > On Thu, Jan 31, 2019 at 3:43 PM Nick Vatamaniuc <vatam...@gmail.com> wrote: > >> Hi Joan, >> >> Thank you for taking a look! >> >> >> >>>> * `GET /_shard_splits` >>> >>> As a result I'm concerned: would we then have duplicate endpoints >>> for /_shard_merges? Or would a unified /_reshard endpoint make >>> more sense here? >>> >> >> Good idea. Let's go with _reshard it's more general and allows for adding >> shard merging later. >> >> >>> I presume that if you've disabled shard manipulation on the >>> cluster, the status changes to "disabled" and the value is the >>> reason provided by the operator? >>> >>> >> Currently it's PUT /_reshard/state and body {"state":"running":"stopped", >> "reason":...}. This will be shown at the top level in GET /_reshard/ >> response. >> >> >>>> Get a summary of shard splitting for the whole cluster. >>> >>> What happens if every node in the cluster is restarted while a shard >>> split operation is occurring? Is the job persisted somewhere, i.e. in >>> special docs in _dbs, or would this kill the entire operation? I'm >>> considering rolling cluster upgrades here. >>> >>> >> The job will checkpoint as it goes through various steps that is saved in >> a _local document in the shards dbs. So if a node is restarted, the job >> will resume from the last checkpoint it stopped at >> >> >>> >>>> * `PUT /_shard_splits` >>> >>> Same comment as above about whether this is /_shard_splits or something >>> that could expand to shard merging in the future as well. >>> >>> If you persist the state of the shard splitting operation when disabling, >>> this could be used as a prerequisite to a rolling cluster upgrade >>> (i.e., an important documentation update). >>> >>> >> I think after discussing with other participants this became PUT >> /_shard_splits/state (now PUT _reshard/state). The disable state is also >> persisted on a per-node basis. >> >> An interesting thing to think about, is if a node is down when shard >> splitting is stopped or started, it won't find out about it. So I think we >> might have to do some kind querying of neighboring nodes to detect if a new >> node that just joined had missed a recent change to the global state. >> >> >>>> * `POST /_shard_splits/jobs` >>>> >>>> Start a shard splitting job. >>>> >>>> Request body: >>>> >>>> { >>>> "node": "dbc...@db1.sandbox001.cloudant.net", >>>> "shard": "shards/00000000-FFFFFFFF/username/dbname.$timestamp" >>>> } >>> >>> >>> 1. Agree with earlier comments that having to specify this per-node is >>> a nice to have, but really an end user wants to specify a *database*, >>> and have the API create the q*n jobs needed. It would then return an >>> array of jobs in the format you describe. >>> >> >> Ok, I think that's doable if we switch the response to be an array of >> job_ids. Then we might also have to think about various failure modes, such >> as what if the one of the nodes where a copy lives, is not up. Should that >> be a failure or do we continue splitting just 2 copies. >> >>> >>> 2. Same comment as above; why not add a new field for "type":"split" or >>> "merge" to make this expandable in the future? >>> >>> That makes sense, I can add a type field if we have _reshard as the top >> level endpoint. >> >> >>> -Joan >>> >>