Nice API. Few remarks * GET /_shard_splits
How the `#/states/running` would look like in response to `GET /_shard_splits` in the case when there are many jobs running. My guess is we would have all nodes of the cluster in the list. How useful it is to know. Should we return a number of jobs for every node? Maybe omit list of nodes? * `PUT /_shard_splits` The naming is inconsistent in description we are talking about enabling or disabling the feature globbaly for the cluster. In the body of the request we use such verbs as `start` and `stop`. Do we support pause of the shard splitting process? best regards, iilyak On 2019/01/23 19:08:41, Nick Vatamaniuc <vatam...@apache.org> wrote: > In a previous thread > https://mail-archives.apache.org/mod_mbox/couchdb-dev/201901.mbox/%3CCAJd%3D5Hbs%2BNwrt0%3Dz%2BGN68JPU5yHUea0xGRFtyow79TmjGN-_Sg%40mail.gmail.com%3E > we discussed the possibility of adding shard splitting to CouchDB. > > There was some interest in the IRC channel and the mailing list, and I > would like some help refining the new API so that the whole community has > say in it. Also, to propose to continue working on this in the ASF repo for > visibility and continued feedback from the whole community. > > To recap the previous thread, this is about having the basics necessary to > split database shards. It would happen without having to stop the cluster > or put nodes in maintenance mode. > > The API was partially inspired by _scheduler/jobs, another similar API > introduced along with the scheduling replicator functionality. > > > ## API Spec > > * `GET /_shard_splits` > > Get a summary of shard splitting for the whole cluster. This would return > the total number of shard splitting jobs and the number of active ones, > that is the ones that are doing work at that very moment. Another piece of > information is the global state of shard splitting, if it is stopped or > running. > > { > "jobs_total": 10, > "jobs_running": 2, > "states": { > "running": [ > "node1@127.0.0.1", > "node2@127.0.0.1", > "node3@127.0.0.1" > ] > } > } > > > * `PUT /_shard_splits` > > Enable or disable shard splitting on the cluster. This feature that would > be helpful is to have the ability to disable or enable shard splitting > globally on a cluster. This can function as a feature flag as well used in > cases where existing tooling might for example manipulate shards and it's > desirable for the shard splitting to not interfere. > > To disable request body would be: > > { > "stop": "<reason>" > } > > > To (re)-enable: > > { > "start": true > } > > An alternative for this would be to have another underscore path like > `_shard_splits/_state` but I feel it is better to minimize the use of > underscore path, they feel less REST-ful. > > > * `GET /_shard_splits/jobs` > > Get all shard split jobs > > Response body: > > { > "jobs": [ > { > "id": > "001-e41e8751873b56e4beafa373823604d26a2f11ba434a040f865b48df835ccb0b", > "job_state": "completed", > "node": "node1@127.0.0.1", > "source": "shards/00000000-1fffffff/db.1548175503", > "split_state": "completed", > "state_info": {}, > "targets": [ > "shards/00000000-0fffffff/db.1548175503", > "shards/10000000-1fffffff/db.1548175503" > ], > "time_created": "2019-01-23T18:36:17.951228Z", > "time_started": "2019-01-23T18:36:18.457231Z", > "time_updated": "2019-01-23T18:49:19.174453Z" > } > ], > "offset": 0, > "total_rows": 1 > } > > The offset and total_rows here are to keep a view-like shape of the > response and to have it look more like _scheduler/jobs. > > > * `POST /_shard_splits/jobs` > > Start a shard splitting job. > > Request body: > > { > "node": "dbc...@db1.sandbox001.cloudant.net", > "shard": "shards/00000000-FFFFFFFF/username/dbname.$timestamp" > } > > Response body: > > { > "id": > "001-e41e8751873b56e4beafa373823604d26a2f11ba434a040f865b48df835ccb0b", > "ok": true > } > > Or if there are too many shard splitting jobs (a limit inspired by > scheduling replicator as well) it might return an error: > > { > "error": "max_jobs_exceeded", > "reason" "There are $N jobs currently running" > } > > If shard splitting is disabled globally, user get an error and a reason. > The reason here would be the reason sent in the `PUT /_shard_splits` body. > > { > "error": "stopped", > "reason": "Shard splitting is disabled on the cluster currently" > } > > * `GET /_shard_splits/jobs/$jobid` > > > Access information about a particular shard splitting job by its ID. This > is the ID returned from the POST to _shard_splits/jobs request. > > { > "id": > "001-5f553fd2d9180c74aa39c35377fe3e1731d09ec39bbd0f02541f55148e48d888", > "job_state": "completed", > "node": "node1@127.0.0.1", > "source": "shards/00000000-1fffffff/db.1548186810", > "split_state": "completed", > "state_info": {}, > "targets": [ > "shards/00000000-0fffffff/db.1548186810", > "shards/10000000-1fffffff/db.1548186810" > ], > "time_created": "2019-01-23T18:36:17.951228Z", > "time_started": "2019-01-23T18:36:18.457231Z", > "time_updated": "2019-01-23T18:49:19.174453Z" > } > > > * `DELETE /_shard_splits/jobs/$jobid` > > Remove a job. After a job completes or fails, it will not be automatically > removed but will stay around to allow the user to retrieve its status. > After its status is inspected the user should use the DELETE method to > remove the job. If the job is running, it will be cancelled and removed > from the system. > > Response body: > > { > "ok": true > } >