[
https://issues.apache.org/jira/browse/SOLR-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643229#comment-14643229
]
Timothy Potter commented on SOLR-5606:
--------------------------------------
I think striving for a REST solution is not practical at this point with the
maturing of the bulk-style Schema and Configs APIs. Rather than continuing to
argue about REST (see discussion on SOLR-7312), we should embrace this bulk
approach and try to be consistent. In other words, having the same feel across
all admin APIs seems more productive than having a REST-based Collection API
and a bulk-style Schema / Config API. Specifically, here are some additional
ideas I have for this effort:
1) Adapt the current collections API to use the bulk-style API used by Schema
and Config API at the {{/solr/<collection>/admin}} endpoint. For instance, to
add a replica to shard1 of the *gettingstarted* collection, I would do:
{code}
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-replica":{
"shard":"shard1",
"node":"192.168.0.1_solr"}
}' http://localhost:8983/solr/gettingstarted/admin
{code}
The other collection admin actions would work similarly. This also has the
benefit of allowing multiple collection admin actions to be applied at the same
time, such as to add a replica for each shard at the same time.
In general, no operations that change the state of Solr should accept GET
requests, see: SOLR-1523
2) Move all of the cluster-level API actions currently in the collections API
to a cluster API. Specifically, the following actions should not be in the
"collections" API:
/admin/collections?action=CLUSTERPROP: Add/edit/delete a cluster-wide property
/admin/collections?action=ADDROLE: Add a specific role to a node in the cluster
/admin/collections?action=REMOVEROLE: Remove an assigned role
/admin/collections?action=OVERSEERSTATUS: Get status and statistics of the
overseer
/admin/collections?action=CLUSTERSTATUS: Get cluster status
In addition, we should add a node status API endpoint similar to what is
reported by {{bin/solr status}}, i.e.
{code}
Solr process 81705 running on port 7574
{
"solr_home":"/Users/timpotter/dev/lw/projects/br5x/solr/example/cloud/node2/solr/",
"version":"5.3.0-SNAPSHOT 1689511 - timpotter - 2015-07-06 16:00:47",
"startTime":"2015-07-06T22:36:38.322Z",
"uptime":"0 days, 21 hours, 39 minutes, 50 seconds",
"memory":"88 MB (%17.9) of 490.7 MB",
"cloud":{
"ZooKeeper":"localhost:9983",
"liveNodes":"2",
"collections":"3"}}
{code}
NOTE: The JSON returned by the node status API should use a consistent naming
style for names; the Schema / Config APIs use a snake-case with dashes vs.
camel case. Whichever we chose, it needs to be consistent across all requests /
responses returned by Solr.
3) The CLUSTERSTATUS action takes an optional collection / shard parameters,
which should be migrated under a specific collection endpoint, such as:
{{/solr/<collection>/status}}
Integrate the healthcheck code in the SolrCLI with the
{{/solr/<collection>/status}} action so that the healthcheck is available to
all clients and not just from the command-line.
4) Sending a GET request to the {{/solr/<collection>}} endpoint should return
200 (exists) or 404 (not found). The body could also return basic metadata (as
JSON) about the specified collection if it exists. This also helps fix the
issue of determining if a collection already exists. Currently, users have to
either iterate over the list of collections to determine existence or use the
CLUSTERSTATUS command with the collection parameter, neither of which are as
intuitive as sending a GET request to a collection resource.
Alternatively, rather than having a separate status endpoint for a collection,
we could just return the status information for the collection for a GET
request to {{/solr/<collection>}}. We can use a query string parameter to allow
users to control how much status information should be returned as things like
the healthcheck are not free to execute so should only be done when requested.
For instance:
GET /solr/<collection> returns 200 or 404
GET /solr/<collection>?status=true returns status information in the response
body
5) Ability to filter collections from the API based on the following criteria
(similar to what the cloud panel enables in the UI):
{{GET /solr/collections}} returns a list of all collection names
or
{{GET /solr/collections?params}} return a list of collections matching criteria
specified in the additional params.
Filtering criteria could include:
+ name prefix matching (tj*)
+ config name (to show me all the collections that use config xyz)
+ activity level (to show my busiest collections in the past X time range)
+ replica status (to show me all the collections that have replicas that are
down | recovering | etc)
+ by node (to show me all the collections that have replicas on a specific node
in my cluster)
+ creation date (to show me all the collections created since some date or
before some other date)
6) Deleting a collection should use the DELETE HTTP verb, i.e.
{{DELETE /solr/<collection>}}
This makes it easier to secure.
7) Creating a collection needs to be overhauled. Currently, a user sends a GET
request to {{/solr/admin/collections?action=CREATE&args …}}. There are several
issues with this:
- GET requests should not be used to change the state of the system, should be
a PUT or a POST (or both is fine too).
- Long list of query parameters to specify collection parameters
- XML is returned by default using embedded {{<lst>}} elements (confusing)
- collection.configName parameter: numerous issues
- response (besides being XML) makes no sense to a new user (see below, looks
like a bunch of mumbo jumbo to me):
{code}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1672</int>
</lst>
<lst name="success">
<lst>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1497</int>
</lst>
<str name="core">foo_shard2_replica1</str>
</lst>
<lst>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1557</int>
</lst>
<str name="core">foo_shard1_replica1</str>
</lst>
</lst>
</response>
{code}
Introduce an approach that will appeal to the REST lovers among us that accepts
a JSON definition of a collection where you can POST to {{/solr/collections/}},
such as:
{code}
curl -XPOST -H 'Content-type: application/json' -d
'{"name":"golf","numShards":2,"configName":"foo"}'
http://localhost:8080/solr/collections/
{code}
POST should be used instead of PUT as PUT requests are intended / expected to
be idempotent. At this point, the {{/collections}} endpoint is solely used to
handle creation and list/find collection requests. The API should use a
sensible default for numShards and replicationFactor as a new user may not
really understand these the first time they use Solr, as is done currently by
the {{bin/solr create -c}} command. Response is either 201 (created) or an
error code and explanation (in JSON)
There are obviously more issues to be dealt with around collection configs, but
I'll address those in other ticket. The point here is to clean-up how create
works.
7) We need a collection-level metrics API endpoint. SolrCloud doesn't provide
any aggregate stats about the cluster or a collection. Very common questions
such as document counts per shard, index sizes, request rates etc cannot be
answered easily without figuring out the cluster state, invoking multiple core
admin APIs and aggregating them manually, see: SOLR-6325
> REST based Collections API
> --------------------------
>
> Key: SOLR-5606
> URL: https://issues.apache.org/jira/browse/SOLR-5606
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Jan Høydahl
> Priority: Minor
> Fix For: Trunk
>
>
> For consistency reasons, the collections API (and other admin APIs) should be
> REST based. Spinoff from SOLR-1523
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]