[jira] [Commented] (SOLR-5606) REST based Collections API

Timothy Potter (JIRA) Mon, 27 Jul 2015 12:12:44 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643229#comment-14643229
 ]


Timothy Potter commented on SOLR-5606:
--------------------------------------

I think striving for a REST solution is not practical at this point with the 
maturing of the bulk-style Schema and Configs APIs. Rather than continuing to 
argue about REST (see discussion on SOLR-7312), we should embrace this bulk 
approach and try to be consistent. In other words, having the same feel across 
all admin APIs seems more productive than having a REST-based Collection API 
and a bulk-style Schema / Config API. Specifically, here are some additional 
ideas I have for this effort:

1) Adapt the current collections API to use the bulk-style API used by Schema 
and Config API at the {{/solr/<collection>/admin}} endpoint. For instance, to 
add a replica to shard1 of the *gettingstarted* collection, I would do:

{code}
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-replica":{
     "shard":"shard1",
     "node":"192.168.0.1_solr"}
}' http://localhost:8983/solr/gettingstarted/admin
{code}

The other collection admin actions would work similarly. This also has the 
benefit of allowing multiple collection admin actions to be applied at the same 
time, such as to add a replica for each shard at the same time.

In general, no operations that change the state of Solr should accept GET 
requests, see: SOLR-1523

2) Move all of the cluster-level API actions currently in the collections API 
to a cluster API. Specifically, the following actions should not be in the 
"collections" API:

/admin/collections?action=CLUSTERPROP: Add/edit/delete a cluster-wide property 
/admin/collections?action=ADDROLE: Add a specific role to a node in the cluster 
/admin/collections?action=REMOVEROLE: Remove an assigned role 
/admin/collections?action=OVERSEERSTATUS: Get status and statistics of the 
overseer 
/admin/collections?action=CLUSTERSTATUS: Get cluster status 

In addition, we should add a node status API endpoint similar to what is 
reported by {{bin/solr status}}, i.e.

{code}
Solr process 81705 running on port 7574
{
  
"solr_home":"/Users/timpotter/dev/lw/projects/br5x/solr/example/cloud/node2/solr/",
  "version":"5.3.0-SNAPSHOT 1689511 - timpotter - 2015-07-06 16:00:47",
  "startTime":"2015-07-06T22:36:38.322Z",
  "uptime":"0 days, 21 hours, 39 minutes, 50 seconds",
  "memory":"88 MB (%17.9) of 490.7 MB",
  "cloud":{
    "ZooKeeper":"localhost:9983",
    "liveNodes":"2",
    "collections":"3"}}
{code}

NOTE: The JSON returned by the node status API should use a consistent naming 
style for names; the Schema / Config APIs use a snake-case with dashes vs. 
camel case. Whichever we chose, it needs to be consistent across all requests / 
responses returned by Solr.

3) The CLUSTERSTATUS action takes an optional collection / shard parameters, 
which should be migrated under a specific collection endpoint, such as:

{{/solr/<collection>/status}}

Integrate the healthcheck code in the SolrCLI with the 
{{/solr/<collection>/status}} action so that the healthcheck is available to 
all clients and not just from the command-line.

4) Sending a GET request to the {{/solr/<collection>}} endpoint should return 
200 (exists) or 404 (not found). The body could also return basic metadata (as 
JSON) about the specified collection if it exists. This also helps fix the 
issue of determining if a collection already exists. Currently, users have to 
either iterate over the list of collections to determine existence or use the 
CLUSTERSTATUS command with the collection parameter, neither of which are as 
intuitive as sending a GET request to a collection resource.

Alternatively, rather than having a separate status endpoint for a collection, 
we could just return the status information for the collection for a GET 
request to {{/solr/<collection>}}. We can use a query string parameter to allow 
users to control how much status information should be returned as things like 
the healthcheck are not free to execute so should only be done when requested. 
For instance:

GET /solr/<collection> returns 200 or 404
GET /solr/<collection>?status=true returns status information in the response 
body

5) Ability to filter collections from the API based on the following criteria 
(similar to what the cloud panel enables in the UI):

{{GET /solr/collections}} returns a list of all collection names

or

{{GET /solr/collections?params}} return a list of collections matching criteria 
specified in the additional params. 

Filtering criteria could include:
+ name prefix matching (tj*)
+ config name (to show me all the collections that use config xyz)
+ activity level (to show my busiest collections in the past X time range)
+ replica status (to show me all the collections that have replicas that are 
down | recovering | etc)
+ by node (to show me all the collections that have replicas on a specific node 
in my cluster)
+ creation date (to show me all the collections created since some date or 
before some other date)

6) Deleting a collection should use the DELETE HTTP verb, i.e.

{{DELETE /solr/<collection>}}

This makes it easier to secure.

7) Creating a collection needs to be overhauled. Currently, a user sends a GET 
request to {{/solr/admin/collections?action=CREATE&args …}}. There are several 
issues with this:

- GET requests should not be used to change the state of the system, should be 
a PUT or a POST (or both is fine too).
- Long list of query parameters to specify collection parameters
- XML is returned by default using embedded {{<lst>}} elements (confusing)
- collection.configName parameter: numerous issues
- response (besides being XML) makes no sense to a new user (see below, looks 
like a bunch of mumbo jumbo to me):
{code}
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">1672</int>
  </lst>
  <lst name="success">
    <lst>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1497</int>
      </lst>
      <str name="core">foo_shard2_replica1</str>
    </lst>
    <lst>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1557</int>
      </lst>
      <str name="core">foo_shard1_replica1</str>
    </lst>
  </lst>
</response>
{code}

Introduce an approach that will appeal to the REST lovers among us that accepts 
a JSON definition of a collection where you can POST to {{/solr/collections/}}, 
such as:

{code}
curl -XPOST -H 'Content-type: application/json' -d 
'{"name":"golf","numShards":2,"configName":"foo"}' 
http://localhost:8080/solr/collections/
{code}

POST should be used instead of PUT as PUT requests are intended / expected to 
be idempotent. At this point, the {{/collections}} endpoint is solely used to 
handle creation and list/find collection requests. The API should use a 
sensible default for numShards and replicationFactor as a new user may not 
really understand these the first time they use Solr, as is done currently by 
the {{bin/solr create -c}} command. Response is either 201 (created) or an 
error code and explanation (in JSON)

There are obviously more issues to be dealt with around collection configs, but 
I'll address those in other ticket. The point here is to clean-up how create 
works.

7) We need a collection-level metrics API endpoint. SolrCloud doesn't provide 
any aggregate stats about the cluster or a collection. Very common questions 
such as document counts per shard, index sizes, request rates etc cannot be 
answered easily without figuring out the cluster state, invoking multiple core 
admin APIs and aggregating them manually, see: SOLR-6325





> REST based Collections API
> --------------------------
>
>                 Key: SOLR-5606
>                 URL: https://issues.apache.org/jira/browse/SOLR-5606
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Jan Høydahl
>            Priority: Minor
>             Fix For: Trunk
>
>
> For consistency reasons, the collections API (and other admin APIs) should be 
> REST based. Spinoff from SOLR-1523



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5606) REST based Collections API

Reply via email to