[ https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429990#comment-15429990 ]
Steve Rowe commented on SOLR-8029: ---------------------------------- I reviewed the changes on the apiv2 branch, and found many small problems, which I've listed below. I didn't attempt to prioritize them - I think they all should be addressed. h3. Miscellaneous # I couldn't find any V2 version of these old collections api actions: CLUSTERPROP, BALANCESHARDUNIQUE, MIGRATESTATEFORMAT, BACKUP, RESTORE, ADDROLE, REMOVEROLE # Some plugins, e.g. /replication, don't have _introspect support - is this intentional? # Question: I can't find it now, but I saw somewhere that only one command is allowed when POSTing to V2 APIs. ** Is this true? ** What about bulk schema and config APIs? Manual V2 schema api testing with multiple commands in a single request resulted in all but the first command being silently ignored. h3. API spec files # There are many many many missing descriptions & documentation links, and many top-level CWIKI documentation links ** I think it's very important to make these as complete as possible # Default value documentation is missing almost everywhere ** I think these should be filled in where possible # /url/path appears to be an orphan in many *.json files - I can't find any use of it in code, and when it appears in *.json files, it's always a subset of /url/paths (plural). # These orphaned spec files should be removed: ** cores.core.Commands.requestRecovery.json (directly specified in cores.core.Commands.json) ** cores.json (handled by cores.Status.json) # Empty command properties maps, should be removed from: ** cluster.security.BasicAuth.Commands.json: /commands/set-user ** cluster.security.RuleBasedAuthorization.json: /commands/set-permission/params # cluster.config.Commands.json: ** "delete" command should be removed, since that functionality is covered by cluster.config.delete.json # cluster.security.RuleBasedAuthorization.json: ** /commands/delete-permission: *** "type" key is present twice - "type":"object" should be removed (since it should be "int") ** /commands/set-user-role: *** description should state that multiple roles should be comma-separated. # cluster.security.authentication.Commands.json ** Why does it contain no commands? # cluster.security.authorization.Commands.json ** Why does it contain no commands? # cluster.security.RuleBasedAuthorization.json: ** "set-permission" and "update-permission" commands: required "role" property is never defined. ** "set-permission" command: "method" should be restricted to the allowed values (GET, POST, etc.) *** JsonSchemaValidator will have to be modified to support the "enum" JSON schema feature ** "update-permission" command: "index" property is required, but shouldn't be, since "before" could be specified instead. # collections.Commands.json: ** description is spelled "Description" (capital "D") but should be lowercase ** description is a copy-paste-o:"The add-field command adds a new field definition to your schema." # core.config.Commands.json: ** type is object, no validation on the object?: most commands have this problem *** e.g.: add-requesthandler should have required "name" and "class" (as should all update-* and add-* commands?) # core.RealtimeGet.json: ** Missing "fq" param ** How do /get/update and /get/versions work, and what do they do? I don't see any code to handle them. ** /v2/c/\{collection\}/_introspect doesn't return /get/... in availableSubPaths, but it should *** same problem for cores ** /v2/c/\{collection\}/get/_introspect says that the paths are /get/..., but that's wrong: they're /v2/c/\{collection\}/get/... *** same problem for cores # core.SchemaEdit.json: ** Non-supported properties that should be removed: "tokenized", "binary" ** Bulk mode doesn't work - any command after the first appears to be silently ignored *** Why isn't testing failing? ** /v2/c/\{collection\}/_introspect doesn't return /schema/... in availableSubPaths, but it should *** same problem for cores ** /v2/c/\{collection\}/schema/_introspect says that the paths are /schema/..., but that's wrong: they're /v2/c/\{collection\}/schema/... *** same problem for cores # core.SchemaRead.fields.json: ** includeDynamic only applies to /schema/fields, but /schema/dynamicfields and /schema/fieldtypes are included in /url/paths # cluster.json ** /v2/cluster/nodes endpoint returns same results as /v2/cluster - is /v2/cluster/nodes supposed to do something different? If not, why does it need to be here? # cluster.commandstatus.json ** There is no way to delete a command status (old collections api: DELETESTATUS action) - you can only get info on one via: GET /v2/cluster/command-status/\{id\}. ** Maybe this endpoint could renamed from /v2/cluster/command-status to /v2/cluster/async-command, and then have DELETE available on /v2/cluster/async-command/\{id\}? # cluster.config.*.json ** should be renamed to "cluster.configs.*.json" (plural) - the endpoint is /cluster/configs (plural) # collection.json ** should be renamed to collections.collection.json ** CollectionHandlerApi.EndPoint.COLLECTION_STATE will need to have its spec name updated # collections.collection.Commands.json ** Missing command: migrate-docs # collections.collection.Commands.modify.json: ** "maxShardsPerNode" property is missing # collections.collection.shards.Commands.json ** "async" key is missing for "split", "create" and "add-replica" commands ** "create" command: "createNodeSet" description should say it's a comma-separated list ** "add-replica" command: "shard" property description is incomplete: "The name of the shard where " # collections.collection.shards.shard.Commands.json: ** force-leader <- additionalProperties should be false instead of true, since none are supported ** Missing command: "synch-shard" <- I think this should be spelled "sync-shard" # collections.collection.shards.shard.replica.Commands.json ** "set-property" command should take an array ** /commands/set-property/properties/value/description is wrong (copy-paste-o): "The property name" # collections.collection.shards.shard.replica.delete.json ** "async" param is missing # collections.Commands.json ** "async" key is missing for "create", "create-alias" and "delete-alias" commands. ** /commands/create: *** "createNodeSet.shuffle" key is missing *** "maxShardsPerNode" key is missing # core.config.Params.Commands.json ** /commands/update/description: "update one or more *configsets*", <- should be parameter sets # core.SchemaEdit.addFieldType.json: ** "queryAnalyzer" and "indexAnalyzer" should be added as possible keys ** analyzer objects ("analyzer" and the above two) should not include the "type" key ** analyzer objects should include "class", "charFilters", "tokenizer" and "filters" keys, each with appropriate sub-config. # core.SchemaEdit.replaceFieldType.json ** should be the same schema as core.SchemaEdit.addFieldType, but is currently almost empty. # core.SchemaRead.json: ** "/schema/similarity" is present 3 times ** "/schema/zkversion" and "/schema/uniquekey" are present 2 times ** Maybe there should be a test for the multiply-specified-path problem (e.g. in JsonSchemaValidator)? # core.SchemaRead.fields.json ** params only apply to the "fields" endpoint, so why are dynamicfields and fieldtypes included here? # core.Update.json ** /update/json/commands <- is this path new? seems to be rewritten/forwarded to /update/json in UpdateRequestHandlerApi. Is adding this endpoint necessary? ** Missing path: /update/json/docs ** (maybe?) missing path: /update/bin # cores.Commands.json: ** The description on "schema" and "dataDir" are copy/paste-o's ("The core name") ** "numShards" description starts "NO:of", which may not be understood by some, should be "number of" instead. ** /commands/create: *** "async" key is missing *** "configset" should not be required - it's not included in CoreDescriptor.requiredProperties **** or if I'm wrong and it is required, then it's misspelled ("configSet" is the correct spelling) # cores.core.Commands.json ** missing command: "request-status" (corresponding to REQUESTSTATUS) ** missing command: "pre-recovery" (corresponding to PRERECOVERY) ** missing command: "request-sync-shard" (corresponding to REQUESTSYNCSHARD) ** missing command: "request-buffer-updates" (corresponding to REQUESTBUFFERUPDATES) ** missing command: "request-apply-updates" (corresponding to REQUESTAPPLYUPDATES) ** "async" key is missing from "swap", "rename", "unload", "merge-indexes" commands ** /commands/reload *** "additionalProperties" should be false, not true ** /commands/merge-indexes *** Missing properties: "indexDir", "srcCore", "async" ** /commands/request-recovery *** "additionalProperties" should be false, not true ** /commands/forceprepareforleadership: *** I think it should be spelled "force-prepare-for-leadership" *** "additionalProperties" should be false, not true # cores.core.Commands.split.json ** "path" and "targetCore" are singular but take arrays - the descriptions should be plural but are now singular. ** "ranges" is plural but takes a single string - why is treated differently from "path" and "targetCore"? # cores.Status.json ** "indexInfo" param default is wrong: it's true, not false ** I think core-specific status should be at /v2/cores/\{core\} (currently 404's), rather than at /v2/cores/\{core\}/status - that would be consistent with the V2 Collections API per-collection status at /v2/c/\{collection\} # node.Commands.json ** Missing commands: add-role, remove-role ** /commands/overseerop *** I think it should be spelled "overseer-op" *** Missing properties: "op" and "electionNode" *** "additionalProperties" should be false, not true ** /commands/rejoinleaderelection *** I think it should be spelled "rejoin-leader-election" *** "additionalProperties" should be false, not true # node.invoke.json: ** Missing param: "class" h3. JSON schema validation # "int" is used instead of "integer" in schemas (json-schema requires "integer") # JsonSchemaValidator has no "integer" support, only "number" (e.g. no INTEGER in JsonSchemaValidator.Type, ApiBag.KNOWN_TYPES) # JsonSchemaValidator doesn't validate type values - using an invalid type gives an NPE instead of a useful error message ** JsonValidatorTest is incomplete: it should test all supported schema validation aspects ** Did you explore existing JSON validation libraries (instead of implementing from scratch)? # JsonSchemaValidator ** Class javadocs should fully describe the limitations it has compared to full JSON schema support ** Type.valdateData() is misspelled, should be validateData() ** Attribute.validateData is unused, should be removed ** Attribute class name is too vague - it could be named SchemaNode instead, since it represents a node in a JSON schema graph. ** ObjectAttribute class name is too vague - it could be named SchemaAttribute instead, since it is a set of schema object attribute validators h3. Code/resources # solr/core/src/resources/implicitPlugins.json ** references "update_json_docs" as paramsets to be used by the "update" and "/update/json/docs" endpoints, but the definition for these is only in techproducts example - shouldn't a copy of it be in *all* example configsets? Or maybe instead in solr/core/src/resources/? # Map2: ** there are no direct tests of this class, but there should be ** generic <K,V> should be dropped (the delegate map should probably be <String,Object> though) *** values can be of any type *** no generic uses in the code ** Map2 name is too vague - maybe call it ValidatingJsonMap? *** Differences from Java's Map: **** value validation **** value type coersion to JSON types **** JSON deserialization **** deep copy ** NOT_NULL_OF_TYPE lambda is never used (and has a bug: first of pair's type is checked against itself instead of against second of pair's) ** getDeepCopy(Collection,int,boolean) can be removed in favor of the exact duplicate in o.a.s.Utils # o.a.s.common.util.Predicate looks very similar to Java8's Predicate, except that the instead of returning a boolean, the test() method returns either null for success or a String containing a failure explanation. I think it's bad to have a same-named class that behaves slightly differently. Maybe name it ExplanatoryPredicate? # o.a.s.common.util.Lookup interface is never used, so should be removed # ApiBag ** registerLazy() doesn't use its generic type parameter <T>, so it should be removed ** validateAndRegister(): this looks like orphaned code - there is no /url/parts in any *.json file: {code:java} Map2 parts = url.getMap("parts", null); if (parts != null) { Set<String> wildCardNames = getWildCardNames(paths); for (Object o : parts.keySet()) { if (!wildCardNames.contains(o.toString())) throw new RuntimeException("" + o + " is not a valid part name"); Map2 pathMeta = parts.getMap(o.toString(), NOT_NULL); pathMeta.get("type", ENUM_OF, ImmutableSet.of("enum", "string", "int", "number", "boolean")); } } {code} ** getWildCardNames() isn't called from anywhere but the above dead code, so can it also be removed? ** IntrospectApi.call() makes a deep copy of the base api spec map on every request, apparently to exclude description of commands that were not called. This seems inefficient - maybe there should be a per-command cache or something? # URL templates are used in various places, but names in code are inconsistent: ** In DumpRequestHandler.handleRequestBody(), "urlPart" should be renamed, e.g. to "urlTemplateValues"/"templates" : {code:java} String[] parts = req.getParams().getParams("urlPart"); if (parts != null && parts.length > 0) { Map map = new LinkedHashMap<>(); rsp.getValues().add("urlPart", map); for (String part : parts) { map.put(part, req.getPathValues().get(part)); } } {code} ** In PathTrie: *** "parts" is used to mean both "url path segment sequence" and "path segment template mappings". E.g. in lookup(Map parts) <- parts is a map from path segment template name to concrete value, e.g "\{collection\}"->mycollection. Maybe rename the "path segment template mappings" ones to (path)(segment)templates(map)? *** wildCardName() and Node.varName should be renamed, e.g. to (get)templateName ** SolrQueryRequest.getPathValues() should be renamed, e.g. to getPathSegmentTemplateMappings() or getTemplates() # URL path segment sequences are used in various places, but names in code are inconsistent: ** In PathTrie and V2HttpCall, List<String> getParts()/"parts"/"pieces"/"path" usages would be more easily identifiable if referred to consistently as "segments" (since we're talking URL path segments here). # TestPathTrie.testPathTrie() should clear the "parts" template map between lookup calls; otherwise previous paths' template values will linger. # PathTrie: ** findValidChildren() <- why is "valid" part of this method name? There doesn't seem to be any filtering for validity here - it's more like "recursively enumerate all descendant paths". Maybe rename to "getAvailableSubPaths" (same as "availableSubPaths" result set)? ** Node.lookup(List,int,Map,Set) really really needs comments. It took me 10 minutes of staring to figure out this 13 line method. Specifically, the role of the "i" path segment counter should be thoroughly explained, including its relationship to the current node in the trie. # ApiCommand ** commented out code should be removed. ** generic type param is never used, so should be removed. # V2HttpCall: ** getApiInfo(): typo: s/4044/404/ in: {{// this is to return the user with all the subpaths for a given 4044 request}} ** execute(): //todo remove. for debugging only # CollectionHandlerApi ** prefixSubStitutes() should be prefixSubstitutes() (second "S" should be lowercase) # UpdateRequestHandler ** shouldn't /update/bin be "application/javabin" instead of "application/csv"?: *** createDefaultLoaders(): {code:java} pathVsLoaders.put(BIN_PATH,registry.get("application/csv")); ... public static final String BIN_PATH = "/update/bin"; {code} ** Is /update/bin supported? It's not listed in core.Update.json or in ImplicitPlugins.json. # ReplicationHandler ** LOCATION is unused and should be removed. > Modernize and standardize Solr APIs > ----------------------------------- > > Key: SOLR-8029 > URL: https://issues.apache.org/jira/browse/SOLR-8029 > Project: Solr > Issue Type: Improvement > Affects Versions: 6.0 > Reporter: Noble Paul > Assignee: Noble Paul > Labels: API, EaseOfUse > Fix For: 6.0 > > Attachments: SOLR-8029.patch, SOLR-8029.patch, SOLR-8029.patch, > SOLR-8029.patch > > > Solr APIs have organically evolved and they are sometimes inconsistent with > each other or not in sync with the widely followed conventions of HTTP > protocol. Trying to make incremental changes to make them modern is like > applying band-aid. So, we have done a complete rethink of what the APIs > should be. The most notable aspects of the API are as follows: > The new set of APIs will be placed under a new path {{/solr2}}. The legacy > APIs will continue to work under the {{/solr}} path as they used to and they > will be eventually deprecated. > There are 4 types of requests in the new API > * {{/v2/<collection-name>/*}} : Hit a collection directly or manage > collections/shards/replicas > * {{/v2/<core>/*}} : Hit a core directly or manage cores > * {{/v2/cluster/*}} : Operations on cluster not pertaining to any collection > or core. e.g: security, overseer ops etc > This will be released as part of a major release. Check the link given below > for the full specification. Your comments are welcome > [Solr API version 2 Specification | http://bit.ly/1JYsBMQ] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org