[jira] [Comment Edited] (SOLR-10574) Choose a default configset for Solr 7

JIRA Thu, 08 Jun 2017 14:43:15 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043491#comment-16043491
 ]


Jan Høydahl edited comment on SOLR-10574 at 6/8/17 9:41 PM:
------------------------------------------------------------

It is important to remember that even in a data driven mode, nothing stops you 
from creating a collection, adding a few fields with Schema API that you want 
to force fieldType for, and then start indexing docs. I've used this approach 
in POC settings many times, and it makes sure the fields you define up front 
behave well while still being able to explore, search and facet on new, unknown 
fields. Then before going to production, you clean up the schema and flip the 
data-driven switch.

So for 7.x I tend to agree with Ishan and make it possible to add a collection 
from Admin UI as the first thing you try after install, and have it behave 
exactly like {{bin/solr create -c foo}}.

Then let us make data-driven more mature, here are some rough thoughts
* Implement SOLR-9526, indexing text as both tokenized and string
* If {{update/json}}, perhaps be better at guessing primitive types from JSON 
type (not possible from XML, CSV)
* Add a {{add-unknown-fields-to-the-schema-dryrun}} update chain which buffers 
N docs before guessing, and does no indexing
* Add a {{<data-driven>true|false</data-driven>}} tag/API to the schema, and 
let DirectUpdateHandler? enable/disable the update chain based on this
* Make it possible to decide data-driven or not while creating collection? 
{{bin/solr create -c foo -data-driven false}}

Wrt schema.xml vs managed-schema, I'm +0 on renaming to {{managed-schema.xml}}, 
the "managed" part in the name and comments in the file gives warning enough. 
What if we add an API {{POST /collection/schema/xml}} which takes the complete 
XML file as body, as a safe way to continue hand-editing the xml schema? It 
would also be easy to add an Admin UI edit textbox if we had this...


was (Author: janhoy):
It is important to remember that even in a data driven mode, nothing stops you 
from creating a core, adding a few fields that you want to force fieldType for, 
and then start indexing docs. I've used this approach in POC settings many 
times, and it makes sure the fields you define up front behave well while still 
being able to explore, search and facet on new, unknown fields. Then before 
going to production, you clean up the schema and flip the data-driven switch.

So for 7.x I tend to agree with Ishan and make it possible to add a collection 
from Admin UI as the first thing you try after install, and have it behave 
exactly like {{bin/solr create -c foo}}.

Then let us make data-driven more mature, here are some rough thoughts
* Implement SOLR-9526, indexing text as both tokenized and string
* If {{update/json}}, perhaps be better at guessing primitive types from JSON 
type (not possible from XML, CSV)
* Add a {{add-unknown-fields-to-the-schema-dryrun}} update chain which buffers 
N docs before guessing, and does no indexing
* Add a {{<data-driven>true|false</data-driven>}} tag/API to the schema, and 
let DirectUpdateHandler? enable/disable the update chain based on this
* Make it possible to decide data-driven or not while creating collection? 
{{bin/solr create -c foo -data-driven false}}

Wrt schema.xml vs managed-schema, I'm +0 on renaming to {{managed-schema.xml}}, 
the "managed" part in the name and comments in the file gives warning enough. 
What if we add an API {{POST /collection/schema/xml}} which takes the complete 
XML file as body, as a safe way to continue hand-editing the xml schema? It 
would also be easy to add an Admin UI edit textbox if we had this...

> Choose a default configset for Solr 7
> -------------------------------------
>
>                 Key: SOLR-10574
>                 URL: https://issues.apache.org/jira/browse/SOLR-10574
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: master (7.0)
>
>         Attachments: SOLR-10574.patch, SOLR-10574.patch, SOLR-10574.patch
>
>
> Currently, the data_driven_schema_configs is the default configset when 
> collections are created using the bin/solr script and no configset is 
> specified.
> However, that may not be the best choice. We need to decide which is the best 
> choice, out of the box, considering many users might create collections 
> without knowing about the concept of a configset going forward.
> (See also SOLR-10272)
> Proposed changes:
> # Remove data_driven_schema_configs and basic_configs
> # Introduce a combined configset, {{_default}} based on the above two 
> configsets.
> # Build a "toggleable" data driven functionality into {{_default}}
> Usage:
> # Create a collection (using _default configset)
> # Data driven / schemaless functionality is enabled by default; so just start 
> indexing your documents.
> # If don't want data driven / schemaless, disable this behaviour: {code}
> curl http://host:8983/solr/coll1/config -d '{"set-user-property": 
> {"update.autoCreateFields":"false"}}'
> {code}
> # Create schema fields using schema API, and index documents



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10574) Choose a default configset for Solr 7

Reply via email to