[jira] [Commented] (SOLR-10574) Choose a default configset for Solr 7

Erick Erickson (JIRA) Wed, 14 Jun 2017 08:36:15 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049300#comment-16049300
 ]


Erick Erickson commented on SOLR-10574:
---------------------------------------

I'll add a yes to managed schema having an xml extension. Agree make it a 
separate issue.

Catch-all _text_ field: yes. Enabled by default: yes with warning.

Since this is not for production anyway, might as well make it as easy as 
possible to get started. If we're going to enable data_driven, we should have a 
catch-all field enabled by default. Neither one is something I'd recommend 
going to production with without close examination.

So to me it's a "both or neither" preference. The point of having data_driven 
as the default is to lower first-time barriers to entry. If the catch-all field 
is there and it's the pre-configured "df" for the request handlers people get 
results the first time they index and search without even knowing they have 
fields in their documents. Otherwise they're left scratching their heads 
because they indexed stuff but didn't find anything.

So we'd then tell them "Examine your index to see what fields were actually 
defined, and do fielded search ('cause they don't even necessarily know what 
the docs look like!). Or enable a catch-all field and re-index", which is a 
minimal improvement in first-time experience over what we have now, at least 
they were able to index docs if not successfully search them the first time 
they tried.

Perhaps the warning (in the schema file and in startup guides or maybe "taking 
Solr to production") is something akin to "add-unknown-fields-to-the-schema and 
the default behavior of copying all fields to _text_ are options intended for 
getting started. Production systems rarely enable either of these two options. 
See solrconfig.xml and managed-schema(.xml) for the text 'RARELY ENABLED FOR 
PRODUCTION' ". Or something like that.

> Choose a default configset for Solr 7
> -------------------------------------
>
>                 Key: SOLR-10574
>                 URL: https://issues.apache.org/jira/browse/SOLR-10574
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: master (7.0)
>
>         Attachments: SOLR-10574.patch, SOLR-10574.patch, SOLR-10574.patch
>
>
> Currently, the data_driven_schema_configs is the default configset when 
> collections are created using the bin/solr script and no configset is 
> specified.
> However, that may not be the best choice. We need to decide which is the best 
> choice, out of the box, considering many users might create collections 
> without knowing about the concept of a configset going forward.
> (See also SOLR-10272)
> Proposed changes:
> # Remove data_driven_schema_configs and basic_configs
> # Introduce a combined configset, {{_default}} based on the above two 
> configsets.
> # Build a "toggleable" data driven functionality into {{_default}}
> Usage:
> # Create a collection (using _default configset)
> # Data driven / schemaless functionality is enabled by default; so just start 
> indexing your documents.
> # If don't want data driven / schemaless, disable this behaviour: {code}
> curl http://host:8983/solr/coll1/config -d '{"set-user-property": 
> {"update.autoCreateFields":"false"}}'
> {code}
> # Create schema fields using schema API, and index documents



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10574) Choose a default configset for Solr 7

Reply via email to