[ 
https://issues.apache.org/jira/browse/SOLR-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044672#comment-16044672
 ] 

Erick Erickson commented on SOLR-10574:
---------------------------------------

If we have a {{_text_}} field but don't copy anything to it, what good is it? 
The user has to get in there and change the schema to search. If they have to 
intervene before using it that would defeat the purpose of making it zero-touch 
to start. And unless we do something with wildcard fields being searched by 
default or the {{_all}} query type, their first queries would all get zero hits 
since they'd inevitably just to a {{q=some terms}} which would search against 
{{_text_}}

------------

Why not make everything multi-valued by default with data_driven? What 
functionality would be lost? Then switching to single-valued becomes an 
optimization if they need to tune things.

[~janhoy] bq:  update the field to multi valued before passing on the update 
request.

I don't think that would work, at least not without a lot of work. I tried a 
quick experiment just changing multiValued from false to true then updating a 
couple of docs. When I tried grouping and faceting got 
"org.apache.solr.common.SolrException: can not use FieldCache on multivalued 
field: eoe". Don't particularly know whether it was the grouping or the 
faceting that caused it. Maybe we could fix this up but I don't think it would 
be simple.

Just used default master techproducts  schema, string type. DocValues is false. 
But DocValues MV fields are SORTED_SET so they'd have their own issues I'd 
guess.

------------------
The friction here I think, is that that the zero-touch startup requires us to 
make some decisions that we _know_ aren't valid for production systems. At 
least not at scale. Throwing everything into a {{_text_}} field won't scale. 
Searching all text fields all the time won't scale, at least not at the scale I 
often see. At that scale users _must_ hand-tune the schema. Or at least 
understand the tradeoffs. But _not_ doing one of those things requires that new 
users struggle with schema definitions before doing anything.

Maybe we can resolve this tension by using one of the not-for-production 
solutions but raising some flags to the user that they're, well, not for 
production use at scale? Take the {{_all}} query type suggestion for instance. 
If we go that route then provide a request handler called "gettingstarted" or 
"demo" or "DONOTUSETHISINPRODUCTION". Well, maybe not that latter. Then direct 
users _there_ in the getting started guides and the like, perhaps with 
notifications that once they're comfortable they need to dive into the schema 
definitions when they set up "for real".

[~varunthacker] The JIRA has already been created I think: SOLR-5917

> Choose a default configset for Solr 7
> -------------------------------------
>
>                 Key: SOLR-10574
>                 URL: https://issues.apache.org/jira/browse/SOLR-10574
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: master (7.0)
>
>         Attachments: SOLR-10574.patch, SOLR-10574.patch, SOLR-10574.patch
>
>
> Currently, the data_driven_schema_configs is the default configset when 
> collections are created using the bin/solr script and no configset is 
> specified.
> However, that may not be the best choice. We need to decide which is the best 
> choice, out of the box, considering many users might create collections 
> without knowing about the concept of a configset going forward.
> (See also SOLR-10272)
> Proposed changes:
> # Remove data_driven_schema_configs and basic_configs
> # Introduce a combined configset, {{_default}} based on the above two 
> configsets.
> # Build a "toggleable" data driven functionality into {{_default}}
> Usage:
> # Create a collection (using _default configset)
> # Data driven / schemaless functionality is enabled by default; so just start 
> indexing your documents.
> # If don't want data driven / schemaless, disable this behaviour: {code}
> curl http://host:8983/solr/coll1/config -d '{"set-user-property": 
> {"update.autoCreateFields":"false"}}'
> {code}
> # Create schema fields using schema API, and index documents



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to