[ 
https://issues.apache.org/jira/browse/SOLR-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-10574:
----------------------------------------
    Attachment: SOLR-10574.patch

Apologies and a bit of an update on my radio silence. I had offline discussions 
with [~noblepaul], [~hossman], [~shalinmangar].

There were various approaches that I was considering:
# The initParams based enabling/disabling mechanism for data driven nature. 
Discarded this, considering Noble's concerns that initParams with 
globbing/wildcards support is a risky tool for user to shoot himself on the 
foot (if he gets the wildcards wrong), and hence it is a possibility that we 
may want to remove initParams support going forward.
# Trying to create the chain programmatically was not easy, since the 
AddSchemaFieldsUpdateProcessorFactory needs field type names as defined in the 
managed-schema/schema.xml. Hence, if the chain is created programmatically, the 
user would not be able to switch them to point fields instead of trie fields or 
vice versa for example.
# Letting the user enable/disable the data driven nature by adding 
"update.chain=add-unknown-fields-to-the-schema" to every paramset in 
ImplicitPlugins.json and then letting the user use the config API to update the 
"update.chain" parameter's value for enabling/disabling. This approach exposed 
too much of the internals like "update chain" and the name of the chain etc. in 
the command to enable/disable data driven nature and hence potentially 
confusing.

A very important consideration in setting up this enable/disable data driven 
feature was that if we are going to use the "add-unknown-fields-to-schema" 
update chain exactly as it is defined in data-driven-schema-configs as of 
today, then it would be impossible for the user to modify the update chain (or 
parts of the chain) using the config API, as the config API cannot edit URPs 
that are within an update chain, and also it doesn't support creating/editing 
update chains.

So, the solution (as in the patch) was to break out the individual URPs in the 
add-unknown-fields-to-the-schema chain into top level named URPs (hence they 
would be editable using config APIs) and creating a chain using those named 
URPs that is functionally similar. There is a nice, not well documented, 
default=true|false attribute for update chains that has been (and should have 
been all along) used to enable/disable the data driven nature (based on a 
variable).

So, *TLDR*; check out the new {{_default}} configset in the patch. It has data 
driven nature enabled by default. The data driven nature can be 
enabled/disabled using the following:

{code}
Disable schemaless/data driven nature:
curl http://localhost:8983/solr/mycollection/config -d '{"set-user-property": 
{"update.autoCreateFields":"false"}}'
Enable schemaless/data driven nature:
curl http://localhost:8983/solr/mycollection/config -d '{"set-user-property": 
{"update.autoCreateFields":"true"}}'
{code}

Would appreciate a review.

Note: the patch contains only the new default configset. However, we also need 
to remove the existing data_driven_schema_configs and basic_configs and update 
the script. Also, I haven't consolidated the managed-schema differences between 
basic_configs and data_driven_schema_configs into this {{_default}} configset 
yet.

> Choose a default configset for Solr 7
> -------------------------------------
>
>                 Key: SOLR-10574
>                 URL: https://issues.apache.org/jira/browse/SOLR-10574
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Ishan Chattopadhyaya
>            Priority: Blocker
>             Fix For: master (7.0)
>
>         Attachments: SOLR-10574.patch
>
>
> Currently, the data_driven_schema_configs is the default configset when 
> collections are created using the bin/solr script and no configset is 
> specified.
> However, that may not be the best choice. We need to decide which is the best 
> choice, out of the box, considering many users might create collections 
> without knowing about the concept of a configset going forward.
> (See also SOLR-10272)
> Proposed changes:
> # Lets deprecate what we know as data_driven_schema_configs
> # Build a "toggleable" data driven functionality into the basic_configs 
> configset (and make it the default)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to