[
https://issues.apache.org/jira/browse/SOLR-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326341#comment-16326341
]
Christine Poerschke commented on SOLR-11855:
--------------------------------------------
We encountered a puzzling scenario as follows:
* create a new collection using the _default configset,
* add a string field and make it a required field ...
* ... but otherwise (unwittingly) continue in schemaless mode.
* Add a document that has the required field,
* add another document that has a single blank space for the required field,
* add a third document that has an empty string for the required field.
The third document unexpectedly fails to be indexed with a "missing required
field" error.
If next then we
* decide that the new field is not required after all, and we
* index the third document again then this time it indexes successfully but the
'empty string' value for the field is discarded i.e. the document has no value
for that field.
Lastly, disabling field guessing as per
https://lucene.apache.org/solr/guide/7_2/schemaless-mode.html#disabling-automatic-field-guessing
solves the issue i.e. the third document is indexed successfully with an empty
string as the required field value.
----
Attached patch contains a script to reproduce and demonstrate the scenario:
{code}
cd solr
ant dist server
./run-demo.sh collectionA
./run-demo.sh collectionC
./run-demo.sh collectionB
{code}
> known field's empty value can vanish via add-unknown-fields-to-the-schema
> updateRequestProcessorChain
> -----------------------------------------------------------------------------------------------------
>
> Key: SOLR-11855
> URL: https://issues.apache.org/jira/browse/SOLR-11855
> Project: Solr
> Issue Type: Bug
> Environment: *known field's empty value can vanish via
> add-unknown-fields-to-the-schema updateRequestProcessorChain*
> This appears to be due to the "remove-blank" processor in the
> "add-unknown-fields-to-the-schema" process chain i.e. remove-blank applies to
> known as well as unknown fields (and when field guessing is disabled then
> remove-blank logic also stops).
> Technically there's nothing broken here but I'm wondering if remove-blank
> might be removed (no pun intended) from the _default configuration or if the
> {code}
> WARNING: Using _default configset. Data driven schema functionality is
> enabled by default, which is NOT RECOMMENDED for production use.
> To turn it off:
> curl http://localhost:8983/solr/collectionA/config -d '{"set-user-property":
> {"update.autoCreateFields":"false"}}'
> {code}
> wording might be revised somehow to better account for the remove-blank
> functionality?
> Reporter: Christine Poerschke
> Priority: Trivial
> Attachments: SOLR-11855.patch
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]