[ 
https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433074#comment-17433074
 ] 

Timothy Potter commented on SOLR-14701:
---------------------------------------

Schema Designer in the UI gives us a nice interactive getting started 
experience and supports an API endpoint where users can push a bunch of 
arbitrary docs into and then tweak / refine the schema from the UI. Here's an 
example in the ref guide that illustrates this using the techproducts docs: 
https://solr.apache.org/guide/8_10/schema-designer.html#iteratively-post-sample-documents
 (it's just basic and can be enhanced to support richer document types such as 
PDF).

So I'm all for deprecating the {{add-unknown-fields-to-the-schema}} URP chain 
from the {{_default}} schema in 8.11 and removing in 9. Any "guessing" features 
should be moved into the Schema Designer backend ... that should make it clear 
that Solr has tools to help users build an initial schema but they're part of a 
design process and aren't supported for indexing into live production 
collections.

The main open issue is the URP's in the {{add-unknown-fields-to-the-schema}} 
that do transformation / smart parsing on the input data, e.g. {{parse-date}}; 
Alexandre brought this issue up previously (see above). These transformational 
URPs can be useful b/c they allow for some flexibility in the format of 
incoming fields, e.g. you send text that looks like a timestamp into Solr and 
send up with a {{pdate}} field. 

I'm actually fine with removing these in 9 too and requiring well-formed input 
data, as most modern indexing solutions require a lot more transformation / 
parsing / enrichment on data destined for Solr, so removing basic 
transformations from the URP chain is probably not a huge loss for most users. 
In other words, most indexing clients are probably already doing some other, 
possibly more complicated, transformation on the data before Solr sees it, so 
these apps don't really need Solr to try to parse dates for them, etc.

Moreover, keeping the transformational URP's in the chain can certainly be an 
option that the Schema Designer offers via a toggle: flexible date parsing? 
check ... and so on.

So the tl;dr here for Solr 9 is:
* Deprecate the {{add-unknown-fields-to-the-schema}} URP chain from the 
{{_default}} schema in 8.11, as well as the 
{{solr.AddSchemaFieldsUpdateProcessorFactory}}. Remove them in 9.0
* Keep the "transformational" URP stages like {{parse-date}} and wire into the 
Schema Designer UI to let users toggle these features on/off for their URP chain
* Continually improve the Schema Designer experience throughout Solr 9.x, such 
as adding support for PDFs and other common rich document types
* Update the ref guide to point users to the Schema Designer as a getting 
started tool; also remove all the field guessing content

> Deprecate Schemaless Mode (Discussion)
> --------------------------------------
>
>                 Key: SOLR-14701
>                 URL: https://issues.apache.org/jira/browse/SOLR-14701
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Marcus Eagan
>            Priority: Blocker
>             Fix For: main (9.0)
>
>         Attachments: image-2020-08-04-01-35-03-075.png
>
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> I know this won't be the most popular ticket out there, but I am growing more 
> and more sympathetic to the idea that we should rip many of the freedoms out 
> that cause users more harm than not. One of the freedoms I saw time and time 
> again to cause issues was schemaless mode. It doesn't work as named or 
> documented, so I think it should be deprecated. 
> If you use it in production reliably and in a way that cannot be accomplished 
> another way, I am happy to hear from more knowledgeable folks as to why 
> deprecation is a bad idea. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to