[jira] [Commented] (SOLR-14701) Deprecate Schemaless Mode (Discussion)

Jira Mon, 03 Aug 2020 05:53:03 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170013#comment-17170013
 ]


Jan Høydahl commented on SOLR-14701:
------------------------------------

{quote}When we guess wrong, you can't index some documents
{quote}
Sure. But when we guess right, you can :)
{quote}The mechanism for updating the schema is fragile
{quote}
Doesn't matter, as this is NOT a production feature, so don't expect there to 
be any load and any large number of servers/shards involved
{quote}It's another instance of complex code that we have to maintain.
{quote}
True. We always have to weigh benefit vs complexity. Since this is mostly 
contained to *one* URP I'm not overly worried. Would be interesting to hear 
what user community says about it. Perhaps they love it, or perhaps they hate 
it. Probably a good chunk of both.
{quote}We don't really deliver "schemaless". What we deliver is something that 
doesn't (and can't) work correctly.
{quote}
For some usecases with well formatted typed data it can work really well. Other 
times not so much. What I tend to do is do a first run, identify the 
problematic 1-3 fields that get mixed up, then wrote {{add-field}} schema api 
commands for those in my script that is run before ingestion, and lett the 
system guess the rest. If you have used Elastic, this is exactly what you need 
to to there as well.
{quote}and the users _still_ have to go in and tweak the schema
{quote}
Of course they do. ALL search apps need to tweak the schema. For Solr. For 
Elastic. For MySql. And we must tell them clearly. This feature is only an aid 
very early on in exploring your data, to avoid having to hand edit 142 
{{<field>}} tags in a schema before you can even look at you data.
{quote}Version control is another hidden gotcha.
{quote}
It's not hidden, is it? We recommend AGAINST this feature in production, i.e. 
turn it off once you reach stable schema and stick your schema in version 
control, in your Application and use schema api or whatever. Perhaps that can 
be documented even better.
{quote}Hmmm, though if we wanted to help them make a real schema, we could 
write something that processed an existing index
{quote}
Or we could just make a page in Admin UI schema tab - schema wizard, where they 
could paste N documents similar to what they can do in the "Documents" tab, and 
we detect most likely schema from those documents and spit out a JSON that can 
be used in Schema-API to bootstrap that schema. ?

> Deprecate Schemaless Mode (Discussion)
> --------------------------------------
>
>                 Key: SOLR-14701
>                 URL: https://issues.apache.org/jira/browse/SOLR-14701
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>            Reporter: Marcus Eagan
>            Priority: Major
>
> I know this won't be the most popular ticket out there, but I am growing more 
> and more sympathetic to the idea that we should rip many of the freedoms out 
> that cause users more harm than not. One of the freedoms I saw time and time 
> again to cause issues was schemaless mode. It doesn't work as named or 
> documented, so I think it should be deprecated. 
> If you use it in production reliably and in a way that cannot be accomplished 
> another way, I am happy to hear from more knowledgeable folks as to why 
> deprecation is a bad idea. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14701) Deprecate Schemaless Mode (Discussion)

Reply via email to