[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Kumar Singh updated SOLR-11741: ---------------------------------------- Attachment: RuleForMostAccomodatingField.png > Offline training mode for schema guessing > ----------------------------------------- > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ishan Chattopadhyaya > Attachments: RuleForMostAccomodatingField.png, > RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org