[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315462#comment-16315462 ]
Ishan Chattopadhyaya commented on SOLR-11741: --------------------------------------------- bq. This will be the mapping of field -> supported types. At every point in time, every field will be mapped to only *one* possible (most granular) field type, isn't it? > Offline training mode for schema guessing > ----------------------------------------- > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Ishan Chattopadhyaya > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org