Ishan Chattopadhyaya created SOLR-11741:
-------------------------------------------

             Summary: Offline training mode for schema guessing
                 Key: SOLR-11741
                 URL: https://issues.apache.org/jira/browse/SOLR-11741
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Ishan Chattopadhyaya


Our data driven schema guessing doesn't work under many situations. For 
example, if the first document has a field with value "0", it is guessed as 
Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
field had alphanumeric contents for a latter document, those documents are 
rejected. Also, single vs. multi valued field guessing is not ideal.

Proposing an offline training mode where Solr accepts bunch of documents and 
returns a guessed schema (without indexing). This schema can then be used for 
actual indexing. I think the original idea is from Hoss.

I think initial implementation can be based on an UpdateRequestProcessor. We 
can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to