[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399533#comment-17399533 ]
David Smiley commented on SOLR-11741: ------------------------------------- Cassandra: [I sort of asked this|https://issues.apache.org/jira/browse/SOLR-15277?focusedCommentId=17314616&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17314616] and Tim thought maybe. WDYT [~tpot] ? I just played with the Schema Designer on main to get myself a bit more familiar. I uploaded a CSV file. It did seem to guess it but I didn't look thoroughly. Tim's comments on that issue said it didn't guess so maybe I'm wrong? If Schema Designer does guess based on the data then I think there is much less value in the issue being discussed here, but it still has some. If it were to be committed, I could imagine the Schema Designer being adapted to use it. But I wouldn't want two competing systems to maintain. Whatever happens; it's a shame to see some promising work by a contributor get forgotten after a few years. [~abhidemon] feel free to explicitly ask if you needed more attention/feedback. > Offline training mode for schema guessing > ----------------------------------------- > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Reporter: Ishan Chattopadhyaya > Assignee: Ishan Chattopadhyaya > Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org