Varun Thacker created SOLR-6327: ----------------------------------- Summary: An UpdateProcessor to generate a best fit schema Key: SOLR-6327 URL: https://issues.apache.org/jira/browse/SOLR-6327 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Priority: Minor
We should have an UpdateProcessor which takes in documents and learns the types from it to generate a best fit schema automatically. Quoting Hoss - "You wouldn't need/want a handler for this – you'd just need an UpdateProcessorFactory to use in place of RunUpdateProcessorFactory that would look at the datatpes of the fields in each document w/o doing any indexing and pick the least common denominator. So then you'd have a chain with all of your normal update processors including the TypeMapping processors configured with the preccedence orders and locales and format strings you want – and at the end you'd have your BestFitScheamGeneratorUpdateProcessorFactory that would look at all those docs, study their values, and throw them away – until a commit comes along, at which point it does all the under the hood schema field addition calls. So do learn, you'd send docs using whatever handler/format you wnat (json, xml, extraction, etc...) with an update.chain=my.datatype.learning.processor.chain request param ... and once you've sent a bunch and giving it a lot of variety to see, then you send a commit so it creates the schema and then you re-index your docs for real w/o that special chain." That discussion took place in SOLR-6016 -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org