[jira] [Commented] (SOLR-6939) UpdateProcessor to buffer & sample documents and then batch create neccessary fields

Alexandre Rafalovitch (JIRA) Fri, 09 Jan 2015 07:29:57 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271152#comment-14271152
 ]


Alexandre Rafalovitch commented on SOLR-6939:
---------------------------------------------

So the interesting question is how the URP will know the upgrade path of types. 
That Int should upgrade to Float, etc.

May need a *Type Tree* of some sort with Strings on top. 
{quote}
In the beginning, L*ne created String type. And it was good!
But then the numbers had to be stored and they did not sort of facet well.
And then two numbers looked at each other and realised that they were 
different. 
One of them was straight and precise and another was imprecise and always 
floating.
And they saw each other, different as they were, next to each other in the bad 
sort and got embarrassed.
And L*ne got annoyed and cast them out of the uniform String type and created 
individual types, and packers.
And L*ne made some of the types special and more unique by letting them be 
stored as DocValues, but kept others individual and stored one-by-one on disk.
And then the flood came and cast some of the older types out to the legacy hell.
{quote}


> UpdateProcessor to buffer & sample documents and then batch create neccessary 
> fields
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-6939
>                 URL: https://issues.apache.org/jira/browse/SOLR-6939
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> spun off of an idea in SOLR-6016...
> {quote}
> bq. We could add a SchemaGeneratorHandler which would generate the "best" 
> schema.
> You wouldn't need/want a handler for this – you'd just need an 
> UpdateProcessorFactory to use in place of RunUpdateProcessorFactory that 
> would look at the datatypes of the fields in each document w/o doing any 
> indexing and pick the least common denominator.
> So then you'd have a chain with all of your normal update processors 
> including the TypeMapping processors configured with the preccedence orders 
> and locales and format strings you want – and at the end you'd have your 
> BestFitScheamGeneratorUpdateProcessorFactory that would look at all those 
> docs, study their values, and throw them away – until a commit comes along, 
> at which point it does all the under the hood schema field addition calls.
> So to learn, you'd send docs using whatever handler/format you wnat (json, 
> xml, extraction, etc...) with an 
> update.chain=my.datatype.learning.processor.chain request param ... and once 
> you've sent a bunch and giving it a lot of variety to see, then you send a 
> commit so it creates the schema and then you re-index your docs for real w/o 
> that special chain.
> {quote}
> ...not mentioned originally: this factory could also default to assuming 
> fields should be single valued, unless/until it sees multiple values in a doc 
> that it samples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6939) UpdateProcessor to buffer & sample documents and then batch create neccessary fields

Reply via email to