Hi Erick,

The numerics are in fact "analyzed". The data is read using a Tokenizer that 
works on top of oal.analysis.NumericTokenStream from Lucene. This one produces 
the tokens from the numerical value given as native data type to the 
TokenStream. Those are indexed (in fact, it is binary data in different 
precisions according to the precision step).
Additional analysis on top of that is not easy possible, because the Tokenizer 
does all the work, there is no way to inject a TokenFilter. Theoretically, 
there would only be the possibility to add a CharFilter before the numeric 
tokenizer. But the field type does not allow to do that at the moment, because 
the "analysis" is hardcoded in the field type.


Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Erick Erickson [mailto:[email protected]]
> Sent: Thursday, March 20, 2014 6:52 PM
> To: [email protected]
> Subject: Analyzing primitive types, why can't we do this in Solr?
> 
> It's bugged me for a while that we can't define any analysis on primitive
> types. This is especially acute with date types, we require a very exact 
> format
> and have to tell people "transform it correctly on the ingestion side", or
> "create an custom update processor that transforms it".
> 
> I thought I remembered something about being able to do this, but can't find
> it. I suspect I was confusing it with DIH.
> 
> What's the reason for primitive types being unanalyzed? Just "it's always
> been that way", or "it would lead to a very sticky wicket we never wanted to
> get stuck in"? Both are perfectly valid, I'm just sayin'.
> 
> I realize this would provide some "interesting" output. Say you defined a
> regex for an int type that removed all non-numerics. If the input was
> "30asdf" and it was transformed correctly into 30 for the underlying int 
> field,
> it would still come back as 30asdf from the stored data, but that's true about
> all analysis steps.
> 
> Or perhaps you'd like to have a string of integers as input to a multiValued 
> int
> field. Or....
> 
> Musings sparked by seeing this crop up again in another context.
> 
> Erick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to