Untokenized lowercase string
I am new to Solr. Just wetting my feet, trying to set it up and to migrate our in-house search to it. Is it possible to define a field type that is not tokenized, but has a lowercase filtering? I'm sure I can do it in java code, but I am looking for an XML file solution. Basically Foo Bar and foo bar wants to be the same thing when stored into that field, but otherwise it wants to behave as StrField, not TextField. Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Untokenized lowercase string
Alexandre Rafalovitch wrote Each field has a type. Each type defines what happens with the text. You can certainly select to do one thing but not another. Understood. But it seemed to me that only TextField allows adding filters to it and filters go in conjunction with tokenizers. I could not find a way to add a filter without also adding a tokenizer. And there is nothing like a ready to use null tokenizer. I am using Solr-4.0.0-BETA. Alexandre Rafalovitch wrote Just look towards the bottom of the schema.xml and compare field types definition for string and text, it should be fairly obvious. You'll most probably make up a new type and use the definition from the String, but add the lower-case filter. Just make sure that it is added both for indexing and query time if there are two sections in there (don't have my config right here). I'm probably missing something. You mean schema.xml from the example? The string type is basically an empty reference to StrField. And all of text_* types use one tokenizer or another. I can't find any without a tokenizer. Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296p4010308.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Untokenized lowercase string
That sounds right, thanks! I missed KeywordTokenizerFactory, with a name like that it did not sound like what I wanted. I expected NullTokenizerFactory or something standing out like that :) Jack Krupansky-2 wrote Use the KeywordTokenizerFactory for your text field tokenizer to keep the text from being tokenized, and then use the LowerCaseFilterFactory token filter to do the lowercasing. Unfortunately, string (StrField) does not support analysis. -- Jack Krupansky -- View this message in context: http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296p4010310.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Untokenized lowercase string
Just wanted to confirm that this: fieldtype name=string_lc class=solr.TextField sortMissingLast=true omitNorms=true analyzer filter class=solr.LowerCaseFilterFactory/ tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldtype ...works beautifully for untokenized lowercase values. Starting spaces and spaces in the middle work fine. -- View this message in context: http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296p4010351.html Sent from the Solr - User mailing list archive at Nabble.com.