Untokenized lowercase string

2012-09-25 Thread am
I am new to Solr. Just wetting my feet, trying to set it up and to migrate
our in-house search to it.

Is it possible to define a field type that is not tokenized, but has a
lowercase filtering? I'm sure I can do it in java code, but I am looking for
an XML file solution. Basically Foo Bar and foo bar wants to be the same
thing when stored into that field, but otherwise it wants to behave as
StrField, not TextField.


Andrew.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Untokenized lowercase string

2012-09-25 Thread am
Alexandre Rafalovitch wrote
 Each field has a type. Each type defines what happens with the text.
 You can certainly select to do one thing but not another.

Understood. But it seemed to me that only TextField allows adding filters to
it and filters go in conjunction with tokenizers. I could not find a way to
add a filter without also adding a tokenizer. And there is nothing like a
ready to use null tokenizer.

I am using Solr-4.0.0-BETA.


Alexandre Rafalovitch wrote
 Just look towards the bottom of the schema.xml and compare field types
 definition for string and text, it should be fairly obvious. You'll
 most probably make up a new type and use the definition from the
 String, but add the lower-case filter. Just make sure that it is added
 both for indexing and query time if there are two sections in there
 (don't have my config right here).

I'm probably missing something. You mean schema.xml from the example? The
string type is basically an empty reference to StrField. And all of text_*
types use one tokenizer or another. I can't find any without a tokenizer.


Andrew.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296p4010308.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Untokenized lowercase string

2012-09-25 Thread am
That sounds right, thanks! I missed KeywordTokenizerFactory, with a name like
that it did not sound like what I wanted. I expected NullTokenizerFactory or
something standing out like that :)


Jack Krupansky-2 wrote
 Use the KeywordTokenizerFactory for your text field tokenizer to keep the 
 text from being tokenized, and then use the LowerCaseFilterFactory token 
 filter to do the lowercasing. Unfortunately, string (StrField) does not 
 support analysis.
 
 -- Jack Krupansky





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296p4010310.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Untokenized lowercase string

2012-09-25 Thread am
Just wanted to confirm that this:

fieldtype name=string_lc class=solr.TextField sortMissingLast=true
omitNorms=true
  analyzer
filter class=solr.LowerCaseFilterFactory/
tokenizer class=solr.KeywordTokenizerFactory/
  /analyzer
/fieldtype

...works beautifully for untokenized lowercase values. Starting spaces and
spaces in the middle work fine.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Untokenized-lowercase-string-tp4010296p4010351.html
Sent from the Solr - User mailing list archive at Nabble.com.