Jack / Michael: Thanks, but the query parser still seems to be tokenizing the query?
public class StringPhraseAnalyzer extends Analyzer { protected TokenStreamComponents createComponents (String fieldName, Reader reader) { Tokenizer tok = new KeywordTokenizer(reader); TokenFilter filter = new LowerCaseFilter(Version.LUCENE_41, tok); filter = new TrimFilter(filter, true); return new TokenStreamComponents(tok, filter); } } ... Analyzer analyzer = new StringPhraseAnalyzer(); // using this analyzer, add document to index with city TextField (value "NEW YORK") QueryParser qp = new QueryParser(Version.LUCENE_41, "city", analyzer); Query q = qp.parse("new york"); System.out.println ("Query: " + q); results in... Query: city:new city:york// I expected "city:new york" ...and no matches. Is a QueryParser the wrong way to generate the query for this type of analyzer? Thanks again! ________________________________ From: Jack Krupansky <j...@basetechnology.com> To: java-user@lucene.apache.org Sent: Tuesday, May 21, 2013 10:22 AM Subject: Re: Case insensitive StringField? To be clear, analysis is not supported on StringField (or any non-tokenized field). But the good news is that by using the keyword tokenizer (KeywordTokenizer) on a TextField, you can get the same effect. That will preserve the entire input as a single token. You may want to include filters to trim exterior white space and normalize interior white space. -- Jack Krupansky -----Original Message----- From: Shahak Nagiel Sent: Tuesday, May 21, 2013 10:06 AM To: java-user@lucene.apache.org Subject: Case insensitive StringField? It appears that StringField instances are treated as literals, even though my analyzer lower-cases (on both write and read sides). So, for example, I can match with a term query (e.g. "NEW YORK"), but only if the case matches. If I use a QueryParser (or MultiFieldQueryParser), it never works because these query values are lowercased and don't match. I've found that using a TextField instead works, presumably because it's tokenized and processed correctly by the write analyzer. However, I would prefer that queries match against the entire/exact phrase ("NEW YORK"), rather than among the tokens ("NEW" or "YORK"). What's the solution here? Thanks in advance. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org