It's the QueryParser, not the Analyzer. When the query parser sees multiple tokens from what looks like a single word, it puts them in a phrase query.
I think the only way to change that behavior would be to modify the QueryParser. -Yonik On 8/23/05, Dan Armbrust <[EMAIL PROTECTED]> wrote: > I wrote a slightly modified version of the WhiteSpaceTokenizer that > allows me to treat other characters as whitespace. My thought was that > this would be an easy way to make it tokenize on characters such as "-". > > My tokenizer looks like this: > > public class CustomWhiteSpaceTokenizer extends CharTokenizer > { > > protected boolean isTokenChar(char c) > { > if (Character.isWhitespace(c) || whiteSpaceChars_.contains(new > Character(c))) > { > return false; > } > else > { > return true; > } > } > > <snip other stuff> > } > > When I use my Analyzer which uses this tokenizer in the QueryParser with > the character "-" defined as whitespace, the following query gets parsed > like this: > > "title:(john a) body:(john a) " -> (title:john title:a) (body:john body:a) > > which is what I expect. But then the following query: > > "title:(john--a) body:(john--a) " -> title:"john a" body:"john a" > > Isn't what I want. I can't seem to figure out why it is behaving > differently on these characters (space vs hyphen) when I am specifying > them both as a non-token. > > This is with the svn trunk as of yesterday. > Any help appreciated, > > Thanks, > > Dan > > -- > **************************** > Daniel Armbrust > Biomedical Informatics > Mayo Clinic Rochester > daniel.armbrust(at)mayo.edu > http://informatics.mayo.edu/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]