It's the QueryParser, not the Analyzer.
When the query parser sees multiple tokens from what looks like a
single word, it puts them in a phrase query.
I think the only way to change that behavior would be to modify the
QueryParser.
-Yonik
On 8/23/05, Dan Armbrust <[EMAIL PROTECTED]> wrote:
> I wrote a slightly modified version of the WhiteSpaceTokenizer that
> allows me to treat other characters as whitespace. My thought was that
> this would be an easy way to make it tokenize on characters such as "-".
>
> My tokenizer looks like this:
>
> public class CustomWhiteSpaceTokenizer extends CharTokenizer
> {
>
> protected boolean isTokenChar(char c)
> {
> if (Character.isWhitespace(c) || whiteSpaceChars_.contains(new
> Character(c)))
> {
> return false;
> }
> else
> {
> return true;
> }
> }
>
> <snip other stuff>
> }
>
> When I use my Analyzer which uses this tokenizer in the QueryParser with
> the character "-" defined as whitespace, the following query gets parsed
> like this:
>
> "title:(john a) body:(john a) " -> (title:john title:a) (body:john body:a)
>
> which is what I expect. But then the following query:
>
> "title:(john--a) body:(john--a) " -> title:"john a" body:"john a"
>
> Isn't what I want. I can't seem to figure out why it is behaving
> differently on these characters (space vs hyphen) when I am specifying
> them both as a non-token.
>
> This is with the svn trunk as of yesterday.
> Any help appreciated,
>
> Thanks,
>
> Dan
>
> --
> ****************************
> Daniel Armbrust
> Biomedical Informatics
> Mayo Clinic Rochester
> daniel.armbrust(at)mayo.edu
> http://informatics.mayo.edu/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]