Re: jaspq: dashed numerical values tokenized differently

Erik Hatcher Wed, 03 Nov 2004 04:39:04 -0800

On Nov 3, 2004, at 5:03 AM, Daniel Taurat wrote:

Query parser was changed to treat '-' within words as part of the word. Before that change a query 'dash-test' was parsed as 'dash AND NOT

test'.

Now QP reads one word 'dash-test' which is analyzed. If the analyzer
splits that to more than one token (standard analyzer does) a phrase
query is created.
The difference you see comes from standard analyzer which tokenizes
dash-test dash-123 to tokens dash, test and dash-123.
Prefix queries aren't analyzed.


So you say that dash-123 is a prefix query whereas dash-test is not?
I found also (with Luke) that dash-anystring123 is not tokenized as
well.
What exactly are the criteria for Lucene to decide what a prefix is and
what not?

Anything that ends with an asterisk is parsed as a PrefixQuery, as long as it does not have other wildcard characters. If it has other wildcard characters or the asterisk is not at the end, then it is parsed as a WildcardQuery.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: jaspq: dashed numerical values tokenized differently

Reply via email to