Daniel Taurat writes: > Hi, > I have just another stupid parser question: > There seems to be a special handling of the dash sign "-" different from > Lucene 1.2 at least in Lucene 1.4.RC3 > StandardAnalyzer. > > Examples (1.4RC3): > > A document containing the string "dash-test" is matched by the following > search expressions: > dash > test > dash* > dash-test > It is _not_ matched by the following search expressions: > dash-* > dash-t* > > If the string after the dash consists of digits, the behavior is > different. > E.g., a document containing the string "dash-123" is matched by: > dash* > dash-* > dash-123 > It is not matched by: > dash > 123 > > Question: > Is this, esp. the different behavior when parsing digits and characters, > intentional and how can it be explained? > Regards, > Query parser was changed to treat '-' within words as part of the word. Before that change a query 'dash-test' was parsed as 'dash AND NOT test'. Now QP reads one word 'dash-test' which is analyzed. If the analyzer splits that to more than one token (standard analyzer does) a phrase query is created. The difference you see comes from standard analyzer which tokenizes dash-test dash-123 to tokens dash, test and dash-123. Prefix queries aren't analyzed.
Morus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]