Re: jaspq: dashed numerical values tokenized differently

sergiu gordea Mon, 01 Nov 2004 22:56:53 -0800

Daniel Taurat wrote:

Hi, I have just another stupid parser question: There seems to be a special handling of the dash sign "-" different from Lucene 1.2 at least in Lucene 1.4.RC3 StandardAnalyzer.

From the behaviour you describe I think that the dash sign is removed from the text by the analyzer. This is quite correct because dash is used to separate two words. Without its elimination you won't be able to get the "dash-test" in results if you search for: dash or/and test

I suggest you to use LUKE ... see contributors page in order to see what exactly you have in the index, then you will understand why search is working like that.

Sergiu

Examples (1.4RC3):

A document containing the string "dash-test" is matched by the following
search expressions:
dash
test
dash*
dash-test
It is _not_ matched by the following search expressions:
dash-*
dash-t*

If the string after the dash consists of digits, the behavior is
different.
E.g., a document containing the string "dash-123" is matched by:
dash*
dash-*
dash-123
It is not matched by:
dash
123

Question:
Is this, esp. the different behavior when parsing digits and characters,
intentional and how can it be explained?
Regards,

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: jaspq: dashed numerical values tokenized differently

Reply via email to