This isn't ideal, but if you have a defined list of such terms, you may find it easier to filter these terms out into a separate field for indexing.

-h
----------------------------------------------------------------------
Hira, N.R.
Solutions Architect
Cognocys, Inc.
(773) 251-7453

On 24-Jun-2008, at 11:03 AM, John Byrne wrote:

I don't think there is a simpler way. I think you will have to modify the tokenizer. Once you go beyond basic human-readable text, you always end up having to do that. I have modified the JavaCC version of StandardTokenizer for allowing symbols to pass through, but I've never used the JFlex version - don't know anything about JFlex I'm afraid!

A good strategy might be to make a new type of lexical token called "SYMBOL" and try to catch as many symbols as you can think of; then maybe create new token types which are ALPHANUM types that can have pre-fixed or post-fixed symbols.

That way, you'll be able to catch things like "c++" in a TokenFilter, and you can choose to pass it through as a single token, or split it up into two tokens, or whatever you want.

Hope that helps.

Regards,
JB

Alex Soto wrote:
Hello:

I have a problem where I need to search for the term "C++".
If I use StandardAnalyzer, the "+" characters are removed and the
search is done on just the "c" character which is not what is
intended.
Yet, I need to use standard analyzer for the other benefits it provides.

I think I need to write a specialized tokenizer (and accompanying
analyzer) that let the "+" characters pass.
I would use the JFlex provided one, modify it and add it to my project.

My question is:

Is there any simpler way to accomplish the same?


Best regards,
Alex Soto
[EMAIL PROTECTED]

-
Amicus Plato, sed magis amica veritas.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]










---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to