This isn't ideal, but if you have a defined list of such terms, you
may find it easier to filter these terms out into a separate field
for indexing.
-h
----------------------------------------------------------------------
Hira, N.R.
Solutions Architect
Cognocys, Inc.
(773) 251-7453
On 24-Jun-2008, at 11:03 AM, John Byrne wrote:
I don't think there is a simpler way. I think you will have to
modify the tokenizer. Once you go beyond basic human-readable text,
you always end up having to do that. I have modified the JavaCC
version of StandardTokenizer for allowing symbols to pass through,
but I've never used the JFlex version - don't know anything about
JFlex I'm afraid!
A good strategy might be to make a new type of lexical token called
"SYMBOL" and try to catch as many symbols as you can think of; then
maybe create new token types which are ALPHANUM types that can have
pre-fixed or post-fixed symbols.
That way, you'll be able to catch things like "c++" in a
TokenFilter, and you can choose to pass it through as a single
token, or split it up into two tokens, or whatever you want.
Hope that helps.
Regards,
JB
Alex Soto wrote:
Hello:
I have a problem where I need to search for the term "C++".
If I use StandardAnalyzer, the "+" characters are removed and the
search is done on just the "c" character which is not what is
intended.
Yet, I need to use standard analyzer for the other benefits it
provides.
I think I need to write a specialized tokenizer (and accompanying
analyzer) that let the "+" characters pass.
I would use the JFlex provided one, modify it and add it to my
project.
My question is:
Is there any simpler way to accomplish the same?
Best regards,
Alex Soto
[EMAIL PROTECTED]
-
Amicus Plato, sed magis amica veritas.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]