Thanks everyone. I appreciate the help. I think I will write my own tokenizer, because I do not have a predefined list of words with symbols. I will modify the grammar by defining a SYMBOL token as John suggested and redefine ALPHANUM to include it.
Regards, Alex Soto On Tue, Jun 24, 2008 at 12:12 PM, N. Hira <[EMAIL PROTECTED]> wrote: > This isn't ideal, but if you have a defined list of such terms, you may find > it easier to filter these terms out into a separate field for indexing. > > -h > ---------------------------------------------------------------------- > Hira, N.R. > Solutions Architect > Cognocys, Inc. > (773) 251-7453 > > On 24-Jun-2008, at 11:03 AM, John Byrne wrote: > >> I don't think there is a simpler way. I think you will have to modify the >> tokenizer. Once you go beyond basic human-readable text, you always end up >> having to do that. I have modified the JavaCC version of StandardTokenizer >> for allowing symbols to pass through, but I've never used the JFlex version >> - don't know anything about JFlex I'm afraid! >> >> A good strategy might be to make a new type of lexical token called >> "SYMBOL" and try to catch as many symbols as you can think of; then maybe >> create new token types which are ALPHANUM types that can have pre-fixed or >> post-fixed symbols. >> >> That way, you'll be able to catch things like "c++" in a TokenFilter, and >> you can choose to pass it through as a single token, or split it up into two >> tokens, or whatever you want. >> >> Hope that helps. >> >> Regards, >> JB >> >> Alex Soto wrote: >>> >>> Hello: >>> >>> I have a problem where I need to search for the term "C++". >>> If I use StandardAnalyzer, the "+" characters are removed and the >>> search is done on just the "c" character which is not what is >>> intended. >>> Yet, I need to use standard analyzer for the other benefits it provides. >>> >>> I think I need to write a specialized tokenizer (and accompanying >>> analyzer) that let the "+" characters pass. >>> I would use the JFlex provided one, modify it and add it to my project. >>> >>> My question is: >>> >>> Is there any simpler way to accomplish the same? >>> >>> >>> Best regards, >>> Alex Soto >>> [EMAIL PROTECTED] >>> >>> - >>> Amicus Plato, sed magis amica veritas. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >>> >>> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Alex Soto [EMAIL PROTECTED] - Amicus Plato, sed magis amica veritas. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]