Dmitry Goldenberg wrote:

Hi,
I'm trying to figure out a way to locate tokens which include special characters. The actual text in the file being indexed is something like "function() { statement1; statement2; }" The query I'm using is "function\()" since I want to locate precisely "function()" - the query succeeds but what it finds is actually "function", not "function()". If I run the same query against "function { statement1, statement2 } it still succeeds and I get "function" in best fragments. How can I enforce () to be included?

I think you're going to have to write your own Analyzer subclass that keeps special characters in the terms. Then, use that Analyzer during indexing. The included Analyzers drop parentheses and the like.

If you're using Lucene's QueryParser, then use your new Analyzer there, too, and escape things like parentheses in the query text you submit to parse().

I think there's a discussion of custom Analyzers in the Lucene book, but I don't know where. Maybe somebody else on this list knows???

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to