Hey all, I posted this question also to the JFlex[1] list as it seems a more appropriate place, but I thought I should raise this here as well.
I'm looking for ways to use Lucene's tokenizers, but preserve some custom tokens defined by the user. For example, use StandardTokenizer but preserve C++, C# and i-phone as whole tokens. The gotcha here is I want that list to be loaded on runtime, and not compiled into the tokenizer - mainly because it will change over time. The problem is there's no real way of doing this currently. While I had implemented this myself, JFlex doesn't seem to support this (other than defining new macros and regenerating the Java pieces, recompiling etc). I discussed this with Rob Muir a couple of months back and he seemed interested, will be happy to see if there's interest in pursuing this, or get any new ideas on how to enable this more easily on the JFlex layer or otherwise. I'll be happy to take this on but every approach I'm looking at currently has some significant flaws. Cheers, [1] http://sourceforge.net/p/jflex/mailman/jflex-users/?viewmonth=201411 -- Itamar Syn-Hershko http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Author of RavenDB in Action <http://manning.com/synhershko/>