Generating a modified StandardTokenizerImpl ...

James Crowley Wed, 03 Feb 2010 15:00:38 -0800

Hey guys,

Hoping you can help me with this! I'm looking to get lucene to pay attention
to keywords like C#, .NET and C++, but still need the benefits that the
StandardTokenizer brings (as opposed to the more basic WhitespaceTokenizer's
and suchlike).


>From reading the various previous discussions, I think my best bet is to
modify the tokenizer itself. However I'm not sure what the best way to do
this is going to be, given that its definition is specified in a jflex
file.. which when re-generated will generate java code that I'd then have to
port again. Have you guys had a nicer process for this when porting to .NET,
or did you just manually convert the StandardTokenizerImpl?

Am I going to be better off starting from scratch with another tool like
ANTLR? (I'm relatively inexperienced in creating my own grammars, so not
sure how easy it will be to rewrite the original jflex grammer into antlr
either?)

Many thanks in advance,

James

Generating a modified StandardTokenizerImpl ...

Reply via email to