Hey guys, Hoping you can help me with this! I'm looking to get lucene to pay attention to keywords like C#, .NET and C++, but still need the benefits that the StandardTokenizer brings (as opposed to the more basic WhitespaceTokenizer's and suchlike).
>From reading the various previous discussions, I think my best bet is to modify the tokenizer itself. However I'm not sure what the best way to do this is going to be, given that its definition is specified in a jflex file.. which when re-generated will generate java code that I'd then have to port again. Have you guys had a nicer process for this when porting to .NET, or did you just manually convert the StandardTokenizerImpl? Am I going to be better off starting from scratch with another tool like ANTLR? (I'm relatively inexperienced in creating my own grammars, so not sure how easy it will be to rewrite the original jflex grammer into antlr either?) Many thanks in advance, James