On 18/10/2011 06:19, Steven A Rowe wrote:On 18/10/2011 06:19, Steven A
Rowe wrote:
Hi Paul,
You could add a rule to the StandardTokenizer JFlex grammar to handle
this case, bypassing its other rules.
Hmm, dont really understand jflex, but that is a possibility, but would
prefer to do in Java code unless easy to use jflex
Another option is to create a char filter that substitutes
PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods, etc.,
Yes that is how I first did it
but only when the entire input consists exclusively of whitespace and
punctuation.
but I couldnt work out how to only do it when exclusively whitespace and
punctuation, any ideas to sole that _
These symbols would then be left intact by StandardTokenizer.
Steve
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org