On 18/10/2011 06:19, Steven A Rowe wrote:On 18/10/2011 06:19, Steven A Rowe wrote:
Hi Paul,

You could add a rule to the StandardTokenizer JFlex grammar to handle this case, bypassing its other rules.
Hmm, dont really understand jflex, but that is a possibility, but would prefer to do in Java code unless easy to use jflex
Another option is to create a char filter that substitutes PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods, etc.,

Yes that is how I first did it
but only when the entire input consists exclusively of whitespace and punctuation.

but I couldnt work out how to only do it when exclusively whitespace and punctuation, any ideas to sole that _
  These symbols would then be left intact by StandardTokenizer.

Steve

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to