On 1/19/12 6:52 PM, Marvin Humphrey wrote:
It's rare that we need to optimize for performance. Most of the time we
should be optimizing for maintainability.
+1
I suspect that at some point we will want to expose sentence boundary
detection via a public API, because people who subclass Highlighter may want
to use it.
+1 here too.
I have been putting some work into sentence boundary detection in
Search::Tools, and I would love to see some thinking amongst the bright
people here about how best to do it.
It seems to me that publishing UAX #29 sentence boundary detection via an
Analyzer is a conservative API extension, since it's so closely related to the
UAX #29 word boundary detection we expose via StandardTokenizer.
So that explains what I was thinking. But of course refactoring sentence
boundary detection into a string utility function also achieves the end of
cleaning up Highlighter.c just as effectively, and might be more elegant --
who knows?
Until we actually expose this capability via a public API, either approach
should work fine.
Agreed here too.
--
Peter Karman . http://peknet.com/ . [email protected]