On 1/19/12 6:52 PM, Marvin Humphrey wrote:

It's rare that we need to optimize for performance.  Most of the time we
should be optimizing for maintainability.

+1

I suspect that at some point we will want to expose sentence boundary
detection via a public API, because people who subclass Highlighter may want
to use it.

+1 here too.

I have been putting some work into sentence boundary detection in Search::Tools, and I would love to see some thinking amongst the bright people here about how best to do it.


It seems to me that publishing UAX #29 sentence boundary detection via an
Analyzer is a conservative API extension, since it's so closely related to the
UAX #29 word boundary detection we expose via StandardTokenizer.

So that explains what I was thinking.  But of course refactoring sentence
boundary detection into a string utility function also achieves the end of
cleaning up Highlighter.c just as effectively, and might be more elegant --
who knows?

Until we actually expose this capability via a public API, either approach
should work fine.

Agreed here too.



--
Peter Karman  .  http://peknet.com/  .  [email protected]

Reply via email to