Wrong repost, should have been:



Hi,

> > Matthew Strawbridge wrote:
>> >> I agree that determining the ends of sentences is non-trivial.
>> >> However, I think that this is a good reason to do it once in OOo
>> >> instead of each grammar checker having to figure it out manually. OOo
>> >> already maintains a list of abbreviations (ending with .), so
>> >> presumably this could be used. If the user adds custom abbreviations,
>> >> these would then automatically be picked up by the sentence splitter,
>> >> which wouldn't happen if each grammar checker implemented its own.
> >
> > Maybe this is true for English, but not for Polish. I believe that most
> > languages are not covered as of now. Moreover, in some languages,
> > segmentation cannot be simply punctuation-based (Asian languages like
> > Japanese are very hard to segment meaningfully). In the future, it
would
> > be ideal to implement the SRX standard which is an emerging
segmentation
> > standard in the translation industry. For specification, see
> > http://www.lisa.org/standards/srx/
> >
> > Now, SRX could be implemented on the grammar checker level, or on the
> > Ooo level - using SRX would help grammar checker developers get exactly
> > what they want. So if you want Ooo-level segmentation, the only option
> > is to start implementing SRX, which would include abbreviations we
> > already. Otherwise, this mechanism would be still bad for languages
with
> > quite different punctuation schemas.

So how about my idea of having a suggested end-of-sentence obtained
from the i18n breakiterator and to provide that in the API call to the
grammar checker?
Later on it could be required that the breakiterator implements the SRX
standard (or a grammar checker can do so by itself).
I think this will allow for maximum flexibility since the grammar
checker can ignore the suggestion.


Regards,
Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to