Re: [lingu-dev] About grammar checker

Thomas Lange Wed, 27 Jul 2005 02:56:38 -0700


Hi all,

One problem is that you often don't know where a sentence starts and ends.Just looking for "." is not enough (e.g. because of abbreviations that endwith a dot -- and these are different in all languages). So the APIshouldn't know about sentences and needs to leave the sentence boundarydetection to the grammar checker.


I wonder if there should be an option that states if the grammar checker
should do end of sentence detection on it's own.
When viewed from the other side it seems to be a good idea to have a
way that allows the grammar checker to state if it likes to do that
on it's own (and thus of course is capable of it) or not.
In the latter case probably the application has to use the breakiteraor
to retrieve the sentences and pass them on to the grammar checker.

The problem I have in mind when the grammar checker is required to dothe end of sentence checking is that the sentence may not contain words

from one language only and thus will have trouble to find it.
And having mixed language sentence is a quite regular thing to have.
AFAIK I know the breakiterator will probably never be as good as a
specialised grammar checker when required to do this (just because
it is reuired to quick and thus is rather simple) but it should always
use the context of the word and it's language to do so.

Just for example, if you consider a dictionary or schoolbook that
teaches Chinese the complete sentence may be considered to be chinese
but some words will be Chinese. If now the last word in the sentence is
Chinese does the English grammar checker has a chance to detect the
end of sentece? Even so the trouble may be as simple as the
letters being used are not from any western character set.

Because of that I'm not sure if one can rely on the grammar checker
only for this.

So if the caller has to call grammar() more then once, probably we
need to remember some of the states information, such as previous
errors/corrections or position of error, to make this more efficient.



I think having this kind of state in the API would be bad design.


Of course it should not be reflected in the API or probably at most
something like the position to start at in the sentence should reflect it.

But if you don't have a complex API that returns all errors at once
you may have one that states up to which position the sentence
is handled (probably where the first error is) and if that is in the
20th word of a 25 word sentence it would be nice if it would not be
necessary to parse the sentence from the start after that word got
fixed. Maybe it is possible to to keep some of the data that was build
while parsing the sentence.
At least this might be possible if the start for the next call to the
grammar checker is the position where it stopped in the previous call.

Of course this is all about implemention only and thus specific to the
grammar checker developed.

But in order to at least allow for such kind of grammar checkers to be
implemented there must be means for it in the API. If the implementor
find it to troubleome or not functional for his implementation he may
just ignore that value and always start from the beginning.
But if we the API does not allow for such thing we may already be
limiting the implementations to come just by defining the API.
That's what is on my mind here.

Be aware that my points are purely theoretical since I do not have any
knowledge of grammar checkers tough. I just like to point out some
issues that come to my mind when thinking about it.
So if you tell me my points are nonsense I can't object. ;-)

Also there are a few user options that I can think of that can affect
the API, such as custom dictionaries, and whether to check quoted
sentences or not (should this one just affect the caller?).
Each grammar checker will have its own set of settings and it will berather complex. So hard-coding settings in OOo doesn't seem to make senseanyway.


That's true.
But the problem with each grammar checker having it's own set of options
is that none can be configured by the usere. At least not via UI since
one does not know which ones the specific implementation will have.
And I see no way how the implementation can present a dialog to the user
that allows him to set all those options as he likes.

So what would be nice is to have a set of options that is probablycommon to all implementations (but that's probably too optimistic).

So the next best I can think of is to at least get a set of options
that is useful for the most likely used languages and have that
available. Options that does not apply can than be ignored.
But at least we would have a set of options and can write an UI for
them to allow the user to modiy those.


Regards,
Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] About grammar checker

Reply via email to