[lingu-dev] About grammar checker

Keli Hu Thu, 21 Jul 2005 07:23:54 -0700

Hi guys,

I'm now working on a grammar checker for OOo, as an accepted project
of Google Summer of Code program. Thomas Lange is my mentor in this
project, since I'm a student and don't have previous experince in open
source project, I largely rely on him for OOo-specific information,
and we had a couple of mails discussing various aspects of
lingucomponent. I'll try to make an excerpt of our mails later. Also
as a brief self introduction, I'm Keli Hu, a first year PhD student
majoring in Language Engineering. My mother tongue is Chinese, so when
I'm not being clear, please tell me and I'll try to explain better :-)


Now an important issue on our agenda is the interface. I'm not really
familiar with interface design, so what I have in mind for the grammar
checker API is roughly something like spell checker API:

Interface: XGrammarChecker

Methods: isValid() grammar()

isValid() parameters:

a sentence, locale of the sentence, other properties

isValid() returns:

true if the sentence is grammatically correct in the specified
language, false otherwise.

grammar() parameters:

a sentence, locale of the sentence, other properties

grammar() returns:

NULL if the sentence is correct. Otherwise, an XGrammarSuggestions
object with start/end position of a error in the sentence, reason for
failure, and if possible, error correction proposals.

There are other issues that that we may need to take into
consideration for interface design.  For example, how to deal with
multiple errors efficiently. Unlike spelling error in a word, a
sentence might contain multiple grammar errors. This means that either
the caller calls grammar() multiple times or we return a complex
structure that can hold all the errors/corrections. Returning a
complex structure with all the errors seems not so wise to me, because
once the user notices there is a grammar error indication, he/she
might edit the sentence immediately and it's quite possible that all
errors go away either because the user correct other errors or the
detected errors are somewhat correlated, so correcting one error also
removes the rest of them, which makes the whole point of returning all
errors/corrections almost void.

So if the caller has to call grammar() more then once, probably we
need to remember some of the states information, such as previous
errors/corrections or position of error, to make this more efficient.
The question is, should the burden of remembering states info falls on
the caller side or grammar checker side? Which solution we choose here
could makes a difference in both interface and implementation.

Also there are a few user options that I can think of that can affect
the API, such as custom dictionaries, and whether to check quoted
sentences or not (should this one just affect the caller?).

Any comments or suggestions are welcome. Thanks!


Regards,

Keli

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[lingu-dev] About grammar checker

Reply via email to