Re: [lingu-dev] [SoC] Grammar checker API

Bruno Sant'Anna Sat, 27 May 2006 09:13:51 -0700

On 5/26/06, Thomas Lange <[EMAIL PROTECTED]> wrote:

Hello Bruno,

Well, first things first:
Congratulations for being accepted as on of the projects for the Google
Summer of Code! :-)

Hi Thomas,

Haha thanks a lot, as I told before I'll do a great effort in this project.
It will be a great experience for me.

> 1. Grammar Checker API, now:
>
> 1. It makes sense working with just one language now; so, foreign
> words in the text should be ignored.

From the API view agreed!

From the UI view I'm a bit unsure here. Since currently different
languages in one sentence being spell checked is working it looks a bit
like a regression from the users point of view if that text would just
be skipped.

For the user interface we can create something visual that shows to user a message like "Words in other languages are not being checked yet." It for sure will pass confidence for users i think.

> 2. The grammar checker should run in a different thread to not block
> OpenOffice.

You mean when grammar checking is done automatically (in the background
like automatic spell checking) only?

No, not just in background, I was planning to implement both of them, automatic checking and interactive checking, but for both of them we can create threads. My idea in creating threads is to not block OOo, when a user want interactive checking it doesn't matter but when a authomatic checking start it must run in background and the main process (OOo) must continue.

Here I want to add a thing, I'm planning to implement both modes ok (automatic and interactive). When a user request the interactive method (e.g. clicking in a button "Check Grammar") the API provides the current text, I mean everything, not just a block, I think it is secure, it can be slow but the user is prepared for waiting since he asked to check. In the automatic checking, after every change of a paragraph the API sends it to checker, I was thinking about setting a time limit too, for example, 60 seconds, what do you think?

>    3. The grammar checker should be able to check inside table cells,
>       text headers and footers, enumerations and text boxes (Drawing
>       Objects).

Sure.
The question is should it be able to do so because it knows of the
existence of such objects and is able to retrieve/modify those on it's
own?

I think in this case the rules change a bit, the secure method here is when a user is editing this, just an example: when a user stops typing for a period of 4 seconds, the API sends that little piece of text to grammar checker.

Or should the existence of such objects be completely hidden to the
grammar checker?

As I told before we have to figure how to deal with it. But I think inside table cells and enumerations should be checked for sure.

For example by means of an abstract API to iterate
through and modify the text of a document.
And pushing that question one step further:
Is the grammar checkers implementation to iterate through the text or
should there be a different object that iterates through the text and
calls the grammar checker to process it?

For me the second one, it can treat details like formatting, letting the grammar checkers act directly should be dangerous for text formatting

>    4. The grammar checker should determine end of the sentences, because
>       it is not so trivial ( e.g., abbreviations). So, OpenOffice should
>       just provide to the grammar checker an entire block of text, like
>       a paragraph.

Doing it this way would of cause be easiest from the applications view.
First it does not need to determine the end of a sentence and secondly
paragraphs are the easiest units to access.

But I somewhat doubt the ability of a grammar to identify the end of
sentence in a mixed language text. For example if an English grammar
checker encounters the upside-down question-mark following the Spanish
word at the end. Thus I'm wondering if the API should allow for a
suggested-end-of-sentence when calling the grammar checker. Thus if the
implementation encounters unknown characters it has at least a hint.

BTW: The I18N break-iterator is not that bad with abbreviations. I think
it has a list of those. But citations and similar things might pose a
huge problem to it.

Question: Can grammar checkers use I18N break-iterator?

And another question would be:
Having the grammar checker being called with sentences, does it mean
when an error is found the whole paragraph is presented to the user
(could be really large!) or does the UI only display the sentence of
where the error occurred?

The sentence for sure, for these we will have a list with start position and end position indexes. =)

Displaying less than a sentence seems somewhat bad to me because
sometimes the user will possibly like to solve an error by rearranging
the sentence.

Yes , I agree with you, I think grammar checkers should deal with it in sententeces, it a part of a sentence is wrong, this sentence is considered wrong.

And quiting the UI because only the wrong word was
displayed seems to be annoying. And allowing the original document to be
modified parallel to the dialog being display may be somewhat
troublesome to implement.

I think the secure way of implement changes is by showing dialogs, even in authomatic checking, it just show the mistakes, a user have to right click in it and a dialog appears. Have you figured another way to do it?

>    5. OpenOffice should be able to replace the wrong sentences.

;-)

>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

+1.

Of course this will not prevent someones grammar checker to come along
with it's own UI.
It only makes the implementation easier if the UI is already there and
to the user all the grammar checker will look the same. Thus avoiding a
possible source of confusion.

>    7. Automatic checking should run in background and marking the wrong
>       sentences with a wavy line. It could be enabled and disabled, like
>       Spell Checker.

+1.
Someone once mentioned the idea of at least two different kind of lines.
One for what the grammar checker knows for sure is wrong. And the other
one for "this is probably wrong" (e.g. outdated words like "thy" or
"thee" in English). This of course going along with an option that
allows the user to specify if he likes to have both types displayed or
only the I'm-100%-sure-it-is-wrong parts.
The reasoning was AFAIR that it is most annoying to the user to get
errors reported that are no errors.
I found that idea quite compelling...

It can be done but I'm not sure if every grammar checker will implement it.

>    8. The API should provide a paragraph (for example) to grammar
>       checker and this one should return a list. If there is no mistake
>       in this paragraph, the list should be empty,  else the list should
>       contain:

A list of what?

a list containning objects, for example;

object mistake
{
    int startpos; // start position of the sentence
    int endpos; // end positon of the sentence
    string guessed_sentence; // the right sentence guessed by the grammar checker
    string rule_tip; // the grammar rule comment
    boolean checked; // flag if the user want to ignore it or not.
}

Suggestions on how to correct the first encountered error?

Or did you meant a list of all errors? Or even sth else?

>          1. Where is the mistake in the paragraph (initial index + final
>             index).
>          2. A list of suggestions to correct that mistake (this list can
>             be empty if checker is not prepared to guess).
>          3. A comment about mistake, e.g. what a grammar book should say
>             about it.

Having listed point 1. here as part of the list seems to suggest that a
list of all errors was meant to be returned...
When I talked about this to people implementing grammar checkers last
year all of them said to stop at the first error. Since when that error
was corrected the whole sentence will have to be checked again.
Thus there would be no need for further errors.
Also (as sometimes happen with compilers) consider one single error to
trigger reports of several errors following it. If that one gets fixed
all the other ones will vanish as well. Thus the list may already be
obsolete when the first error got fixed.

This is true, yes it should be better.

> 2. Grammar Checker API, future:
>
>    1. Let's suppose it's possible to manage several languages in a text
>       and there is a Language Guessing API. Then, when OpenOffice
>       discover language of a sentence, it automatically loads grammar
>       checker to correspondent language.

Here it is a bit like the snake biting it's tail:
How is the language guessing to be presented with a sentence to operate
on (in order to define which grammar checker is to be used), when the
grammar checker is already required to identify the end of the sentence?

Either it is only guessing the language of the paragraph, which may
constitute of several complete-sentences-in-various-languages. Or we
still need the I18N breakiterator (or sth similar) to identify the sentence.

Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] [SoC] Grammar checker API

Reply via email to