[lingu-dev] About proofreader and spell checker interaction

Thomas Lange - Sun Germany - ham02 - Hamburg Tue, 21 Apr 2009 06:40:18 -0700

Hi all,

Here I'm going to post some e-mail conversation about spell checkers and
proofreaders with spell checking support in order for anyone who is
interested to participate and comment.

The first one is my reply to Marcins initial mail:

Hi Marcin,

Marcin Mi?kowski wrote:

> > Hi Thomas,
> >
> > I tried my own suggestion - replace TextMarkup.PROOFING with SPELLCHECK in 
> > the error returned with LanguageTool. However, this seems to have no effect 
> > - the curly underline is still blue and not red. Moreover, even if I return 
> > true for isSpellChecker(), I still get blue. Is this still unimplemented? 
> > (I'm using 3.0.1 right now).
> >
> > Actually, I wanted to return red underlines for some of the errors that LT 
> > catches - there are quite sure cases of context-dependent serious spelling 
> > mistakes and some of the language maintainers would like to mark them up in 
> > red. Is this possible at all?
> >   
>   
Not yet.
Because of the below lines in the gciterator.cxx. For the time being you
just report them as spelling errors (as that is correct) and accept them
being treated as results from the grammar checker with no explicit hint
that it is about spelling only.

> >     // the proofreader may return SPELLING but right now our core
> >     // does only handle PROOFREADING if the result is from the
> > proofreader...
> >     // (later on we may wish to color spelling errors found by the
> > proofreader
> >     // differently for example. But no special handling right now.
> >     if (rDesc.nType == text::TextMarkupType::SPELLCHECK)
> >         rDesc.nType = text::TextMarkupType::PROOFREADING;
>   
Currently, where spell checking is still a separate process and there is
no coordination between it and proofreading it is explicitly disabled.
The reason for this is that it may be bad to have to different and
independent components spell check the same text. There is no mechanism
to prevent/solve inconsistencies.

In the longer run we like to move spell checking to the gciterator as
well. Then it should be possible to nicely solve the related problems in
some way.

The main question arises form the idea that a proofreader might have a
better understanding of the text, and thus if it is also spell checker
should it be more trusted? That is should we even go so far to not use
other spell checkers if the proofreader for that language is also a
spell checker?

Currently spell checkers are chained (that is up for discussion as well
though, since without chaining the route to take seems to be rather
obvious). That means if any of several spell checker for a given
language says this text is correct than no error will be reported. That
would allow for spell checker A to check normal English text, and for
spell checker B to know only about English medical words. Those two
spell checkers can easily be chained and you will get a result that is
better than using just a single one. Without chaining you would need a
spell checker that has to take care of both tasks in one sweep.

But having only a spell checker will usually result in incorrectly
capitalized words within a sentence to go by unnoticed. E.g. in
    This text is not Correct.
This happens because the spell checker does not have the information
that 'Correct' is not at the start of a sentence. A spell checker that
is also a proof reader however can easily notice that 'Correct' should
not be capitalized. But a t least in this case if chaining were still to
be allowed that will still result in no error since the other spell
checker says the word is fine.

Thus the problems at hand and to be discussed are:

a) should we give up on chained spell checkers even though there are
good uses for them? The simple fact that vanilla OOo has only one spell
checker does not mean there aren't other spell checkers around that
already make use of that chaining... Or that someone would like to make
use of it in the future.

b) The easy case is having no spell-checker-only for that language but a
grammar checker that does also spell checking. Nothing much to think
about here.
But even if we give up on chaining but still have a grammar checker that
is also a spell checker AND a second only spell checker, we still have
to decide if we want to make use of the second one. If we want to make
use of that one as well, how to merge the results? Should it simply be
that the grammar checkers spell checker is only allowed to mark errors
where the second one hat found none? That is to introduce additional
ones? (See above mentioned case problem.) Or should it be allowed to
overrule errors found by the second one as not-to-be-reported as well?
Or do we need even more complex handling for this problem?

On short notice however a) can be treated as a special case of b) as
well. ^_-
Thus we probably do not need to change the current behavior of that.

Thomas

[lingu-dev] About proofreader and spell checker interaction

Reply via email to