On 5/31/06, Jonathon Blake <[EMAIL PROTECTED]> wrote:
Thomas wrote:
> But consider this example:
> "An inyanga is a traditional healer."
> If the attributes were set correctly and the sentence will get broken up accordingly we will have:
> a) "An"
> b) "inyanga"
> c) "is a traditional healer."
> Since all of those text are incomplete as sentence the grammar checker would have to mark all of them as *grammatically* wrong and has a hard time to give suggestions.
That is why my earlier message suggested throwing the paragrpah at the
grammr checker for each language used in the pragraph, and flagging
the errors returned by the grammr checkers in different colors.
In this example, the English grmmar checker would return "inyanga" as
"unknown word". The Zulu grammar checker would kick out "inyanga" as
"correct grammar", and flag the rest as "unknown words".
People who mix two or more languages in the same phrase should expect
to have to proofread their material, because grammar checkers won't
trap polylingual errors.
Underlining with different colors can be possible for sure, but I was discussing yesterday with Thomas about multiple languages, maybe its not the better way pass a full paragraph to every grammar checker, It can be complicated to merge indexes between two or more grammar checkers when we are pointing an error in the paragraph. An example:
We pass a paragraph with 2 languages, several sentences in english and one sentence in portuguese, when we pass this paragraph for the grammar checker in english fatally it will mark the portuguese sentence as an error.
The portuguese will do the same for english sentences. It will mark everything as wrong, now managing the API to deal with this kind of thing like ignoring objects its pretty difficult, even because we will treat sentences returned by the grammar checkers like objects with a start an ending index, in the worst scenario the grammar checkers will not divide the sentences uniformly, merging for example a single word with two underlinings...
What I was thinking about is letting the API divide the sentences, guess the idiom for them (for first implementation we can avoid this) and pass the single sentence to it proper grammar checker. It will avoid the situation i mentioned above, help with grammar interactive grammar checking (since it will check sentence by sentence), will help in automatic checking too (faster than waiting the user end a paragraph).
I have another affirmation to support this idea: the algorithm for the implementation of single phrases will be by far faster. Imagine if you have 3 languages in a paragraph, you have to pass it to different grammar checkers and they work in the hole paragraph... Do you agree that a lot of work will be spent? (since we know that there is some sentences in a different language and the sentences but the API doesnt)... Now if we figure a way to let the API divide the text block into sentences, we could flag each sentence with it language and send this single sentence to the proper grammar checker, a lot of work ill be avoided in this situation what do you think? I know splitting the paragraph into sentences is not trivial but I sincerely think that this way is better than sending the full paragraph when we are dealing with more than one language.
Bruno Sant'Anna
