Re: [lingu-dev] hunspell dictionary extension by Google

R.J. Baars Tue, 17 Feb 2009 07:50:15 -0800

Laszlo, all,

The use of the 'all'-dictionary will require validation by all language
teams.  And, as mentioned before, the flexes could quite different.


We might need a way to validate these words for all languages. Or a way,
as a language, to adopt the 'all' or not.

Limiting it to only proper names will reduce complexity.
But still, how will we check validity of the words for all languages, or
will we add a 'validity for' flag to the word specifying the language(s)
it is for?


> Hi,
>
> I just recognised the new user dictionary UI of OpenOffice.org 3. I
> have an old issue about solving of the user dictionary problems
> (http://qa.openoffice.org/issues/show_bug.cgi?id=61525), so I'm
> interesting in it. Hunspell will be able to handle the exception
> format and its suggestion, also the Language All option, too. I will
> write about the tasks and their possible solutions by Hunspell.
>
> Regards,
> László
>
> 2009/2/16  <[email protected]>:
>> Thomas,
>>
>> Thanks for the very clear explanations.
>>
>>> - First: exception dictionaries usually consist of correct words that
>>> you don't like to use in your text or context for some reason.
>>>
>>> a) please consider writing a fairy tale for children to read, there are
>>> a lot of words in regular English that you don't want to appear in
>>> there. (Though for that we may better have an English-Child-Safe
>>> dictionary). But it could also be done by a larger exception
>>> dictionary.
>>>
>>> b) You (or your company) may have a list of words that you are not to
>>> use in your public documents.
>>> Or maybe of two possible and valid choices you still want to use only
>>> one. For example in German, according to the latest spelling reform, we
>>> can either write dolphin as 'Delfin' or 'Delphin' both are valid, but
>>> you don't want them both to appear in a single text. One way to solve
>>> this is to declare one of them as an exception (and to provide the
>>> other
>>> as suggestion).
>>> Those words can then be added to an exception dictionary and hence
>>> forth
>>> the spell checker should complain about them.
>>
>> I think, that could be fixed, if hunspell was able to read
>> in more than one dictionary at speller class initialization
>> time, or even better, while in work. Then arbitrary user
>> dictionaries could be enabled, that could inhibit certain words
>> from being shown as good ones or add certain words as good ones.
>> Then it would be up to the dictionary provider's phantasy,
>> what he adds.
>>
>> I do not know, how László sees this, he might have some comments
>> about this.
>>
>>> - Second: It allows the user to customize the spelling suggestions.
>>>
>>> If for example you tend to make the typo 'rigth' then you could add
>>> that
>>> word to an exception dictionary and by providing only a single
>>> suggestion ('right') one would expect the spell checker to return onyl
>>> that one (and none from it's dictionary base) or at least to put that
>>> single word at the top of the suggestion list.
>>>
>>> And of course you should be allowed to make more than one suggestion
>>> (OOo currently does not allow for that though), and again the list
>>> should replace the list returned by hunspell or hunspell should add
>>> that
>>> word list at the top of the words itself has found.
>>
>> Understood, no idea here. László knows this very well, he might want
>> to comment this also.
>>
>>> >>If we then can also have means for a 'Language All'
>>> >> dictionary then we could replace the user-dictionaries by hunspell
>>> >> compatible ones, and that would be a nice thing to do I believe.
>>> >
>>> > Please explain, what do you mean with "language all" dictionary,
>>> > best with some examples.
>>>
>>> A 'Language All' dictionary will be a list of words that are correct
>>> that way in ALL languages (usually because they won't get translated).
>>> Common examples are peoples or company names.
>>> E.g.
>>>   OpenOffice.org
>>>   ASCII
>>>   HTML
>>>   Thomas
>>>   Alva
>>>   Edison
>>> If you are writing multilingual documents or if you have a server
>>> installation with a number of multi lingual users, you can add all
>>> those
>>> words that would be spelled the same regardless of the texts language
>>> in
>>> a single dictionary instead of creating a dictionary for each of those
>>> languages.
>>> And then, for every language and word the spell checker has always to
>>> look up into those dictionaries of 'Language All' as well before
>>> deciding to declare a word as misspelled.
>>
>> Yes, that is also a nice suggestion, and could be added to
>> the first request, since an additional dictionary would solve it.
>>
>> For this, however please consider, that even German flektates
>> words, so for example Edison should be able also recognized
>> as Edisons in German.
>>
>> For Hungarian (or Turkish, Finnish, Estonian, Basque, Persian, etc...)
>> the situation  is more sharp, because Edison has roughly
>> 2500 derivates in Hungarian, therefore if Edison needs to be recognized
>> as a correct word in Hungarian, it is far more productive to add
>> that word to the Húngarian .dic list with the proper affix list.
>>
>> Also some German or Danish cities are for example different
>> from the German or Danish pronounciation in Hungarian for
>> historical reasons. Therefore a German city names list is not usable
>> in Hungarian.
>>
>> Regards: eleonora
>> --
>> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
>> http://www.gmx.net/de/go/multimessenger01
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [lingu-dev] hunspell dictionary extension by Google

Reply via email to