Re: [lingu-dev] hunspell dictionary extension by Google

Németh László Tue, 17 Feb 2009 03:25:22 -0800

Hi,

I just recognised the new user dictionary UI of OpenOffice.org 3. I
have an old issue about solving of the user dictionary problems
(http://qa.openoffice.org/issues/show_bug.cgi?id=61525), so I'm
interesting in it. Hunspell will be able to handle the exception
format and its suggestion, also the Language All option, too. I will
write about the tasks and their possible solutions by Hunspell.


Regards,
László

2009/2/16  <[email protected]>:
> Thomas,
>
> Thanks for the very clear explanations.
>
>> - First: exception dictionaries usually consist of correct words that
>> you don't like to use in your text or context for some reason.
>>
>> a) please consider writing a fairy tale for children to read, there are
>> a lot of words in regular English that you don't want to appear in
>> there. (Though for that we may better have an English-Child-Safe
>> dictionary). But it could also be done by a larger exception dictionary.
>>
>> b) You (or your company) may have a list of words that you are not to
>> use in your public documents.
>> Or maybe of two possible and valid choices you still want to use only
>> one. For example in German, according to the latest spelling reform, we
>> can either write dolphin as 'Delfin' or 'Delphin' both are valid, but
>> you don't want them both to appear in a single text. One way to solve
>> this is to declare one of them as an exception (and to provide the other
>> as suggestion).
>> Those words can then be added to an exception dictionary and hence forth
>> the spell checker should complain about them.
>
> I think, that could be fixed, if hunspell was able to read
> in more than one dictionary at speller class initialization
> time, or even better, while in work. Then arbitrary user
> dictionaries could be enabled, that could inhibit certain words
> from being shown as good ones or add certain words as good ones.
> Then it would be up to the dictionary provider's phantasy,
> what he adds.
>
> I do not know, how László sees this, he might have some comments
> about this.
>
>> - Second: It allows the user to customize the spelling suggestions.
>>
>> If for example you tend to make the typo 'rigth' then you could add that
>> word to an exception dictionary and by providing only a single
>> suggestion ('right') one would expect the spell checker to return onyl
>> that one (and none from it's dictionary base) or at least to put that
>> single word at the top of the suggestion list.
>>
>> And of course you should be allowed to make more than one suggestion
>> (OOo currently does not allow for that though), and again the list
>> should replace the list returned by hunspell or hunspell should add that
>> word list at the top of the words itself has found.
>
> Understood, no idea here. László knows this very well, he might want
> to comment this also.
>
>> >>If we then can also have means for a 'Language All'
>> >> dictionary then we could replace the user-dictionaries by hunspell
>> >> compatible ones, and that would be a nice thing to do I believe.
>> >
>> > Please explain, what do you mean with "language all" dictionary,
>> > best with some examples.
>>
>> A 'Language All' dictionary will be a list of words that are correct
>> that way in ALL languages (usually because they won't get translated).
>> Common examples are peoples or company names.
>> E.g.
>>   OpenOffice.org
>>   ASCII
>>   HTML
>>   Thomas
>>   Alva
>>   Edison
>> If you are writing multilingual documents or if you have a server
>> installation with a number of multi lingual users, you can add all those
>> words that would be spelled the same regardless of the texts language in
>> a single dictionary instead of creating a dictionary for each of those
>> languages.
>> And then, for every language and word the spell checker has always to
>> look up into those dictionaries of 'Language All' as well before
>> deciding to declare a word as misspelled.
>
> Yes, that is also a nice suggestion, and could be added to
> the first request, since an additional dictionary would solve it.
>
> For this, however please consider, that even German flektates
> words, so for example Edison should be able also recognized
> as Edisons in German.
>
> For Hungarian (or Turkish, Finnish, Estonian, Basque, Persian, etc...)
> the situation  is more sharp, because Edison has roughly
> 2500 derivates in Hungarian, therefore if Edison needs to be recognized
> as a correct word in Hungarian, it is far more productive to add
> that word to the Húngarian .dic list with the proper affix list.
>
> Also some German or Danish cities are for example different
> from the German or Danish pronounciation in Hungarian for
> historical reasons. Therefore a German city names list is not usable
> in Hungarian.
>
> Regards: eleonora
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
> http://www.gmx.net/de/go/multimessenger01
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [lingu-dev] hunspell dictionary extension by Google

Reply via email to