Re: spell checker issue

2012-10-26 Thread Németh László
Hi,

2012/10/25 Caolán McNamara :
> On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote:
>> Hi,
>>
>> Adding a simple new item to the en_US.dic, like
>>
>> men's
>>
>> will extend the dictionary. The biggest plus in the American English
>> dictionary of LibreOffice is the morphological data (also based on
>> Kevin's data and maybe WordNet) for stemming and morphological
>> generation in thesaurus suggestions, see the attached conversion
>> script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.
>
> So basically one attractive route to go would be to build our dictionary
> at LibreOffice build time ourselves from wordnet +
> custom-libreoffice-words patch + that script. Which would give us
> something we can easily sync whenever wordnet gets updated without
> losing the extra morphological data. Or is there any gotchas with doing
> that ?

Only a small part of Wordnet – the list of the irregular forms – used
by the script. But the thesaurus of LibreOffice is based on the full
Wordnet, so it would be fine to add the thesaurus generation to the
building process. We would be able to add some attractive thesaurus
improvements, too, like Unicode symbols as synonyms: eg. alpha -> α,
skull -> ☠, as in the Hungarian thesaurus.

Gotchas: there were some manual fixes (documented in the
README_en_US.txt) to handle Unicode apostrophes and ligatures.
Adding a small list with the most urgent words would be easier for me.

I also tried to find an old OpenOffice.org issue about the quality
analysis/extension of the (American) English dictionary, but I have
found only the
en-GB-oed dictionary for international organizations, see
https://issues.apache.org/ooo/show_bug.cgi?id=51093,
http://ftp.nluug.nl/office/openoffice/contrib/dictionaries/README_en_GB-oed.txt.

Best regards,
László


>
> C.
>
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: spell checker issue

2012-10-25 Thread Caolán McNamara
On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote:
> Hi,
> 
> Adding a simple new item to the en_US.dic, like
> 
> men's
> 
> will extend the dictionary. The biggest plus in the American English
> dictionary of LibreOffice is the morphological data (also based on
> Kevin's data and maybe WordNet) for stemming and morphological
> generation in thesaurus suggestions, see the attached conversion
> script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.

So basically one attractive route to go would be to build our dictionary
at LibreOffice build time ourselves from wordnet +
custom-libreoffice-words patch + that script. Which would give us
something we can easily sync whenever wordnet gets updated without
losing the extra morphological data. Or is there any gotchas with doing
that ?

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: spell checker issue

2012-10-15 Thread Németh László
Hi,

Adding a simple new item to the en_US.dic, like

men's

will extend the dictionary. The biggest plus in the American English
dictionary of LibreOffice is the morphological data (also based on
Kevin's data and maybe WordNet) for stemming and morphological
generation in thesaurus suggestions, see the attached conversion
script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.

By the way, Firefox or Google Chrome
(http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/en_US.dic_delta?revision=138928&view=markup)
have got some new words, too, as patches.

Regards,
László

2012/10/11 Caolán McNamara :
> On Sun, 2012-09-30 at 12:47 -0700, Steven Howe wrote:
>> Who deals with spell checker dictionary issues?
>>
>> I'm using the work " men's "; the spell checker thinks this is wrong,
>> although spell checker for gmail does not. I've visited webster's
>> dictionary online. "men's" appears to be the correct spelling.
>
> English - US, right ? Best in general to submit a bug about these
> things. But it does bring up the general case as to what's the
> "canonical" upstream for the English dictionaries.
>
> e.g. for Fedora I consider Kevin's wordlist at
> http://wordlist.sourceforge.net/ as the upstream of the en-US dictionary
> and in that light I've submitted
> https://sourceforge.net/tracker/?func=detail&aid=3576342&group_id=10079&atid=1014602
> which would allow men's, women's and other possessive of irregular
> plural nouns.
>
> I'm not entirely sure of the provenance of the en-US dictionaries we
> have in LibreOffice. I mean, IIRC they are derived ultimately from
> Kevin's list, but I don't know if they are resynced occasionally or if
> Nemeth is maintaining them in some source format somewhere else. Or if
> they have accidentally forked themselves over time.
>
> They definitely appear to be at least affix compressed or something into
> something sufficiently unreadable I can't trivially see the right way to
> add men's, women's to the copies we have in our tree :-)
>
> C.
>
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: spell checker issue

2012-10-11 Thread Caolán McNamara
On Sun, 2012-09-30 at 12:47 -0700, Steven Howe wrote:
> Who deals with spell checker dictionary issues?
> 
> I'm using the work " men's "; the spell checker thinks this is wrong,
> although spell checker for gmail does not. I've visited webster's
> dictionary online. "men's" appears to be the correct spelling.

English - US, right ? Best in general to submit a bug about these
things. But it does bring up the general case as to what's the
"canonical" upstream for the English dictionaries.

e.g. for Fedora I consider Kevin's wordlist at
http://wordlist.sourceforge.net/ as the upstream of the en-US dictionary
and in that light I've submitted
https://sourceforge.net/tracker/?func=detail&aid=3576342&group_id=10079&atid=1014602
which would allow men's, women's and other possessive of irregular
plural nouns.

I'm not entirely sure of the provenance of the en-US dictionaries we
have in LibreOffice. I mean, IIRC they are derived ultimately from
Kevin's list, but I don't know if they are resynced occasionally or if
Nemeth is maintaining them in some source format somewhere else. Or if
they have accidentally forked themselves over time.

They definitely appear to be at least affix compressed or something into
something sufficiently unreadable I can't trivially see the right way to
add men's, women's to the copies we have in our tree :-)

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


spell checker issue

2012-09-30 Thread Steven Howe
Who deals with spell checker dictionary issues?

I'm using the work " men's "; the spell checker thinks this is wrong,
although spell checker for gmail does not. I've visited webster's
dictionary online. "men's" appears to be the correct spelling.
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice