Re: spell checker issue
Hi, 2012/10/25 Caolán McNamara : > On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote: >> Hi, >> >> Adding a simple new item to the en_US.dic, like >> >> men's >> >> will extend the dictionary. The biggest plus in the American English >> dictionary of LibreOffice is the morphological data (also based on >> Kevin's data and maybe WordNet) for stemming and morphological >> generation in thesaurus suggestions, see the attached conversion >> script in https://issues.apache.org/ooo/show_bug.cgi?id=19563. > > So basically one attractive route to go would be to build our dictionary > at LibreOffice build time ourselves from wordnet + > custom-libreoffice-words patch + that script. Which would give us > something we can easily sync whenever wordnet gets updated without > losing the extra morphological data. Or is there any gotchas with doing > that ? Only a small part of Wordnet – the list of the irregular forms – used by the script. But the thesaurus of LibreOffice is based on the full Wordnet, so it would be fine to add the thesaurus generation to the building process. We would be able to add some attractive thesaurus improvements, too, like Unicode symbols as synonyms: eg. alpha -> α, skull -> ☠, as in the Hungarian thesaurus. Gotchas: there were some manual fixes (documented in the README_en_US.txt) to handle Unicode apostrophes and ligatures. Adding a small list with the most urgent words would be easier for me. I also tried to find an old OpenOffice.org issue about the quality analysis/extension of the (American) English dictionary, but I have found only the en-GB-oed dictionary for international organizations, see https://issues.apache.org/ooo/show_bug.cgi?id=51093, http://ftp.nluug.nl/office/openoffice/contrib/dictionaries/README_en_GB-oed.txt. Best regards, László > > C. > ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: spell checker issue
On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote: > Hi, > > Adding a simple new item to the en_US.dic, like > > men's > > will extend the dictionary. The biggest plus in the American English > dictionary of LibreOffice is the morphological data (also based on > Kevin's data and maybe WordNet) for stemming and morphological > generation in thesaurus suggestions, see the attached conversion > script in https://issues.apache.org/ooo/show_bug.cgi?id=19563. So basically one attractive route to go would be to build our dictionary at LibreOffice build time ourselves from wordnet + custom-libreoffice-words patch + that script. Which would give us something we can easily sync whenever wordnet gets updated without losing the extra morphological data. Or is there any gotchas with doing that ? C. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: spell checker issue
Hi, Adding a simple new item to the en_US.dic, like men's will extend the dictionary. The biggest plus in the American English dictionary of LibreOffice is the morphological data (also based on Kevin's data and maybe WordNet) for stemming and morphological generation in thesaurus suggestions, see the attached conversion script in https://issues.apache.org/ooo/show_bug.cgi?id=19563. By the way, Firefox or Google Chrome (http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/en_US.dic_delta?revision=138928&view=markup) have got some new words, too, as patches. Regards, László 2012/10/11 Caolán McNamara : > On Sun, 2012-09-30 at 12:47 -0700, Steven Howe wrote: >> Who deals with spell checker dictionary issues? >> >> I'm using the work " men's "; the spell checker thinks this is wrong, >> although spell checker for gmail does not. I've visited webster's >> dictionary online. "men's" appears to be the correct spelling. > > English - US, right ? Best in general to submit a bug about these > things. But it does bring up the general case as to what's the > "canonical" upstream for the English dictionaries. > > e.g. for Fedora I consider Kevin's wordlist at > http://wordlist.sourceforge.net/ as the upstream of the en-US dictionary > and in that light I've submitted > https://sourceforge.net/tracker/?func=detail&aid=3576342&group_id=10079&atid=1014602 > which would allow men's, women's and other possessive of irregular > plural nouns. > > I'm not entirely sure of the provenance of the en-US dictionaries we > have in LibreOffice. I mean, IIRC they are derived ultimately from > Kevin's list, but I don't know if they are resynced occasionally or if > Nemeth is maintaining them in some source format somewhere else. Or if > they have accidentally forked themselves over time. > > They definitely appear to be at least affix compressed or something into > something sufficiently unreadable I can't trivially see the right way to > add men's, women's to the copies we have in our tree :-) > > C. > ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: spell checker issue
On Sun, 2012-09-30 at 12:47 -0700, Steven Howe wrote: > Who deals with spell checker dictionary issues? > > I'm using the work " men's "; the spell checker thinks this is wrong, > although spell checker for gmail does not. I've visited webster's > dictionary online. "men's" appears to be the correct spelling. English - US, right ? Best in general to submit a bug about these things. But it does bring up the general case as to what's the "canonical" upstream for the English dictionaries. e.g. for Fedora I consider Kevin's wordlist at http://wordlist.sourceforge.net/ as the upstream of the en-US dictionary and in that light I've submitted https://sourceforge.net/tracker/?func=detail&aid=3576342&group_id=10079&atid=1014602 which would allow men's, women's and other possessive of irregular plural nouns. I'm not entirely sure of the provenance of the en-US dictionaries we have in LibreOffice. I mean, IIRC they are derived ultimately from Kevin's list, but I don't know if they are resynced occasionally or if Nemeth is maintaining them in some source format somewhere else. Or if they have accidentally forked themselves over time. They definitely appear to be at least affix compressed or something into something sufficiently unreadable I can't trivially see the right way to add men's, women's to the copies we have in our tree :-) C. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
spell checker issue
Who deals with spell checker dictionary issues? I'm using the work " men's "; the spell checker thinks this is wrong, although spell checker for gmail does not. I've visited webster's dictionary online. "men's" appears to be the correct spelling. ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice