Hi, 2012/10/25 Caolán McNamara <caol...@redhat.com>: > On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote: >> Hi, >> >> Adding a simple new item to the en_US.dic, like >> >> men's >> >> will extend the dictionary. The biggest plus in the American English >> dictionary of LibreOffice is the morphological data (also based on >> Kevin's data and maybe WordNet) for stemming and morphological >> generation in thesaurus suggestions, see the attached conversion >> script in https://issues.apache.org/ooo/show_bug.cgi?id=19563. > > So basically one attractive route to go would be to build our dictionary > at LibreOffice build time ourselves from wordnet + > custom-libreoffice-words patch + that script. Which would give us > something we can easily sync whenever wordnet gets updated without > losing the extra morphological data. Or is there any gotchas with doing > that ?
Only a small part of Wordnet – the list of the irregular forms – used by the script. But the thesaurus of LibreOffice is based on the full Wordnet, so it would be fine to add the thesaurus generation to the building process. We would be able to add some attractive thesaurus improvements, too, like Unicode symbols as synonyms: eg. alpha -> α, skull -> ☠, as in the Hungarian thesaurus. Gotchas: there were some manual fixes (documented in the README_en_US.txt) to handle Unicode apostrophes and ligatures. Adding a small list with the most urgent words would be easier for me. I also tried to find an old OpenOffice.org issue about the quality analysis/extension of the (American) English dictionary, but I have found only the en-GB-oed dictionary for international organizations, see https://issues.apache.org/ooo/show_bug.cgi?id=51093, http://ftp.nluug.nl/office/openoffice/contrib/dictionaries/README_en_GB-oed.txt. Best regards, László > > C. > _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice