[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 User kevina changed the following: What|Old value |New value CC|''|'kevina' - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from aardvar...@openoffice.org Sun May 3 12:46:49 + 2009 --- I have uploaded the enhanced dictionary, dd_2009_04_en_US.dic. It works in Open Office as en_US.dic. There is now a 63,000 word difference between this and the original Open Office en_US.dic. I received an e-mail from the original dictionary maintainer saying that he will block any effort to replace HIS dictionary. It is regrettable Open Office won't allow improvements. However, I have filed issue #101500 in order to handle hyphenated words (Hunspell does not work with hyphenated words). If you ever allow people to work on the dictionary let me know. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from aardvar...@openoffice.org Mon Apr 27 18:36:18 + 2009 --- Created an attachment (id=61855) updated, enhanced en_US.dic - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from aardvar...@openoffice.org Tue Apr 14 15:22:22 + 2009 --- UPGRADE DICTIONARY. I have checked articles in the New York Times and the Wall Street Journal, and so forth, and am adding a number of new words, such as: Facebook, MySpace, Wikipedia, Geithner, cyberspy, etc. Am also adding a number of possessives to the dictionary, as there seems some confusion among writers about adjectives and nouns. Also spell checked some computer books. This will modernize the dictionary with thousands of additional words. I expect to release the update version in two weeks. It will be the same size as the original Open Office en_US.dic, though since I use real words, there will be a 50,000 word difference between my version and the dictionary packaged with Open Office. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from aardvar...@openoffice.org Sat Feb 21 17:44:01 + 2009 --- I began to read through my submitted spelling dictionary and noticed a couple of omissions. The words "antiquark" and "antilepton" both lack a plural entry. This is easily solved by adding /S to their entry in the word list. I am prepared to spend 7-10 days going through the word list, looking for such omissions, but wasn't sure what the status of the word list is. Is this something you intend to use, and if so, how much proofreading and checking is being done by others? If most of it isn't being used, then there is no rush for me to do anything. Second, I wanted to mention that the spelling checker can be enhanced by the Auto Correction feature in Open Office writer. It is very important for a published writer not to make mistakes. I read the "Wasteland" series by Stephen King, and in the third book he uses "for awhile" five times on facing pages. This is the sort of thing that makes one sit up. I can remember nothing else that was on those two pages, but years later still remember those five errors. "Awhile" is an adverb, so it cannot be the object of a preposition. Also "awhile" means "for a time" so saying "for awhile" is equivalent to saying "for for a time." The correct usage is the noun form, which is two words "a while," hence "for a while." Such mistakes are easily caught using Open Office Auto Correction, entry and replacement: "for awhile" "for a while" "after awhile" "after a while" "pointblank" "point-blank" "antisemitism" "anti-Semitism" This helps, since otherwise students may think that the omission of "pointblank" and "antisemitism" is a mistake. Note that Microsoft uses these two wrong entries, and Word Net usually goes along with Microsoft. This is typical Microsoft disregard of language, and professional writers do not endorse this. There are many other hyphenated words that can be included in the Auto Correction feature. I was often frustrated by not being able to include hyphenated words in the word list (though there are entries for "AK-47" and for "al-Qaeda"). Also I noticed that Hunspell does not catch very short accented words, such as "eclair" or "elan," which have acute accents over the E. The correctly accented word is in the dictionary, but Hunspell does not give it as a spelling suggestion. So use Auto Correction to make sure that such short accented words will be handled correctly. And have an entry for "deja vu" with all its accents. As it is, the Auto Correction feature is wasted. It functions exactly as in Microsoft Word, catching a few misspelled words, and this is better left to the spelling dictionary. Instead, the Auto Correction feature could become quite useful. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from aardvar...@openoffice.org Fri Dec 12 00:01:53 + 2008 --- You requested that I subscribe to wordlist-de...@lists.sourceforge.net: https://lists.sourceforge.net/lists/listinfo/wordlist-devel I have subscribed, and gotten a confirmation e-mail. I do not see a problem with other licenses beyond GPL 3. The dictionary represents hard years of work. My main concern was that I did not want a corporation to take my word list, encrypt it, and pass it off as their own spelling checker, sold for their profit. To that end I wanted to work in the open source community, such as with AbiWord, OpenOffice.org, and Mozilla. I used MUNCH under Puppy Linux to compile the word list, in dictionary format, and submitted it as an attachment. If you would prefer to view the word list before it was compiled, it can be sent in zip format. Then the word choices are clearer. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from [EMAIL PROTECTED] Sun Dec 7 04:58:54 + 2008 --- David, please subscribe to [EMAIL PROTECTED]: https://lists.sourceforge.net/lists/listinfo/wordlist-devel We are working on the en_US dictionary for OpenOffice.org and Mozilla. Unfortunately, Mozilla has a more strict license policy, and it needs GPL/LGPL/MPL tri-license, GPL 3 is not enough, also for OpenOffice.org pre-bundled dictionaries. You have made a lot of nice developments, that we can integrate to the wordlist distribution or the generated Firefox/OpenOffice.org dictionaries under your name and work together on a better and up-to-date American English spelling dictionary. But you can also make your own dictionary version for OpenOffice.org using the Extension support (http://extensions.services.openoffice.org/). > The regular en_US.dic >has a number of lines with numerals at the beginning (about 20 lines). Those >can >be inserted into this dictionary. I wasn't quite sure what those lines meant. It is for ordinal number checking (1st, *11st etc.) - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from [EMAIL PROTECTED] Mon Sep 8 18:58:21 + 2008 --- Here is the integrated US English dictionary for Open Office. You will find many thousands of new words beyond the existing dictionary. All words were checked against the American Heritage Dictionary or http://dictionary.reference.com. In some cases, such as words that begin with the prefix "un," these sources failed me, and I instead used http://www.merriam-webster.com for a full list of words with the "un" prefix from an unabridged dictionary. I went through the word list and added possessives manually. This dictionary is released under the Gnu GPL version 3: en_US.dic by David M. Dibble, copyright September, 2008 (Standard terms apply--This is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.) I compiled the dictionary using MUNCH under Puppy Linux. The regular en_US.dic has a number of lines with numerals at the beginning (about 20 lines). Those can be inserted into this dictionary. I wasn't quite sure what those lines meant. Some quick explanations. Most dictionaries use common conventions. In a word entry, "OR" means that words have equal weight, as in "burned or burnt" (though the first listing may have a slight edge). In such cases both words are present in this dictionary. Dictionaries use "ALSO" to indicate a second-rate or inferior alternative, so in such cases the first listing should be used in a spell checker to encourage people to use the best choice. For instance, "papoose also pappoose." Microsoft Word uses "pappoose," but that word isn't even listed in the American Heritage Dictionary, and in the Random House Unabridged Dictionary the word "pappoose" is given as an "ALSO." So "papoose" is the best choice. However, dictionaries can flat out disagree. Some list "facade" [unaccented] as the best. Some list "facade" [c cedilla] as best. In that case both words are in the spell checker. And words change as time passes. "Sea bird" has always been two words, ("seawater" is one word), but I now think that "seabird" is acceptable. Then there are problems of capitalization. My word list has "leno, leno's, slough, slough's." Jay Leno is a TV personality. Slough is a municipality in England. So maybe the word list should be "leno, Leno's, slough, Slough's." I just wasn't sure. English is used internationally, as one sees on forums. So it seems odd to list every tiny town in the United States, but ignore the major metropolitan centers in the rest of the world. So I added many names for major cities, whether in Japan, or Brazil, or Pakistan. All these names should be correctly accented. And since I use Linux, I added names like Ubuntu, Xubuntu, Mandriva, AbiWord, Gnumeric, and so forth. Many place names have accents. But people often use common names like Yucatan or Guantanamo or Galapagos without accents, and may not even be aware of the accented form. So I decided to include both the unaccented words and the accented words, though often the possessive form is only given for the accented (correct) word. I removed words that could cause problems. I previously commented on "Lindberg" and "Lichtenstein." I also took out "corespondent," as many students will drop the R when they mean "correspondent," with humorous results. Besides, "corespondent" seems an outdated word; one very rarely hears it anymore. And I took out "nob" since students are sure to spell "knob" without the K. But if there is strong opinion that "corespondent" should be in the word list I would not object to seeing it put back in. A few months ago I used the word "stelar" in a review of a Tomb Raider custom level, but that word would just confuse people who want "stellar," so "stelar" isn't in the word list, either. In other words, a lot of judgment calls had to be made. Also I took out most of the hardcore profanity and offensive racial epithets. People can still freely use these words all they want; the words just aren't in the dictionary. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 --- Additional comments from [EMAIL PROTECTED] Mon Sep 8 18:53:04 + 2008 --- Created an attachment (id=56328) revised en_US.dic - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 User nemeth changed the following: What|Old value |New value Assigned to|[EMAIL PROTECTED] |nemeth Status|STARTED |NEW Target milestone|--- |OOo 3.1 Version|OOo 2.4.1 |OOo 3.0 Beta 2 --- Additional comments from [EMAIL PROTECTED] Mon Aug 4 16:26:03 + 2008 --- Target: 3.1 - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 User nemeth changed the following: What|Old value |New value Ever confirmed| |1 Status|UNCONFIRMED |STARTED --- Additional comments from [EMAIL PROTECTED] Mon Aug 4 16:24:43 + 2008 --- David, Thanks in advance for your great contribution. I just started to make a new version for morphological analysis and generation based on the old en_US dictionary and WordNet data. There is an effort from Kevin Atkinson to make a maintained version from the OpenOffice.org en_US dic, see the result in the recent Mozilla Firefox (also here: https://bugzilla.mozilla.org/show_bug.cgi?id=397150 and http://wordlist.sourceforge.net). Unfortunately, it contains the same errors: $ grep '\(.\)\1\1' en_US.dic AAA Andeee/M Annnora/M BBB Diannne/M Harwilll/M KKK/M Lilllie/M Minnnie/M Rafaellle/M SSS Sonnnie/M WWW/M iii viii ... I'd like to examine also the corpus based methods to improve the dictionary data. I will use this issue for the discussion about the planned dictionary improvements. Best regards, László - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=92383 User mru changed the following: What|Old value |New value Assigned to|mru |[EMAIL PROTECTED] Component|Word processor|lingucomponent QA contact|[EMAIL PROTECTED] |[EMAIL PROTECTED] Subcomponent|programming |spell checking --- Additional comments from [EMAIL PROTECTED] Mon Aug 4 13:11:01 + 2008 --- Reassigned to lingucomponent. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]