[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2011-01-10 Thread kevina
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383


User kevina changed the following:

What|Old value |New value

  CC|''|'kevina'





-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2009-05-03 Thread aardvark12
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from aardvar...@openoffice.org Sun May  3 12:46:49 
+ 2009 ---
I have uploaded the enhanced dictionary, dd_2009_04_en_US.dic.  It works in Open
Office as en_US.dic. There is now a 63,000 word difference between this and the
original Open Office en_US.dic.

I received an e-mail from the original dictionary maintainer saying that he will
block any effort to replace HIS dictionary. 

It is regrettable Open Office won't allow improvements. However, I have filed
issue #101500 in order to handle hyphenated words (Hunspell does not work with
hyphenated words).

If you ever allow people to work on the dictionary let me know.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2009-04-27 Thread aardvark12
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from aardvar...@openoffice.org Mon Apr 27 18:36:18 
+ 2009 ---
Created an attachment (id=61855)
updated, enhanced en_US.dic


-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2009-04-14 Thread aardvark12
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from aardvar...@openoffice.org Tue Apr 14 15:22:22 
+ 2009 ---
UPGRADE  DICTIONARY.
I have checked articles in the New York Times and the Wall Street Journal, and
so forth, and am adding a number of new words, such as: Facebook, MySpace,
Wikipedia, Geithner, cyberspy, etc. Am also adding a number of possessives to
the dictionary, as there seems some confusion among writers about adjectives and
nouns. Also spell checked some computer books. This will modernize the
dictionary with thousands of additional words.

I expect to release the update version in two weeks. It will be the same size as
the original Open Office en_US.dic, though since I use real words, there will be
a 50,000 word difference between my version and the dictionary packaged with
Open Office.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2009-02-21 Thread aardvark12
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from aardvar...@openoffice.org Sat Feb 21 17:44:01 
+ 2009 ---
I began to read through my submitted spelling dictionary and noticed a couple of
omissions. The words "antiquark" and "antilepton" both lack a plural entry. This
is easily solved by adding /S to their entry in the word list. I am prepared to
spend 7-10 days going through the word list, looking for such omissions, but
wasn't sure what the status of the word list is. Is this something you intend to
use, and if so, how much proofreading and checking is being done by others? If
most of it isn't being used, then there is no rush for me to do anything.

Second, I wanted to mention that the spelling checker can be enhanced by the
Auto Correction feature in Open Office writer.

It is very important for a published writer not to make mistakes. I read the
"Wasteland" series by Stephen King, and in the third book he uses "for awhile"
five times on facing pages. This is the sort of thing that makes one sit up. I
can remember nothing else that was on those two pages, but years later still
remember those five errors. "Awhile" is an adverb, so it cannot be the object of
a preposition. Also "awhile" means "for a time" so saying "for awhile" is
equivalent to saying "for for a time." The correct usage is the noun form, which
is two words "a while," hence "for a while."

Such mistakes are easily caught using Open Office Auto Correction, entry and
replacement:

"for awhile"  "for a while"
"after awhile" "after a while"
"pointblank"  "point-blank"
"antisemitism" "anti-Semitism"

This helps, since otherwise students may think that the omission of "pointblank"
and "antisemitism" is a mistake. Note that Microsoft uses these two wrong
entries, and Word Net usually goes along with Microsoft. This is typical
Microsoft disregard of language, and professional writers do not endorse this.
There are many other hyphenated words that can be included in the Auto
Correction feature. I was often frustrated by not being able to include
hyphenated words in the word list (though there are entries for "AK-47" and for
"al-Qaeda").

Also I noticed that Hunspell does not catch very short accented words, such as
"eclair" or "elan," which have acute accents over the E. The correctly accented
word is in the dictionary, but Hunspell does not give it as a spelling
suggestion. So use Auto Correction to make sure that such short accented words
will be handled correctly. And have an entry for "deja vu" with all its accents.

As it is, the Auto Correction feature is wasted. It functions exactly as in
Microsoft Word, catching a few misspelled words, and this is better left to the
spelling dictionary. Instead, the Auto Correction feature could become quite 
useful.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2008-12-11 Thread aardvark12
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from aardvar...@openoffice.org Fri Dec 12 00:01:53 
+ 2008 ---
You requested that I subscribe to wordlist-de...@lists.sourceforge.net:
 https://lists.sourceforge.net/lists/listinfo/wordlist-devel

I have subscribed, and gotten a confirmation e-mail.

I do not see a problem with other licenses beyond GPL 3. The dictionary
represents hard years of work. My main concern was that I did not want a
corporation to take my word list, encrypt it, and pass it off as their own
spelling checker, sold for their profit. To that end I wanted to work in the
open source community, such as with AbiWord, OpenOffice.org, and Mozilla.

I used MUNCH under Puppy Linux to compile the word list, in dictionary format,
and submitted it as an attachment. If you would prefer to view the word list
before it was compiled, it can be sent in zip format. Then the word choices are
clearer.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@lingucomponent.openoffice.org
For additional commands, e-mail: issues-h...@lingucomponent.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2008-12-06 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from [EMAIL PROTECTED] Sun Dec  7 04:58:54 + 
2008 ---
David, please subscribe to [EMAIL PROTECTED]:
 https://lists.sourceforge.net/lists/listinfo/wordlist-devel

We are working on the en_US dictionary for OpenOffice.org and Mozilla.
Unfortunately, Mozilla has a more strict license policy, and it needs
GPL/LGPL/MPL tri-license, GPL 3 is not enough, also for OpenOffice.org
pre-bundled dictionaries. You have made a lot of nice developments, that we can
integrate to the wordlist distribution or the generated Firefox/OpenOffice.org
dictionaries under your name and work together on a better and up-to-date
American English spelling dictionary. But you can also make your own dictionary
version for OpenOffice.org using the Extension support
(http://extensions.services.openoffice.org/).

> The regular en_US.dic
>has a number of lines with numerals at the beginning (about 20 lines). Those 
>can
>be inserted into this dictionary. I wasn't quite sure what those lines meant.

It is for ordinal number checking (1st, *11st etc.)


-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2008-09-08 Thread aardvark12
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from [EMAIL PROTECTED] Mon Sep  8 18:58:21 + 
2008 ---
Here is the integrated US English dictionary for Open Office. You will find many
thousands of new words beyond the existing dictionary. All words were checked
against the American Heritage Dictionary or http://dictionary.reference.com. In
some cases, such as words that begin with the prefix "un," these sources failed
me, and I instead used http://www.merriam-webster.com for a full list of words
with the "un" prefix from an unabridged dictionary. I went through the word list
and added possessives manually.
   
This dictionary is released under the Gnu GPL version 3: en_US.dic by David M.
Dibble, copyright September, 2008 (Standard terms apply--This is free software:
you can redistribute it and/or modify it under the terms of the GNU General
Public License as published by the Free Software Foundation, either version 3 of
the License, or (at your option) any later version.)

I compiled the dictionary using MUNCH under Puppy Linux. The regular en_US.dic
has a number of lines with numerals at the beginning (about 20 lines). Those can
be inserted into this dictionary. I wasn't quite sure what those lines meant.

Some quick explanations. Most dictionaries use common conventions. In a word
entry, "OR" means that words have equal weight, as in "burned or burnt" (though
the first listing may have a slight edge). In such cases both words are present
in this dictionary. Dictionaries use "ALSO" to indicate a second-rate or
inferior alternative, so in such cases the first listing should be used in a
spell checker to encourage people to use the best choice. For instance, "papoose
also pappoose." Microsoft Word uses "pappoose," but that word isn't even listed
in the American Heritage Dictionary, and in the Random House Unabridged
Dictionary the word "pappoose" is given as an "ALSO." So "papoose" is the best
choice. However, dictionaries can flat out disagree. Some list "facade"
[unaccented] as the best. Some list "facade" [c cedilla] as best. In that case
both words are in the spell checker. And words change as time passes. "Sea bird"
has always been two words, ("seawater" is one word), but I now think that
"seabird" is acceptable.

Then there are problems of capitalization. My word list has "leno, leno's,
slough, slough's." Jay Leno is a TV personality. Slough is a municipality in
England. So maybe the word list should be "leno, Leno's, slough, Slough's." I
just wasn't sure.

English is used internationally, as one sees on forums. So it seems odd to list
every tiny town in the United States, but ignore the major metropolitan centers
in the rest of the world. So I added many names for major cities, whether in
Japan, or Brazil, or Pakistan. All these names should be correctly accented. And
since I use Linux, I added names like Ubuntu, Xubuntu, Mandriva, AbiWord,
Gnumeric, and so forth.

Many place names have accents. But people often use common names like Yucatan or
Guantanamo or Galapagos without accents, and may not even be aware of the
accented form. So I decided to include both the unaccented words and the
accented words, though often the possessive form is only given for the accented
(correct) word.

I removed words that could cause problems. I previously commented on "Lindberg"
and "Lichtenstein." I also took out "corespondent," as many students will drop
the R when they mean "correspondent," with humorous results. Besides,
"corespondent" seems an outdated word; one very rarely hears it anymore. And I
took out "nob" since students are sure to spell "knob" without the K. But if
there is strong opinion that "corespondent" should be in the word list I would
not object to seeing it put back in. A few months ago I used the word "stelar"
in a review of a Tomb Raider custom level, but that word would just confuse
people who want "stellar," so "stelar" isn't in the word list, either. In other
words, a lot of judgment calls had to be made. Also I took out most of the
hardcore profanity and offensive racial epithets. People can still freely use
these words all they want; the words just aren't in the dictionary. 

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2008-09-08 Thread aardvark12
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383





--- Additional comments from [EMAIL PROTECTED] Mon Sep  8 18:53:04 + 
2008 ---
Created an attachment (id=56328)
revised en_US.dic


-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2008-08-04 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383


User nemeth changed the following:

What|Old value |New value

 Assigned to|[EMAIL PROTECTED] |nemeth

  Status|STARTED   |NEW

Target milestone|---   |OOo 3.1

 Version|OOo 2.4.1 |OOo 3.0 Beta 2





--- Additional comments from [EMAIL PROTECTED] Mon Aug  4 16:26:03 + 
2008 ---
Target: 3.1

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2008-08-04 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383


User nemeth changed the following:

What|Old value |New value

  Ever confirmed|  |1

  Status|UNCONFIRMED   |STARTED





--- Additional comments from [EMAIL PROTECTED] Mon Aug  4 16:24:43 + 
2008 ---
David,

Thanks in advance for your great contribution. I just started to make a new
version for morphological analysis and generation based on the old en_US
dictionary and WordNet data. There is an effort from Kevin Atkinson to make a
maintained version from the OpenOffice.org en_US dic, see the result in the
recent Mozilla Firefox (also here:
https://bugzilla.mozilla.org/show_bug.cgi?id=397150 and
http://wordlist.sourceforge.net). Unfortunately, it contains the same errors:

$ grep '\(.\)\1\1' en_US.dic
AAA
Andeee/M
Annnora/M
BBB
Diannne/M
Harwilll/M
KKK/M
Lilllie/M
Minnnie/M
Rafaellle/M
SSS
Sonnnie/M
WWW/M
iii
viii
...

I'd like to examine also the corpus based methods to improve the dictionary
data. I will use this issue for the discussion about the planned dictionary
improvements.

Best regards,
László


-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[lingucomponent-issues] [Issue 92383] submit new en_US.dic witho ut the errors

2008-08-04 Thread mru
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=92383


User mru changed the following:

What|Old value |New value

 Assigned to|mru   |[EMAIL PROTECTED]

   Component|Word processor|lingucomponent

  QA contact|[EMAIL PROTECTED] |[EMAIL PROTECTED]

Subcomponent|programming   |spell checking





--- Additional comments from [EMAIL PROTECTED] Mon Aug  4 13:11:01 + 
2008 ---
Reassigned to lingucomponent.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]