Added BSD / MIT like licensed dictionaries from SCOWL (http://wordlist.aspell.net + https://github.com/kevina/wordlist)
Project: http://git-wip-us.apache.org/repos/asf/flex-utilities/repo Commit: http://git-wip-us.apache.org/repos/asf/flex-utilities/commit/54451db4 Tree: http://git-wip-us.apache.org/repos/asf/flex-utilities/tree/54451db4 Diff: http://git-wip-us.apache.org/repos/asf/flex-utilities/diff/54451db4 Branch: refs/heads/master Commit: 54451db476bf792a506d7e288f176123c5285efd Parents: 94ea50d Author: Justin Mclean <jmcl...@apache.org> Authored: Thu Sep 4 18:01:13 2014 +1000 Committer: Justin Mclean <jmcl...@apache.org> Committed: Thu Sep 4 18:01:13 2014 +1000 ---------------------------------------------------------------------- Squiggly/dictionaries/en_GB/README | 309 + Squiggly/dictionaries/en_GB/en_GB.aff | 201 + Squiggly/dictionaries/en_GB/en_GB.dic | 48651 +++++++++++++++++++++++++++ Squiggly/dictionaries/en_US/README | 309 + Squiggly/dictionaries/en_US/en_US.aff | 201 + Squiggly/dictionaries/en_US/en_US.dic | 48437 ++++++++++++++++++++++++++ 6 files changed, 98108 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flex-utilities/blob/54451db4/Squiggly/dictionaries/en_GB/README ---------------------------------------------------------------------- diff --git a/Squiggly/dictionaries/en_GB/README b/Squiggly/dictionaries/en_GB/README new file mode 100644 index 0000000..a3a6270 --- /dev/null +++ b/Squiggly/dictionaries/en_GB/README @@ -0,0 +1,309 @@ +en_GB-ise Hunspell Dictionary +Version 2014.08.11 +Mon Aug 11 18:23:56 2014 +0200 [be45e88] +http://wordlist.sourceforge.net + +README file for English Hunspell dictionaries derived from SCOWL. + +These dictionaries are created using the speller/make-hunspell-dict +script in SCOWL. + +The following dictionaries are available: + + en_US (American) + en_CA (Canadian) + en_GB-ise (British with "ize" spelling) + en_GB-ize (British with "ize" spelling) + + en_US-large + en_CA-large + en_GB-large (with both "ize" and "ise" spelling) + +The normal (non-large) dictionaries correspond to SCOWL size 60 and, +to encourage consistent spelling, generally only include one spelling +variant for a word. The large dictionaries correspond to SCOWL size +70 and may include multiple spelling for a word when both variants are +considered almost equal. Also, the general quality of the larger +dictionaries may also be less as they are not as carefully checked for +errors as the normal dictionaries. + +To get an idea of the difference in size, here are 25 random words +only found in the large dictionary for American English: + + Bermejo Freyr's Guenevere Hatshepsut Nottinghamshire arrestment + crassitudes crural dogwatches errorless fetial flaxseeds godroon + incretion jalapeño's kelpie kishkes neuroglias pietisms pullulation + stemwinder stenoses syce thalassic zees + +The en_US and en_CA are the official dictionaries for Hunspell. The +en_GB and large dictionaries are made available on an experimental +basis. If you find them useful please send me a quick email at +kev...@gnu.org. + +If none of these dictionaries suite you (for example, maybe you want +the larger dictionary but only use spelling of a word) additional +dictionaries can be generated at http://app.aspell.net/create or by +modifying speller/make-hunspell-dict in SCOWL. Please do let me know +if you end up publishing a customized dictionary. + +If a word is not found in the dictionary or a word is there you think +shouldn't be, you can lookup the word up at http://app.aspell.net/lookup +to help determine why that is. + +General comments on these list can be sent directly to me at +kev...@gnu.org or to the wordlist-devel mailing lists +(https://lists.sourceforge.net/lists/listinfo/wordlist-devel). If you +have specific issues with any of these dictionaries please file a bug +report at https://github.com/kevina/wordlist/issues. + +ADDITIONAL NOTES: + +The NOSUGGEST flag was added to certain taboo words. While I made an +honest attempt to flag the strongest taboo words with the NOSUGGEST +flag, I MAKE NO GUARANTEE THAT I FLAGGED EVERY POSSIBLE TABOO WORD. +The list was originally derived from Németh László, however I removed +some words which, while being considered taboo by some dictionaries, +are not really considered swear words in today's society. + +COPYRIGHT, SOURCES, and CREDITS: + +The English dictionaries come directly from SCOWL +and is thus under the same copyright of SCOWL. The affix file is +a heavily modified version of the original english.aff file which was +released as part of Geoff Kuenning's Ispell and as such is covered by +his BSD license. Part of SCOWL is also based on Ispell thus the +Ispell copyright is included with the SCOWL copyright. + +The collective work is Copyright 2000-2014 by Kevin Atkinson as well +as any of the copyrights mentioned below: + + Copyright 2000-2014 by Kevin Atkinson + + Permission to use, copy, modify, distribute and sell these word + lists, the associated scripts, the output created from the scripts, + and its documentation for any purpose is hereby granted without fee, + provided that the above copyright notice appears in all copies and + that both that copyright notice and this permission notice appear in + supporting documentation. Kevin Atkinson makes no representations + about the suitability of this array for any purpose. It is provided + "as is" without express or implied warranty. + +Alan Beale <bil...@pobox.com> also deserves special credit as he has, +in addition to providing the 12Dicts package and being a major +contributor to the ENABLE word list, given me an incredible amount of +feedback and created a number of special lists (those found in the +Supplement) in order to help improve the overall quality of SCOWL. + +The 10 level includes the 1000 most common English words (according to +the Moby (TM) Words II [MWords] package), a subset of the 1000 most +common words on the Internet (again, according to Moby Words II), and +frequently class 16 from Brian Kelk's "UK English Wordlist +with Frequency Classification". + +The MWords package was explicitly placed in the public domain: + + The Moby lexicon project is complete and has + been place into the public domain. Use, sell, + rework, excerpt and use in any way on any platform. + + Placing this material on internal or public servers is + also encouraged. The compiler is not aware of any + export restrictions so freely distribute world-wide. + + You can verify the public domain status by contacting + + Grady Ward + 3449 Martha Ct. + Arcata, CA 95521-4884 + + gr...@netcom.com + gr...@northcoast.com + +The "UK English Wordlist With Frequency Classification" is also in the +Public Domain: + + Date: Sat, 08 Jul 2000 20:27:21 +0100 + From: Brian Kelk <brian.k...@cl.cam.ac.uk> + + > I was wondering what the copyright status of your "UK English + > Wordlist With Frequency Classification" word list as it seems to + > be lacking any copyright notice. + + There were many many sources in total, but any text marked + "copyright" was avoided. Locally-written documentation was one + source. An earlier version of the list resided in a filespace called + PUBLIC on the University mainframe, because it was considered public + domain. + + Date: Tue, 11 Jul 2000 19:31:34 +0100 + + > So are you saying your word list is also in the public domain? + + That is the intention. + +The 20 level includes frequency classes 7-15 from Brian's word list. + +The 35 level includes frequency classes 2-6 and words appearing in at +least 11 of 12 dictionaries as indicated in the 12Dicts package. All +words from the 12Dicts package have had likely inflections added via +my inflection database. + +The 12Dicts package and Supplement is in the Public Domain. + +The WordNet database, which was used in the creation of the +Inflections database, is under the following copyright: + + This software and database is being provided to you, the LICENSEE, + by Princeton University under the following license. By obtaining, + using and/or copying this software and database, you agree that you + have read, understood, and will comply with these terms and + conditions.: + + Permission to use, copy, modify and distribute this software and + database and its documentation for any purpose and without fee or + royalty is hereby granted, provided that you agree to comply with + the following copyright notice and statements, including the + disclaimer, and that the same appear on ALL copies of the software, + database and documentation, including modifications that you make + for internal use or for distribution. + + WordNet 1.6 Copyright 1997 by Princeton University. All rights + reserved. + + THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON + UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR + IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON + UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- + ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE + LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY + THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. + + The name of Princeton University or Princeton may not be used in + advertising or publicity pertaining to distribution of the software + and/or database. Title to copyright in this software, database and + any associated documentation shall at all times remain with + Princeton University and LICENSEE agrees to preserve same. + +The 40 level includes words from Alan's 3esl list found in version 4.0 +of his 12dicts package. Like his other stuff the 3esl list is also in the +public domain. + +The 50 level includes Brian's frequency class 1, words appearing +in at least 5 of 12 of the dictionaries as indicated in the 12Dicts +package, and uppercase words in at least 4 of the previous 12 +dictionaries. A decent number of proper names is also included: The +top 1000 male, female, and Last names from the 1990 Census report; a +list of names sent to me by Alan Beale; and a few names that I added +myself. Finally a small list of abbreviations not commonly found in +other word lists is included. + +The name files form the Census report is a government document which I +don't think can be copyrighted. + +The file special-jargon.50 uses common.lst and word.lst from the +"Unofficial Jargon File Word Lists" which is derived from "The Jargon +File". All of which is in the Public Domain. This file also contain +a few extra UNIX terms which are found in the file "unix-terms" in the +special/ directory. + +The 55 level includes words from Alan's 2of4brif list found in version +4.0 of his 12dicts package. Like his other stuff the 2of4brif is also +in the public domain. + +The 60 level includes all words appearing in at least 2 of the 12 +dictionaries as indicated by the 12Dicts package. + +The 70 level includes Brian's frequency class 0 and the 74,550 common +dictionary words from the MWords package. The common dictionary words, +like those from the 12Dicts package, have had all likely inflections +added. The 70 level also included the 5desk list from version 4.0 of +the 12Dics package which is in the public domain. + +The 80 level includes the ENABLE word list, all the lists in the +ENABLE supplement package (except for ABLE), the "UK Advanced Cryptics +Dictionary" (UKACD), the list of signature words from the YAWL package, +and the 10,196 places list from the MWords package. + +The ENABLE package, mainted by M\Cooper <thegren...@theriver.com>, +is in the Public Domain: + + The ENABLE master word list, WORD.LST, is herewith formally released + into the Public Domain. Anyone is free to use it or distribute it in + any manner they see fit. No fee or registration is required for its + use nor are "contributions" solicited (if you feel you absolutely + must contribute something for your own peace of mind, the authors of + the ENABLE list ask that you make a donation on their behalf to your + favorite charity). This word list is our gift to the Scrabble + community, as an alternate to "official" word lists. Game designers + may feel free to incorporate the WORD.LST into their games. Please + mention the source and credit us as originators of the list. Note + that if you, as a game designer, use the WORD.LST in your product, + you may still copyright and protect your product, but you may *not* + legally copyright or in any way restrict redistribution of the + WORD.LST portion of your product. This *may* under law restrict your + rights to restrict your users' rights, but that is only fair. + +UKACD, by J Ross Beresford <r...@bryson.demon.co.uk>, is under the +following copyright: + + Copyright (c) J Ross Beresford 1993-1999. All Rights Reserved. + + The following restriction is placed on the use of this publication: + if The UK Advanced Cryptics Dictionary is used in a software package + or redistributed in any form, the copyright notice must be + prominently displayed and the text of this document must be included + verbatim. + + There are no other restrictions: I would like to see the list + distributed as widely as possible. + +The 95 level includes the 354,984 single words, 256,772 compound +words, 4,946 female names and the 3,897 male names, and 21,986 names +from the MWords package, ABLE.LST from the ENABLE Supplement, and some +additional words found in my part-of-speech database that were not +found anywhere else. + +Accent information was taken from UKACD. + +My VARCON package was used to create the American, British, and +Canadian word list. + +Since the original word lists used in the VARCON package came +from the Ispell distribution they are under the Ispell copyright: + + Copyright 1993, Geoff Kuenning, Granada Hills, CA + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + 3. All modifications to the source code must be clearly marked as + such. Binary redistributions based on modified source code + must be clearly marked as modified versions in the documentation + and/or other materials provided with the distribution. + (clause 4 removed with permission from Geoff Kuenning) + 5. The name of Geoff Kuenning may not be used to endorse or promote + products derived from this software without specific prior + written permission. + + THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS + IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEOFF + KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN + ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. + +Build Date: Mon Aug 11 18:27:26 CEST 2014 +Wordlist Command: mk-list en_GB-ise 60 | deaccent http://git-wip-us.apache.org/repos/asf/flex-utilities/blob/54451db4/Squiggly/dictionaries/en_GB/en_GB.aff ---------------------------------------------------------------------- diff --git a/Squiggly/dictionaries/en_GB/en_GB.aff b/Squiggly/dictionaries/en_GB/en_GB.aff new file mode 100644 index 0000000..2ddd985 --- /dev/null +++ b/Squiggly/dictionaries/en_GB/en_GB.aff @@ -0,0 +1,201 @@ +SET ISO8859-1 +TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ' +NOSUGGEST ! + +# ordinal numbers +COMPOUNDMIN 1 +# only in compounds: 1th, 2th, 3th +ONLYINCOMPOUND c +# compound rules: +# 1. [0-9]*1[0-9]th (10th, 11th, 12th, 56714th, etc.) +# 2. [0-9]*[02-9](1st|2nd|3rd|[4-9]th) (21st, 22nd, 123rd, 1234th, etc.) +COMPOUNDRULE 2 +COMPOUNDRULE n*1t +COMPOUNDRULE n*mp +WORDCHARS 0123456789 + +PFX A Y 1 +PFX A 0 re . + +PFX I Y 1 +PFX I 0 in . + +PFX U Y 1 +PFX U 0 un . + +PFX C Y 1 +PFX C 0 de . + +PFX E Y 1 +PFX E 0 dis . + +PFX F Y 1 +PFX F 0 con . + +PFX K Y 1 +PFX K 0 pro . + +SFX V N 2 +SFX V e ive e +SFX V 0 ive [^e] + +SFX N Y 3 +SFX N e ion e +SFX N y ication y +SFX N 0 en [^ey] + +SFX X Y 3 +SFX X e ions e +SFX X y ications y +SFX X 0 ens [^ey] + +SFX H N 2 +SFX H y ieth y +SFX H 0 th [^y] + +SFX Y Y 1 +SFX Y 0 ly . + +SFX G Y 2 +SFX G e ing e +SFX G 0 ing [^e] + +SFX J Y 2 +SFX J e ings e +SFX J 0 ings [^e] + +SFX D Y 4 +SFX D 0 d e +SFX D y ied [^aeiou]y +SFX D 0 ed [^ey] +SFX D 0 ed [aeiou]y + +SFX T N 4 +SFX T 0 st e +SFX T y iest [^aeiou]y +SFX T 0 est [aeiou]y +SFX T 0 est [^ey] + +SFX R Y 4 +SFX R 0 r e +SFX R y ier [^aeiou]y +SFX R 0 er [aeiou]y +SFX R 0 er [^ey] + +SFX Z Y 4 +SFX Z 0 rs e +SFX Z y iers [^aeiou]y +SFX Z 0 ers [aeiou]y +SFX Z 0 ers [^ey] + +SFX S Y 4 +SFX S y ies [^aeiou]y +SFX S 0 s [aeiou]y +SFX S 0 es [sxzh] +SFX S 0 s [^sxzhy] + +SFX P Y 3 +SFX P y iness [^aeiou]y +SFX P 0 ness [aeiou]y +SFX P 0 ness [^y] + +SFX M Y 1 +SFX M 0 's . + +SFX B Y 3 +SFX B 0 able [^aeiou] +SFX B 0 able ee +SFX B e able [^aeiou]e + +SFX L Y 1 +SFX L 0 ment . + +REP 88 +REP a ei +REP ei a +REP a ey +REP ey a +REP ai ie +REP ie ai +REP are air +REP are ear +REP are eir +REP air are +REP air ere +REP ere air +REP ere ear +REP ere eir +REP ear are +REP ear air +REP ear ere +REP eir are +REP eir ere +REP ch te +REP te ch +REP ch ti +REP ti ch +REP ch tu +REP tu ch +REP ch s +REP s ch +REP ch k +REP k ch +REP f ph +REP ph f +REP gh f +REP f gh +REP i igh +REP igh i +REP i uy +REP uy i +REP i ee +REP ee i +REP j di +REP di j +REP j gg +REP gg j +REP j ge +REP ge j +REP s ti +REP ti s +REP s ci +REP ci s +REP k cc +REP cc k +REP k qu +REP qu k +REP kw qu +REP o eau +REP eau o +REP o ew +REP ew o +REP oo ew +REP ew oo +REP ew ui +REP ui ew +REP oo ui +REP ui oo +REP ew u +REP u ew +REP oo u +REP u oo +REP u oe +REP oe u +REP u ieu +REP ieu u +REP ue ew +REP ew ue +REP uff ough +REP oo ieu +REP ieu oo +REP ier ear +REP ear ier +REP ear air +REP air ear +REP w qu +REP qu w +REP z ss +REP ss z +REP shun tion +REP shun sion +REP shun cion