Re: missing words, lots of them

2001-11-08 Thread Joseph Jacobson

 These words, 830 of them, were obtained by intersecting the words
 in a number of lexicons and then subtracting the words in
 /usr/share/dict/web2. This all done with words that contain only
 lowercase letters.
 You'll find those words at the end of this message. Should you
 take even a cursory look at this list, I expect you'll be appalled
 at the words that are not in the lexicon.
 The point is *not* that these words should be added. The point is
 that a cursory, in-my-sleep check of the word list shows glaring
 deficiencies. A serious audit of the list will find way many more
 missing words (I did a preliminary -- think ~50,000-100,000
 missing words if it is supposed to approximate the contents of an
 unabridged dictionary.)
 Anyway, I'm willing to create a replacement list, if it's likely
 to actually get used.

Wondering if anything became of this  It would be nice to have a
relatively complete word list. contains a good
summary of publically available word lists.   IMHO, the ENABLE list
mentioned there (
seems like a good candidate for a drop-in replacement


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

missing words, lots of them

2001-09-25 Thread admin

These words, 830 of them, were obtained by intersecting the words
in a number of lexicons and then subtracting the words in
/usr/share/dict/web2. This all done with words that contain only
lowercase letters.

You'll find those words at the end of this message. Should you
take even a cursory look at this list, I expect you'll be appalled
at the words that are not in the lexicon.

The point is *not* that these words should be added. The point is
that a cursory, in-my-sleep check of the word list shows glaring
deficiencies. A serious audit of the list will find way many more
missing words (I did a preliminary -- think ~50,000-100,000
missing words if it is supposed to approximate the contents of an
unabridged dictionary.)

Anyway, I'm willing to create a replacement list, if it's likely
to actually get used.

Missing words:

abbreviated aborning absentminded academia acknowledgment actuate addictive
adios aerospace affairs affianced aficionado aforementioned agonized agonizing
agribusiness airborne airflow airline airs airspace airspeed alleged allowed
aloes aloha alphanumeric ammoniac analog anechoic anomie antebellum
antecedents antidisestablishmentarianism antimatter antimicrobial antipasto
antiperspirant anytime appetizing apples appointed arbitrage archives arrears
artwork ashtray assembled asserted attested audiovisual authors
autocorrelation automate automation auxiliaries avionics awed

backpack backstairs bags baklava balls bananas bandwagon bandwidth bans
baptistry barbell barracks barrens baseline bawdy beatnik becalmed became
bedraggled beep began begot belongings biased bidden bifocals bijection
bimetallic binoculars biofeedback biomass biomedicine blabbermouth blacklist
bleachers blew blinders blinkers blond bloodstream boatyard bobsledding
boldface boogie boson botulinus bouncy bounds boutique box boyfriend
braggadocio brainchild brainstorm bratwurst breathtaking briefcase brindle
brindled brinkmanship britches brouhaha buckskins bugs bullshit bullyboy
bumbling bunkmate burdened burger busboy butterflies buttocks byte

cacciatore cannonball capita cards careerism cartwheel cassette castanets
catalog caulk caveman challenging chambers changeover charged checklist
checkpoint children chipboard chops chorale chromaticness ciao circuitry
clannish classics classify classless clericals clipboard clippers clomp
clueless cm coastline combo computerize coney confer cons contend
contrabassoon contrived conveyor cookie cooperate cooperation cooperative
coordinate coordination copywriter cordless cords corduroys corned
corticosteroid councillor counterproductive courses coven coverall cowgirl
cowpoke credits creeps critter crocked crud cuffs curia curtsey

damaging damnedest danged dated dateline deadweight debug decencies decoder
decor decrypt dedicated deli demo demodulate demur demythologize denims
deprived desegregate despatch destined destruct diminished directions
disadvantaged disembodied dishonesty distended disulfide disused divertimento
dividers dogleg dominations dominions donnybrook doubleheader doubles downs
downwards drily droppings drops dryer duce ducks duds dues dumbfound dumps

eastwards eatables ecstatics ecumenicist edibles eggbeater einsteinium elan
electromyography elevenses emaciated emancipated endgame enervated epoxy
equities ergo erstwhile esprit esthete estranged evenings expecting expertise
extramarital eyeglasses

fallout falsity famed famished fantasize farmland fatigued fatso feathers feet
feints femme fermion fermium ferroelectric fete fewer fiberglass fief
fieldstone filigreed finicky fireworks fisticuffs fjord flamethrower flashback
flashbulb flats floats floorboard fluorocarbon foiled fond footloose forensics
forgave forgiven forklift forsook foxed frag frazzled fresher freshwater
frostbitten frustrated fumed futures

gadgetry geese gigahertz gills gimbals girlfriend gizmo glim glob globetrotter
goddamn goddamned goggles goodbye gooey grapheme greatest greenbelt groceries
grooved groundhog gunfight gunslinger gutsy

hadron haiku hairstyle hammered handcrafted handlebar hands handyman hang
hangover harassed hardboard hardened hards hardtop hardworking harken harrumph
has haunted haunting heads headwaters heated heaves heist held hellfire
helluva hereabouts heres heroics hibachi hid hideout hightail hijacker hipster
histrionics hoagy holography homecoming honky hooray hooves hopping horrified
horseshoes hubris hype

icosahedron idiolect idyll immersed implied impoverished improvised incised
inclined including incorporating industrials inherited innards instructions
integrated interdisciplinary interfaith interferon intransigence intro ironic
irons irritated isometrics itemized ivories

jackboot jackpot jacks jaundiced jaws jeepers jetliner jiggered jigging jigsaw
jockstrap jumbled junkie

karat kcal keystroke kg kid kidding killjoy kilohertz kitsch klutz km knives
knockwurst krill

Re: missing words, lots of them

2001-09-25 Thread Christoph Sold


These words, 830 of them, were obtained by intersecting the words
in a number of lexicons and then subtracting the words in
/usr/share/dict/web2. [snip]

The point is *not* that these words should be added. The point is
that a cursory, in-my-sleep check of the word list shows glaring
deficiencies. A serious audit of the list will find way many more
missing words (I did a preliminary -- think ~50,000-100,000
missing words if it is supposed to approximate the contents of an
unabridged dictionary.)

This is to be expected. The word list was created from an very old 
Webster dictionary, because the copyright had to expire before it could 
be used in an open source dictionary.

Anyway, I'm willing to create a replacement list, if it's likely
to actually get used.

That would be a welcome contribution to the project.

Missing words: [Lots of them deleted]

Just my EUR.02
-Christoph Sold

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message

Re: missing words, lots of them

2001-09-25 Thread admin

Christoph Sold [EMAIL PROTECTED] wrote:
 This is to be expected. The word list was created from an very old
 Webster dictionary, because the copyright had to expire before it could
 be used in an open source dictionary.

I know. :) However, there are several lexicons that have
acceptable copyrights. E.g., the Moby list, though I have some
reservations about it, is public domain. So there's no good reason
to live with an archaic list.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message