Dilwyn Jones wrote:
...
One thing I need help on is on what appears to be a simple ASCII
compression scheme on some of the unix-sourced word lists. I'm
assuming they're from Unix systems because the end of line character
is only a linefeed, no carriage returns. I need to find out if the
following is a known standard or not:
After many words, there's a forward slash followed by single letters
indicating various word endings. Example: Abbreviate/DGNSX or ABBEY/MS
In some cases it's quite obvious that /S indicates plural or current
tense is valid, e.g. /S implies Abbreviates, /D implies Abbreviated,
although there's a certain amount of grammar dependency, e.g. PLAY/S
would mean that both PLAY and PLAYS are valid, but ABNOMALITY/S is
less easy because the plural is ABNORMALITIES.
I can of course go through the word list until I find all the
permutations, but if anyone already knows the scheme used, it would
help me enormously. I hate reinventing wheels!
Found it. It is ispell (http://www.lasr.cs.ucla.edu/geoff/ispell.html).
Each letter is a code for prefixes and suffixes that are permitable in the
given language - every language can have its own affix file to define what
each letter means!
The file is a munched (ie condensed) file that is used by the spell checker.
Expanding some of your entries would give a full dictionary, eg:
abate/DGRS
abbey/MS
abbreviate/DGNSX
abnomality/S
become
abate
abated # D flag
abating # G flag
abater # R flag
abates # S flag
abbey
abbey's # M
abbeys # S
abbreviate
abbreviates # S
abbreviated # D
abbreviating # G
abbreviation # N
abbreviations # X
abnomality
abnomalities # S
Each letter is like a macro. The man page summarises them:
In the following list, an asterisk indicates that a flag
participates in cross-product formation (see ispell(4)).
...
Prefixes:
*A - re
*I - in
*U - un
Suffixes:
V - ive
*N - ion, tion, en
*X - ions, ications, ens
H - th, ieth
*Y - ly, ily
*G - ing
*J - ings
*D - ed
T - est
*R - er
*Z - ers
*S - s, es, ies
*P - ness, iness
*M - 's
The actual rules use regular expressions for the conversions, especially
with regard to the modifications required. Here's the rules from the
english.aff file:
# Here's a record of flags used, in case you want to add new ones.
# Right now, we fit within the minimal MASKBITS definition.
#
# ABCDEFGHIJKLMNOPQRSTUVWXYZ
# Used: * * **** ** * ***** ***
# A D GHIJ MN P RSTUV XYZ
# Available: -- -- -- - - -
# BC EF KL O Q W
# Now the prefix table. There are only three prefixes that are truly
# frequent in English, and none of them seem to need conditional variations.
#
prefixes
flag *A:
. > RE # As in enter > reenter
flag *I:
. > IN # As in disposed > indisposed
flag *U:
. > UN # As in natural > unnatural
# Finally, the suffixes. These are exactly the suffixes that came out
# with the original "ispell"; I haven't tried to improve them. The only
# thing I did besides translate them was to add selected cross-product
# flags.
#
suffixes
flag V:
E > -E,IVE # As in create > creative
[^E] > IVE # As in prevent > preventive
flag *N:
E > -E,ION # As in create > creation
Y > -Y,ICATION # As in multiply > multiplication
[^EY] > EN # As in fall > fallen
flag *X:
E > -E,IONS # As in create > creations
Y > -Y,ICATIONS # As in multiply > multiplications
[^EY] > ENS # As in weak > weakens
flag H:
Y > -Y,IETH # As in twenty > twentieth
[^Y] > TH # As in hundred > hundredth
flag *Y:
Y > -Y,ILY # As in messy > messily
[^Y] > LY # As in quick > quickly
flag *G:
E > -E,ING # As in file > filing
[^E] > ING # As in cross > crossing
flag *J:
E > -E,INGS # As in file > filings
[^E] > INGS # As in cross > crossings
flag *D:
E > D # As in create > created
[^AEIOU]Y > -Y,IED # As in imply > implied
[^EY] > ED # As in cross > crossed
[AEIOU]Y > ED # As in convey > conveyed
flag T:
E > ST # As in late > latest
[^AEIOU]Y > -Y,IEST # As in dirty > dirtiest
[AEIOU]Y > EST # As in gray > grayest
[^EY] > EST # As in small > smallest
flag *R:
E > R # As in skate > skater
[^AEIOU]Y > -Y,IER # As in multiply > multiplier
[AEIOU]Y > ER # As in convey > conveyer
[^EY] > ER # As in build > builder
flag *Z:
E > RS # As in skate > skaters
[^AEIOU]Y > -Y,IERS # As in multiply > multipliers
[AEIOU]Y > ERS # As in convey > conveyers
[^EY] > ERS # As in build > builders
flag *S:
[^AEIOU]Y > -Y,IES # As in imply > implies
[AEIOU]Y > S # As in convey > conveys
[CST]H > ES # As in lash > lashes
# (some TH's...)
[^CST]H > S # As in cough > coughs
[SXZ] > ES # As in fix > fixes
[^SXZHY] > S # As in bat > bats
flag *P:
[^AEIOU]Y > -Y,INESS # As in cloudy > cloudiness
[AEIOU]Y > NESS # As in gray > grayness
[^Y] > NESS # As in late > lateness
flag *M:
. > 'S # As in dog > dog's
Any help,
Robert
PS my dictionaries obviously have limited space and not room for every
derivation.
_______________________________________________
QL-Users Mailing List
http://www.q-v-d.demon.co.uk/smsqe.htm