Dilwyn Jones wrote:

...
One thing I need help on is on what appears to be a simple ASCII
compression scheme on some of the unix-sourced word lists. I'm
assuming they're from Unix systems because the end of line character
is only a linefeed, no carriage returns. I need to find out if the
following is a known standard or not:

After many words, there's a forward slash followed by single letters
indicating various word endings. Example: Abbreviate/DGNSX or ABBEY/MS

In some cases it's quite obvious that /S indicates plural or current
tense is valid, e.g. /S implies Abbreviates, /D implies Abbreviated,
although there's a certain amount of grammar dependency, e.g. PLAY/S
would mean that both PLAY and PLAYS are valid, but ABNOMALITY/S is
less easy because the plural is ABNORMALITIES.

I can of course go through the word list until I find all the
permutations, but if anyone already knows the scheme used, it would
help me enormously. I hate reinventing wheels!

Found it. It is ispell (http://www.lasr.cs.ucla.edu/geoff/ispell.html). Each letter is a code for prefixes and suffixes that are permitable in the given language - every language can have its own affix file to define what each letter means!

The file is a munched (ie condensed) file that is used by the spell checker. Expanding some of your entries would give a full dictionary, eg:

   abate/DGRS
   abbey/MS
   abbreviate/DGNSX
   abnomality/S

become

   abate
   abated              # D flag
   abating             # G flag
   abater              # R flag
   abates              # S flag
   abbey
   abbey's             # M
   abbeys              # S
   abbreviate
   abbreviates         # S
   abbreviated         # D
   abbreviating        # G
   abbreviation        # N
   abbreviations       # X
   abnomality
   abnomalities        # S

Each letter is like a macro.  The man page summarises them:

       In the following list, an asterisk indicates that  a  flag
       participates in cross-product formation (see ispell(4)).
...
       Prefixes:
              *A - re
              *I - in
              *U - un

       Suffixes:
              V - ive
              *N - ion, tion, en
              *X - ions, ications, ens
              H - th, ieth
              *Y - ly, ily
              *G - ing
              *J - ings
              *D - ed
              T - est
              *R - er
              *Z - ers
              *S - s, es, ies
              *P - ness, iness
              *M - 's

The actual rules use regular expressions for the conversions, especially with regard to the modifications required. Here's the rules from the english.aff file:

# Here's a record of flags used, in case you want to add new ones.
# Right now, we fit within the minimal MASKBITS definition.
#
#            ABCDEFGHIJKLMNOPQRSTUVWXYZ
# Used:      *  *  ****  ** * ***** ***
#            A  D  GHIJ  MN P RSTUV XYZ
# Available:  -- --    --  - -     -
#             BC EF    KL  O Q     W

# Now the prefix table.  There are only three prefixes that are truly
# frequent in English, and none of them seem to need conditional variations.
#
prefixes

flag *A:
    .           >       RE              # As in enter > reenter

flag *I:
    .           >       IN              # As in disposed > indisposed

flag *U:
    .           >       UN              # As in natural > unnatural

# Finally, the suffixes.  These are exactly the suffixes that came out
# with the original "ispell";  I haven't tried to improve them.  The only
# thing I did besides translate them was to add selected cross-product
# flags.
#
suffixes

flag V:
    E           >       -E,IVE          # As in create > creative
    [^E]        >       IVE             # As in prevent > preventive

flag *N:
    E           >       -E,ION          # As in create > creation
    Y           >       -Y,ICATION      # As in multiply > multiplication
    [^EY]       >       EN              # As in fall > fallen

flag *X:
    E           >       -E,IONS         # As in create > creations
    Y           >       -Y,ICATIONS     # As in multiply > multiplications
    [^EY]       >       ENS             # As in weak > weakens

flag H:
    Y           >       -Y,IETH         # As in twenty > twentieth
    [^Y]        >       TH              # As in hundred > hundredth

flag *Y:
    Y           >       -Y,ILY          # As in messy > messily
    [^Y]        >       LY              # As in quick > quickly

flag *G:
    E           >       -E,ING          # As in file > filing
    [^E]        >       ING             # As in cross > crossing

flag *J:
    E           >       -E,INGS         # As in file > filings
    [^E]        >       INGS            # As in cross > crossings

flag *D:
    E           >       D               # As in create > created
    [^AEIOU]Y   >       -Y,IED          # As in imply > implied
    [^EY]       >       ED              # As in cross > crossed
    [AEIOU]Y    >       ED              # As in convey > conveyed

flag T:
    E           >       ST              # As in late > latest
    [^AEIOU]Y   >       -Y,IEST         # As in dirty > dirtiest
    [AEIOU]Y    >       EST             # As in gray > grayest
    [^EY]       >       EST             # As in small > smallest
flag *R:
    E           >       R               # As in skate > skater
    [^AEIOU]Y   >       -Y,IER          # As in multiply > multiplier
    [AEIOU]Y    >       ER              # As in convey > conveyer
    [^EY]       >       ER              # As in build > builder

flag *Z:
    E           >       RS              # As in skate > skaters
    [^AEIOU]Y   >       -Y,IERS         # As in multiply > multipliers
    [AEIOU]Y    >       ERS             # As in convey > conveyers
    [^EY]       >       ERS             # As in build > builders

flag *S:
    [^AEIOU]Y   >       -Y,IES          # As in imply > implies
    [AEIOU]Y    >       S               # As in convey > conveys
    [CST]H      >       ES              # As in lash > lashes
                                        #      (some TH's...)
    [^CST]H     >       S               # As in cough > coughs
    [SXZ]       >       ES              # As in fix > fixes
    [^SXZHY]    >       S               # As in bat > bats

flag *P:
    [^AEIOU]Y   >       -Y,INESS        # As in cloudy > cloudiness
    [AEIOU]Y    >       NESS            # As in gray > grayness
    [^Y]        >       NESS            # As in late > lateness

flag *M:
    .           >       'S              # As in dog > dog's

Any help,

Robert

PS my dictionaries obviously have limited space and not room for every derivation.

_______________________________________________
QL-Users Mailing List
http://www.q-v-d.demon.co.uk/smsqe.htm

Reply via email to