I've no real idea, but I love a challenge...
It's obviously a code of some sort, but I don't think it's quite as
you suspect:
...
abandon/DGS
abandonment
abase/DGS
...
abbey/MS
abbot/MS
Abbott
abbreviate/DGNSX
...
aberrate/NX
...
ability/MS
abject/PY
abjection/S
abjure/DGS
ablate/DGNSV
ablaze
able/RT
ablute/N
...
It's the 'R', 'M', 'X' etc that lead me to suspect that. I would
suspect that each letter means something, just not a direct 'S'
pluralises with 'S'.
Some of the listed words don't exist in my dictionaries (eg
aberrate, ablute; though aberration and ablution both exist) -
Concise Oxford Dictionary, and my American MW (Marion-Webster?)
Dictionary.
From where did you get this list, and what's the file called? And
do you have any idea of the original purpose of the list - ie what
sort of app might have processed it?
They may give us something more to search if there is nobody who
knows (which I suspect).
This particular one came from a CD-ROM called Bibliotech, a PD
collection of all sorts of texts from classic books to those uniquely
internet-style UFO conpsiracies texts and all sorets of text files I
haven't even gone near! The other, more useful, lists came from some
internet sites I discovered via a simple Google search for
dictionaries and wordlists.
This file was described as 'ASCII word list' and that's all the info I
had!
It might be more orientated towards human users, since the letters
don't really imply a FIXED rule. It's also possible that combinations
of letters might imply a different way of handling a word (exceptions
to a rule).
aberrate is probably an example of how these sorts of word lists
work - since there is a noun aberration there must be a verb to
aberrate! Dictionaries don't usually list them but English varies so
much worldwidew that you may find people use the verb aberrated as
shorthand for suffered an aberration or whatever. Not strictly
correct or recognised officially but nonetheless used in some places.
Languages evolve to survive...I sometimes struggle to understand some
of the English words my son uses (took me ages to pick up 'bling' and
'ming') and a few months later I find that the dictionaries start to
include them.
Looking at the dozen or so English word lists (and being careful which
is American and which British!) I find most of them have words like
the example you gave. It looks like about 40K to 60K words seems to be
the average and preferred size for a spellchecker, but there are some
more specialised word lists around. Look at Geoff Wicks's Advanced
Cryptics Dictionary (well, it's sold by Geoff) with 239,000 words, or
the half million word list from Rich Mellor and Paul Merdinian. Those
sizes of lists are a bit silly to be used as typing checkers since
they often contain names or contrived words.
QTYP has a dictionary editor so you can remove any words you object to
and save the revised list. You can also merge new words into the list.
As long as you have a base word list of reasonable length to work
from, with a bit of time, patience and determination you can develop
it.
I might give this particular list a miss unless someone fancies a
challenge! It has a base count of about 25,000 entries, but obviously
if you add plurals, verb tenses etc it expands and I just fancied
finding out what it would expand to if I could write a little filter
to do so. I may just do so with the fairly obvious notes which imply
'add s' or 'add ing' or add 'able' or whatever. With a bit of help
from a grammar book I might be able to write a little filter which
makes a good guess at when to add a double letter, drop one of a
double letter, drop a vowel from the end before adding a plural etc
etc as some of these grammar rules are straightforward. Or to someone
whose first language is English anyway.
If anyone fancies a challenge, I could send them a copy of the word
list to go with this little first attempt at a filter! (Filter is
terms of going through data, not a QDOS filter per se).
100 CLS: CLS#0:INPUT #0,'Input file > ';ip$
110 INPUT #0,'Output file > ';op$
120 OPEN_IN #3,ip$
130 OPEN_NEW #4,op$
140 no = 0 : REMark number of words
150 CLS : CLS #0
160 REPeat loop
170 IF INKEY$ = CHR$(27) THEN EXIT loop : REMark ESC
180 IF EOF(#3):EXIT loop
190 INPUT #3,word$
200 IF word$ = '' THEN NEXT loop
210 tab = '/' INSTR word$
220 IF tab = 0 THEN
230 PRINT #4,word$ : REMark no change to word
240 no = no + 1
250 ELSE
260 REMark extract 'root' word
270 root$ = word$(1 TO tab-1)
280 PRINT #4,root$ : PRINT root$;
290 :
300 REMark get switches letters, currently only handles /DGSY,
others ignored
310 sw$ = word$(tab+1 TO LEN(word$))
320 :
330 IF 'd' INSTR sw$ THEN
340 REMark add 'ed' to root, dropping final silent 'e' if
present
350 IF root$(LEN(root$)) == 'e' THEN
360 PRINT #4,root$&'d' : PRINT ! root$&'d';
370 ELSE
380 PRINT #4,root$&'ed' : PRINT ! root$&'ed';
390 END IF
400 no = no + 1
410 END IF
420 :
430 IF 'g' INSTR sw$ THEN
440 REMark add 'ing', drop final silent e at end of word
450 IF root$(LEN(root$)) == 'e' THEN
460 PRINT #4,root$(1 TO LEN(root$)-1)&'ing' : PRINT ! root$(1
TO LEN(root$)-1)&'ing';
470 ELSE
480 PRINT #4,root$&'ing' : PRINT ! root$ & 'ing';
490 END IF
500 no = no + 1
510 END IF
520 :
530 IF 's' INSTR sw$ THEN
540 REMark plural or third person singular 's' or 'es'
550 IF root$(LEN(root$)) == 's' OR root$(LEN(root$)) == 'x' OR
root$(LEN(root$)) == 'z' OR root$(LEN(root$)-1 TO) == 'ch' OR
root$(LEN(root$)-1 TO) == 'sh' THEN
560 PRINT #4,root$&'es' : PRINT ! root$&'es';
570 ELSE
580 REMark special case is word ending with 'y'
590 REMark which depends if 'y' is preceded by vowel or not
600 IF root$(LEN(root$)) == 'y' THEN
610 IF LEN(root$) > 1 THEN
620 IF root$(LEN(root$)-1) INSTR 'aeiou' THEN
630 PRINT #4,root$&'s' : PRINT !root$&'s';
640 ELSE
650 PRINT #4,root$(1 TO LEN(root$)-1)&'ies' : PRINT !
root$(1 TO LEN(root$)-1)&'ies';
660 END IF
670 ELSE
680 PRINT #4,root$&'s' : PRINT ! root$&'s';
690 END IF
700 ELSE
710 PRINT #4,root$&'s' : PRINT ! root$&'s';
720 END IF
730 END IF
740 no = no + 1
750 END IF
760 :
770 IF 'y' INSTR sw$ THEN PRINT #4,root$&"ly" : PRINT !
root$&"ly"; : no = no + 1
780 :
790 PRINT : REMark new line after all permutations done
800 IF (no MOD 1000) = 0 THEN AT #0,0,0 : PRINT #0,no
810 END IF
820 PAUSE 5 : REMark vary speed as required for viewing
830 END REPeat loop
840 PRINT #0,no;' Words Total'
850 CLOSE #3 : CLOSE #4
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.338 / Virus Database: 267.9.7/60 - Release Date: 28/07/2005
_______________________________________________
QL-Users Mailing List
http://www.q-v-d.demon.co.uk/smsqe.htm