Hi Pablo,

Check out "The Data Compression Handbook"  by Mark Nelson ISBN
1-55851-216-0.  It covers most of major data compression methods with
the exception of Fractal and Wavelet compression which apply mostly to
graphics.  More importantly it's chocked full of C code not just
equations which is explained in the book and is readily tweakable with
#define variables.

Since you're looking for pattern induction as opposed to data
compression I would think you could up the size of some of the #define
variables to the point it would find all repeated patterns in your input
stream and track the percentage of likelyhood that the pattern would
reoccur on an ongoing basis.  I believe that some of the new interent
security programs do this type of thing to try to identify unusual data
patterns such as hacking probes or denial of service activity and alert
management.

I have done some thinking along these lines myself.  A group in Germany
I know of supposedly has code that if you give it a string of characters
it can compare the bit patterns to the dictionaries in zip files that
has compressed large amounts of text in different languages.  Based on
the bit patterns it can tell you the language of a 20 byte string with a
high degree of probabilty.  They had indicated that they would probably
put the code into OpenSource once they had written their papers and
whatever.

I wanted to use this in my bot so that if a user type in French, German,
etc... My bot could say sorry I don't speak French instead of resorting
to a bluff.

I also do a spell check in my bot against a dictionary sorted by the
usage frequency of the words and get a list of possible replacement
words based on Levenschtein Distance.  Soundex is bad if the first
character of the word is wrong or if the user transposes two letters
that give the word a different phonetic sound.  For each possible
correction I have to resubmit the potentially corrected input back
through my pattern matcher.  To minimize response time and improve
scaleability it would be optimal to know what the probability of a word
occuring was if you knew the previous word or words was correct.
Then I could sort the replacement words by the probability they occur
after the prior correctly spelled word. This would allow me to get the
correct word most of the time on the first or second try instead of the
several tries it takes me now.  

Your project probably involves doing the prediction on a character
rather than a word basis but if you happen to be thinking along the line
of words instead of characters, I would be interested in hearing more
about your work.
 


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On
Behalf Of Pablo
Sent: Sunday, December 08, 2002 9:34 PM
To: [EMAIL PROTECTED]
Subject: [agi] general patterns induction



Hi Everyone,

I'm looking for information about "pattern induction" or "general
patterns" or anything that sounds like that... 

What I want to do is, having a stream of data, predict what may come.
(yes, and then take over the world... sorry if it sounds like Pinky and
The Brain!!!!!!!!!!)

I guess general patterns induction is related to data compression,
because if we find a pattern in a string, then we don't have to write
all the characters every time the pattern appears. Surely someone has
already been working on that (who?)

Anyone would please give me a clue? Is there any book I should read?? Is
there any book like "AI basics", "introduction to AI", or "AI for
dummies" that may help before?

Thanks a lot!

Pablo Carbonell

PS: thanks Ben, Kevin and Eliezer for the previous help

-------
To unsubscribe, change your address, or temporarily deactivate your
subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
BEGIN:VCARD
VERSION:2.1
N:Miller;Gary;A.
FN:Gary A. Miller ([EMAIL PROTECTED]) ([EMAIL PROTECTED])
ORG:New Millennium Consulting
TITLE:Principal Consultant
TEL;WORK;VOICE:(440) 942-9264
TEL;HOME;VOICE:(440) 942-9264
ADR;WORK:;;7222 Hodgson Rd.;Mentor;OH;44060;United States of America
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:7222 Hodgson Rd.=0D=0AMentor, OH 44060=0D=0AUnited States of America
EMAIL;PREF;INTERNET:[EMAIL PROTECTED]
REV:20021108T231940Z
END:VCARD

Reply via email to