Hi Pablo, Check out "The Data Compression Handbook" by Mark Nelson ISBN 1-55851-216-0. It covers most of major data compression methods with the exception of Fractal and Wavelet compression which apply mostly to graphics. More importantly it's chocked full of C code not just equations which is explained in the book and is readily tweakable with #define variables.
Since you're looking for pattern induction as opposed to data compression I would think you could up the size of some of the #define variables to the point it would find all repeated patterns in your input stream and track the percentage of likelyhood that the pattern would reoccur on an ongoing basis. I believe that some of the new interent security programs do this type of thing to try to identify unusual data patterns such as hacking probes or denial of service activity and alert management. I have done some thinking along these lines myself. A group in Germany I know of supposedly has code that if you give it a string of characters it can compare the bit patterns to the dictionaries in zip files that has compressed large amounts of text in different languages. Based on the bit patterns it can tell you the language of a 20 byte string with a high degree of probabilty. They had indicated that they would probably put the code into OpenSource once they had written their papers and whatever. I wanted to use this in my bot so that if a user type in French, German, etc... My bot could say sorry I don't speak French instead of resorting to a bluff. I also do a spell check in my bot against a dictionary sorted by the usage frequency of the words and get a list of possible replacement words based on Levenschtein Distance. Soundex is bad if the first character of the word is wrong or if the user transposes two letters that give the word a different phonetic sound. For each possible correction I have to resubmit the potentially corrected input back through my pattern matcher. To minimize response time and improve scaleability it would be optimal to know what the probability of a word occuring was if you knew the previous word or words was correct. Then I could sort the replacement words by the probability they occur after the prior correctly spelled word. This would allow me to get the correct word most of the time on the first or second try instead of the several tries it takes me now. Your project probably involves doing the prediction on a character rather than a word basis but if you happen to be thinking along the line of words instead of characters, I would be interested in hearing more about your work. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Pablo Sent: Sunday, December 08, 2002 9:34 PM To: [EMAIL PROTECTED] Subject: [agi] general patterns induction Hi Everyone, I'm looking for information about "pattern induction" or "general patterns" or anything that sounds like that... What I want to do is, having a stream of data, predict what may come. (yes, and then take over the world... sorry if it sounds like Pinky and The Brain!!!!!!!!!!) I guess general patterns induction is related to data compression, because if we find a pattern in a string, then we don't have to write all the characters every time the pattern appears. Surely someone has already been working on that (who?) Anyone would please give me a clue? Is there any book I should read?? Is there any book like "AI basics", "introduction to AI", or "AI for dummies" that may help before? Thanks a lot! Pablo Carbonell PS: thanks Ben, Kevin and Eliezer for the previous help ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED] ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
BEGIN:VCARD VERSION:2.1 N:Miller;Gary;A. FN:Gary A. Miller ([EMAIL PROTECTED]) ([EMAIL PROTECTED]) ORG:New Millennium Consulting TITLE:Principal Consultant TEL;WORK;VOICE:(440) 942-9264 TEL;HOME;VOICE:(440) 942-9264 ADR;WORK:;;7222 Hodgson Rd.;Mentor;OH;44060;United States of America LABEL;WORK;ENCODING=QUOTED-PRINTABLE:7222 Hodgson Rd.=0D=0AMentor, OH 44060=0D=0AUnited States of America EMAIL;PREF;INTERNET:[EMAIL PROTECTED] REV:20021108T231940Z END:VCARD