2008/8/21 Raymond Wan <[EMAIL PROTECTED]>: > > Hi Dermot, > > Off-topic, so I hope no one minds if I reply. > > Perl is good at manipulating text strings, but that doesn't usually help > search engine implementations. A search engine (or information retrieval > system) has to be fast and after it has tokenized the document collection or > query, you're basically comparing integers (i.e., a lookup table that maps > an integer to a word in a dictionary). Actually, even during the initial > mapping, a C-style strcmp would be sufficient. I doubt a fast search engine > would actual perform string matching using regular expressions. > > Of course, a Perl implementation might be interesting as a learning tool for > students. But as an IR system that is suppose to be run in the "real world" > and not in the class room...I don't think you will see a Perl system anytime > soon. I think if you wrote quick Perl and C/C++ implementations that merely > tokenize a collection (let's say of the range in GBs), you'll know what I am > talking about. Of course, in the classroom, a lecturer might just want the > students to play with something that is a MB or less...if so, I think Perl > would be good and students might even prefer it... :-)
Thanks for all the suggestion. I am also very grateful for this heads-up on how a text search engine is actually implemented. Now that I understand that the engine is actually an indexed DB and I have done a bit more digging around. I guess I will have to try a couple to see what fits and what looks well supported/documented. Thanks for replies. Dp. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/