2008/8/21 Raymond Wan <[EMAIL PROTECTED]>:
>
> Hi Dermot,
>
> Off-topic, so I hope no one minds if I reply.
>
> Perl is good at manipulating text strings, but that doesn't usually help
> search engine implementations.  A search engine (or information retrieval
> system) has to be fast and after it has tokenized the document collection or
> query, you're basically comparing integers (i.e., a lookup table that maps
> an integer to a word in a dictionary).  Actually, even during the initial
> mapping, a C-style strcmp would be sufficient.  I doubt a fast search engine
> would actual perform string matching using regular expressions.
>
> Of course, a Perl implementation might be interesting as a learning tool for
> students.  But as an IR system that is suppose to be run in the "real world"
> and not in the class room...I don't think you will see a Perl system anytime
> soon.  I think if you wrote quick Perl and C/C++ implementations that merely
> tokenize a collection (let's say of the range in GBs), you'll know what I am
> talking about.  Of course, in the classroom, a lecturer might just want the
> students to play with something that is a MB or less...if so, I think Perl
> would be good and students might even prefer it...  :-)

Thanks for all the suggestion. I am also very grateful for this
heads-up on how a text search engine is actually implemented. Now that
I understand that the engine is actually an indexed DB and I have done
a bit more digging around. I guess I will have to try a couple to see
what fits and what looks well supported/documented. Thanks for
replies.
Dp.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to