Re: Text search engine [OT]

Raymond Wan Wed, 20 Aug 2008 19:12:31 -0700


Hi Dermot,


Off-topic, so I hope no one minds if I reply.

Perl is good at manipulating text strings, but that doesn't usually helpsearch engine implementations. A search engine (or informationretrieval system) has to be fast and after it has tokenized the documentcollection or query, you're basically comparing integers (i.e., a lookuptable that maps an integer to a word in a dictionary). Actually, evenduring the initial mapping, a C-style strcmp would be sufficient. Idoubt a fast search engine would actual perform string matching usingregular expressions.

Of course, a Perl implementation might be interesting as a learning toolfor students. But as an IR system that is suppose to be run in the"real world" and not in the class room...I don't think you will see aPerl system anytime soon. I think if you wrote quick Perl and C/C++implementations that merely tokenize a collection (let's say of therange in GBs), you'll know what I am talking about. Of course, in theclassroom, a lecturer might just want the students to play withsomething that is a MB or less...if so, I think Perl would be good andstudents might even prefer it... :-)


Ray



Dermot wrote:

I am looking for a text search engine that has a Perl interface. I
have found a few, Lucene, OpenFTS and Swish-E. OpenFTS hasn't had a
release of the last 3 years. That makes me nervous about using it.
Lucene is java based. I have zero java experience but there is Perl
Module into a 'C++ port API of Lucene'. There is also a thread on
perlmonks about the performance penalty of tying Perl to Java. I am a
bit surprised that the there isn't a more native Perl text search
engine given Perl's agility with text strings.

Could anyone recommend any of the above or suggest an alternative?



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Text search engine [OT]

Reply via email to