Hi Dermot,

Off-topic, so I hope no one minds if I reply.

Perl is good at manipulating text strings, but that doesn't usually help search engine implementations. A search engine (or information retrieval system) has to be fast and after it has tokenized the document collection or query, you're basically comparing integers (i.e., a lookup table that maps an integer to a word in a dictionary). Actually, even during the initial mapping, a C-style strcmp would be sufficient. I doubt a fast search engine would actual perform string matching using regular expressions.

Of course, a Perl implementation might be interesting as a learning tool for students. But as an IR system that is suppose to be run in the "real world" and not in the class room...I don't think you will see a Perl system anytime soon. I think if you wrote quick Perl and C/C++ implementations that merely tokenize a collection (let's say of the range in GBs), you'll know what I am talking about. Of course, in the classroom, a lecturer might just want the students to play with something that is a MB or less...if so, I think Perl would be good and students might even prefer it... :-)

Ray



Dermot wrote:
I am looking for a text search engine that has a Perl interface. I
have found a few, Lucene, OpenFTS and Swish-E. OpenFTS hasn't had a
release of the last 3 years. That makes me nervous about using it.
Lucene is java based. I have zero java experience but there is Perl
Module into a 'C++ port API of Lucene'. There is also a thread on
perlmonks about the performance penalty of tying Perl to Java. I am a
bit surprised that the there isn't a more native Perl text search
engine given Perl's agility with text strings.

Could anyone recommend any of the above or suggest an alternative?


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to