on 2/1/01 4:05 PM, Sander Pilon at [EMAIL PROTECTED] wrote:

> 
>> on 2/1/01 1:36 PM, Gonzalo Aguilar at [EMAIL PROTECTED] wrote:
>> 
>>> Making search engines is not a trivial thing, but this may be an
>>> aproach...
>> 
>> I appreciate all the replies. In my original post I also asked about the
>> FULL TEXT index type. If I may ask again, has anyone had any
>> experience with
>> this? Can you comment on it's stability (it's listed as "beta" on
>> the site).
>> In general, how long is something in that state? I don't suppose this
>> particular site will go live for another couple months (it's
>> running fine in
>> filemaker now). Do the FULL TEXT searches perform well? Are the results
>> good?
>> 
>> In order to get it working, I'd have to build 3.23 and I had trouble with
>> that when I first tried, but I'm sure I could get it working with a little
>> effort...
>> 
> 
> First of all - realize that fulltext search, as implemented in mysql, is
> WEIRD.
> If you intend to use it as searchengine and expect it to behave like modern
> searchengines do (google, altavista) then think again, it does not. (You
> want boolean search for that, or perhaps ranked boolean search, but not
> something like mysql has now. - 'coming soon' to a mysql version near you.)
> I have evaluated it (see my post, use archives) for searchengine use and
> found that it didn't suit my needs. (since it will have a very weird order
> of sorting, and by that I mean that a document with two of the words in the
> query can rank below (!) one with just one of the querywords - depending on
> what mysql thinks is the information-value of the word.)
> 
> As for stability - It has had its share of bugs, but I think most are fixed.
> 
> FULLTEXT indexes are about 95% of the size of the table, and they can
> perform quite well or very slow, depending on what you do.

I did notice in the explanation of the FULLTEXT indexing/searching scheme
that the weight of a word decreased as it became more common in the search
pool, which might be problematic for me. These are resumes of "tech
industry" people and often a client will search for "HTML" or "C++". Some
things like that would appear in the text of _many_ resumes. At and rate,
it's probably worth playing with. I don't need boolean search capability
(the users in this case probably can't spell boolean, let alone understand a
boolean search query...)

A completely different possibility is to keep the resume and cover letter
text in individual files somewhere (perhaps named by candidate id) and use
some other tool to index and search them. If I did this, I would still need
to coordinate the search with mysql. For instance, they would want to be
able to limit the search to a certain state, or a certain date range (date
of submission)...has anybody done anything like this? Since these tools rank
things by relevance, and limit the quantity of results, I think I would need
to limit the search set with a mysql query _before_ doing the full text
search...

I saw one recommendation to use HTDig, or something like that. I could also
use the Apple Information Access Toolkit, as this in on Mac OS X. Any
feedback on either of these?

Also, I don't consider disk usage an issue at all. As long as performance is
good, they can put as much disk space as needed in to it.

Thanks,

Geoff


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to