Tatu,

I agree 100% with everything you've said.

Let's look at MySQL for example.  Great database.  No doubt about it.

BUT, looking at the Full text indexing/searching part...it not up to snuff.

Currently, I'm using mysql's full text search support. I have a database of
3-5 million rows. Each row is unique, let's say a product. Each row has
several columns, but the two I search on are title and description. I
created a full text index on title and description. Title has approximately
100 characters, and description has 255 characters.

At the moment, mysql is taking 50 seconds plus to return results on simple
one word searches. My dedicated server is a P4, 2.0 Gighz, 1.5 Gig RAM
RedHat Linux 7.3 platform, with nothing else running on it, i.e. another
server is handling HTTP requests. It is a dedicated mysql box.  In addition,
I'm the only person making queries.

Obviously, the above performance is unacceptable for real world web
applications.

I'd love to try Lucene with the above, but the Lucene install fails because
of JavaCC issues.  Surprised more people haven't encountered this problem,
as the install instructions are out of date.

Regards,

John



-----Original Message-----
From: Tatu Saloranta [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 25, 2003 12:26 PM
To: Lucene Users List
Subject: Re: commercial websites powered by Lucene?


On Tuesday 24 June 2003 07:36, Ulrich Mayring wrote:
> Chris Miller wrote:
...
> Well, nothing against Lucene, but it doesn't solve your problem, which
> is an overloaded DB-Server. It may temporarily alleviate the effects,
> but you'll soon be at the same load again. So I'd recommend to install

I don't think that would necessarily be the case. Like you mention later on,
indexing data stored in DB does flatten it to allow faster indexing (and
retrieval), and faster in this context means more efficient, not only
sharing
the load between DB and search engine, but potentially lowering total load?

The alternative, data warehouse - like preprocessing of data, for faster
search, would likely be doable too, but it's usually more useful for running
reports. For actual searches Lucene does it job nicely and efficiently,
biggest problems I've seen are more related to relevancy questions. But
that's where tuning of Lucene ranking should be easier than trying to build
your own ranking from raw database hits (except if one uses OracleText or
such that's pretty much a search engine on top of DB itself).

So, to me it all comes down to "right tool for the job" aspect;  DBs are
good
at mass retrieval of data, or using aggregate functions (in read-only side),
whereas dedicated search engines are better for, well, searching.

...
> Of course, in real life there may be political obstacles which will
> prevent you from doing the right thing as detailed above for example,
> and your only chance is to circumvent in some way - and then Lucene is a
> great way to do that. But keep in mind that you are basically
> reinventing the functionality that is already built-in in a database :)

It depends on type of queries, but Lucene certainly has much more advanced
text searching functionality, even if indexed content comes from a rigid
structure like RDBMS. I'm not sure using a ready product like Lucene is
reinventing much functionality, even considering synchronization issues?

So I would go as far saying that for searching purposes, plain vanilla
RDBMSs
are not all that great in the first place. Even if queries need not use
advanced search features (advanced as in not just using % and _ in addition
to exact matches) Lucene may well offer better search performance and
functionality.

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to