Tatu, I agree 100% with everything you've said.
Let's look at MySQL for example. Great database. No doubt about it. BUT, looking at the Full text indexing/searching part...it not up to snuff. Currently, I'm using mysql's full text search support. I have a database of 3-5 million rows. Each row is unique, let's say a product. Each row has several columns, but the two I search on are title and description. I created a full text index on title and description. Title has approximately 100 characters, and description has 255 characters. At the moment, mysql is taking 50 seconds plus to return results on simple one word searches. My dedicated server is a P4, 2.0 Gighz, 1.5 Gig RAM RedHat Linux 7.3 platform, with nothing else running on it, i.e. another server is handling HTTP requests. It is a dedicated mysql box. In addition, I'm the only person making queries. Obviously, the above performance is unacceptable for real world web applications. I'd love to try Lucene with the above, but the Lucene install fails because of JavaCC issues. Surprised more people haven't encountered this problem, as the install instructions are out of date. Regards, John -----Original Message----- From: Tatu Saloranta [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 25, 2003 12:26 PM To: Lucene Users List Subject: Re: commercial websites powered by Lucene? On Tuesday 24 June 2003 07:36, Ulrich Mayring wrote: > Chris Miller wrote: ... > Well, nothing against Lucene, but it doesn't solve your problem, which > is an overloaded DB-Server. It may temporarily alleviate the effects, > but you'll soon be at the same load again. So I'd recommend to install I don't think that would necessarily be the case. Like you mention later on, indexing data stored in DB does flatten it to allow faster indexing (and retrieval), and faster in this context means more efficient, not only sharing the load between DB and search engine, but potentially lowering total load? The alternative, data warehouse - like preprocessing of data, for faster search, would likely be doable too, but it's usually more useful for running reports. For actual searches Lucene does it job nicely and efficiently, biggest problems I've seen are more related to relevancy questions. But that's where tuning of Lucene ranking should be easier than trying to build your own ranking from raw database hits (except if one uses OracleText or such that's pretty much a search engine on top of DB itself). So, to me it all comes down to "right tool for the job" aspect; DBs are good at mass retrieval of data, or using aggregate functions (in read-only side), whereas dedicated search engines are better for, well, searching. ... > Of course, in real life there may be political obstacles which will > prevent you from doing the right thing as detailed above for example, > and your only chance is to circumvent in some way - and then Lucene is a > great way to do that. But keep in mind that you are basically > reinventing the functionality that is already built-in in a database :) It depends on type of queries, but Lucene certainly has much more advanced text searching functionality, even if indexed content comes from a rigid structure like RDBMS. I'm not sure using a ready product like Lucene is reinventing much functionality, even considering synchronization issues? So I would go as far saying that for searching purposes, plain vanilla RDBMSs are not all that great in the first place. Even if queries need not use advanced search features (advanced as in not just using % and _ in addition to exact matches) Lucene may well offer better search performance and functionality. -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]