I've added some user-defined lucene functions to HSQLDB and I've been able to run queries like the following one:
select top 10 lucene_highlight(adText) from ads where pricePounds <200 and lucene_query('bass guitar drums',id)>0 order by lucene_score(id) DESC I've had similar success with Derby (Cloudscape). This approach has some appeal and I've been able to use the same class as a UDF in both databases but it does have issues: it looks like this UDF based integration won't scale. The above query took 80 milliseconds using 10,000 records. Another index/database with 50,000 records was taking a matter of seconds. I think a scalable integration is likely to require modification of the core RDBMS code. I think it is worth considering developing such a tight RDBMS integration if you consider the issues commonly associated with using Lucene: 1) Sorting on float/date fields and associated memory consumption 2) Representing numbers/dates in Lucene (eg having to pad with sufficent leading zeros and add to index's list of terms) 3) Retrieving only certain stored fields from a document (all storage can be done in db) 4) Issues to do with updating volatile data eg price data used in sorts 5) Manually coding joins with RDBMS content as custom filters 6) Too-many terms exceptions produced by range queries 7) Grouping results eg by website 8) Boosting docs based on stored content eg date I'm not saying there aren't answers to the above using Lucene. However,I do wonder if these can be addressed more effectively in a project which seeks tighter integration with an RDBMS and leveraging its capabilities. Any one else been down this route? ___________________________________________________________ ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]