I've added some user-defined lucene functions to
HSQLDB and I've been able to run queries like the
following one:

select top 10 lucene_highlight(adText) from ads where
pricePounds <200  and lucene_query('bass guitar
drums',id)>0 order by lucene_score(id) DESC

I've had similar success with Derby (Cloudscape).
This approach has some appeal and I've been able to
use the same class as a UDF in both databases but it
does have issues: it looks like this UDF based
integration won't scale. The above query took 80
milliseconds using 10,000 records. Another
index/database with 50,000 records was taking a matter
of seconds. I think a scalable integration is likely
to require modification of the core RDBMS code.

I think it is worth considering developing such a
tight RDBMS integration if you consider the issues
commonly associated with using Lucene:
1) Sorting on float/date fields and associated memory
consumption
2) Representing numbers/dates in Lucene (eg having to
pad with sufficent leading zeros and add to index's
list of terms)
3) Retrieving only certain stored fields from a
document (all storage can be done in db)
4) Issues to do with updating volatile data eg price
data used in sorts
5) Manually coding joins with RDBMS content as custom
filters
6) Too-many terms exceptions produced by range queries
7) Grouping results eg by website
8) Boosting docs based on stored content eg date

I'm not saying there aren't answers to the above using
Lucene. However,I do wonder if these can be addressed
more effectively in a project which seeks tighter
integration with an RDBMS and leveraging its
capabilities.

Any one else been down this route?





        
        
                
___________________________________________________________ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to