RE: Lucene vs. in-DB-full-text-searching

Mike Rose Fri, 18 Feb 2005 13:45:57 -0800

I can comment on this since I'm in the middle of excising Oracle text
searching and replacing it with Lucene in one of my projects.


Oracle does provide mechanisms for creating fuzzy indexes of text and
doing word stemming as well, has a scoring mechanism, etc...  However,
this requires additional licensing (or an enterprise license, big $$$)
and index creation is slow.  Unlike other indexes in Oracle, this needs
to be explicitly dropped and recreated in order to pick up changes to
the content, and you can't update a single entry in the index, you have
to do the whole thing in one shot.  That being said, it has been
successful for me so far, you just have to use some non-standard funky
SQL operators to make use of it.

So why am I switching to Lucene on this project?

        Speed: Lucene is faster at indexing and searching.

        Price: I don't think I need to explain this one.

        Size: The size of the Lucene index is tiny and easier to deploy
to the servers that search it.

        Flexibility: If I want to change my methodology of index or
search, I don't need to worry about db schema evolution across multiple
environments on the way to production.

All in all, I don't think that a JDBC wrapper is going to do what you
want.  The material you want to index is application-specific, as are
the mechanics of searching the index.  A JDBC driver isn't going to know
which of the fields you are updating you might care to index and search
later.  In the end, the approach that worked for me was to create a
config driven wrapper that knows how to index specific properties of
POJOs.  The same config also drives the formation of the query
expressions as well.  This way I don't care if the content was
instantiated from a db or xml (I need to do both), or some other source.
I think one of the great benefits of Lucene is that it allows me to
embed sophisticated search functionality into my apps without being
dependent upon any particular persistence mechanism.

Mike

smime.p7s
Description: S/MIME cryptographic signature

RE: Lucene vs. in-DB-full-text-searching

Reply via email to