RE: storing index in third party database.

David Elworthy Wed, 03 Apr 2002 06:43:07 -0800

> -----Original Message-----
> From: Karl Øie [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, April 03, 2002 10:00 AM
> To: Lucene Users List
> Subject: Re: storing index in third party database.
> 
> 
> without having investigated the problem much i would think that a SQL 
> database would be a very bad match for lucene as most of 
> lucene's working is 
> creating key's for words and documents and then creating 
> indexes of these 
> keys. for these purposes a SQL database is an unecessary 
> overhead, not even 
> talking about the overhead represented by the SQL language parser.
> 
> for these kind of indexes a lower-level database would be 
> better suited. I 
> have good experiences with BerkeleyDB 
> (http://www.sleepycat.com) and a friend 
> of me uses gdbm successfully for such key-pair indexing 
> tasks. the advantage 
> of these low-level databasesystems is that they are really 
> much or less 
> persistent b-tree/hashtable implementations, and thus created 
> for key-pairing.
> 
> they have no SQL layer as you will have to program against 
> them as they are 
> more subroutines that applications. but for key-pair indexes i have 
> experienced that BerkeleyDB runs circles around any SQL 
> database (including 
> db2 and oracle!!!).


I would agree with this based on my experiences in implementing the
ANVIL system at Canon. SQL server was far too slow for simple term
lookup. We started with gdbm and subsequently moved to Berkeley DB. BDB
was faster in general, and more importantly, has support for
multi-threading. Analysis with Purify suggested that gdbm has some
"uninitialized memory read" problems. The folks at Sleepycat were also
very helpful in getting us going.

-- David


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: storing index in third party database.

Reply via email to