This sounds like an excellent start and would certainly be useful in a
number of scenarios, but it is not quite as generally useful as it could be
given its asynchronous nature.  Generally expected database behavior is that
when a change is committed (and not before) it is immediately viewable in
all new transactions (i.e. new readers).

Would it be difficult to modify your design to act more like a traditional
database?  If such changes were made, would it still efficiently and
effectively solve the problems you mentioned below?

Scott

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, October 16, 2002 5:45 PM
> To: [EMAIL PROTECTED]
> Subject: Concurency in Lucene
> 
> 
> My company, Epiphany, has decided to integrate our products 
> with Lucene.
> I'm leading this effort, and for this I have developed a 
> solution around
> Lucene that allows concurrent processes to search, insert, 
> update and delete
> documents. 
> This solution solves the following:
>       - concurrent writing (insert, update, delete) to the Index (see
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12588 and
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg
> 01795.html
>       - not-transactional nature of Lucene. Solution puts transaction
> around every insert, update and delete. All writes are 
> guaranteed to be in
> the index eventually.
>       - running out of file handles.
>       - solution does all of the book-keeping, clients do not 
> worry about
> when to open and close  IndexReader/Writer. Technically one 
> can do this
> after every operation, but creating/deleting of .lock file 
> slows things
> down.
> 
> 
> In summary, every write (update, delete, insert) is made to 
> log file first.
> There is a worker thread that wakes up every so often, 
> examines the logs,
> and makes a decision on whether to propagate changes or not (this is
> configurable). If decision is to propagate changes, thread 
> creates new log
> files, locks current log files,  makes a copy of the new index, merges
> changes from logs to the index, and then hot-swaps the newly 
> created index
> and deletes the old logs and index. At any given time, result 
> from search
> will not contain deleted documents, but newly created/updated 
> documents will
> not be in search result until merge is finished. Worker 
> thread also keeps
> state of the logs/index in case of crash. 
> 
> Here is what were the driven factors to create this solution. 
>       Need for concurrent non-blocking writes (insert/update/delete)
>       Need for deleted documents not to show up in the query 
> result (Hits)
> once deleted
>       Lucene does not handle crashes well. The mentality is 
> "if in doubt,
> redo index" which does not work in some cases. Rebuilding of 
> the index is
> fast, but in our case a) it takes too many non-Lucene related 
> recourses
> (documents can be stored in database), b) high availability 
> of search is a
> requirement
>               - Lucene can leave .lock files.
>               - Lucene keeps state (documents) in memory
> 
> 
> I wanted to see how much interest is out there for such a solution and
> whether Lucene developers feel that this should be part of 
> Lucene. If there
> is enough interest I would like to donate this code to Lucene.
> 
> Thanks,
> 
> Kiril Zack
> 
> --
> To unsubscribe, e-mail:   
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: 
> <mailto:[EMAIL PROTECTED]>
> 

Reply via email to