This sounds like an excellent start and would certainly be useful in a number of scenarios, but it is not quite as generally useful as it could be given its asynchronous nature. Generally expected database behavior is that when a change is committed (and not before) it is immediately viewable in all new transactions (i.e. new readers).
Would it be difficult to modify your design to act more like a traditional database? If such changes were made, would it still efficiently and effectively solve the problems you mentioned below? Scott > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, October 16, 2002 5:45 PM > To: [EMAIL PROTECTED] > Subject: Concurency in Lucene > > > My company, Epiphany, has decided to integrate our products > with Lucene. > I'm leading this effort, and for this I have developed a > solution around > Lucene that allows concurrent processes to search, insert, > update and delete > documents. > This solution solves the following: > - concurrent writing (insert, update, delete) to the Index (see > http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12588 and > http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg > 01795.html > - not-transactional nature of Lucene. Solution puts transaction > around every insert, update and delete. All writes are > guaranteed to be in > the index eventually. > - running out of file handles. > - solution does all of the book-keeping, clients do not > worry about > when to open and close IndexReader/Writer. Technically one > can do this > after every operation, but creating/deleting of .lock file > slows things > down. > > > In summary, every write (update, delete, insert) is made to > log file first. > There is a worker thread that wakes up every so often, > examines the logs, > and makes a decision on whether to propagate changes or not (this is > configurable). If decision is to propagate changes, thread > creates new log > files, locks current log files, makes a copy of the new index, merges > changes from logs to the index, and then hot-swaps the newly > created index > and deletes the old logs and index. At any given time, result > from search > will not contain deleted documents, but newly created/updated > documents will > not be in search result until merge is finished. Worker > thread also keeps > state of the logs/index in case of crash. > > Here is what were the driven factors to create this solution. > Need for concurrent non-blocking writes (insert/update/delete) > Need for deleted documents not to show up in the query > result (Hits) > once deleted > Lucene does not handle crashes well. The mentality is > "if in doubt, > redo index" which does not work in some cases. Rebuilding of > the index is > fast, but in our case a) it takes too many non-Lucene related > recourses > (documents can be stored in database), b) high availability > of search is a > requirement > - Lucene can leave .lock files. > - Lucene keeps state (documents) in memory > > > I wanted to see how much interest is out there for such a solution and > whether Lucene developers feel that this should be part of > Lucene. If there > is enough interest I would like to donate this code to Lucene. > > Thanks, > > Kiril Zack > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> >