Hi Mark, Unfortunately, indexing performance *is* of concern, otherwise I'd already be committing on every post.
If your guess is correct, you are basically saying that adding a document to an index in Solr/Lucene is just as fast as writing that file directly to the disk. Because, obviously, if we want guaranteed delivery, that's what we'd have to do. But I think this is worth the experiment - Solr/Lucene may be fast, but I have doubts that it can perform as well as raw disk I/O and still manage to do anything in the way of document analysis or (heaven forbid) text extraction. -----Original Message----- From: ext Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, May 24, 2010 3:33 PM To: dev@lucene.apache.org Subject: Re: Solr updateRequestHandler and performance vs. atomicity On 5/24/10 3:10 PM, karl.wri...@nokia.com wrote: > Hi all, > It seems to me that the "commit" logic in the Solr updateRequestHandler > (or wherever the logic is actually located) conflates two different > semantics. One semantic is what you need to do to make the index process > perform well. The other semantic is guaranteed atomicity of document > reception by Solr. > In particular, it would be nice to be able to post documents in such a > way that you can guarantee that the document is permanently in Solr's > queue, safe in the event of a Solr restart, etc., even if the document > has not yet been "committed". > This issue came up in the LCF talk that I gave, and I initially thought > that separating the two kinds of events would necessarily be an LCF > change, but the more I thought about it the more I realized that other > Solr indexing clients may also benefit from such a separation. > Does anyone agree? Where should this logic properly live? > Thanks, > Karl Its an interesting idea - but I think you would likely pay a similar cost to guarantee reception as you would to commit (also, I'm not sure Lucene guarantees it - it works for consistency, but I'm not so sure it achieves durability). I can think of two things offhand - Perhaps store the text and use fsync to quasi guarantee acceptance - then index from the store on the commit. Another simpler idea if only the separation is important and not the performance - index to another side index, taking advantage of Lucene's current commit functionality, and then use addIndex to merge to the main index on commit. Just spit balling though. I think this would obviously need to be an optional mode. -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org