Hi Karl, what are you describing seems to be a good usecase for something like a message queue where you push a document or record to a queue which guarantees the queues persistence. I look at this from a little different perspective, in a distributed environment you would have to guarantee delivery to a single solr instance but on several or at least n instances but that is a different story.
>From a Solr point of view this sounds like a need for a write-ahead log that guarantees durability and atomicity. I like this idea as it might also solve lots of problems in distributed environments (solr cloud) etc. Very interesting topic - should investigate more in this direction.... simon On Mon, May 24, 2010 at 10:03 PM, <karl.wri...@nokia.com> wrote: > Hi Mark, > > Unfortunately, indexing performance *is* of concern, otherwise I'd already be > committing on every post. > > If your guess is correct, you are basically saying that adding a document to > an index in Solr/Lucene is just as fast as writing that file directly to the > disk. Because, obviously, if we want guaranteed delivery, that's what we'd > have to do. But I think this is worth the experiment - Solr/Lucene may be > fast, but I have doubts that it can perform as well as raw disk I/O and still > manage to do anything in the way of document analysis or (heaven forbid) text > extraction. > > > > -----Original Message----- > From: ext Mark Miller [mailto:markrmil...@gmail.com] > Sent: Monday, May 24, 2010 3:33 PM > To: dev@lucene.apache.org > Subject: Re: Solr updateRequestHandler and performance vs. atomicity > > On 5/24/10 3:10 PM, karl.wri...@nokia.com wrote: >> Hi all, >> It seems to me that the "commit" logic in the Solr updateRequestHandler >> (or wherever the logic is actually located) conflates two different >> semantics. One semantic is what you need to do to make the index process >> perform well. The other semantic is guaranteed atomicity of document >> reception by Solr. >> In particular, it would be nice to be able to post documents in such a >> way that you can guarantee that the document is permanently in Solr's >> queue, safe in the event of a Solr restart, etc., even if the document >> has not yet been "committed". >> This issue came up in the LCF talk that I gave, and I initially thought >> that separating the two kinds of events would necessarily be an LCF >> change, but the more I thought about it the more I realized that other >> Solr indexing clients may also benefit from such a separation. >> Does anyone agree? Where should this logic properly live? >> Thanks, >> Karl > > Its an interesting idea - but I think you would likely pay a similar > cost to guarantee reception as you would to commit (also, I'm not sure > Lucene guarantees it - it works for consistency, but I'm not so sure it > achieves durability). > > I can think of two things offhand - > > Perhaps store the text and use fsync to quasi guarantee acceptance - > then index from the store on the commit. > > Another simpler idea if only the separation is important and not the > performance - index to another side index, taking advantage of Lucene's > current commit functionality, and then use addIndex to merge to the main > index on commit. > > Just spit balling though. > > I think this would obviously need to be an optional mode. > > -- > - Mark > > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org