We do have a way to recover partially with a version number for each transaction. The same version maintained in lucene as one document. During startup these numbers define what has to be syncd up. Unfortunately lucene is used in a webapp, so this happens "only" during a jetty restart.
- Vidhya > On 21-Jun-2014, at 11:08 am, "Vitaly Funstein" <vfunst...@gmail.com> wrote: > > This is a better idea than what you had before, but I don't think there's > any point in doing any commits manually at all unless you have a way of > detecting and recovering exactly the data that hasn't been committed. In > other words, what difference does it make whether you lost 1 index record > or 1M, if you can't determine which records were lost and need to reindex > everything from the start anyway, to ensure consistency between SOR and > Lucene? > > > > > On Fri, Jun 20, 2014 at 10:20 PM, Umashanker, Srividhya < > srividhya.umashan...@hp.com> wrote: > >> Let me try with the NRT and periodic commit say every 5 mins in a >> committer thread on need basis. >> >> Is there a threshold limit on how long we can go without committing ? I >> think the buffers get flushed to disk but not to crash proof on disk. So we >> should be good on memory. >> >> I should also verify if the time taken for commit() is longer when more >> data piled up to commit. But definitely should be better than committing >> for every thread.. >> >> Will post back after tests. >> >> - Vidhya >> >> >>> On 21-Jun-2014, at 10:28 am, "Vitaly Funstein" <vfunst...@gmail.com> >> wrote: >>> >>> Hmm, I might have actually given you a slightly incorrect explanation wrt >>> what happens when internal buffers fill up. There will definitely be a >>> flush of the buffer, and segment files will be written to, but it's not >>> actually considered a full commit, i.e. an external reader will not see >>> these changes (yet). The exact details elude me but there are quite a few >>> threads here on what happens during a commit (vs a flush). However, when >>> you call IndexWriter.close() a commit will definitely happen. >>> >>> But in any event, if you use an NRT reader to search, then it shouldn't >>> matter to you when the commit actually takes place. Such readers also >>> search uncommitted changes as well as those already on disk. If data >>> durability is not a requirement for you, if i.e. you can (and probably >> do) >>> reindex your data from SOR on startup, then not doing commits yourself >> may >>> be the way to go. Or perhaps you could reduce the amount of data you need >>> to reindex and still call commit() yourself periodically though not for >>> every write transaction, but maybe introduce some watermarking logic >>> whereby you detect the highest watermark committed to Lucene. Then >> reindex >>> only the data from the DB from that point onward (meaning only >> uncommitted >>> data is lost and needs to be recovered, but you can figure out exactly >>> where that point is). >>> >>> >>> >>> On Fri, Jun 20, 2014 at 8:02 PM, Umashanker, Srividhya < >>> srividhya.umashan...@hp.com> wrote: >>> >>>> It is non transactional. We first write the same data to database in a >>>> transaction and then call writer addDocument. If lucene fails we still >>>> hold the data to recover. >>>> >>>> I can avoid the commit if we use NRT reader. We do need this to be >>>> searchable immediately. >>>> >>>> Another question. I did try removing commit() in each thread and wait >> for >>>> lucene to auto commit with maxBufferedDocs set to 100 and >> ramBufferedSize >>>> set to high values, so docs triggers first. But did not see the 1st 100 >>>> docs data in lucene even after 500 docs. >>>> >>>> Is there a way for me to see when lucene auto commits? >>>> >>>> If we tune the auto commit parameters appropriately, do i still need the >>>> committer thread ? Because it's job is to call commit. Anyway >>>> add/updateDocument is already done in my writer threads. >>>> >>>> Thanks for your time and your suggestions! >>>> >>>> - Vidhya >>>> >>>> >>>>> On 21-Jun-2014, at 12:09 am, "Vitaly Funstein" <vfunst...@gmail.com> >>>> wrote: >>>>> >>>>> You could just avoid calling commit() altogether if your application's >>>>> semantics allow this (i.e. it's non-transactional in nature). This way, >>>>> Lucene will do commits when appropriate, based on the buffering >> settings >>>>> you chose. It's generally unnecessary and undesirable to call commit at >>>> the >>>>> end of each write, unless you see to provide strict durability >> guarantees >>>>> in your system. >>>>> >>>>> If you must acknowledge every write after it's been committed, set up a >>>>> single committer thread that does this when there are any work tasks in >>>> the >>>>> queue. Then add to that queue from your writer threads... >>>>> >>>>> >>>>> On Fri, Jun 20, 2014 at 8:47 AM, Umashanker, Srividhya < >>>>> srividhya.umashan...@hp.com> wrote: >>>>> >>>>>> Lucene Experts - >>>>>> >>>>>> Recently we upgraded to Lucene 4. We want to make use of concurrent >>>>>> flushing feature Of Lucene. >>>>>> >>>>>> Indexing for us includes certain db operations and writing to lucene >>>> ended >>>>>> by commit. There may be multiple concurrent calls to Indexer to >> publish >>>>>> single/multiple records. >>>>>> >>>>>> So far, with older version of lucene, we had our indexing synchronized >>>> (1 >>>>>> thread indexing). >>>>>> Which means waiting time is more, based on concurrency and execution >>>> time. >>>>>> >>>>>> We are moving away from the Synchronized indexing. Which is actually >> to >>>>>> cut down the waiting period. Trying to find out if we have to limit >> the >>>>>> number of threads that adds document and commits. >>>>>> >>>>>> Below are the tests - to publish just 1000 records with 3 text fields. >>>>>> >>>>>> Java 7 , JVM config : -XX:MaxPermSize=384M >>>>>> -XX:+HeapDumpOnOutOfMemoryError -Xmx400m -Xms50m -XX:MaxNewSize=100m >>>>>> -Xss256k -XX:-UseParallelOldGC -XX:-UseSplitVerifier >>>>>> -Djsse.enableSNIExtension=false >>>>>> >>>>>> IndexConfiguration being default : We also tried with changes in >>>>>> maxThreadStates,maxBufferedDocs,ramBufferSizeMB - no impact. >>>>>> >>>>>> >>>>>> >>>>>> Min time in ms >>>>>> >>>>>> Max time ms >>>>>> >>>>>> Avg time ms >>>>>> >>>>>> 1 thread -commit >>>>>> >>>>>> 65 >>>>>> >>>>>> 267 >>>>>> >>>>>> 85 >>>>>> >>>>>> 1 thread -updateDocument >>>>>> >>>>>> 0 >>>>>> >>>>>> 40 >>>>>> >>>>>> 1 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 6 thread-commit >>>>>> >>>>>> 83 >>>>>> >>>>>> 1449 >>>>>> >>>>>> 552.42 >>>>>> >>>>>> 6 thread- updateDocument >>>>>> >>>>>> 0 >>>>>> >>>>>> 175 >>>>>> >>>>>> 1.5 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 10 thread -Commit >>>>>> >>>>>> 154 >>>>>> >>>>>> 2429 >>>>>> >>>>>> 874 >>>>>> >>>>>> 10 thread- updateDocument >>>>>> >>>>>> 0 >>>>>> >>>>>> 243 >>>>>> >>>>>> 1.9 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 20 thread -commit >>>>>> >>>>>> 76 >>>>>> >>>>>> 4351 >>>>>> >>>>>> 1622 >>>>>> >>>>>> 20 thread - updateDocument >>>>>> >>>>>> 0 >>>>>> >>>>>> 326 >>>>>> >>>>>> 2.1 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> More the threads trying to write to lucene, the updateDocument and >>>>>> commit() are becoming bottlenecks. In the above table, 10 and 20 >>>> threads >>>>>> have an average of 1.5 sec for 1000 commits. >>>>>> >>>>>> Is there some configuration of suggestions to tune the performance of >>>> the >>>>>> 2 methods, so that our service performs better, with more concurrency? >>>>>> >>>>>> -vidhya >>>>>> >>>>>> >>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org