The fourth workload: - Upsert. Every operation is a delete followed by an insert. 75% of the deletes do not match any document already inserted. 25% of the deletes match some document inserted.
The new IndexWriter took 136min. The current IndexModifier has been running for 18 hours and hasn't finished... For your convenience, here are the performance results for the first three workloads again. current current new Workload IndexWriter IndexModifier IndexWriter ----------------------------------------------------------------------- Insert only 116 min 119 min 116 min Insert/delete (big batches) -- 135 min 125 min Insert/delete (small batches) -- 338 min 134 min Regards, Ning Ning Li Search Technologies IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 |---------+----------------------------> | | Ning | | | Li/Almaden/[EMAIL PROTECTED]| | | MUS | | | | | | 05/09/2006 04:54 | | | PM | | | Please respond to| | | java-dev | |---------+----------------------------> >------------------------------------------------------------------------------------------------------------------------------| | | | To: java-dev@lucene.apache.org | | cc: | | Subject: Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided) | >------------------------------------------------------------------------------------------------------------------------------| The machine is swamped with tests. I will run the experiment when the machine is free. Regards, Ning Ning Li Search Technologies IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 |---------+----------------------------> | | Otis Gospodnetic | | | <otis_gospodnetic| | | @yahoo.com> | | | | | | 05/09/2006 07:30 | | | AM | | | Please respond to| | | java-dev | |---------+----------------------------> >------------------------------------------------------------------------------------------------------------------------------| | | | To: java-dev@lucene.apache.org | | cc: | | Subject: Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided) | >------------------------------------------------------------------------------------------------------------------------------| I agree - a delete (typically for a Term that represents a "primary key" for a Document in an index) followed by re-add of a Document is a very common scenario, and I'd love to see the numbers for that. Thanks, Otis > We experimented with three workloads: > - Insert only. 1.6M documents were inserted and the final > index size was 2.3GB. > - Insert/delete (big batches). The same documents were > inserted, but 25% were deleted. 1000 documents were > deleted for every 4000 inserted. > - Insert/delete (small batches). In this case, 5 documents > were deleted for every 20 inserted. Thanks, these benchmarks are very important. If you can do it, I'd love to see the results of a fourth benchmark, which represents a typical situation (which you also mentioned) of document updates: every single insert is preceded by a delete, 25% of which actually delete (the updated document existed previously) and the rest end up not finding an old document and not deleting anything. I expect this benchmark to show an even greater improvment of your approach over the naive IndexModifier. -- Nadav Har'El --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]