Re: Updates during Optimize
The current limitation or pause is when the ram buffer is flushing to disk - when an optimize starts and is running ~4 hours, you say, that DIH is flushing the doc`s during this pause into the index ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Updates-during-Optimize-tp2811183p2815064.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updates during Optimize
Not cleanly currently. SOLR-2193: Re-architect Update Handler, should take care of this though. - Mark On Apr 12, 2011, at 8:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Updates-during-Optimize-tp2811183p2811183.html Sent from the Solr - User mailing list archive at Nabble.com. - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org
Re: Updates during Optimize
On 4/12/2011 6:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? You can't index and optimize at the same time, and I'm pretty sure that there isn't any way to make it possible that wouldn't involve a major rewrite of Lucene, and possibly Solr. The devs would have to say differently if my understanding is wrong. The optimize takes place at the Lucene level. I can't give you much in-depth information, but I can give you some high level stuff. What it's doing is equivalent to a merge, down to one segment. This is not the same as a straight file copy. It must read the entire Lucene data structure and build a new one from scratch. The process removes deleted documents and will also upgrade the version number of the index if it was written with an older version of Lucene. It's very likely that the reading side of the process is nearly as comprehensive as the CheckIndex program, but it also has to write out a new index segment. The net result -- the process gives your CPU and especially your I/O subsystem a workout, simultaneously. If you were to make your I/O subsystem faster, you would probably see a major improvement in your optimize times. On my installation, it takes about 11 minutes to optimize one my 16GB shards, each with 9 million docs. These live in virtual machines that are stored on a six-drive RAID10 array using 7200RPM SATA disks. One of my pie-in-the-sky upgrade dreams is to replace that with a four-drive RAID10 array using SSD, the other two drives would be regular SATA -- a mirrored OS partition. Thanks, Shawn
Re: Updates during Optimize
You can index and optimize at the same time. The current limitation or pause is when the ram buffer is flushing to disk, however that's changing with the DocumentsWriterPerThread implementation, eg, LUCENE-2324. On Tue, Apr 12, 2011 at 8:34 AM, Shawn Heisey s...@elyograg.org wrote: On 4/12/2011 6:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? You can't index and optimize at the same time, and I'm pretty sure that there isn't any way to make it possible that wouldn't involve a major rewrite of Lucene, and possibly Solr. The devs would have to say differently if my understanding is wrong. The optimize takes place at the Lucene level. I can't give you much in-depth information, but I can give you some high level stuff. What it's doing is equivalent to a merge, down to one segment. This is not the same as a straight file copy. It must read the entire Lucene data structure and build a new one from scratch. The process removes deleted documents and will also upgrade the version number of the index if it was written with an older version of Lucene. It's very likely that the reading side of the process is nearly as comprehensive as the CheckIndex program, but it also has to write out a new index segment. The net result -- the process gives your CPU and especially your I/O subsystem a workout, simultaneously. If you were to make your I/O subsystem faster, you would probably see a major improvement in your optimize times. On my installation, it takes about 11 minutes to optimize one my 16GB shards, each with 9 million docs. These live in virtual machines that are stored on a six-drive RAID10 array using 7200RPM SATA disks. One of my pie-in-the-sky upgrade dreams is to replace that with a four-drive RAID10 array using SSD, the other two drives would be regular SATA -- a mirrored OS partition. Thanks, Shawn