Re: Autocommit blocking adds? AutoCommit Speedup?
Thanks Mike, I'm running it in a few environments that do not have post-commit hooks and so far have not seen any issues. A white-box review will be helpful in seeing things that may rarely occur, or if I had any misuse if internal data structures that I do not know well enough to measure. --j Mike Klaas wrote: Hi Jayson, It is on my list of things to do. I've been having a very busy week and and am also working all weekend. I hope to get to it next week sometime, if no-one else has taken it. cheers, -mike On 8-May-09, at 10:15 PM, jayson.minard wrote: First cut of updated handler now in: https://issues.apache.org/jira/browse/SOLR-1155 Needs review from those that know Lucene better, and double check for errors in locking or other areas of the code. Thanks. --j -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23587440.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autocommit blocking adds? AutoCommit Speedup?
Siddharth, The settings you have in your solrconfig for ramBufferSizeMB and maxBufferedDocs control how much memory may be used during indexing besides any overhead with the documents being in-flight at a given moment (deserialized into memory but not yet handed to lucene). There are streaming versions of the client/server that help with that as well by trying to process them as they arrive. The patch SOLR-1155 does not add more memory use, but rather lets the threads proceed through to Lucene without blocking within Solr as often. So instead of a stuck thread holding the documents in memory they will be moving threads doing the same. So the buffer sizes mentioned above along with the amount of documents you send at a time will push your memory footprint. Send smaller batches (less efficient) or stream; or make sure you have enough memory for the amount of docs you send at a time. For indexing I slow my commits down if there is no need for the documents to become available for query right away. For pure indexing, a long autoCommit time and large max document count ebfore auto committing helps. Committing isn't what flushes them out of memory, it is what makes the on-disk version part of the overall index. Over committing will slow you way down. Especially if you have any listeners on the commits doing a lot of work (i.e. Solr distribution). Also, if you are querying on the indexer that can eat memory and compete with the memory you are trying to reserve for indexing. So a split model of indexing and querying on different instances lets you tune each the best; but then you have a gap in time from indexing to querying as the trade-off. It is hard to say what is going on with GC without knowing what garbage collection settings you are passing to the VM, and what version of the Java VM you are using. Which garbage collector are you using and what tuning parameters? I tend to use Parallel GC on my indexers with GC Overhead limit turned off allowing for some pauses (which users don't see on a back-end indexer) but good GC with lower heap fragmentation. I tend to use concurrent mark and sweep GC on my query slaves with tuned incremental mode and pacing which is a low pause collector taking advantage of the cores on my servers and can incrementally keep up with the needs of a query slave. -- Jayson Gargate, Siddharth wrote: Hi all, I am also facing the same issue where autocommit blocks all other requests. I having around 1,00,000 documents with average size of 100K each. It took more than 20 hours to index. I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25. Do I need more configuration changes? Also I see that memory usage goes to peak level of heap specified(6 GB in my case). Looks like Solr spends most of the time in GC. According to my understanding, fix for Solr-1155 would be that commit will run in background and new documents will be queued in the memory. But I am afraid of the memory consumption by this queue if commit takes much longer to complete. Thanks, Siddharth -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540569.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
Indexing speed comes down to a lot of factors. The settings as talked about above, VM settings, the size of the documents, how many are sent at a time, how active you can keep the indexer (i.e. one thread sending documents lets the indexer relax whereas N threads keeps pressure on the indexer), how often you commit and of course the hardware you are running on. Disk I/O is a big factor along with having enough cores and memory to buffer and process the documents. Comparing two sets of numbers is tough. We have indexes that range from indexing a few million an hour up through 18-20M per hour in a indexing cluster for distributed search. --j Jack Godwin wrote: 20+ hours? I index 3 million records in 3 hours. Is your auto commit causing a snapshot? What do you have listed in the events. Jack On 5/14/09, Gargate, Siddharth sgarg...@ptc.com wrote: Hi all, I am also facing the same issue where autocommit blocks all other requests. I having around 1,00,000 documents with average size of 100K each. It took more than 20 hours to index. I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25. Do I need more configuration changes? Also I see that memory usage goes to peak level of heap specified(6 GB in my case). Looks like Solr spends most of the time in GC. According to my understanding, fix for Solr-1155 would be that commit will run in background and new documents will be queued in the memory. But I am afraid of the memory consumption by this queue if commit takes much longer to complete. Thanks, Siddharth -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540643.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
Created issue: https://issues.apache.org/jira/browse/SOLR-1155 Jim Murphy wrote: Any pointers to this newer more concurrent behavior in lucene? I can try an experiment where I downgrade the iwCommit lock to the iwAccess lock to allow updates to happen during commit. Would you expect that to work? Thanks for bootstrapping me on this. Jim -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23453693.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
Can we move this to patch files within the JIRA issue please. Will make it easier to review and help out a as a patch to current trunk. --j Jim Murphy wrote: Yonik Seeley-2 wrote: ...your code snippit elided and edited below ... Don't take this code as correct (or even compiling) but is this the essence? I moved shared access to the writer inside the read lock and kept the other non-commit bits to the write lock. I'd need to rethink the locking in a more fundamental way but is this close to idea? public void commit(CommitUpdateCommand cmd) throws IOException { if (cmd.optimize) { optimizeCommands.incrementAndGet(); } else { commitCommands.incrementAndGet(); } Future[] waitSearcher = null; if (cmd.waitSearcher) { waitSearcher = new Future[1]; } boolean error=true; iwCommit.lock(); try { log.info(start +cmd); if (cmd.optimize) { closeSearcher(); openWriter(); writer.optimize(cmd.maxOptimizeSegments); } finally { iwCommit.unlock(); } iwAccess.lock(); try { writer.commit(); } finally { iwAccess.unlock(); } iwCommit.lock(); try { callPostCommitCallbacks(); if (cmd.optimize) { callPostOptimizeCallbacks(); } // open a new searcher in the sync block to avoid opening it // after a deleteByQuery changed the index, or in between deletes // and adds of another commit being done. core.getSearcher(true,false,waitSearcher); // reset commit tracking tracker.didCommit(); log.info(end_commit_flush); error=false; } finally { iwCommit.unlock(); addCommands.set(0); deleteByIdCommands.set(0); deleteByQueryCommands.set(0); numErrors.set(error ? 1 : 0); } // if we are supposed to wait for the searcher to be registered, then we should do it // outside of the synchronized block so that other update operations can proceed. if (waitSearcher!=null waitSearcher[0] != null) { try { waitSearcher[0].get(); } catch (InterruptedException e) { SolrException.log(log,e); } catch (ExecutionException e) { SolrException.log(log,e); } } } -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23455432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
First cut of updated handler now in: https://issues.apache.org/jira/browse/SOLR-1155 Needs review from those that know Lucene better, and double check for errors in locking or other areas of the code. Thanks. --j jayson.minard wrote: Can we move this to patch files within the JIRA issue please. Will make it easier to review and help out a as a patch to current trunk. --j Jim Murphy wrote: Yonik Seeley-2 wrote: ...your code snippit elided and edited below ... Don't take this code as correct (or even compiling) but is this the essence? I moved shared access to the writer inside the read lock and kept the other non-commit bits to the write lock. I'd need to rethink the locking in a more fundamental way but is this close to idea? public void commit(CommitUpdateCommand cmd) throws IOException { if (cmd.optimize) { optimizeCommands.incrementAndGet(); } else { commitCommands.incrementAndGet(); } Future[] waitSearcher = null; if (cmd.waitSearcher) { waitSearcher = new Future[1]; } boolean error=true; iwCommit.lock(); try { log.info(start +cmd); if (cmd.optimize) { closeSearcher(); openWriter(); writer.optimize(cmd.maxOptimizeSegments); } finally { iwCommit.unlock(); } iwAccess.lock(); try { writer.commit(); } finally { iwAccess.unlock(); } iwCommit.lock(); try { callPostCommitCallbacks(); if (cmd.optimize) { callPostOptimizeCallbacks(); } // open a new searcher in the sync block to avoid opening it // after a deleteByQuery changed the index, or in between deletes // and adds of another commit being done. core.getSearcher(true,false,waitSearcher); // reset commit tracking tracker.didCommit(); log.info(end_commit_flush); error=false; } finally { iwCommit.unlock(); addCommands.set(0); deleteByIdCommands.set(0); deleteByQueryCommands.set(0); numErrors.set(error ? 1 : 0); } // if we are supposed to wait for the searcher to be registered, then we should do it // outside of the synchronized block so that other update operations can proceed. if (waitSearcher!=null waitSearcher[0] != null) { try { waitSearcher[0].get(); } catch (InterruptedException e) { SolrException.log(log,e); } catch (ExecutionException e) { SolrException.log(log,e); } } } -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23457422.html Sent from the Solr - User mailing list archive at Nabble.com.
Odd q.op=AND and fq interactions in Solr 1.3.0
I am seeing odd behavior where a query such as: http://localhost:8983/solr/select/?q=mossversion=2.2start=0rows=10indent=onfq=docType%3AFancy+Doc works until I add q.op=AND http://localhost:8983/solr/select/?q=mossq.op=ANDversion=2.2start=0rows=10indent=onfq=docType%3AFancy+Doc which then causes 0 results. There is only one term in the q parameter, and the fq parameter I would think would be unaffected, and both of its terms are there anyway although in a String field and not a tokenized way (so maybe it is inserting an AND between Fancy AND Doc which isn't matching the untokenized string anymore?) Is there a way to apply q.op to q and not fq at the same time; if that is indeed the problem? Cheers! -- Jayson -- View this message in context: http://www.nabble.com/Odd-q.op%3DAND-and-fq-interactions-in-Solr-1.3.0-tp20106953p20106953.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Odd q.op=AND and fq interactions in Solr 1.3.0
BY the way, the fq parameter is being used to apply a facet value as a refinement which is why it is not tokenized and is a string. jayson.minard wrote: I am seeing odd behavior where a query such as: http://localhost:8983/solr/select/?q=mossversion=2.2start=0rows=10indent=onfq=docType%3AFancy+Doc works until I add q.op=AND http://localhost:8983/solr/select/?q=mossq.op=ANDversion=2.2start=0rows=10indent=onfq=docType%3AFancy+Doc which then causes 0 results. There is only one term in the q parameter, and the fq parameter I would think would be unaffected, and both of its terms are there anyway although in a String field and not a tokenized way (so maybe it is inserting an AND between Fancy AND Doc which isn't matching the untokenized string anymore?) Is there a way to apply q.op to q and not fq at the same time; if that is indeed the problem? Cheers! -- Jayson -- View this message in context: http://www.nabble.com/Odd-q.op%3DAND-and-fq-interactions-in-Solr-1.3.0-tp20106953p20106971.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Odd q.op=AND and fq interactions in Solr 1.3.0
Thinking about this, I could work around it by quoting the facet value so that the AND isn't inserted between tokens in the fq parameter. jayson.minard wrote: BY the way, the fq parameter is being used to apply a facet value as a refinement which is why it is not tokenized and is a string. jayson.minard wrote: I am seeing odd behavior where a query such as: http://localhost:8983/solr/select/?q=mossversion=2.2start=0rows=10indent=onfq=docType%3AFancy+Doc works until I add q.op=AND http://localhost:8983/solr/select/?q=mossq.op=ANDversion=2.2start=0rows=10indent=onfq=docType%3AFancy+Doc which then causes 0 results. There is only one term in the q parameter, and the fq parameter I would think would be unaffected, and both of its terms are there anyway although in a String field and not a tokenized way (so maybe it is inserting an AND between Fancy AND Doc which isn't matching the untokenized string anymore?) Is there a way to apply q.op to q and not fq at the same time; if that is indeed the problem? Cheers! -- Jayson -- View this message in context: http://www.nabble.com/Odd-q.op%3DAND-and-fq-interactions-in-Solr-1.3.0-tp20106953p20106996.html Sent from the Solr - User mailing list archive at Nabble.com.