Re: effect of continuous deletes on index's read performance
Your continuous deletes won't affect performance noticeably, that's true. But you're really doing bad things with the commit after every add or delete. You haven't said whether you have a master/ slave setup or not, but assuming you're searching on the same machine you're indexing to, each time you commit, you're forcing the underlying searcher to close and re-open and any attendant autowarming to occur. All to get a single document searchable. 20 times a second. If you have a master/ slave setup, you're forcing the slave to fetch the changed parts of the index every time it polls, which is better than what's happening on the master, but still rather often. 400K documents isn't very big by Solr standards, so unless you can show performance problems, I wouldn't be concerned about index size, as Otis says, your per-document commit is probably hurting you far more than any index size savings. I'd actually think carefully about whether you need even 10 second commits. If you can stretch that out to minutes, so much the better. But it all depends upon your problem space. Best Erick On Mon, Feb 6, 2012 at 2:59 AM, prasenjit mukherjee prasen@gmail.com wrote: Thanks Otis. commitWithin will definitely work for me ( as I currently am using 3.4 version, which doesnt have NRT yet ). Assuming that I use commitWithin=10secs, are you saying that the continuous deletes ( without commit ) wont have any affect on performance ? I was under the impression that deletes just mark the doc-ids ( essentially means that the index size will remain the same ) , but wont actually do the compaction till someone calls optimize/commit, is my assumption not true ? -Thanks, Prasenjit On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Prasenjit, It sounds like at this point your main enemy might be those per-doc-add commits. Don't commit until you need to see your new docs in results. And if you need NRT then use softCommit option with Solr trunk (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin to limit commit's performance damage. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: prasenjit mukherjee prasen@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Monday, February 6, 2012 1:17 AM Subject: effect of continuous deletes on index's read performance I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes triggers compaction. Or with every commits ( while adds ) ? How bad they will effect on search response time. -Thanks, Prasenjit
Re: effect of continuous deletes on index's read performance
You could also try Solr 3.4 with RankingAlgorithm as this offers NRT. You can get more information about NRT for Solr 3.4 from here: http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 2/5/2012 11:59 PM, prasenjit mukherjee wrote: Thanks Otis. commitWithin will definitely work for me ( as I currently am using 3.4 version, which doesnt have NRT yet ). Assuming that I use commitWithin=10secs, are you saying that the continuous deletes ( without commit ) wont have any affect on performance ? I was under the impression that deletes just mark the doc-ids ( essentially means that the index size will remain the same ) , but wont actually do the compaction till someone calls optimize/commit, is my assumption not true ? -Thanks, Prasenjit On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Prasenjit, It sounds like at this point your main enemy might be those per-doc-add commits. Don't commit until you need to see your new docs in results. And if you need NRT then use softCommit option with Solr trunk (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin to limit commit's performance damage. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: prasenjit mukherjeeprasen@gmail.com To: solr-usersolr-user@lucene.apache.org Sent: Monday, February 6, 2012 1:17 AM Subject: effect of continuous deletes on index's read performance I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes triggers compaction. Or with every commits ( while adds ) ? How bad they will effect on search response time. -Thanks, Prasenjit
Re: effect of continuous deletes on index's read performance
Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share the same underlying in-memory datastructure so that IndexSearcher need not be reopened with every commit. On 2/6/12, Erick Erickson erickerick...@gmail.com wrote: Your continuous deletes won't affect performance noticeably, that's true. But you're really doing bad things with the commit after every add or delete. You haven't said whether you have a master/ slave setup or not, but assuming you're searching on the same machine you're indexing to, each time you commit, you're forcing the underlying searcher to close and re-open and any attendant autowarming to occur. All to get a single document searchable. 20 times a second. If you have a master/ slave setup, you're forcing the slave to fetch the changed parts of the index every time it polls, which is better than what's happening on the master, but still rather often. 400K documents isn't very big by Solr standards, so unless you can show performance problems, I wouldn't be concerned about index size, as Otis says, your per-document commit is probably hurting you far more than any index size savings. I'd actually think carefully about whether you need even 10 second commits. If you can stretch that out to minutes, so much the better. But it all depends upon your problem space. Best Erick On Mon, Feb 6, 2012 at 2:59 AM, prasenjit mukherjee prasen@gmail.com wrote: Thanks Otis. commitWithin will definitely work for me ( as I currently am using 3.4 version, which doesnt have NRT yet ). Assuming that I use commitWithin=10secs, are you saying that the continuous deletes ( without commit ) wont have any affect on performance ? I was under the impression that deletes just mark the doc-ids ( essentially means that the index size will remain the same ) , but wont actually do the compaction till someone calls optimize/commit, is my assumption not true ? -Thanks, Prasenjit On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Prasenjit, It sounds like at this point your main enemy might be those per-doc-add commits. Don't commit until you need to see your new docs in results. And if you need NRT then use softCommit option with Solr trunk (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin to limit commit's performance damage. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: prasenjit mukherjee prasen@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Monday, February 6, 2012 1:17 AM Subject: effect of continuous deletes on index's read performance I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes triggers compaction. Or with every commits ( while adds ) ? How bad they will effect on search response time. -Thanks, Prasenjit -- Sent from my mobile device
Re: effect of continuous deletes on index's read performance
On Mon, Feb 6, 2012 at 8:20 AM, prasenjit mukherjee prasen@gmail.com wrote: Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share the same underlying in-memory datastructure so that IndexSearcher need not be reopened with every commit. Because the semantics of an IndexReader in Lucene guarantee an unchanging point-in-time view of the index, as of when that IndexReader was opened. That said, Lucene has near-real-time readers, which keep point-in-time semantics but are very fast to open after adding/deleting docs, and do not require a (costly) commit. EG see my blog post: http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html The tests I ran there indexed at a highish rate (~1000 1KB sized docs per second, or 1 MB plain text per second, or ~2X Twitter's peak rate, at least as of last July), and the reopen latency was fast (~ 60 msec). Admittedly this was a fast machine, and the index was on a good SSD, and I used NRTCachingDir and MemoryCodec for the id field. But net/net Lucene's NRT search is very fast. It should easily handle your 20 docs/second rate, unless your docs are enormous Solr trunk has finally cutover to using these APIs, but unfortunately this has not been backported to Solr 3.x. You might want to check out ElasticSearch, an alternative to Solr, which does use Lucene's NRT APIs Mike McCandless http://blog.mikemccandless.com
effect of continuous deletes on index's read performance
I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes triggers compaction. Or with every commits ( while adds ) ? How bad they will effect on search response time. -Thanks, Prasenjit
Re: effect of continuous deletes on index's read performance
Hi Prasenjit, It sounds like at this point your main enemy might be those per-doc-add commits. Don't commit until you need to see your new docs in results. And if you need NRT then use softCommit option with Solr trunk (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin to limit commit's performance damage. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: prasenjit mukherjee prasen@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Monday, February 6, 2012 1:17 AM Subject: effect of continuous deletes on index's read performance I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes triggers compaction. Or with every commits ( while adds ) ? How bad they will effect on search response time. -Thanks, Prasenjit
Re: effect of continuous deletes on index's read performance
Thanks Otis. commitWithin will definitely work for me ( as I currently am using 3.4 version, which doesnt have NRT yet ). Assuming that I use commitWithin=10secs, are you saying that the continuous deletes ( without commit ) wont have any affect on performance ? I was under the impression that deletes just mark the doc-ids ( essentially means that the index size will remain the same ) , but wont actually do the compaction till someone calls optimize/commit, is my assumption not true ? -Thanks, Prasenjit On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Prasenjit, It sounds like at this point your main enemy might be those per-doc-add commits. Don't commit until you need to see your new docs in results. And if you need NRT then use softCommit option with Solr trunk (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin to limit commit's performance damage. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: prasenjit mukherjee prasen@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Monday, February 6, 2012 1:17 AM Subject: effect of continuous deletes on index's read performance I have a use case where documents are continuously added @ 20 docs/sec ( each doc add is also doing a commit ) and docs continuously getting deleted at the same rate. So the searchable index size remains the same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6). Will it have pauses when deletes triggers compaction. Or with every commits ( while adds ) ? How bad they will effect on search response time. -Thanks, Prasenjit