Re: effect of continuous deletes on index's read performance

2012-02-06 Thread Erick Erickson
Your continuous deletes won't affect performance
noticeably, that's true.

But you're really doing bad things with the commit after every
add or delete. You haven't said whether you have a master/
slave setup or not, but assuming you're searching on
the same machine you're indexing to, each time you commit,
you're forcing the underlying searcher to close and re-open and
any attendant autowarming to occur. All to get a single
document searchable. 20 times a second. If you have a master/
slave setup, you're forcing the slave to fetch the changed
parts of the index every time it polls, which is better than
what's happening on the master, but still rather often.

400K documents isn't very big by Solr standards, so unless
you can show performance problems, I wouldn't be concerned
about index size, as Otis says, your per-document commit
is probably hurting you far more than any index size
savings.

I'd actually think carefully about whether you need even
10 second commits. If you can stretch that out to minutes,
so much the better. But it all depends upon your problem
space.

Best
Erick


On Mon, Feb 6, 2012 at 2:59 AM, prasenjit mukherjee
prasen@gmail.com wrote:
 Thanks Otis. commitWithin  will definitely work for me ( as I
 currently am using 3.4 version, which doesnt have NRT yet ).

 Assuming that I use commitWithin=10secs, are you saying that the
 continuous deletes ( without commit ) wont have any affect on
 performance ?
 I was under the impression that deletes just mark the doc-ids (
 essentially means that the index size will remain the same ) , but
 wont actually do the compaction till someone calls optimize/commit, is
 my assumption  not true ?

 -Thanks,
 Prasenjit

 On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Prasenjit,

 It sounds like at this point your main enemy might be those per-doc-add 
 commits.  Don't commit until you need to see your new docs in results.  And 
 if you need NRT then use softCommit option with Solr trunk 
 (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin 
 to limit commit's performance damage.


  Otis

 
 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: prasenjit mukherjee prasen@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Monday, February 6, 2012 1:17 AM
Subject: effect of continuous deletes on index's read performance

I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit





Re: effect of continuous deletes on index's read performance

2012-02-06 Thread Nagendra Nagarajayya
You could also try Solr 3.4 with RankingAlgorithm as this offers NRT.  
You can get more information about NRT for Solr 3.4 from here:


http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

On 2/5/2012 11:59 PM, prasenjit mukherjee wrote:

Thanks Otis. commitWithin  will definitely work for me ( as I
currently am using 3.4 version, which doesnt have NRT yet ).

Assuming that I use commitWithin=10secs, are you saying that the
continuous deletes ( without commit ) wont have any affect on
performance ?
I was under the impression that deletes just mark the doc-ids (
essentially means that the index size will remain the same ) , but
wont actually do the compaction till someone calls optimize/commit, is
my assumption  not true ?

-Thanks,
Prasenjit

On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com  wrote:

Hi Prasenjit,

It sounds like at this point your main enemy might be those per-doc-add commits.  Don't 
commit until you need to see your new docs in results.  And if you need NRT then use 
softCommit option with Solr trunk 
(http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin to limit 
commit's performance damage.


  Otis


Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html





From: prasenjit mukherjeeprasen@gmail.com
To: solr-usersolr-user@lucene.apache.org
Sent: Monday, February 6, 2012 1:17 AM
Subject: effect of continuous deletes on index's read performance

I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit









Re: effect of continuous deletes on index's read performance

2012-02-06 Thread prasenjit mukherjee
Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share
the same underlying in-memory datastructure so that IndexSearcher need
not be reopened with every commit.


On 2/6/12, Erick Erickson erickerick...@gmail.com wrote:
 Your continuous deletes won't affect performance
 noticeably, that's true.

 But you're really doing bad things with the commit after every
 add or delete. You haven't said whether you have a master/
 slave setup or not, but assuming you're searching on
 the same machine you're indexing to, each time you commit,
 you're forcing the underlying searcher to close and re-open and
 any attendant autowarming to occur. All to get a single
 document searchable. 20 times a second. If you have a master/
 slave setup, you're forcing the slave to fetch the changed
 parts of the index every time it polls, which is better than
 what's happening on the master, but still rather often.

 400K documents isn't very big by Solr standards, so unless
 you can show performance problems, I wouldn't be concerned
 about index size, as Otis says, your per-document commit
 is probably hurting you far more than any index size
 savings.

 I'd actually think carefully about whether you need even
 10 second commits. If you can stretch that out to minutes,
 so much the better. But it all depends upon your problem
 space.

 Best
 Erick


 On Mon, Feb 6, 2012 at 2:59 AM, prasenjit mukherjee
 prasen@gmail.com wrote:
 Thanks Otis. commitWithin  will definitely work for me ( as I
 currently am using 3.4 version, which doesnt have NRT yet ).

 Assuming that I use commitWithin=10secs, are you saying that the
 continuous deletes ( without commit ) wont have any affect on
 performance ?
 I was under the impression that deletes just mark the doc-ids (
 essentially means that the index size will remain the same ) , but
 wont actually do the compaction till someone calls optimize/commit, is
 my assumption  not true ?

 -Thanks,
 Prasenjit

 On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Prasenjit,

 It sounds like at this point your main enemy might be those per-doc-add
 commits.  Don't commit until you need to see your new docs in results.
 And if you need NRT then use softCommit option with Solr trunk
 (http://search-lucene.com/?q=softcommitfc_project=Solr) or use
 commitWithin to limit commit's performance damage.


  Otis

 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: prasenjit mukherjee prasen@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Monday, February 6, 2012 1:17 AM
Subject: effect of continuous deletes on index's read performance

I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit





-- 
Sent from my mobile device


Re: effect of continuous deletes on index's read performance

2012-02-06 Thread Michael McCandless
On Mon, Feb 6, 2012 at 8:20 AM, prasenjit mukherjee
prasen@gmail.com wrote:

 Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share
 the same underlying in-memory datastructure so that IndexSearcher need
 not be reopened with every commit.

Because the semantics of an IndexReader in Lucene guarantee an
unchanging point-in-time view of the index, as of when that
IndexReader was opened.

That said, Lucene has near-real-time readers, which keep point-in-time
semantics but are very fast to open after adding/deleting docs, and do
not require a (costly) commit.  EG see my blog post:


http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html

The tests I ran there indexed at a highish rate (~1000 1KB sized docs
per second, or 1 MB plain text per second, or ~2X Twitter's peak rate,
at least as of last July), and the reopen latency was fast (~ 60
msec).  Admittedly this was a fast machine, and the index was on a
good SSD, and I used NRTCachingDir and MemoryCodec for the id field.

But net/net Lucene's NRT search is very fast.  It should easily handle
your 20 docs/second rate, unless your docs are enormous

Solr trunk has finally cutover to using these APIs, but unfortunately
this has not been backported to Solr 3.x.  You might want to check out
ElasticSearch, an alternative to Solr, which does use Lucene's NRT
APIs

Mike McCandless

http://blog.mikemccandless.com


effect of continuous deletes on index's read performance

2012-02-05 Thread prasenjit mukherjee
I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit


Re: effect of continuous deletes on index's read performance

2012-02-05 Thread Otis Gospodnetic
Hi Prasenjit,

It sounds like at this point your main enemy might be those per-doc-add 
commits.  Don't commit until you need to see your new docs in results.  And if 
you need NRT then use softCommit option with Solr trunk 
(http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin to 
limit commit's performance damage.


 Otis


Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html 




 From: prasenjit mukherjee prasen@gmail.com
To: solr-user solr-user@lucene.apache.org 
Sent: Monday, February 6, 2012 1:17 AM
Subject: effect of continuous deletes on index's read performance
 
I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit




Re: effect of continuous deletes on index's read performance

2012-02-05 Thread prasenjit mukherjee
Thanks Otis. commitWithin  will definitely work for me ( as I
currently am using 3.4 version, which doesnt have NRT yet ).

Assuming that I use commitWithin=10secs, are you saying that the
continuous deletes ( without commit ) wont have any affect on
performance ?
I was under the impression that deletes just mark the doc-ids (
essentially means that the index size will remain the same ) , but
wont actually do the compaction till someone calls optimize/commit, is
my assumption  not true ?

-Thanks,
Prasenjit

On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi Prasenjit,

 It sounds like at this point your main enemy might be those per-doc-add 
 commits.  Don't commit until you need to see your new docs in results.  And 
 if you need NRT then use softCommit option with Solr trunk 
 (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin 
 to limit commit's performance damage.


  Otis

 
 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: prasenjit mukherjee prasen@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Monday, February 6, 2012 1:17 AM
Subject: effect of continuous deletes on index's read performance

I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit