Solr on netty

2012-02-22 Thread prasenjit mukherjee
Is anybody aware of any effort regarding porting solr to a netty ( or
any other async-io based framework ) based framework.

Even on medium load ( 10 parallel clients )  with 16 shards
performance seems to deteriorate quite sharply compared another
alternative ( async-io based ) solution as load increases.

-Prasenjit

-- 
Sent from my mobile device


Re: Solr on netty

2012-02-22 Thread prasenjit mukherjee
Thanks for the response.

Yes we have 16 shards/partitions each on 16 different nodes and a
separate master Solr receiving continuous parallel requests from 10
client threads running on a single separate machine.
Our observation was that the perf degraded non linearly as the load (
no of concurrent clients ) increased.

Have some followup questions :

1.  What is the default maxnumber of threads configured when a Solr
instance make calls to other 16 partitions ?

2. How do I increase the max no of connections for solr--solr
interactions as u mentioned in your mail ?



On 2/22/12, Yonik Seeley yo...@lucidimagination.com wrote:
 On Wed, Feb 22, 2012 at 9:27 AM, prasenjit mukherjee
 prasen@gmail.com wrote:
 Is anybody aware of any effort regarding porting solr to a netty ( or
 any other async-io based framework ) based framework.

 Even on medium load ( 10 parallel clients )  with 16 shards
 performance seems to deteriorate quite sharply compared another
 alternative ( async-io based ) solution as load increases.

 By 16 shards do you mean you have 16 nodes and each single client
 request causes a distributed search across all of them them?  How many
 concurrent requests are your 10 clients making to each node?

 NIO works well when there are many clients, but when servicing those
 client requests only needs intermittent CPU.  That's not the pattern
 we see for search.
 You *can* easily configure Solr's Jetty to use NIO when accepting
 client connections, but it won't do you any good, just as switching to
 Netty wouldn't do anything here.

 Where NIO could help a little is with the requests that Solr makes to
 other Solr instances.  Solr is already architected for async
 request-response to other nodes, but the current underlying
 implementation uses HttpClient 3 (which doesn't have NIO).

 Anyway, it's unlikely that NIO vs BIO will make much of a difference
 with the numbers you're talking about (16 shards).

 Someone else reported that we have the number of connections per host
 set too low, and they saw big gains by increasing this.  There's an
 issue open to make this configurable in 3x:
 https://issues.apache.org/jira/browse/SOLR-3079
 We should probably up the max connections per host by default.

 -Yonik
 lucidimagination.com


-- 
Sent from my mobile device


Re: effect of continuous deletes on index's read performance

2012-02-06 Thread prasenjit mukherjee
Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share
the same underlying in-memory datastructure so that IndexSearcher need
not be reopened with every commit.


On 2/6/12, Erick Erickson erickerick...@gmail.com wrote:
 Your continuous deletes won't affect performance
 noticeably, that's true.

 But you're really doing bad things with the commit after every
 add or delete. You haven't said whether you have a master/
 slave setup or not, but assuming you're searching on
 the same machine you're indexing to, each time you commit,
 you're forcing the underlying searcher to close and re-open and
 any attendant autowarming to occur. All to get a single
 document searchable. 20 times a second. If you have a master/
 slave setup, you're forcing the slave to fetch the changed
 parts of the index every time it polls, which is better than
 what's happening on the master, but still rather often.

 400K documents isn't very big by Solr standards, so unless
 you can show performance problems, I wouldn't be concerned
 about index size, as Otis says, your per-document commit
 is probably hurting you far more than any index size
 savings.

 I'd actually think carefully about whether you need even
 10 second commits. If you can stretch that out to minutes,
 so much the better. But it all depends upon your problem
 space.

 Best
 Erick


 On Mon, Feb 6, 2012 at 2:59 AM, prasenjit mukherjee
 prasen@gmail.com wrote:
 Thanks Otis. commitWithin  will definitely work for me ( as I
 currently am using 3.4 version, which doesnt have NRT yet ).

 Assuming that I use commitWithin=10secs, are you saying that the
 continuous deletes ( without commit ) wont have any affect on
 performance ?
 I was under the impression that deletes just mark the doc-ids (
 essentially means that the index size will remain the same ) , but
 wont actually do the compaction till someone calls optimize/commit, is
 my assumption  not true ?

 -Thanks,
 Prasenjit

 On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Prasenjit,

 It sounds like at this point your main enemy might be those per-doc-add
 commits.  Don't commit until you need to see your new docs in results.
 And if you need NRT then use softCommit option with Solr trunk
 (http://search-lucene.com/?q=softcommitfc_project=Solr) or use
 commitWithin to limit commit's performance damage.


  Otis

 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: prasenjit mukherjee prasen@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Monday, February 6, 2012 1:17 AM
Subject: effect of continuous deletes on index's read performance

I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit





-- 
Sent from my mobile device


effect of continuous deletes on index's read performance

2012-02-05 Thread prasenjit mukherjee
I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit


Re: effect of continuous deletes on index's read performance

2012-02-05 Thread prasenjit mukherjee
Thanks Otis. commitWithin  will definitely work for me ( as I
currently am using 3.4 version, which doesnt have NRT yet ).

Assuming that I use commitWithin=10secs, are you saying that the
continuous deletes ( without commit ) wont have any affect on
performance ?
I was under the impression that deletes just mark the doc-ids (
essentially means that the index size will remain the same ) , but
wont actually do the compaction till someone calls optimize/commit, is
my assumption  not true ?

-Thanks,
Prasenjit

On Mon, Feb 6, 2012 at 1:13 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi Prasenjit,

 It sounds like at this point your main enemy might be those per-doc-add 
 commits.  Don't commit until you need to see your new docs in results.  And 
 if you need NRT then use softCommit option with Solr trunk 
 (http://search-lucene.com/?q=softcommitfc_project=Solr) or use commitWithin 
 to limit commit's performance damage.


  Otis

 
 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: prasenjit mukherjee prasen@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Monday, February 6, 2012 1:17 AM
Subject: effect of continuous deletes on index's read performance

I have a use case where documents are continuously added @ 20 docs/sec
( each doc add is also doing a commit )  and docs continuously getting
deleted at the same rate. So the searchable index size remains the
same : ~ 400K docs ( docs for last 6 hours ~ 20*3600*6).

Will it have pauses when deletes triggers compaction. Or with every
commits ( while adds ) ? How bad they will effect on search response
time.

-Thanks,
Prasenjit





SolrReplication configuration with frequent deletes and updates

2012-02-01 Thread prasenjit mukherjee
I have the following requirements :

1. Adds : 20 docs/sec
2. Searches : 100 searches/sec
3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron
job which deletes all documents more than 7 days old )

I am thinking of having 6 shards ( with each having 2 million docs )
with 1 master and 2 slaves with SolrReplication. Have following
questions :

1. With  50 searches/sec per shard with 2 million doc, what would be
the tentative response-time  ?  I am thinking of keeping it under 100
ms
2. What would be a reasonable latency ( pollInterval ) on slave for
SolrReplication ( all slaves connected with a single backplane ). Is 1
minute pollInterval reasonable ?
3. Is NRT a better/viable option compared to SolrReplication ?

-Thanks,
Prasenjit


Re: SolrReplication configuration with frequent deletes and updates

2012-02-01 Thread prasenjit mukherjee
Appreciate your reply. Have some more follow up questions inline.

On Thu, Feb 2, 2012 at 12:35 AM, Emmanuel Espina
espinaemman...@gmail.com wrote:
 1. Adds : 20 docs/sec
 2. Searches : 100 searches/sec
 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron
 job which deletes all documents more than 7 days old )

 I am thinking of having 6 shards ( with each having 2 million docs )
 with 1 master and 2 slaves with SolrReplication. Have following
 questions :

 1. With  50 searches/sec per shard with 2 million doc, what would be
 the tentative response-time  ?  I am thinking of keeping it under 100
 ms

 That are quite a lot of searches per second considering that you will
 have to search in 6 shards (the coordination and network latency
 affects the results). Also the components you use and the complexity
 of the query (as well as the number of segments in each shard) affects
 the time. 100 ms is probably a low threshold for your requirements,
 you will probably need to add more replicas.

Adding slaves ( using SolrReplication ) is fine as long as it scales
linear. I do understand that shards may not scale linearly, mostly
because of merging/network overhead, but  think will help in reducing
response time ( pls correct me if I am wrong ) .  I am more worried
about response time ( even on a lightly loaded slave ). The main
intention of sharding was to reduce the response time. Will it be
better to have a 2shardsX6slaves configuration compared to
6shardX2slaves ? Considering my total# docs is 12 million, wIll solr
be ok with 6 million docs/shard ?



 2. What would be a reasonable latency ( pollInterval ) on slave for
 SolrReplication ( all slaves connected with a single backplane ). Is 1
 minute pollInterval reasonable ?

 Yes, but it is not reasonable that each time you poll you get updates.
 That is, you shouldn't perform commits more than once every 10
 minutes. Otherwise we would be talking of near real time indexing,
 something that is in development in trunk
 http://wiki.apache.org/solr/NearRealtimeSearch

Hmm. 10 minutes latency is definitely too hight for me ( specially as
this is a streaming use case, i.e. show latest stuff first )  In that
case I can probably get rid of master-slave and update all the
replicated shards. But then I will have to do lot of leg-work ( what
if one of the slaves are down etc. etc. ) I was trying to avoid that.
Just curious to know what is the stability of  NRT ?



 3. Is NRT a better/viable option compared to SolrReplication ?

 That is something in development. AFAIK it works with shards (because
 nrt refers to indexing and with shards there isn't anything particular
 with the indexing) but with replication something different will be
 needed: SolrCloud I think covers these nrt aspects due to its
 different architecture (not master-slave that in replicas but all
 peers replicating)

So it seems SolrReplication is out ( if my pollInteterval  5 minute
), right ? Let me look into SolrCloud. Any suggestions which one is
more stable SolrCloud/NRT ?

-Thanks,
Prasenjit