Re: Solr performance issue
Hello, The problem turned out to be some sort of sharding/searching weirdness. We modified some code in sharding but I don't think it is related. In any case, we just added a new server that just shards (but doesn't do any searching / doesn't contain any index) and performance is very very good. Thanks for all the help. On Tue, Mar 22, 2011 at 14:30, Alexey Serba wrote: > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > goes > > to 8gb every 20 seconds or so, > > gc runs, falls down to 1gb. > > Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot. > > Do you return all results (ids) for your queries? Any tricky > faceting/sorting/function queries? > -- Doğacan Güney
Re: Solr performance issue
2011/3/14 Markus Jelsma > Mmm. SearchHander.handleRequestBody takes care of sharding. Could your > system > suffer from > http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock > ? > > We increased thread limit (which was 1 before) but it did not help. Anyway, we will try to disable sharding tomorrow. Maybe this can give us a better picture. Thanks for the help, everyone. > I'm not sure, i haven't seen a similar issue in a sharded environment, > probably because it was a controlled environment. > > > > Hello, > > > > 2011/3/14 Markus Jelsma > > > > > That depends on your GC settings and generation sizes. And, instead of > > > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > > > JConsole now shows a different profile output but load is still high and > > performance is still bad. > > > > Btw, here is the thread profile from newrelic: > > > > https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm > > > > Note that we do use a form of sharding so I maybe all the time spent > > waiting for handleRequestBody > > is results from sharding? > > > > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > > > > > It's actually, as I understand it, expected JVM behavior to see the > > > > heap rise to close to it's limit before it gets GC'd, that's how Java > > > > GC works. Whether that should happen every 20 seconds or what, I > > > > don't > > > > > > nkow. > > > > > > > Another option is setting better JVM garbage collection arguments, so > > > > GC doesn't "stop the world" so often. I have had good luck with my > > > > Solr using this: -XX:+UseParallelGC > > > > > > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > > > > Hello again, > > > > > > > > > > 2011/3/14 Markus Jelsma > > > > > > > > > >>> Hello, > > > > >>> > > > > >>> 2011/3/14 Markus Jelsma > > > > >>> > > > > >>>> Hi Doğacan, > > > > >>>> > > > > >>>> Are you, at some point, running out of heap space? In my > > > > >>>> experience, that's the common cause of increased load and > > > > >>>> excessivly high > > > > > > response > > > > > > > >>>> times (or time > > > > >>>> outs). > > > > >>> > > > > >>> How much of a heap size would be enough? Our index size is > growing > > > > >>> slowly but we did not have this problem > > > > >>> a couple weeks ago where index size was maybe 100mb smaller. > > > > >> > > > > >> Telling how much heap space is needed isn't easy to say. It > usually > > > > >> needs to > > > > >> be increased when you run out of memory and get those nasty OOM > > > > > > errors, > > > > > > > >> are you getting them? > > > > >> Replication eventes will increase heap usage due to cache warming > > > > >> queries and > > > > >> autowarming. > > > > > > > > > > Nope, no OOM errors. > > > > > > > > > >>> We left most of the caches in solrconfig as default and only > > > > > > increased > > > > > > > >>> filterCache to 1024. We only ask for "id"s (which > > > > >>> are unique) and no other fields during queries (though we do > > > > > > faceting). > > > > > > > >>> Btw, 1.6gb of our index is stored fields (we store > > > > >>> everything for now, even though we do not get them during > queries), > > > > > > and > > > > > > > >>> about 1gb of index. > > > > >> > > > > >> Hmm, it seems 4000 would be enough indeed. What about the > > > > >> fieldCache, are there > > > > >> a lot of entries? Is there an insanity count? Do you use boost > > > > >> functions? > > > > > > > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > > > > boosting functions. > > > > > > > >
Re: Solr performance issue
Hello, 2011/3/14 Markus Jelsma > That depends on your GC settings and generation sizes. And, instead of > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > JConsole now shows a different profile output but load is still high and performance is still bad. Btw, here is the thread profile from newrelic: https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm Note that we do use a form of sharding so I maybe all the time spent waiting for handleRequestBody is results from sharding? > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > It's actually, as I understand it, expected JVM behavior to see the heap > > rise to close to it's limit before it gets GC'd, that's how Java GC > > works. Whether that should happen every 20 seconds or what, I don't > nkow. > > > > Another option is setting better JVM garbage collection arguments, so GC > > doesn't "stop the world" so often. I have had good luck with my Solr > > using this: -XX:+UseParallelGC > > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > > Hello again, > > > > > > 2011/3/14 Markus Jelsma > > > > > >>> Hello, > > >>> > > >>> 2011/3/14 Markus Jelsma > > >>> > > >>>> Hi Doğacan, > > >>>> > > >>>> Are you, at some point, running out of heap space? In my experience, > > >>>> that's the common cause of increased load and excessivly high > response > > >>>> times (or time > > >>>> outs). > > >>> > > >>> How much of a heap size would be enough? Our index size is growing > > >>> slowly but we did not have this problem > > >>> a couple weeks ago where index size was maybe 100mb smaller. > > >> > > >> Telling how much heap space is needed isn't easy to say. It usually > > >> needs to > > >> be increased when you run out of memory and get those nasty OOM > errors, > > >> are you getting them? > > >> Replication eventes will increase heap usage due to cache warming > > >> queries and > > >> autowarming. > > > > > > Nope, no OOM errors. > > > > > >>> We left most of the caches in solrconfig as default and only > increased > > >>> filterCache to 1024. We only ask for "id"s (which > > >>> are unique) and no other fields during queries (though we do > faceting). > > >>> Btw, 1.6gb of our index is stored fields (we store > > >>> everything for now, even though we do not get them during queries), > and > > >>> about 1gb of index. > > >> > > >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, > > >> are there > > >> a lot of entries? Is there an insanity count? Do you use boost > > >> functions? > > > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > > boosting functions. > > > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > > > goes to 8gb every 20 seconds or so, > > > gc runs, falls down to 1gb. > > > > > > Btw, our current revision was just a random choice but up until two > weeks > > > ago it has been rock-solid so we have been > > > reluctant to update to another version. Would you recommend upgrading > to > > > latest trunk? > > > > > >> It might not have anything to do with memory at all but i'm just > asking. > > >> There > > >> may be a bug in your revision causing this. > > >> > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not > get > > >> > > >> any > > >> > > >>> improvement in load. I can try monitoring with Jconsole > > >>> with 8gigs of heap to see if it helps. > > >>> > > >>>> Cheers, > > >>>> > > >>>>> Hello everyone, > > >>>>> > > >>>>> First of all here is our Solr setup: > > >>>>> > > >>>>> - Solr nightly build 986158 > > >>>>> - Running solr inside the default jetty comes with solr build > > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with > 24gb > > >> > > >> of > > >> > > >>>>> RAM) -
Re: Solr performance issue
Hello again, 2011/3/14 Markus Jelsma > > Hello, > > > > 2011/3/14 Markus Jelsma > > > > > Hi Doğacan, > > > > > > Are you, at some point, running out of heap space? In my experience, > > > that's the common cause of increased load and excessivly high response > > > times (or time > > > outs). > > > > How much of a heap size would be enough? Our index size is growing slowly > > but we did not have this problem > > a couple weeks ago where index size was maybe 100mb smaller. > > Telling how much heap space is needed isn't easy to say. It usually needs > to > be increased when you run out of memory and get those nasty OOM errors, are > you getting them? > Replication eventes will increase heap usage due to cache warming queries > and > autowarming. > > Nope, no OOM errors. > > > > We left most of the caches in solrconfig as default and only increased > > filterCache to 1024. We only ask for "id"s (which > > are unique) and no other fields during queries (though we do faceting). > > Btw, 1.6gb of our index is stored fields (we store > > everything for now, even though we do not get them during queries), and > > about 1gb of index. > > Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are > there > a lot of entries? Is there an insanity count? Do you use boost functions? > > Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting functions. Btw, I am monitoring output via jconsole with 8gb of ram and it still goes to 8gb every 20 seconds or so, gc runs, falls down to 1gb. Btw, our current revision was just a random choice but up until two weeks ago it has been rock-solid so we have been reluctant to update to another version. Would you recommend upgrading to latest trunk? > It might not have anything to do with memory at all but i'm just asking. > There > may be a bug in your revision causing this. > > > > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get > any > > improvement in load. I can try monitoring with Jconsole > > with 8gigs of heap to see if it helps. > > > > > Cheers, > > > > > > > Hello everyone, > > > > > > > > First of all here is our Solr setup: > > > > > > > > - Solr nightly build 986158 > > > > - Running solr inside the default jetty comes with solr build > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb > of > > > > RAM) - Index replicated (on optimize) to slaves via Solr Replication > > > > - Size of index is around 2.5gb > > > > - No incremental writes, index is created from scratch(delete old > > > > > > documents > > > > > > > -> commit new documents -> optimize) every 6 hours > > > > - Avg # of request per second is around 60 (for a single slave) > > > > - Avg time per request is around 25ms (before having problems) > > > > - Load on each is slave is around 2 > > > > > > > > We are using this set-up for months without any problem. However last > > > > > > week > > > > > > > we started to experience very weird performance problems like : > > > > > > > > - Avg time per request increased from 25ms to 200-300ms (even higher > if > > > > > > we > > > > > > > don't restart the slaves) > > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 > > > > cpu) > > > > > > > > When we profile solr we see two very strange things : > > > > > > > > 1 - This is the jconsole output: > > > > > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > > > > > As you see gc runs for every 10-15 seconds and collects more than 1 > gb > > > > of memory. (Actually if you wait more than 10 minutes you see spikes > > > > up to > > > > > > 4gb > > > > > > > consistently) > > > > > > > > 2 - This is the newrelic output : > > > > > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > > > > > As you see solr spent ridiculously long time in > > > > SolrDispatchFilter.doFilter() method. > > > > > > > > > > > > Apart form these, when we clean the index directory, re-replicate and > > > > restart each slave one by one we see a relief in the system but > after > > > > > > some > > > > > > > time servers start to melt down again. Although deleting index and > > > > replicating doesn't solve the problem, we think that these problems > are > > > > somehow related to replication. Because symptoms started after > > > > > > replication > > > > > > > and once it heals itself after replication. I also see > > > > lucene-write.lock files in slaves (we don't have write.lock files in > > > > the master) which I think we shouldn't see. > > > > > > > > > > > > If anyone can give any sort of ideas, we will appreciate it. > > > > > > > > Regards, > > > > Dogacan Guney > -- Doğacan Güney
Re: Solr performance issue
Hello, 2011/3/14 Markus Jelsma > Hi Doğacan, > > Are you, at some point, running out of heap space? In my experience, that's > the common cause of increased load and excessivly high response times (or > time > outs). > > How much of a heap size would be enough? Our index size is growing slowly but we did not have this problem a couple weeks ago where index size was maybe 100mb smaller. We left most of the caches in solrconfig as default and only increased filterCache to 1024. We only ask for "id"s (which are unique) and no other fields during queries (though we do faceting). Btw, 1.6gb of our index is stored fields (we store everything for now, even though we do not get them during queries), and about 1gb of index. Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any improvement in load. I can try monitoring with Jconsole with 8gigs of heap to see if it helps. > Cheers, > > > Hello everyone, > > > > First of all here is our Solr setup: > > > > - Solr nightly build 986158 > > - Running solr inside the default jetty comes with solr build > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of > > RAM) - Index replicated (on optimize) to slaves via Solr Replication > > - Size of index is around 2.5gb > > - No incremental writes, index is created from scratch(delete old > documents > > -> commit new documents -> optimize) every 6 hours > > - Avg # of request per second is around 60 (for a single slave) > > - Avg time per request is around 25ms (before having problems) > > - Load on each is slave is around 2 > > > > We are using this set-up for months without any problem. However last > week > > we started to experience very weird performance problems like : > > > > - Avg time per request increased from 25ms to 200-300ms (even higher if > we > > don't restart the slaves) > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu) > > > > When we profile solr we see two very strange things : > > > > 1 - This is the jconsole output: > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > As you see gc runs for every 10-15 seconds and collects more than 1 gb of > > memory. (Actually if you wait more than 10 minutes you see spikes up to > 4gb > > consistently) > > > > 2 - This is the newrelic output : > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > As you see solr spent ridiculously long time in > > SolrDispatchFilter.doFilter() method. > > > > > > Apart form these, when we clean the index directory, re-replicate and > > restart each slave one by one we see a relief in the system but after > some > > time servers start to melt down again. Although deleting index and > > replicating doesn't solve the problem, we think that these problems are > > somehow related to replication. Because symptoms started after > replication > > and once it heals itself after replication. I also see lucene-write.lock > > files in slaves (we don't have write.lock files in the master) which I > > think we shouldn't see. > > > > > > If anyone can give any sort of ideas, we will appreciate it. > > > > Regards, > > Dogacan Guney > -- Doğacan Güney
Solr performance issue
Hello everyone, First of all here is our Solr setup: - Solr nightly build 986158 - Running solr inside the default jetty comes with solr build - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM) - Index replicated (on optimize) to slaves via Solr Replication - Size of index is around 2.5gb - No incremental writes, index is created from scratch(delete old documents -> commit new documents -> optimize) every 6 hours - Avg # of request per second is around 60 (for a single slave) - Avg time per request is around 25ms (before having problems) - Load on each is slave is around 2 We are using this set-up for months without any problem. However last week we started to experience very weird performance problems like : - Avg time per request increased from 25ms to 200-300ms (even higher if we don't restart the slaves) - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu) When we profile solr we see two very strange things : 1 - This is the jconsole output: https://skitch.com/meralan/rwwcf/mail-886x691 As you see gc runs for every 10-15 seconds and collects more than 1 gb of memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb consistently) 2 - This is the newrelic output : https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm As you see solr spent ridiculously long time in SolrDispatchFilter.doFilter() method. Apart form these, when we clean the index directory, re-replicate and restart each slave one by one we see a relief in the system but after some time servers start to melt down again. Although deleting index and replicating doesn't solve the problem, we think that these problems are somehow related to replication. Because symptoms started after replication and once it heals itself after replication. I also see lucene-write.lock files in slaves (we don't have write.lock files in the master) which I think we shouldn't see. If anyone can give any sort of ideas, we will appreciate it. Regards, Dogacan Guney
Re: Nutch with SOLR
On 9/26/07, Brian Whitman <[EMAIL PROTECTED]> wrote: > > > Sami has a patch in there which used a older version of the solr > > client. with the current solr client in the SVN tree, his patch > > becomes much easier. > > your job would be to upgrade the patch and mail it back to him so > > he can update his blog, or post it as a patch for inclusion in > > nutch/contrib (if sami is ok with that). If you have issues with > > how to use the solr client api, solr-user is here to help. > > > > I've done this. Apparently someone else has taken on the solr-nutch > job and made it a bit more complicated (which is good for the long > term) than sami's original patch -- https://issues.apache.org/jira/ > browse/NUTCH-442 That someone else is me :) NUTCH-442 is one of the issues that I want to really see resolved. Unfortunately, I haven't received many (as in, none) comments, so I haven't made further progress on it. Patch at NUTCH-442 tries to integrate SOLR in a way that it is a "first-class" citizen (so to speak), so that you can index to solr or to lucene within the same Indexer job (or both), retrieve search results from a solr server or from nutch's home-grown index servers in nutch's web UI (or a combination of both). And I think patch lays the ground work for generating summaries from solr. > > But we still use a version of Sami's patch that works on both trunk > nutch and trunk solr (solrj.) I sent my changes to sami when we did > it, if you need it let me know... > > > -b > > > -- Doğacan Güney
Re: Passing arguments to analyzers
On 7/17/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/17/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > Hi, > > On 7/17/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 7/17/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > > > Is there a way to pass arguments to analyzers per document? Let's say > > > that I have a field "foo" which is tokenized by WhitespaceTokenizer > > > and then filtered by MyCustomStemmingFilter. MyCustomStemmingFilter > > > can stem more than one language but (obviously) it needs to know the > > > language of the document it is working on. So what I need is to > > > specify the language per document (actually per field). > > > > > > Here is an example: > > > > > >My spam egg bars baz. > > > > > > > > > Is something like this possible with Solr? > > > > You can pass extra args to a factory in the field-type definition, but > > that means you would need a separate field-type per language. > > Thanks for the answer. > > Your suggestion would work for this particular use case, but IMHO > there are other use cases out there that can benefit (for example, one > may process the whole document and add parameters for each field based > on document-level analysis) from this. > > Would this be useful feature for Solr? I would actually like to work > on it if others consider this as a useful add-on. It seems simple to > accomplish and it would probably be a good introduction to Solr > internals. wrt passing more info to the analyzer at runtime to alter its behavior: analyzers are singletons per field-type, and Analyzer.tokenStream(String fieldName, Reader reader) is called to analyze a particular value. There isn't really a good place to pass in extra info. During XML parsing, we *could* build up a Map of the parameters we don't know about, but then the question is what to do with them. One hackish solution would be to store them in a thread-local where your analyzer could check it. Perhaps a custom request processor could do that task. It seems there does need to be some kind of framework more aligned with parsing documents (word docs, pdf, etc), for adding metadata to fields at runtime (how does UIMA or Tika fit into this?), and for mapping the fields+metadata to Solr/Lucene document fields. I opened SORL-313 for this. -Yonik -- Doğacan Güney
Re: Passing arguments to analyzers
Hi, On 7/17/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/17/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > Hi all, > > Is there a way to pass arguments to analyzers per document? Let's say > that I have a field "foo" which is tokenized by WhitespaceTokenizer > and then filtered by MyCustomStemmingFilter. MyCustomStemmingFilter > can stem more than one language but (obviously) it needs to know the > language of the document it is working on. So what I need is to > specify the language per document (actually per field). > > Here is an example: > >My spam egg bars baz. > > > Is something like this possible with Solr? You can pass extra args to a factory in the field-type definition, but that means you would need a separate field-type per language. Thanks for the answer. Your suggestion would work for this particular use case, but IMHO there are other use cases out there that can benefit (for example, one may process the whole document and add parameters for each field based on document-level analysis) from this. Also, again IMHO, per-field parameters are more flexible. Would this be useful feature for Solr? I would actually like to work on it if others consider this as a useful add-on. It seems simple to accomplish and it would probably be a good introduction to Solr internals. -Yonik -- Doğacan Güney
Passing arguments to analyzers
Hi all, Is there a way to pass arguments to analyzers per document? Let's say that I have a field "foo" which is tokenized by WhitespaceTokenizer and then filtered by MyCustomStemmingFilter. MyCustomStemmingFilter can stem more than one language but (obviously) it needs to know the language of the document it is working on. So what I need is to specify the language per document (actually per field). Here is an example: My spam egg bars baz. Is something like this possible with Solr? -- Doğacan Güney