How do I send multiple user version parameter value for a delet by id request with multiple IDs ?
http://solr:port/collection/update?version_field=1234582.0 works for the payload {"delete":[{"id":"51"},{"id":"5"}]} with multiple ids and the version parameter is applied to both requests. Is it possible to send separate version numbers for the ids in the parameter ? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: One complex wildcard query lead solr OOM
Hi Jack, Thanks! Do you know how to disable wildcards, What I want is if input is wildcards, just treat it as a normal char. I other words, I just want to disable wildcard search. Thanks, Jian On Fri, Jan 22, 2016 at 1:55 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > The Lucene WildcardQuery class does have an additional constructor that has > a maxDeterminizedStates parameter to limit the size of the FSM generated by > a wildcard queery, and the QueryParserBase class does have a method to set > that parameter, setMaxDeterminizedStates, but there is no Solr support for > invoking that method. > > It is probably worth a Jira to get such support. Even then, the question is > how Solr should respond to the exception that gets thrown when that limit > is reached. > > Even if Solr had an option to disable complex wildcards, the question is > what you want to happen when a complex wildcard is used - should an > exception be thrown, or... what? > > I suppose it might be simplest to have a Solr option to limit the number of > wildcard characters used in a term, like to 4 or 8 or something like that. > IOW, have Solr check the term before the WildcardQuery is generated. > > -- Jack Krupansky > > On Thu, Jan 21, 2016 at 8:18 PM, Jian Mou <la.mouj...@gmail.com> wrote: > > > We are using Solr as our search engine, and recently notice some user > > input wildcard query can lead to Solr dead loop in > > > > org.apache.lucene.util.automaton.Operations.determinize() > > > > , and it also eats memory and finally OOM. > > > > the wildcard query seems like **?-???o·???è??**。 > > > > Although we can validate the input parameter, but I also wonder is there > > any configuration which can disable complex wildcard query like this > which > > lead to serve performance problems. > > > > > > Related statcktrace > > > > > > [image: Inline image 1] > > > > > > > > Thanks, > > > > Jian > > >
One complex wildcard query lead solr OOM
We are using Solr as our search engine, and recently notice some user input wildcard query can lead to Solr dead loop in org.apache.lucene.util.automaton.Operations.determinize() , and it also eats memory and finally OOM. the wildcard query seems like **?-???o·???è??**。 Although we can validate the input parameter, but I also wonder is there any configuration which can disable complex wildcard query like this which lead to serve performance problems. Related statcktrace [image: Inline image 1] Thanks, Jian
ERROR CommitTracker auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher
I am using Solr cloud 4.10.0 and I have been seeing this for a while now. Does anyone has similar experience or clue what's happening ? auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:607) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.lang.IllegalArgumentException: maxValue must be non-negative (got: -3) at org.apache.lucene.util.packed.PackedInts.bitsRequired(PackedInts.java:1141) at org.apache.lucene.codecs.lucene41.ForUtil.bitsRequired(ForUtil.java:253) at org.apache.lucene.codecs.lucene41.ForUtil.writeBlock(ForUtil.java:174) at org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter.addPosition(Lucene41PostingsWriter.java:377) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:486) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:80) at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:114) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:441) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:510) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:621) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:414) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:277) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476) ... 11 more -- View this message in context: http://lucene.472066.n3.nabble.com/ERROR-CommitTracker-auto-commit-error-org-apache-solr-common-SolrException-Error-opening-new-searcher-tp4163727.html Sent from the Solr - User mailing list archive at Nabble.com.
Storing termVectors for PreAnalyzed type field
Can anyone please confirm if this is not supported in the current version? I am trying to use pre-analyzed field for mlt and when creating the mltquery it does not get anything from the index. I think even if I set termVectors=true in the PreAnalyzed field definition, it is being ignored. -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-termVectors-for-PreAnalyzed-type-field-tp4112006.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr3.4 on tomcat 7.0.23 - hung with error threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed
We are getting the following error intermittently( two in two weeks interval). The load on server seems to be usual.I see in the log that just before the failure ( 4-5 mins) qtime was very high, normally those query will be processed within 300 ms but before failure they took more than 100 secs.So many requests timed out. I would really appreciate any pointer to find the root cause of this problem .. Aug 27, 2013 3:51:27 PM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet [default] in context with path [/solrSearch] threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed at org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:451) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:283) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr3-4-on-tomcat-7-0-23-hung-with-error-threw-exception-java-lang-IllegalStateException-Cannot-call-tp4087342.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DocValues with docValuesFormat=Disk
Hi, If you use a codec which is not default, you need to download/build lucene codec jars and put it in solr_home/lib directory and add the codecfactory in the solr config file. Look here for detail instruction http://wiki.apache.org/solr/SimpleTextCodecExample Best, Mou -- View this message in context: http://lucene.472066.n3.nabble.com/DocValues-with-docValuesFormat-Disk-tp4058238p4058344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: long QTime for big index
Just to close this discussion , we solved the problem by splitting the index. It turned out that distributed search with 12 cores are faster than searching two cores. All queries ,tomcat configuration, jvm configuration remain same. Now queries are served in milliseconds. On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene] ml-node+s472066n4037870...@n3.nabble.com wrote: Thank you again. Unfortunately the index files will not fit in the RAM.I have to try using document cache. I am also moving my index to SSD again, we took our index off when fusion IO cards failed twice during indexing and index was corrupted.Now with the bios upgrade and new driver, it is supposed to be more reliable. Also I am going to look into the client app to verify that it is making proper query requests. Surprisingly when I used a much lower value than default for defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs very well, the same queries return in less than one sec . I am not sure yet, need to run solrmeter with different heap size , with cache and without cache etc. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html To unsubscribe from long QTime for big index, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: long QTime for big index
We have two boxes, they are really nice servers, 32 core cpu, 192 G memory with both RAID arrays and fusion IOs. But each of them running two instances of Solr, one for indexing and the other for searching.Search index is on fusion IO card. Each instance has 11 cores and a small core for making indexing almost realtime. We have around 300 Million documents and 250G on disk. They are all metadata . Search queries are very diverse and they do not repeat very frequently , 40 -60 qps. Before we had two cores each 125 G on disk and solr was taking long time to get results from those two cores. CPU use was 90%. We never had problem with indexing. 50% of all our docs gets updated every day, so very high indexing rate. On Thu, Feb 14, 2013 at 4:20 PM, alxsss [via Lucene] ml-node+s472066n4040545...@n3.nabble.com wrote: Hi, It is curious to know how many linux boxes do you have and how many cores in each of them. It was my understanding that solr puts in the memory all documents found for a keyword, not the whole index. So, why it must be faster with more cores, when number of selected documents from many separate cores are the same as from one core? Thanks. Alex. -Original Message- From: Mou [hidden email] To: solr-user [hidden email] Sent: Thu, Feb 14, 2013 2:35 pm Subject: Re: long QTime for big index Just to close this discussion , we solved the problem by splitting the index. It turned out that distributed search with 12 cores are faster than searching two cores. All queries ,tomcat configuration, jvm configuration remain same. Now queries are served in milliseconds. On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene] [hidden email] wrote: Thank you again. Unfortunately the index files will not fit in the RAM.I have to try using document cache. I am also moving my index to SSD again, we took our index off when fusion IO cards failed twice during indexing and index was corrupted.Now with the bios upgrade and new driver, it is supposed to be more reliable. Also I am going to look into the client app to verify that it is making proper query requests. Surprisingly when I used a much lower value than default for defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs very well, the same queries return in less than one sec . I am not sure yet, need to run solrmeter with different heap size , with cache and without cache etc. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html To unsubscribe from long QTime for big index, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040545.html To unsubscribe from long QTime for big index, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040549.html Sent from the Solr - User mailing list archive at Nabble.com.
long QTime for big index
I am running solr 3.4 on tomcat 7. Our index is very big , two cores each 120G. We are searching the slaves which are replicated every 30 min. I am using filtercache only and We have more than 90% cache hits. We use lot of filter queries, queries are usually pretty big with 10-20 fq parameters. Not all filters are cached. we are searching three shards and query looks like this -- shards=core1,core2,core3q=*:* fq=field1:some valuefq = -field2=some valuesort=date But some queries are taking more than 30 sec to return result and the behavior is intermittent. I can not find relation to replication. We are using Zing jvm which reduced our GC pause to milli secs, so GC is not a problem. How can I improve the qtime? Is it at all possible to get a better qtime given our index size? Thank you for your suggestion. -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: long QTime for big index
Thanks for your reply. No, there is no eviction, yet. The time is spent mostly on org.apache.solr.handler.component.QueryComponent to process the request. Again, the time varies widely for same query. -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037741.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: long QTime for big index
Thank you Shawn for reading all of my previous entries and for a detailed answer. To clarify, the third shard is used to store the recently added/updated data. Two main big cores take very long to replicate ( when a full replication is required) so the third one helps us to return the newly indexed documents quickly. It gets deleted every hour after we replicate the two other cores with last hour's of new/changed data. This third core is very small. As you said, with that big index and distributed queries , searches were too slow.So we tried to use the filtercache to speed up the queries. Filtercache was big as we have thousands of different filters. other caches were not very helpful as queries are not repetitive and there is heavy add/update to the index. So we have to use bigger heap size. Now,with that big heap size GC pauses was horrible, so we moved to Zing jvm. Zing jvm is now using 134 G of heap and does not have those big pauses but it also does not leave much memory for OS. I am now testing with small heap, small filter cache ( just the basic filters) and lot of memory available for OS disk cache. If that does not work, I am thinking of breaking my index down into small pieces. -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037781.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: long QTime for big index
Thank you again. Unfortunately the index files will not fit in the RAM.I have to try using document cache. I am also moving my index to SSD again, we took our index off when fusion IO cards failed twice during indexing and index was corrupted.Now with the bios upgrade and new driver, it is supposed to be more reliable. Also I am going to look into the client app to verify that it is making proper query requests. Surprisingly when I used a much lower value than default for defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs very well, the same queries return in less than one sec . I am not sure yet, need to run solrmeter with different heap size , with cache and without cache etc. -- View this message in context: http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Score threshold 'reasonably', independent of results returned
Hi, I think that this totally depends on your requirements and thus applicable for a user scenario. Score does not have any absolute meaning, it is always relative to the query. If you want to watch some particular queries and want to show results with score above previously set threshold, you can use this. If I always have that x% threshold in place , there may be many queries which would not return anything and I certainly do not want that. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4002673.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr 3.4 running on tomcat7 - very slow search
Hi Eric, I totally agree. That's what I also figured ultimately. One thing I am not clear. The replication is supposed to be incremental ? But looks like it is trying to replicate the whole index. May be I am changing the index so frequently, it is triggering auto merge and a full replication ? I am thinking in right direction? I see that when I start the solr search instance before I start feeding the solr Index, my searches are fine BUT it is using the old searcher so I am not seeing the updates in the result. So now I am trying to change my architecture. I am going to have a core dedicated to receive daily updates, which is going to be 5 million docs and size is going to be little less than 5 G, which is small and replication will be faster? I will search both the cores i.e. old data and the daily updates and do a field collapsing on my unique id so that I do not return duplicate results .I haven't tried grouping results ; so not sure about the performance. Any suggestion ? Eventually I have to use Solr trunk like you suggested. Thank you for your help, On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] ml-node+s472066n3995754...@n3.nabble.com wrote: bq: This index is only used for searching and being replicated every 7 sec from the master. This is a red-flag. 7 second replication times are likely forcing your app to spend all its time opening new searchers. Your cached filter queries are likely rarely being re-used because they're being thrown away every 7 seconds. This assumes you're changing your master index frequently. If you need near real time, consider Solr trunk and SolrCloud, but trying to simulate NRT with very short replication intervals is usually a bad idea. A quick test would be to disable replication for a bit (or lengthen it to, say, 10 minutes) Best Erick On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden email]http://user/SendEmail.jtp?type=nodenode=3995754i=0 wrote: FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes of index with 16Gb memory. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr 3.4 running on tomcat7 - very slow search
Increasing the polling interval does help. But the requirement is to get a document indexed and searchable instantly ( sounds like RTS), 30 sec is acceptable.I need to look at Solr NRT and cloud. I created a new core to accept daily updates and replicate every 10 sec. Two other cores with 234 Million documents are configured to replicate only once a day. I am feeding all three cores but two big cores are not replicating. While searching I am running a group.field on my unique id and taking the most updated one. Right now it looks fine.Every day I am going to delete the last day's records from the daily update. I am planning to use rsync for replication, it will be fusion IO to fusion IO , so hopefully will be very fast. What do you think ? We use windows service ( written in dot net C#) to feed the data using REST call. That is really fast , we can feed more than 15 Million data in a day to two cores easily. I am using solr config autocommit = 5 sec I could not figure out how I was able to achieve those numbers in my test environment, all configuration were same except I had lot less memory in test ! I am trying to find out what I am missing in other configuration. My SLES kernel version is different in production, its a 3.0.* , test was 2.6.* but I do not think that can cause a problem. Thank you again, Mou On Wed, Jul 18, 2012 at 6:26 PM, Erick Erickson [via Lucene] ml-node+s472066n3995861...@n3.nabble.com wrote: Replication will indeed be incremental. But if you commit too often (and committing too often a common mistake) then the merging will eventually merge everything into new segments and the whole thing will be replicated. Additionally, optimizing (or forceMerge in 4.x) will make a single segment and force the entire index to replicate. You should emphatically _not_ have to have two cores. Solr is built to handle replication etc. I suspect your committing too often or some other mis-configuration and you're creating a problem for yourself. Here's what I'd do: 1 increase the polling interval to, say, 10 minutes (or however long you can live with stale data) on the slave. 2 decrease the commits you're doing. This could involve the autocommit options you might have set in solrconfig.xml. It could be your client (don't know how you're indexing, solrJ?) and the commitWithin parameter. Could be you're optimizing (if you are, stop it!). Note that ramBufferSizeMB has no influence on how often things are _committed_. When this limit is exceeded, the accumulated indexing data is written to the currently-open segment. Multiple flushes can go to the _same_ segment. The write-once nature of segments means that after a segment is closed (through a commit), it is not changed. But a segment that is not closed may be written to multiple times until it's closed. HTH Erick On Wed, Jul 18, 2012 at 1:25 PM, Mou [hidden email]http://user/SendEmail.jtp?type=nodenode=3995861i=0 wrote: Hi Eric, I totally agree. That's what I also figured ultimately. One thing I am not clear. The replication is supposed to be incremental ? But looks like it is trying to replicate the whole index. May be I am changing the index so frequently, it is triggering auto merge and a full replication ? I am thinking in right direction? I see that when I start the solr search instance before I start feeding the solr Index, my searches are fine BUT it is using the old searcher so I am not seeing the updates in the result. So now I am trying to change my architecture. I am going to have a core dedicated to receive daily updates, which is going to be 5 million docs and size is going to be little less than 5 G, which is small and replication will be faster? I will search both the cores i.e. old data and the daily updates and do a field collapsing on my unique id so that I do not return duplicate results .I haven't tried grouping results ; so not sure about the performance. Any suggestion ? Eventually I have to use Solr trunk like you suggested. Thank you for your help, On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=3995861i=1 wrote: bq: This index is only used for searching and being replicated every 7 sec from the master. This is a red-flag. 7 second replication times are likely forcing your app to spend all its time opening new searchers. Your cached filter queries are likely rarely being re-used because they're being thrown away every 7 seconds. This assumes you're changing your master index frequently. If you need near real time, consider Solr trunk and SolrCloud, but trying to simulate NRT with very short replication intervals is usually a bad idea. A quick test would be to disable replication for a bit (or lengthen it to, say, 10 minutes) Best Erick On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden email
Re: Using Solr 3.4 running on tomcat7 - very slow search
Brian, Thanks again. swappiness is set to 60 and from vmstat , I can see no swapping is going on. Also I am using fusion IO SSD for storing my index. I also used the visualVM and it shows me that it is blocked on lock=org.apache.lucene.index.SegmentCoreReaders@299172a7. Any clue? On Mon, Jul 16, 2012 at 10:38 PM, Bryan Loofbourrow [via Lucene] ml-node+s472066n3995452...@n3.nabble.com wrote: Another thing you may wish to ponder is this blog entry from Mike McCandless: http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html In it, he discusses the poor interaction between OS swapping, and long-neglected allocations in a JVM. You're on Linux, which has decent control over swapping decisions, so you may find that a tweak is in order, especially if you can discover evidence that the hard drive is being worked hard during GC. If the problem exists, it might be especially pronounced in your large JVM. I have no direct evidence of thrashing during GC (I am not sure how to go about gathering such evidence), but I have seen, on a Windows machine, a Tomcat running Solr refuse to shut down for many minutes, while a Resource Monitor session reports that that same Tomcat process is frantically reading from the page file the whole time. So there is something besides plausibility to the idea. -- Bryan -Original Message- From: Mou [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=3995452i=0] Sent: Monday, July 16, 2012 9:09 PM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995452i=1 Subject: Re: Using Solr 3.4 running on tomcat7 - very slow search Thanks Brian. Excellent suggestion. I haven't used VisualVM before but I am going to use it to see where CPU is going. I saw that CPU is overly used. I haven't seen so much CPU use in testing. Although I think GC is not a problem, splitting the jvm per shard would be a good idea. On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=3995452i=2 wrote: 5 min is ridiculously long for a query that used to take 65ms. That ought to be a great clue. The only two things I've seen that could cause that are thrashing, or GC. Hard to see how it could be thrashing, given your hardware, so I'd initially suspect GC. Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a nice blue line. And if it's not GC, try out its Sampler tab, and see where the CPU is spending its time. FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. Maybe multiple JVMs and sharding, even on the same machine, would serve you better than a monster 70GB JVM. -- Bryan -Original Message- From: Mou [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=0] Sent: Monday, July 16, 2012 7:43 PM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=1 Subject: Using Solr 3.4 running on tomcat7 - very slow search Hi, Our index is divided into two shards and each of them has 120M docs , total size 75G in each core. The server is a pretty good one , jvm is given memory of 70G and about same is left for OS (SLES 11) . We use all dynamic fields except th eunique id and are using long queries but almost all of them are filter queires, Each query may have 10 -30 fq parameters. When I tested the index ( same size) but with max heap size 40 G, queries were blazing fast. I used solrmeter to load test and it was happily serving 12000 queries or more per min with avg 65 ms qtime.We had an excellent filtercache hit ratio. This index is only used for searching and being replicated every 7 sec from the master. But now in production server it is horribly slow and taking 5 mins(qtime) to return a query ( same query). What could go wrong? Really appreciate your suggestions on debugging this thing.. -- View this message in context: http://lucene.472066.n3.nabble.com/Using- Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7- very-slow-search-tp3995436p3995446.html To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=uns ubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg 1MTA5MTUw . NAMLhttp://lucene.472066.n3
Using Solr 3.4 running on tomcat7 - very slow search
Hi, Our index is divided into two shards and each of them has 120M docs , total size 75G in each core. The server is a pretty good one , jvm is given memory of 70G and about same is left for OS (SLES 11) . We use all dynamic fields except th eunique id and are using long queries but almost all of them are filter queires, Each query may have 10 -30 fq parameters. When I tested the index ( same size) but with max heap size 40 G, queries were blazing fast. I used solrmeter to load test and it was happily serving 12000 queries or more per min with avg 65 ms qtime.We had an excellent filtercache hit ratio. This index is only used for searching and being replicated every 7 sec from the master. But now in production server it is horribly slow and taking 5 mins(qtime) to return a query ( same query). What could go wrong? Really appreciate your suggestions on debugging this thing.. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr 3.4 running on tomcat7 - very slow search
Thanks Brian. Excellent suggestion. I haven't used VisualVM before but I am going to use it to see where CPU is going. I saw that CPU is overly used. I haven't seen so much CPU use in testing. Although I think GC is not a problem, splitting the jvm per shard would be a good idea. On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] ml-node+s472066n3995446...@n3.nabble.com wrote: 5 min is ridiculously long for a query that used to take 65ms. That ought to be a great clue. The only two things I've seen that could cause that are thrashing, or GC. Hard to see how it could be thrashing, given your hardware, so I'd initially suspect GC. Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a nice blue line. And if it's not GC, try out its Sampler tab, and see where the CPU is spending its time. FWIW, when asked at what point one would want to split JVMs and shard, on the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC cost reasons. You're way above that. Maybe multiple JVMs and sharding, even on the same machine, would serve you better than a monster 70GB JVM. -- Bryan -Original Message- From: Mou [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=0] Sent: Monday, July 16, 2012 7:43 PM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=1 Subject: Using Solr 3.4 running on tomcat7 - very slow search Hi, Our index is divided into two shards and each of them has 120M docs , total size 75G in each core. The server is a pretty good one , jvm is given memory of 70G and about same is left for OS (SLES 11) . We use all dynamic fields except th eunique id and are using long queries but almost all of them are filter queires, Each query may have 10 -30 fq parameters. When I tested the index ( same size) but with max heap size 40 G, queries were blazing fast. I used solrmeter to load test and it was happily serving 12000 queries or more per min with avg 65 ms qtime.We had an excellent filtercache hit ratio. This index is only used for searching and being replicated every 7 sec from the master. But now in production server it is horribly slow and taking 5 mins(qtime) to return a query ( same query). What could go wrong? Really appreciate your suggestions on debugging this thing.. -- View this message in context: http://lucene.472066.n3.nabble.com/Using- Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995446.html To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995449.html Sent from the Solr - User mailing list archive at Nabble.com.
Preferred file system for Solr
We are using a VeloDrive (SSD) to store and search our solr index. The system is running on SLES 11. Right now we are using ext3 but wondering if anyone has any experience using XFS/ext3 on SSD or FusionIO for Solr . Does solr have any preference for the underlined file system ? Our index will be big ( around 250 M ) docs to start with , adding 5 M docs every week , 50 to 60 % of that will be updates. -- View this message in context: http://lucene.472066.n3.nabble.com/Preferred-file-system-for-Solr-tp3771250p3771250.html Sent from the Solr - User mailing list archive at Nabble.com.