Re: Solr Search Inconsistent result
Hi Ankit, So you are using solr.UUIDUpdateProcessorFactory to pupolate unique keys? Ahmet On Tuesday, December 23, 2014 7:13 AM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Ahmet, Thanks for the response. Document ID is unique because we are using *UUID* to generate the document ID. Thanks, Ankit On Tue, Dec 23, 2014 at 12:16 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Do you happen to have documents with with unique id in different shards? When unique ids are not unique across shards, people see inconsistent results. Please see : http://find.searchhub.org/document/2814183511b5a52 Ahmet On Monday, December 22, 2014 8:06 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Ahmet, Thanks for the response. I am running this query from Solr Search UI. The number of shards for a collection is two. Thanks, Ankit On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Is this sharded query? Ahmet On Monday, December 22, 2014 4:47 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi All, We are getting inconsistent search result on searching on *multivalued* field: *Input Query:* ( t : [ 0 TO 1419245069253 ] )AND(_all:impetus-i0111.impetus.co.in) The _all field is multivalued field. The above query is returning sometimes 11 records and sometimes 12471 records. Please help. Thanks, Ankit -- Thanks, Ankit Jain -- Thanks, Ankit Jain
Loading data to FieldValueCache
Hello, From the wiki, it states that http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for faceting. Can someone please throw some light on how to load data to this cache. Like on what solrquery option does this consider the data to be loaded to this cache. My requirement is I have 10 facet fields (with facetlimit - 5) to be shown in my UI. I want to speed up this by using this cache. Is there a way where I can specify only the list of fields to be loaded to FieldValue Cache? Thanks, Manohar
RE: Loading data to FieldValueCache
Manohar Sripada [manohar...@gmail.com] wrote: From the wiki, it states that http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for faceting. Can someone please throw some light on how to load data to this cache. Like on what solrquery option does this consider the data to be loaded to this cache. The values are loaded on first facet call with facet.method=fc. http://wiki.apache.org/solr/SimpleFacetParameters#facet.method My requirement is I have 10 facet fields (with facetlimit - 5) to be shown in my UI. I want to speed up this by using this cache. Is there a way where I can specify only the list of fields to be loaded to FieldValue Cache? Add a facet call as explicit warmup in your solrconfig.xml. You might want to consider DocValues for your facet fields. https://cwiki.apache.org/confluence/display/solr/DocValues - Toke Eskildsen
Re: Endless 100% CPU usage on searcherExecutor thread
We do not use dates here, at least not too often. Usually its something like type:Profile (we do use it from the rails application so type describes model names), opted_in:true, etc. Solr wasn't running too long though, so this could not show the real state. Currently for the filter cache it shows 1 and 0.84 of the query results. I also increased the cache size to autowarm: 512, initial: 1024 and size 4096, which is actually never reached because of commits. Best, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Endless-100-CPU-usage-on-searcherExecutor-thread-tp4175088p4175725.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Paging on large indexes
On 12/22/2014 04:27 PM, Erick Erickson wrote: Have you read Hossman's blog here? https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#referrer=solr.pl Oh thanks, that's a pretty interesting read. The scale we're investigating is several orders of magnitude larger than what was tested there, so I'm still a bit worried. Because if you're trying this and _still_ getting bad performance we need to know. I'll definitely keep you posted when our test results on larger indexes (~50 billion documents) come in, but this sadly won't be any time soon (infrastructure sucks). The largest index I currently have access to is about a billion documents in size. Paging there is a nightmare, but the Solr version is too old to support cursors so I'm afraid I can't offer any useful data. Does anyone have any performance data on multi-billion-document indexes? With or without SolrCloud? Bram: One minor pedantic clarification.. The first round-trip only returns the id and sort criteria (score by default), not the whole document, although the effect is the same, as you page N into the corpus, the default implementation returns N * (pageNum + 1) entries. Even worse, each node itself has to _sort_ that many entries Then a second call is made to get the page-worth of docs... I was trying to keep it short and sweet, but yes, that's the way I think it works ;-) That said, though, its pretty easy to argue that the 500th page is pretty useless, nobody will ever hit the next page button 499 times. Nobody will hit next 499 times, but a lot of our users skip to the last page quite often. Maybe I should make *that* as hard as possible. Hmm. Thanks for the tips! - Bram
Re: How to define Json list in schema in xml
A multiValued=true string field is what you're after here. Erik On Dec 22, 2014, at 23:19, Xin Cai xincai2...@gmail.com wrote: hi guys I am looking to parse a json file that contains fields that has a list of schools So for example I would have {Schools:[ name: Seirra High School, name: Walnut elementary School]} So if I want to be able to index all the different schools so i can fast look up with people that went to a certain school, what is the best way for me to define the schema file? I have looked around and I don't think Solr has a native support for list but I can be wrong because list is used so oftenAny advice would be appreciated. Thanks Xin Cai
'Illegal character in query' on Solr cloud 4.10.1
Hi All, I am using SolrCloud 4.10.1 and I have 3 shards with replication factor of 2 , i.e is 6 nodes altogether. When I query the server1 out of 6 nodes in the cluster with the below query , it works fine , but any other node in the cluster when queried with the same query results in a *HTTP Status 500 - {msg=Illegal character in query at index 181:* error. The character at index 181 is the boost character ^. I have see a Jira SOLR-5971 https://issues.apache.org/jira/browse/SOLR-5971 for a similar issue , how can I overcome this issue. The query I use is below. Thanks in Advance! http://xx2..com:8081/solr/dyCollection1_shard2_replica1/?q=x+x+xxsort=score+descwt=jsonindent=truedebugQuery=truedefType=edismaxqf=productName ^1.5+productDescriptionmm=1pf=productName+productDescriptionps=1pf2=productName+productDescriptionpf3=productName+productDescriptionstopwords=truelowercaseOperators=true
Solr server becomes non-responsive.
Hi, I have a setup of 4 shard Solr cluster with embedded zookeeper on one of them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is 24g. When executing a huge query with many wildcards inside it the server crashes and becomes non-responsive. Even the dashboard does not responds and shows connection lost error. This requires me to restart the servers. I have set query *timeAllowed* to 5 minutes but it also seems to be not getting honored and the query hangs around. Kindly help me debug the issue and fix it or if there is a way the timeAllowed can be made honored or a the query which is hanging for sometime can be stopped. *Following are few exceptions.* *org.apache.zookeeper.server.NIOServerCnxn doIO* WARNING: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid session id, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) *org.apache.zookeeper.server.NIOServerCnxn sendBuffer* SEVERE: Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) *org.apache.zookeeper.server.persistence.FileTxnLog commit* WARNING: fsync-ing the write ahead log in SyncThread:0 took 28346ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) *Caused by: java.lang.OutOfMemoryError: Java heap space* at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.init(Lucene41PostingsReader.java:640) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docsAndPositions(Lucene41PostingsReader.java:278) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docsAndPositions(SegmentTermsEnum.java:1011) at org.apache.lucene.search.spans.SpanTermQuery.getSpans(SpanTermQuery.java:123) at org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:180) at org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193) at org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:182) at org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193) at org.apache.lucene.search.spans.NearSpansUnordered$SpansCell.next(NearSpansUnordered.java:88) at org.apache.lucene.search.spans.NearSpansUnordered.initList(NearSpansUnordered.java:295) at org.apache.lucene.search.spans.NearSpansUnordered.next(NearSpansUnordered.java:164) at org.apache.lucene.search.spans.SpanScorer.init(SpanScorer.java:46) at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:88) at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at
Re: Solr server becomes non-responsive.
To add to the details of above issue the query as soon it is executed, even before the OutOfMemory error causes the solr servers to become non-responsive. On Tue, Dec 23, 2014 at 5:04 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I have a setup of 4 shard Solr cluster with embedded zookeeper on one of them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is 24g. When executing a huge query with many wildcards inside it the server crashes and becomes non-responsive. Even the dashboard does not responds and shows connection lost error. This requires me to restart the servers. I have set query *timeAllowed* to 5 minutes but it also seems to be not getting honored and the query hangs around. Kindly help me debug the issue and fix it or if there is a way the timeAllowed can be made honored or a the query which is hanging for sometime can be stopped. *Following are few exceptions.* *org.apache.zookeeper.server.NIOServerCnxn doIO* WARNING: caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid session id, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) *org.apache.zookeeper.server.NIOServerCnxn sendBuffer* SEVERE: Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) *org.apache.zookeeper.server.persistence.FileTxnLog commit* WARNING: fsync-ing the write ahead log in SyncThread:0 took 28346ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) *Caused by: java.lang.OutOfMemoryError: Java heap space* at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsAndPositionsEnum.init(Lucene41PostingsReader.java:640) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docsAndPositions(Lucene41PostingsReader.java:278) at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docsAndPositions(SegmentTermsEnum.java:1011) at org.apache.lucene.search.spans.SpanTermQuery.getSpans(SpanTermQuery.java:123) at org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:180) at org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193) at org.apache.lucene.search.spans.SpanOrQuery$1.initSpanQueue(SpanOrQuery.java:182) at org.apache.lucene.search.spans.SpanOrQuery$1.next(SpanOrQuery.java:193) at org.apache.lucene.search.spans.NearSpansUnordered$SpansCell.next(NearSpansUnordered.java:88) at org.apache.lucene.search.spans.NearSpansUnordered.initList(NearSpansUnordered.java:295) at org.apache.lucene.search.spans.NearSpansUnordered.next(NearSpansUnordered.java:164) at org.apache.lucene.search.spans.SpanScorer.init(SpanScorer.java:46) at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:88) at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356) at
How best to fork Solr for enhancement
Hi, I've (hopefully) made some time to do some work on the Solr Admin UI (convert it to AngularJS). I plan to do it on a clone of the lucene-solr project at GitHub. Before I dive too thoroughly into this, I wanted to see if there were any best practices that would make it easier to back-port these changes into SVN should I actually succeed at producing something useful. Is it enough just to make a branch called SOLR-5507 and start committing my changes there? Periodically, I'll zip up the relevant bits and attach them to the JIRA ticket. TIA Upayavira
Re: How best to fork Solr for enhancement
You can make github play well with Apache Infra. See https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow On Tue, Dec 23, 2014 at 11:52 AM, Upayavira u...@odoko.co.uk wrote: Hi, I've (hopefully) made some time to do some work on the Solr Admin UI (convert it to AngularJS). I plan to do it on a clone of the lucene-solr project at GitHub. Before I dive too thoroughly into this, I wanted to see if there were any best practices that would make it easier to back-port these changes into SVN should I actually succeed at producing something useful. Is it enough just to make a branch called SOLR-5507 and start committing my changes there? Periodically, I'll zip up the relevant bits and attach them to the JIRA ticket. TIA Upayavira -- Regards, Shalin Shekhar Mangar.
Re: How best to fork Solr for enhancement
Perfect, thanks! On Tue, Dec 23, 2014, at 07:10 AM, Shalin Shekhar Mangar wrote: You can make github play well with Apache Infra. See https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow On Tue, Dec 23, 2014 at 11:52 AM, Upayavira u...@odoko.co.uk wrote: Hi, I've (hopefully) made some time to do some work on the Solr Admin UI (convert it to AngularJS). I plan to do it on a clone of the lucene-solr project at GitHub. Before I dive too thoroughly into this, I wanted to see if there were any best practices that would make it easier to back-port these changes into SVN should I actually succeed at producing something useful. Is it enough just to make a branch called SOLR-5507 and start committing my changes there? Periodically, I'll zip up the relevant bits and attach them to the JIRA ticket. TIA Upayavira -- Regards, Shalin Shekhar Mangar.
Re: what does this write.lock does not exist mean??
Haven't seen this particular problem before, but it sounds like it could be a problem with permissions or data size limits - it may be worth looking into. The write.lock file is used when an index is being modified - it is how lucene handles concurrent attempts to modify the index - a writer obtains the lock on the index (http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_purpose_of_write.lock_file.2C_when_is_it_used.2C_and_by_which_classes.3F). Did you check if the file is actually there (in the data/index folder of your solr core)? If it is, then maybe the app has permission to create the file, but the created file does not have permissions to read and/or modify it, so the app thinks it exists but cannot access it? If it's not there, maybe it is being deleted unexpectedly by some other process/person, or alternatively, maybe it can't even create the file - either it doesn't have permissions for that directory or there is no more free space. I've seen this issue several times before of running out of allowed disk space preventing index files from being created. It's kind of like the index is locked in its current state - or at least can't be updated all the way. Are you able to add a large number of documents to the index and then confirm that they have actually been added (search for them by ID for instance)? -- View this message in context: http://lucene.472066.n3.nabble.com/what-does-this-write-lock-does-not-exist-mean-tp4175291p4175773.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud Paging on large indexes
Bram Van Dam [bram.van...@intix.eu] wrote: [Solr cursors] Oh thanks, that's a pretty interesting read. The scale we're investigating is several orders of magnitude larger than what was tested there, so I'm still a bit worried. The beauty of the cursor is that it is has little to no overhead, relative to a standard top-X sorted search. A standard search uses a sliding window over the full result set, as does a cursor-search. Same amount of work. It is just a question of limits for the window. The largest index I currently have access to is about a billion documents in size. Paging there is a nightmare, but the Solr version is too old to support cursors so I'm afraid I can't offer any useful data. Non-cursor paging in Solr uses a sliding window sort with a heap that contains all documents up to the paging number. A heap is a very fine thing for sliding window sort, as long as it is small. But performance drops to horrible levels when it gets large as it is extremely RAM-cache unfriendly. Does anyone have any performance data on multi-billion-document indexes? Sorry, no. I could do a test on our 7 billion documents index, but it would have to wait until the end of January. Nobody will hit next 499 times, but a lot of our users skip to the last page quite often. Maybe I should make *that* as hard as possible. Hmm. Issue a search with sort in reverse order, then reverse the returned list of documents? - Toke Eskildsen
Re: How best to fork Solr for enhancement
Semi Off Topic, but is AngularJS the best next choice, given the version 2 being so different from version 1? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 December 2014 at 06:52, Upayavira u...@odoko.co.uk wrote: Hi, I've (hopefully) made some time to do some work on the Solr Admin UI (convert it to AngularJS). I plan to do it on a clone of the lucene-solr project at GitHub. Before I dive too thoroughly into this, I wanted to see if there were any best practices that would make it easier to back-port these changes into SVN should I actually succeed at producing something useful. Is it enough just to make a branch called SOLR-5507 and start committing my changes there? Periodically, I'll zip up the relevant bits and attach them to the JIRA ticket. TIA Upayavira
Re: How best to fork Solr for enhancement
There is other options like Ember or Backbone, either way AngularJS is well adopted. Alexandre, your question is about the radical change between versions? In some way this shows progress and support to the framework. Other good reason is that AngularJS has a ton of components ready to use. — /Yago Riveiro On Tuesday, Dec 23, 2014 at 3:10 pm, Alexandre Rafalovitch arafa...@gmail.com, wrote: Semi Off Topic, but is AngularJS the best next choice, given the version 2 being so different from version 1? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 December 2014 at 06:52, Upayavira u...@odoko.co.uk wrote: Hi, I've (hopefully) made some time to do some work on the Solr Admin UI (convert it to AngularJS). I plan to do it on a clone of the lucene-solr project at GitHub. Before I dive too thoroughly into this, I wanted to see if there were any best practices that would make it easier to back-port these changes into SVN should I actually succeed at producing something useful. Is it enough just to make a branch called SOLR-5507 and start committing my changes there? Periodically, I'll zip up the relevant bits and attach them to the JIRA ticket. TIA Upayavira
UI for Solr
Hi, I would like to build a User Interface on top of Solr for PC and mobile. I am wondering if there is a framework, best practice commonly used. I want Solr features such as suggestion, auto complete, facet to be available for UI. Any suggestion is welcome. Than you. Regards Olivier
Re: UI for Solr
You don't expose Solr directly to the user, it is not setup for full-proof security out of the box. So you would need a client to talk to Solr. Something like Spring.io's Spring Data Solr could be one of the things to check. You can see an auto-complete example for it at: https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main and embedded in action at http://www.solr-start.com/javadoc/solr-lucene/index.html (search box on the top) Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 December 2014 at 10:45, Olivier Austina olivier.aust...@gmail.com wrote: Hi, I would like to build a User Interface on top of Solr for PC and mobile. I am wondering if there is a framework, best practice commonly used. I want Solr features such as suggestion, auto complete, facet to be available for UI. Any suggestion is welcome. Than you. Regards Olivier
Re: Solr Search Inconsistent result
Hi Ahmet, We are using the *java.util.UUID* to generate the unique id for each document. Thanks, Ankit Jain On Tue, Dec 23, 2014 at 1:32 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Ankit, So you are using solr.UUIDUpdateProcessorFactory to pupolate unique keys? Ahmet On Tuesday, December 23, 2014 7:13 AM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Ahmet, Thanks for the response. Document ID is unique because we are using *UUID* to generate the document ID. Thanks, Ankit On Tue, Dec 23, 2014 at 12:16 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Do you happen to have documents with with unique id in different shards? When unique ids are not unique across shards, people see inconsistent results. Please see : http://find.searchhub.org/document/2814183511b5a52 Ahmet On Monday, December 22, 2014 8:06 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Ahmet, Thanks for the response. I am running this query from Solr Search UI. The number of shards for a collection is two. Thanks, Ankit On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Is this sharded query? Ahmet On Monday, December 22, 2014 4:47 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi All, We are getting inconsistent search result on searching on *multivalued* field: *Input Query:* ( t : [ 0 TO 1419245069253 ] )AND(_all:impetus-i0111.impetus.co.in) The _all field is multivalued field. The above query is returning sometimes 11 records and sometimes 12471 records. Please help. Thanks, Ankit -- Thanks, Ankit Jain -- Thanks, Ankit Jain -- Thanks, Ankit Jain
Re: Endless 100% CPU usage on searcherExecutor thread
On 12/23/2014 2:31 AM, heaven wrote: We do not use dates here, at least not too often. Usually its something like type:Profile (we do use it from the rails application so type describes model names), opted_in:true, etc. Solr wasn't running too long though, so this could not show the real state. Currently for the filter cache it shows 1 and 0.84 of the query results. I also increased the cache size to autowarm: 512, initial: 1024 and size 4096, which is actually never reached because of commits. Warming the filter cache *can* be very slow. It all depends on exactly what your filters are. I had to reduce the autowarmCount on my filterCache to *four* because if it was any higher, a commit would take up to a minute. We have some really complex filters. Thanks, Shawn
Re: Solr server becomes non-responsive.
On 12/23/2014 4:34 AM, Modassar Ather wrote: Hi, I have a setup of 4 shard Solr cluster with embedded zookeeper on one of them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is 24g. When executing a huge query with many wildcards inside it the server crashes and becomes non-responsive. Even the dashboard does not responds and shows connection lost error. This requires me to restart the servers. Here's the important part of your message: *Caused by: java.lang.OutOfMemoryError: Java heap space* Your heap is not big enough for what Solr has been asked to do. You need to either increase your heap size or change your configuration so that it uses less memory. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Most programs have pretty much undefined behavior when an OOME occurs. Lucene's IndexWriter has been hardened so that it tries extremely hard to avoid index corruption when OOME strikes, and I believe that works well enough that we can call it nearly bulletproof ... but the rest of Lucene and Solr will make no guarantees. It's very difficult to have definable program behavior when an OOME happens, because you simply cannot know the precise point during program execution where it will happen, or what isn't going to work because Java did not have memory space to create an object. Going unresponsive is not surprising. If you can solve your heap problem, note that you may run into other performance issues discussed on the wiki page that I linked. Thanks, Shawn
Re: Loading data to FieldValueCache
Or just not worry about it. The cache will be filled up automatically as you query for facets etc., the benefit to trying to fill it up as Toke outlines is just that the first few user queries that call for faceting will be somewhat faster. But after the first few user queries have gone through, it won't matter whether you've pre-loaded the cache or not. My point is that you'll get the benefit of the cache no matter what, it's just a matter of whether it's important that the first few users don't have to wait while they're loaded. And with DocValues, as Toke recommends, even that may be unimportant. Best, Erick On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Manohar Sripada [manohar...@gmail.com] wrote: From the wiki, it states that http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for faceting. Can someone please throw some light on how to load data to this cache. Like on what solrquery option does this consider the data to be loaded to this cache. The values are loaded on first facet call with facet.method=fc. http://wiki.apache.org/solr/SimpleFacetParameters#facet.method My requirement is I have 10 facet fields (with facetlimit - 5) to be shown in my UI. I want to speed up this by using this cache. Is there a way where I can specify only the list of fields to be loaded to FieldValue Cache? Add a facet call as explicit warmup in your solrconfig.xml. You might want to consider DocValues for your facet fields. https://cwiki.apache.org/confluence/display/solr/DocValues - Toke Eskildsen
Re: 'Illegal character in query' on Solr cloud 4.10.1
Hmmm, so you are you pinging the servers directly, right? Here's a couple of things to try: 1 add distrib=false to the query and try each of the 6 servers. What I'm wondering is if this is happening on the sub-query sent out or on the primary server. Adding distrib=false will just execute on the node you're sending it to, and will NOT send sub-queries out to any other node so you'll get partial results back. If one server continues to work but the other 5 fail, then your servlet container is probably not set up with the right character sets. Although why that would manifest itself on the ^ character mystifies me. 2 Let's assume that all 6 servers handle the raw query. Next thing that would be really helpful is to see the sub-queries. Take distrib=false off and tail the logs on all the servers. What we're looking for here is whether the sub-queries even make it to Solr or whether the problem is in your container. 3 If the sub-queries do NOT make it to the Solr logs, what is the query that the container sees? Is it recognizable or has Solr somehow munged the sub-query? What is your environment like? Tomcat? Jetty? Other? What JVM etc? Best, Erick On Tue, Dec 23, 2014 at 3:23 AM, S.L simpleliving...@gmail.com wrote: Hi All, I am using SolrCloud 4.10.1 and I have 3 shards with replication factor of 2 , i.e is 6 nodes altogether. When I query the server1 out of 6 nodes in the cluster with the below query , it works fine , but any other node in the cluster when queried with the same query results in a *HTTP Status 500 - {msg=Illegal character in query at index 181:* error. The character at index 181 is the boost character ^. I have see a Jira SOLR-5971 https://issues.apache.org/jira/browse/SOLR-5971 for a similar issue , how can I overcome this issue. The query I use is below. Thanks in Advance! http://xx2..com:8081/solr/dyCollection1_shard2_replica1/?q=x+x+xxsort=score+descwt=jsonindent=truedebugQuery=truedefType=edismaxqf=productName ^1.5+productDescriptionmm=1pf=productName+productDescriptionps=1pf2=productName+productDescriptionpf3=productName+productDescriptionstopwords=truelowercaseOperators=true
Re: Solr server becomes non-responsive.
Second most important part of your message: When executing a huge query with many wildcards inside it the server This is usually an anti-pattern. The very first thing I'd be doing is trying to not do this. See ngrams for infix queries, or shingles or ReverseWildcardFilterFactory or. And if your corpus is very large with many unique terms it's even worse, but you haven't really told us about that yet. Best, Erick On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org wrote: On 12/23/2014 4:34 AM, Modassar Ather wrote: Hi, I have a setup of 4 shard Solr cluster with embedded zookeeper on one of them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is 24g. When executing a huge query with many wildcards inside it the server crashes and becomes non-responsive. Even the dashboard does not responds and shows connection lost error. This requires me to restart the servers. Here's the important part of your message: *Caused by: java.lang.OutOfMemoryError: Java heap space* Your heap is not big enough for what Solr has been asked to do. You need to either increase your heap size or change your configuration so that it uses less memory. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Most programs have pretty much undefined behavior when an OOME occurs. Lucene's IndexWriter has been hardened so that it tries extremely hard to avoid index corruption when OOME strikes, and I believe that works well enough that we can call it nearly bulletproof ... but the rest of Lucene and Solr will make no guarantees. It's very difficult to have definable program behavior when an OOME happens, because you simply cannot know the precise point during program execution where it will happen, or what isn't going to work because Java did not have memory space to create an object. Going unresponsive is not surprising. If you can solve your heap problem, note that you may run into other performance issues discussed on the wiki page that I linked. Thanks, Shawn
Re: SolrCloud Paging on large indexes
Nobody will hit next 499 times, but a lot of our users skip to the last page quite often. Maybe I should make *that* as hard as possible. Hmm Right. I'd actually argue that providing a last page link in this situation is 1) useless to the user, I mean what's the point? Curiosity? If it really _must_ be supported, Toke's approach is sneaky and elegant. Sort in reverse order and give them the first page ;). 2) dangerous as you well know... several orders of magnitude larger than what was tested there, so I'm still a bit worried. I sympathize, but somebody has to be first ;). Besides, the current situation is untenable from what you're saying... Good luck! Erick On Tue, Dec 23, 2014 at 7:07 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Bram Van Dam [bram.van...@intix.eu] wrote: [Solr cursors] Oh thanks, that's a pretty interesting read. The scale we're investigating is several orders of magnitude larger than what was tested there, so I'm still a bit worried. The beauty of the cursor is that it is has little to no overhead, relative to a standard top-X sorted search. A standard search uses a sliding window over the full result set, as does a cursor-search. Same amount of work. It is just a question of limits for the window. The largest index I currently have access to is about a billion documents in size. Paging there is a nightmare, but the Solr version is too old to support cursors so I'm afraid I can't offer any useful data. Non-cursor paging in Solr uses a sliding window sort with a heap that contains all documents up to the paging number. A heap is a very fine thing for sliding window sort, as long as it is small. But performance drops to horrible levels when it gets large as it is extremely RAM-cache unfriendly. Does anyone have any performance data on multi-billion-document indexes? Sorry, no. I could do a test on our 7 billion documents index, but it would have to wait until the end of January. Nobody will hit next 499 times, but a lot of our users skip to the last page quite often. Maybe I should make *that* as hard as possible. Hmm. Issue a search with sort in reverse order, then reverse the returned list of documents? - Toke Eskildsen
Re: Solr Search Inconsistent result
This really sounds like you _think_ you have two shards in a single collection, but really you don't. I admit I'm not quite sure how it got that way, but... So try this: Add distrib=false to the query and ping each of your servers separately. My bet is that you'll find they have wildly varying numbers of docs, specifically 11 and 12,471. Next 'tail -f' the logs and send the query again. You should see each shard get a sub-query, if you do NOT see the sub-query at one node for each shard you don't really have a sharded collection (or there's some other problem) But this is not supposed to happen in SolrCloud. You haven't told us anything about your setup, what version of Solr? Old-style master/slave or SolrCloud? How did you create your collection? How are you indexing to that collection? In short, please review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Tue, Dec 23, 2014 at 7:57 AM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Ahmet, We are using the *java.util.UUID* to generate the unique id for each document. Thanks, Ankit Jain On Tue, Dec 23, 2014 at 1:32 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Ankit, So you are using solr.UUIDUpdateProcessorFactory to pupolate unique keys? Ahmet On Tuesday, December 23, 2014 7:13 AM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Ahmet, Thanks for the response. Document ID is unique because we are using *UUID* to generate the document ID. Thanks, Ankit On Tue, Dec 23, 2014 at 12:16 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Do you happen to have documents with with unique id in different shards? When unique ids are not unique across shards, people see inconsistent results. Please see : http://find.searchhub.org/document/2814183511b5a52 Ahmet On Monday, December 22, 2014 8:06 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi Ahmet, Thanks for the response. I am running this query from Solr Search UI. The number of shards for a collection is two. Thanks, Ankit On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Is this sharded query? Ahmet On Monday, December 22, 2014 4:47 PM, Ankit Jain ankitjainc...@gmail.com wrote: Hi All, We are getting inconsistent search result on searching on *multivalued* field: *Input Query:* ( t : [ 0 TO 1419245069253 ] )AND(_all:impetus-i0111.impetus.co.in) The _all field is multivalued field. The above query is returning sometimes 11 records and sometimes 12471 records. Please help. Thanks, Ankit -- Thanks, Ankit Jain -- Thanks, Ankit Jain -- Thanks, Ankit Jain
Re: How best to fork Solr for enhancement
I'm somewhat open to other suggestions, as I'm right at the beginning of the project. I know Angular, and like it. I've looked at a couple of others, but have found them to be more of a collection of disparate components and not as integrated as Angular. However, if folks want to have a discussion on competing frameworks, I'm at least prepared to listen!! Note - the design goal is to make it as easy for *Java* developers to work with. Folks who are typically back-end developers, thus the framework must isolate the developer from UI quirks as much as possible, and handle have some form of design abstraction. Upayavira On Tue, Dec 23, 2014, at 10:09 AM, Alexandre Rafalovitch wrote: Semi Off Topic, but is AngularJS the best next choice, given the version 2 being so different from version 1? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 December 2014 at 06:52, Upayavira u...@odoko.co.uk wrote: Hi, I've (hopefully) made some time to do some work on the Solr Admin UI (convert it to AngularJS). I plan to do it on a clone of the lucene-solr project at GitHub. Before I dive too thoroughly into this, I wanted to see if there were any best practices that would make it easier to back-port these changes into SVN should I actually succeed at producing something useful. Is it enough just to make a branch called SOLR-5507 and start committing my changes there? Periodically, I'll zip up the relevant bits and attach them to the JIRA ticket. TIA Upayavira
Re: UI for Solr
Hi Alex, Thank you for prompt reply. I am not aware of Spring.io's Spring Data Solr. Regards Olivier 2014-12-23 16:50 GMT+01:00 Alexandre Rafalovitch arafa...@gmail.com: You don't expose Solr directly to the user, it is not setup for full-proof security out of the box. So you would need a client to talk to Solr. Something like Spring.io's Spring Data Solr could be one of the things to check. You can see an auto-complete example for it at: https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main and embedded in action at http://www.solr-start.com/javadoc/solr-lucene/index.html (search box on the top) Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 December 2014 at 10:45, Olivier Austina olivier.aust...@gmail.com wrote: Hi, I would like to build a User Interface on top of Solr for PC and mobile. I am wondering if there is a framework, best practice commonly used. I want Solr features such as suggestion, auto complete, facet to be available for UI. Any suggestion is welcome. Than you. Regards Olivier
Solr Cloud and relative paths in solrconfig.xml lib directives
Hi all, I seek some advice on the use of lib directives in solrconfig.xml in Solr Cloud. My project has been tested with Solr 4.10.2 and run nicely on single node with the included Jetty. The setup adds a DataImportHandler request handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to pick up dataimporhandler jars from «../../../dist». Now, in migrating this setup to Solr Cloud I upconfig the configuration to ZooKeeper and create collection with the collections API’s CREATE action. The problem with this approach is that the relative path to dist in the lib directive does not resolve correctly. failure: { : org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'cloudcollection1_shard2_replica2': Unable to create core [cloudcollection1_shard2_replica2] Caused by: org.apache.solr.handler.dataimport.DataImportHandler } and the logs reveal that class org.apache.solr.handler.dataimport.DataImportHandler is yet to be found. Then, revamping my lib directive with absolute path to dist directory that includes the dataimporthandler jars, another upconfig and collection creation anew successfully creates the collection. Is this intentional behavior forcing the use of absolute paths, or is it possible to use relative path to dist and contrib directories on solrconfig.xml in Cloud mode? -- Sincerely, Jens Ivar Jørdre about.me/jijordre http://about.me/jijordre
Re: Solr server becomes non-responsive.
Hi, I agree Erick it could be a good think to have more details about your configuration and collection. Your heap size is 32Gb. How many RAM on each servers ? By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a collection with 4 shards ? So, how many nodes in the cluster ? How many shards and replicas for the collection ? How many items in the collection ? What is the size of the index ? How is updated the collection (frequency, how many items per days, what is your hard commit strategy) ? How are configured cache in solrconfig.xml ? Can you provide all other JVM parameters ? Regards Dominique 2014-12-23 17:50 GMT+01:00 Erick Erickson erickerick...@gmail.com: Second most important part of your message: When executing a huge query with many wildcards inside it the server This is usually an anti-pattern. The very first thing I'd be doing is trying to not do this. See ngrams for infix queries, or shingles or ReverseWildcardFilterFactory or. And if your corpus is very large with many unique terms it's even worse, but you haven't really told us about that yet. Best, Erick On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org wrote: On 12/23/2014 4:34 AM, Modassar Ather wrote: Hi, I have a setup of 4 shard Solr cluster with embedded zookeeper on one of them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is 24g. When executing a huge query with many wildcards inside it the server crashes and becomes non-responsive. Even the dashboard does not responds and shows connection lost error. This requires me to restart the servers. Here's the important part of your message: *Caused by: java.lang.OutOfMemoryError: Java heap space* Your heap is not big enough for what Solr has been asked to do. You need to either increase your heap size or change your configuration so that it uses less memory. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Most programs have pretty much undefined behavior when an OOME occurs. Lucene's IndexWriter has been hardened so that it tries extremely hard to avoid index corruption when OOME strikes, and I believe that works well enough that we can call it nearly bulletproof ... but the rest of Lucene and Solr will make no guarantees. It's very difficult to have definable program behavior when an OOME happens, because you simply cannot know the precise point during program execution where it will happen, or what isn't going to work because Java did not have memory space to create an object. Going unresponsive is not surprising. If you can solve your heap problem, note that you may run into other performance issues discussed on the wiki page that I linked. Thanks, Shawn
Re: Solr Cloud and relative paths in solrconfig.xml lib directives
Hi, I use to put all dependency jar files (dih, adbc driver, …) in a lib directory in the solr home directory where your shard are created. something like this solr/ solr.xml cloudcollection1_shard2_replica2/ lib/ In solrconfig.xml, I remove all the lib … directives except this one lib dir=../lib / You may need to restart your nodes after creating your lib directory Regards Dominique 2014-12-23 21:25 GMT+01:00 Jens Ivar Jørdre jijor...@gmail.com: Hi all, I seek some advice on the use of lib directives in solrconfig.xml in Solr Cloud. My project has been tested with Solr 4.10.2 and run nicely on single node with the included Jetty. The setup adds a DataImportHandler request handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to pick up dataimporhandler jars from «../../../dist». Now, in migrating this setup to Solr Cloud I upconfig the configuration to ZooKeeper and create collection with the collections API’s CREATE action. The problem with this approach is that the relative path to dist in the lib directive does not resolve correctly. failure: { : org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'cloudcollection1_shard2_replica2': Unable to create core [cloudcollection1_shard2_replica2] Caused by: org.apache.solr.handler.dataimport.DataImportHandler } and the logs reveal that class org.apache.solr.handler.dataimport.DataImportHandler is yet to be found. Then, revamping my lib directive with absolute path to dist directory that includes the dataimporthandler jars, another upconfig and collection creation anew successfully creates the collection. Is this intentional behavior forcing the use of absolute paths, or is it possible to use relative path to dist and contrib directories on solrconfig.xml in Cloud mode? -- Sincerely, Jens Ivar Jørdre about.me/jijordre http://about.me/jijordre
Re: Solr Cloud and relative paths in solrconfig.xml lib directives
I think you may be running into a bug which was reported an hour back. See https://issues.apache.org/jira/browse/SOLR-6887 On Tue, Dec 23, 2014 at 8:25 PM, Jens Ivar Jørdre jijor...@gmail.com wrote: Hi all, I seek some advice on the use of lib directives in solrconfig.xml in Solr Cloud. My project has been tested with Solr 4.10.2 and run nicely on single node with the included Jetty. The setup adds a DataImportHandler request handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to pick up dataimporhandler jars from «../../../dist». Now, in migrating this setup to Solr Cloud I upconfig the configuration to ZooKeeper and create collection with the collections API’s CREATE action. The problem with this approach is that the relative path to dist in the lib directive does not resolve correctly. failure: { : org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'cloudcollection1_shard2_replica2': Unable to create core [cloudcollection1_shard2_replica2] Caused by: org.apache.solr.handler.dataimport.DataImportHandler } and the logs reveal that class org.apache.solr.handler.dataimport.DataImportHandler is yet to be found. Then, revamping my lib directive with absolute path to dist directory that includes the dataimporthandler jars, another upconfig and collection creation anew successfully creates the collection. Is this intentional behavior forcing the use of absolute paths, or is it possible to use relative path to dist and contrib directories on solrconfig.xml in Cloud mode? -- Sincerely, Jens Ivar Jørdre about.me/jijordre http://about.me/jijordre -- Regards, Shalin Shekhar Mangar.
Re: Solr Cloud and relative paths in solrconfig.xml lib directives
On 12/23/2014 1:25 PM, Jens Ivar Jørdre wrote: I seek some advice on the use of lib directives in solrconfig.xml in Solr Cloud. My project has been tested with Solr 4.10.2 and run nicely on single node with the included Jetty. The setup adds a DataImportHandler request handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to pick up dataimporhandler jars from «../../../dist». Dominique's answer is the best approach ... but you should remove *all* lib directives from solrconfig.xml. You don't even need the directive that he mentioned with ../lib. Just create a lib directory in the same place as your solr.xml and put all the extra jars needed by all your collections in that directory. Make sure that all other copies of those jars are not on your classpath. As of Solr 4.3 (from what I remember, that's the right version), ${solr.solr.home}/lib is automatically included by the resource loader. Prior to that version, you had to include sharedLib=lib in solr.xml. I ran into a problem related to this, a problem that was declared to NOT be a bug: https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13820197page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820197 Thanks, Shawn
Re: Solr Cloud and relative paths in solrconfig.xml lib directives
You ought to be able to specify an environment var as well. S you have something like this in your solrocnfig.xml file: lib dir=${solr.install.dir:../../..}/contrib/clustering/lib/ regex=.*\.jar / and define solr.install.dir as a system var when you invoke Solr. Best, Erick On Tue, Dec 23, 2014 at 2:05 PM, Shawn Heisey apa...@elyograg.org wrote: On 12/23/2014 1:25 PM, Jens Ivar Jørdre wrote: I seek some advice on the use of lib directives in solrconfig.xml in Solr Cloud. My project has been tested with Solr 4.10.2 and run nicely on single node with the included Jetty. The setup adds a DataImportHandler request handler to solrconfig.xml. It also adds a lib directive to solconfig.xml to pick up dataimporhandler jars from «../../../dist». Dominique's answer is the best approach ... but you should remove *all* lib directives from solrconfig.xml. You don't even need the directive that he mentioned with ../lib. Just create a lib directory in the same place as your solr.xml and put all the extra jars needed by all your collections in that directory. Make sure that all other copies of those jars are not on your classpath. As of Solr 4.3 (from what I remember, that's the right version), ${solr.solr.home}/lib is automatically included by the resource loader. Prior to that version, you had to include sharedLib=lib in solr.xml. I ran into a problem related to this, a problem that was declared to NOT be a bug: https://issues.apache.org/jira/browse/SOLR-4852?focusedCommentId=13820197page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820197 Thanks, Shawn
Re: Solr server becomes non-responsive.
Thanks for your suggestions. I will look into the link provided. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap This is usually an anti-pattern. The very first thing I'd be doing is trying to not do this. See ngrams for infix queries, or shingles or ReverseWildcardFilterFactory or. We cannot avoid multiple wildcards since that's is our user's requirement. We try to discourage it but the users insist on firing such queries. Also, ngrams etc. can be tried but our index is already huge and ngrams may further add lot to it. We are OK with such queries failing as long as other queries are not affected. Please find the details below. So, how many nodes in the cluster ? There are total 4 nodes on the cluster. How many shards and replicas for the collection ? There are 4 shards and no replica for any of them. How many items in the collection ? If I understand the question correctly there are two collection on each node and there size on each node is approximately 190GB and 130GB. What is the size of the index ? There are two collection on each node and there size on each node is approximately 190GB and 130GB. How is updated the collection (frequency, how many items per days, what is your hard commit strategy) ? It is an optimized index and read-only. There are no inter-mediate update. How are configured cache in solrconfig.xml ? Filter cache, query result cache and document cache are enabled. Auto-warming is also done. Can you provide all other JVM parameters ? -Xms20g -Xmx24g -XX:+UseConcMarkSweepGC Thanks again, Modassar On Wed, Dec 24, 2014 at 2:29 AM, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, I agree Erick it could be a good think to have more details about your configuration and collection. Your heap size is 32Gb. How many RAM on each servers ? By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a collection with 4 shards ? So, how many nodes in the cluster ? How many shards and replicas for the collection ? How many items in the collection ? What is the size of the index ? How is updated the collection (frequency, how many items per days, what is your hard commit strategy) ? How are configured cache in solrconfig.xml ? Can you provide all other JVM parameters ? Regards Dominique 2014-12-23 17:50 GMT+01:00 Erick Erickson erickerick...@gmail.com: Second most important part of your message: When executing a huge query with many wildcards inside it the server This is usually an anti-pattern. The very first thing I'd be doing is trying to not do this. See ngrams for infix queries, or shingles or ReverseWildcardFilterFactory or. And if your corpus is very large with many unique terms it's even worse, but you haven't really told us about that yet. Best, Erick On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org wrote: On 12/23/2014 4:34 AM, Modassar Ather wrote: Hi, I have a setup of 4 shard Solr cluster with embedded zookeeper on one of them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is 24g. When executing a huge query with many wildcards inside it the server crashes and becomes non-responsive. Even the dashboard does not responds and shows connection lost error. This requires me to restart the servers. Here's the important part of your message: *Caused by: java.lang.OutOfMemoryError: Java heap space* Your heap is not big enough for what Solr has been asked to do. You need to either increase your heap size or change your configuration so that it uses less memory. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Most programs have pretty much undefined behavior when an OOME occurs. Lucene's IndexWriter has been hardened so that it tries extremely hard to avoid index corruption when OOME strikes, and I believe that works well enough that we can call it nearly bulletproof ... but the rest of Lucene and Solr will make no guarantees. It's very difficult to have definable program behavior when an OOME happens, because you simply cannot know the precise point during program execution where it will happen, or what isn't going to work because Java did not have memory space to create an object. Going unresponsive is not surprising. If you can solve your heap problem, note that you may run into other performance issues discussed on the wiki page that I linked. Thanks, Shawn On Wed, Dec 24, 2014 at 2:29 AM, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, I agree Erick it could be a good think to have more details about your configuration and collection. Your heap size is 32Gb. How many RAM on each servers ? By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a collection with 4 shards ? So, how many nodes in the cluster ? How many shards and replicas for the collection ? How many items in the
Re: Loading data to FieldValueCache
Thanks Erick and Toke, Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache that, filterCache can also be used for faceting with facet.method=enum. So, I am bit confused here on which one to use for faceting. One more thing here is I have different types of facets. (For example - Product List, States). The Product List facet has lot many unique values (around 10 million), where as States list will be in hundreds. So, I want to come up with the numbers for size of fieldValueCache/filterCache and pre-populate this. Thanks, Manohar On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson erickerick...@gmail.com wrote: Or just not worry about it. The cache will be filled up automatically as you query for facets etc., the benefit to trying to fill it up as Toke outlines is just that the first few user queries that call for faceting will be somewhat faster. But after the first few user queries have gone through, it won't matter whether you've pre-loaded the cache or not. My point is that you'll get the benefit of the cache no matter what, it's just a matter of whether it's important that the first few users don't have to wait while they're loaded. And with DocValues, as Toke recommends, even that may be unimportant. Best, Erick On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Manohar Sripada [manohar...@gmail.com] wrote: From the wiki, it states that http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for faceting. Can someone please throw some light on how to load data to this cache. Like on what solrquery option does this consider the data to be loaded to this cache. The values are loaded on first facet call with facet.method=fc. http://wiki.apache.org/solr/SimpleFacetParameters#facet.method My requirement is I have 10 facet fields (with facetlimit - 5) to be shown in my UI. I want to speed up this by using this cache. Is there a way where I can specify only the list of fields to be loaded to FieldValue Cache? Add a facet call as explicit warmup in your solrconfig.xml. You might want to consider DocValues for your facet fields. https://cwiki.apache.org/confluence/display/solr/DocValues - Toke Eskildsen
Solr Date Range not returning results for last 1 month
So my Solr date range query is as follows: facet.range=datefacet.range.start=NOW/DAY-36MONTHfacet.range.end=NOW/DAYfacet.range.gap=%2B1MONTH I need facets for past 36 months or 3 year and everything is fine except for data not being returned for last 1 month, However the facets I am getting for the date is till last month, say today is 24th December and I am getting it till 24th November. How should I modify my query to obtain results till today? Tried a few options using HIT and TRIAL :) but could not arrive at a solution. Appreciate the help in this regard.
Re: Loading data to FieldValueCache
By and large, don't use the enum method unless there are _very_ few unique values. It forms a filter (size roughly mixDoc/8 bytes) for _every_ unique value in the field, i.e. if you have 10,000 unique values it'll try to form 10,000 filterCache entries. Let the system do this for you automatically if appropriate. Best, Erick On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada manohar...@gmail.com wrote: Thanks Erick and Toke, Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache that, filterCache can also be used for faceting with facet.method=enum. So, I am bit confused here on which one to use for faceting. One more thing here is I have different types of facets. (For example - Product List, States). The Product List facet has lot many unique values (around 10 million), where as States list will be in hundreds. So, I want to come up with the numbers for size of fieldValueCache/filterCache and pre-populate this. Thanks, Manohar On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson erickerick...@gmail.com wrote: Or just not worry about it. The cache will be filled up automatically as you query for facets etc., the benefit to trying to fill it up as Toke outlines is just that the first few user queries that call for faceting will be somewhat faster. But after the first few user queries have gone through, it won't matter whether you've pre-loaded the cache or not. My point is that you'll get the benefit of the cache no matter what, it's just a matter of whether it's important that the first few users don't have to wait while they're loaded. And with DocValues, as Toke recommends, even that may be unimportant. Best, Erick On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Manohar Sripada [manohar...@gmail.com] wrote: From the wiki, it states that http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for faceting. Can someone please throw some light on how to load data to this cache. Like on what solrquery option does this consider the data to be loaded to this cache. The values are loaded on first facet call with facet.method=fc. http://wiki.apache.org/solr/SimpleFacetParameters#facet.method My requirement is I have 10 facet fields (with facetlimit - 5) to be shown in my UI. I want to speed up this by using this cache. Is there a way where I can specify only the list of fields to be loaded to FieldValue Cache? Add a facet call as explicit warmup in your solrconfig.xml. You might want to consider DocValues for your facet fields. https://cwiki.apache.org/confluence/display/solr/DocValues - Toke Eskildsen
Re: Solr Date Range not returning results for last 1 month
Hmmm, not quite sure what's going on here, but try an end time of NOW/MONTH+1MONTH with the usual escaping of the plus sign... Best, Erick On Tue, Dec 23, 2014 at 9:55 PM, Yavar Husain yavarhus...@gmail.com wrote: So my Solr date range query is as follows: facet.range=datefacet.range.start=NOW/DAY-36MONTHfacet.range.end=NOW/DAYfacet.range.gap=%2B1MONTH I need facets for past 36 months or 3 year and everything is fine except for data not being returned for last 1 month, However the facets I am getting for the date is till last month, say today is 24th December and I am getting it till 24th November. How should I modify my query to obtain results till today? Tried a few options using HIT and TRIAL :) but could not arrive at a solution. Appreciate the help in this regard.
Re: Solr Date Range not returning results for last 1 month
Thanks Erick. That works! Will check some other time as to why NOW/DAY does not work. Regards, Yavar On Wed, Dec 24, 2014 at 11:39 AM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, not quite sure what's going on here, but try an end time of NOW/MONTH+1MONTH with the usual escaping of the plus sign... Best, Erick On Tue, Dec 23, 2014 at 9:55 PM, Yavar Husain yavarhus...@gmail.com wrote: So my Solr date range query is as follows: facet.range=datefacet.range.start=NOW/DAY-36MONTHfacet.range.end=NOW/DAYfacet.range.gap=%2B1MONTH I need facets for past 36 months or 3 year and everything is fine except for data not being returned for last 1 month, However the facets I am getting for the date is till last month, say today is 24th December and I am getting it till 24th November. How should I modify my query to obtain results till today? Tried a few options using HIT and TRIAL :) but could not arrive at a solution. Appreciate the help in this regard.
Re: Loading data to FieldValueCache
Okay. Let me try like this, as mine is a read-only index. I will have some queries in firstSearcher event listener 1) q=*:*facet=truefacet.method=enumfacet.field=state -- To load all the state related unique values to filterCache. Will it use filterCache when I sent a query with filter, eg: fq=state:CA ?? Once it is loaded, Do I need to sent a query with facet.method=enum every time along with facet.field=state to get state related facet data from filterCache? 2) q=*:*facet=truefacet.method=fcfacet.field=products -- To load the values related to products to fieldCache. Again, while querying for this facet do I need to sent facet.method=fc every time? Thanks, Manohar On Wed, Dec 24, 2014 at 11:36 AM, Erick Erickson erickerick...@gmail.com wrote: By and large, don't use the enum method unless there are _very_ few unique values. It forms a filter (size roughly mixDoc/8 bytes) for _every_ unique value in the field, i.e. if you have 10,000 unique values it'll try to form 10,000 filterCache entries. Let the system do this for you automatically if appropriate. Best, Erick On Tue, Dec 23, 2014 at 9:37 PM, Manohar Sripada manohar...@gmail.com wrote: Thanks Erick and Toke, Also, I read here https://wiki.apache.org/solr/SolrCaching#filterCache that, filterCache can also be used for faceting with facet.method=enum. So, I am bit confused here on which one to use for faceting. One more thing here is I have different types of facets. (For example - Product List, States). The Product List facet has lot many unique values (around 10 million), where as States list will be in hundreds. So, I want to come up with the numbers for size of fieldValueCache/filterCache and pre-populate this. Thanks, Manohar On Tue, Dec 23, 2014 at 10:07 PM, Erick Erickson erickerick...@gmail.com wrote: Or just not worry about it. The cache will be filled up automatically as you query for facets etc., the benefit to trying to fill it up as Toke outlines is just that the first few user queries that call for faceting will be somewhat faster. But after the first few user queries have gone through, it won't matter whether you've pre-loaded the cache or not. My point is that you'll get the benefit of the cache no matter what, it's just a matter of whether it's important that the first few users don't have to wait while they're loaded. And with DocValues, as Toke recommends, even that may be unimportant. Best, Erick On Tue, Dec 23, 2014 at 1:03 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Manohar Sripada [manohar...@gmail.com] wrote: From the wiki, it states that http://wiki.apache.org/solr/SolrCaching#fieldValueCache is mostly used for faceting. Can someone please throw some light on how to load data to this cache. Like on what solrquery option does this consider the data to be loaded to this cache. The values are loaded on first facet call with facet.method=fc. http://wiki.apache.org/solr/SimpleFacetParameters#facet.method My requirement is I have 10 facet fields (with facetlimit - 5) to be shown in my UI. I want to speed up this by using this cache. Is there a way where I can specify only the list of fields to be loaded to FieldValue Cache? Add a facet call as explicit warmup in your solrconfig.xml. You might want to consider DocValues for your facet fields. https://cwiki.apache.org/confluence/display/solr/DocValues - Toke Eskildsen
Re: Solr server becomes non-responsive.
Modassar, How many items in the collection ? I mean how many documents per collection ? 1 million, 10 millions, …? How are configured cache in solrconfig.xml ? What are the size attribute value for each cache ? Can you provide a sample of the query ? Does it fail immediately after solrcloud startup or after several hours ? Dominique 2014-12-24 6:20 GMT+01:00 Modassar Ather modather1...@gmail.com: Thanks for your suggestions. I will look into the link provided. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap This is usually an anti-pattern. The very first thing I'd be doing is trying to not do this. See ngrams for infix queries, or shingles or ReverseWildcardFilterFactory or. We cannot avoid multiple wildcards since that's is our user's requirement. We try to discourage it but the users insist on firing such queries. Also, ngrams etc. can be tried but our index is already huge and ngrams may further add lot to it. We are OK with such queries failing as long as other queries are not affected. Please find the details below. So, how many nodes in the cluster ? There are total 4 nodes on the cluster. How many shards and replicas for the collection ? There are 4 shards and no replica for any of them. How many items in the collection ? If I understand the question correctly there are two collection on each node and there size on each node is approximately 190GB and 130GB. What is the size of the index ? There are two collection on each node and there size on each node is approximately 190GB and 130GB. How is updated the collection (frequency, how many items per days, what is your hard commit strategy) ? It is an optimized index and read-only. There are no inter-mediate update. How are configured cache in solrconfig.xml ? Filter cache, query result cache and document cache are enabled. Auto-warming is also done. Can you provide all other JVM parameters ? -Xms20g -Xmx24g -XX:+UseConcMarkSweepGC Thanks again, Modassar On Wed, Dec 24, 2014 at 2:29 AM, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, I agree Erick it could be a good think to have more details about your configuration and collection. Your heap size is 32Gb. How many RAM on each servers ? By « 4 shard Solr cluster », you mean a 4 nodes Solr servers or a collection with 4 shards ? So, how many nodes in the cluster ? How many shards and replicas for the collection ? How many items in the collection ? What is the size of the index ? How is updated the collection (frequency, how many items per days, what is your hard commit strategy) ? How are configured cache in solrconfig.xml ? Can you provide all other JVM parameters ? Regards Dominique 2014-12-23 17:50 GMT+01:00 Erick Erickson erickerick...@gmail.com: Second most important part of your message: When executing a huge query with many wildcards inside it the server This is usually an anti-pattern. The very first thing I'd be doing is trying to not do this. See ngrams for infix queries, or shingles or ReverseWildcardFilterFactory or. And if your corpus is very large with many unique terms it's even worse, but you haven't really told us about that yet. Best, Erick On Tue, Dec 23, 2014 at 8:30 AM, Shawn Heisey apa...@elyograg.org wrote: On 12/23/2014 4:34 AM, Modassar Ather wrote: Hi, I have a setup of 4 shard Solr cluster with embedded zookeeper on one of them. The zkClient time out is set to 30 seconds, -Xms is 20g and -Xms is 24g. When executing a huge query with many wildcards inside it the server crashes and becomes non-responsive. Even the dashboard does not responds and shows connection lost error. This requires me to restart the servers. Here's the important part of your message: *Caused by: java.lang.OutOfMemoryError: Java heap space* Your heap is not big enough for what Solr has been asked to do. You need to either increase your heap size or change your configuration so that it uses less memory. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap Most programs have pretty much undefined behavior when an OOME occurs. Lucene's IndexWriter has been hardened so that it tries extremely hard to avoid index corruption when OOME strikes, and I believe that works well enough that we can call it nearly bulletproof ... but the rest of Lucene and Solr will make no guarantees. It's very difficult to have definable program behavior when an OOME happens, because you simply cannot know the precise point during program execution where it will happen, or what isn't going to work because Java did not have memory space to create an object. Going unresponsive is not surprising. If you can solve your heap problem, note that you may run into other performance issues discussed