Re: Problem regarding queries enclosed in double quotes in Solr 3.4
Upayavira thanks for replying. When we run the quoted query in edismax, we get correct results. The only problem is that the quoted queries are very slow. Can you please point me to a link which talks about the quoted queries in the edismax parser? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856p4093828.html Sent from the Solr - User mailing list archive at Nabble.com.
How to warm up filter queries for a category field with 1000 possible values ?
what's the way to warm up filter queries for a category field with 1000 possible values. Would I need to write 1000 lines manually in the solrconig.xml or what is the format?
Re: How to warm up filter queries for a category field with 1000 possible values ?
If you are asking to read from a file for warm up and if there is not a capability for what you want I can open a Jira issue and send a patch. 2013/10/7 user 01 user...@gmail.com what's the way to warm up filter queries for a category field with 1000 possible values. Would I need to write 1000 lines manually in the solrconig.xml or what is the format?
Re: How to warm up filter queries for a category field with 1000 possible values ?
Sorry, didn't get you exactly. I need to warm up my queries after the newSearcher firstSearcher are initialized. I am trying to warm up the filter caches for a category field but I have almost 1000 categories( changing with time), which make it impossible to list them in solrConfig.xml. Is there any way to iterate over all categories warm up the query for each ? On Mon, Oct 7, 2013 at 12:10 PM, Furkan KAMACI furkankam...@gmail.comwrote: If you are asking to read from a file for warm up and if there is not a capability for what you want I can open a Jira issue and send a patch. 2013/10/7 user 01 user...@gmail.com what's the way to warm up filter queries for a category field with 1000 possible values. Would I need to write 1000 lines manually in the solrconig.xml or what is the format?
Re: Soft commit and flush
I understand the bottom line that soft commits are about visibility, hard commits are about durability. I am just trying to gain better understanding what happens under the hood... 2 more related questions you made me think of: 1. Does the NRTCachingDirectoryFactory relevant for both types of commit, or just for hard commit? 2. If soft commit does not flush - all data exists in RAM until we call hard commit? If so, using soft commit without calling hard commit could cause OOE ... ? -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093834.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shard split issue
I think what is happening here is that the sub shard replicas are taking time to recover. We use a core admin command to wait for the replicas to become active before the shard states are switched. The timeout value for that command is just 120 seconds. We should wait for more than that. I'll open an issue. On Mon, Oct 7, 2013 at 2:47 AM, Yago Riveiro yago.rive...@gmail.com wrote: Seems the issue occurs when the shard has more than one replica. I unload all replicas of the shard (less 1 to do the split) and the SPLITSHARD finished as expected, the parent went to inactive and the children active. If the parent has more than 1 replica, the process apparently is finish, the total number of documents of children are the same of the parent, the problem is that the parent never goes to inactive state and the children are stuck in construction state. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, October 6, 2013 at 12:23 AM, Yago Riveiro wrote: I can attach the full log of the process if you want. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, October 6, 2013 at 12:12 AM, Yago Riveiro wrote: The error in log are: ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: splitshard the collection time out:300s ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: splitshard the collection time out:300s INFO - 2013-10-05 22:48:54.083; org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-000138 complete, response:{success={null={responseHeader={status=0,QTime=1901},core=statistics-13_shard17_0_replica1},null={responseHeader={status=0,QTime=1903},core=statistics-13_shard17_1_replica1},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=6324147}},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_1_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_0_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=1127},core=statistics-13_shard17_0_replica2},null={responseHeader={status=0,QTime=2109},core=statistics-13_shard17_1_replica2}},failure={null=org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I was asked to wait on state active for 192.168. 20.105:8983_solr but I still do not see the requested state. I see state: recovering live:true},Operation splitshard caused exception:=org.apache.solr.common.SolrException: SPLTSHARD failed to create subshard replicas or timed out waiting for them to come up,exception={msg=SPLTSHARD failed to create subshard replicas or timed out waiting for them to come up,rspCode=500}} -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, October 5, 2013 at 5:03 PM, Yago Riveiro wrote: I don't have the log, the rotation log file is configured to only 5 files with a small size, I will reconfigured to a high value and retry the split again. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, October 5, 2013 at 4:54 PM, Shalin Shekhar Mangar wrote: On Sat, Oct 5, 2013 at 8:37 PM, Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote: How I can see the logs of the parent? They are stored on solr.log? Yes. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Difference Between Query Time and Elapsed Time at Solrj Query Response
QueryResponse object at Solrj has two different methods for required time for a given query. One of them is for *QTime(queryTime)* and the other one is for *elapsedTime. *What are the differences between them and what exactly for elapsedTime?
Re: Shard split issue
If the replica has 20G must probably the recovery will take more than 120 seconds. In my case I have ssd's and 120 it's not enough. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, October 7, 2013 at 9:19 AM, Shalin Shekhar Mangar wrote: I think what is happening here is that the sub shard replicas are taking time to recover. We use a core admin command to wait for the replicas to become active before the shard states are switched. The timeout value for that command is just 120 seconds. We should wait for more than that. I'll open an issue. On Mon, Oct 7, 2013 at 2:47 AM, Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote: Seems the issue occurs when the shard has more than one replica. I unload all replicas of the shard (less 1 to do the split) and the SPLITSHARD finished as expected, the parent went to inactive and the children active. If the parent has more than 1 replica, the process apparently is finish, the total number of documents of children are the same of the parent, the problem is that the parent never goes to inactive state and the children are stuck in construction state. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, October 6, 2013 at 12:23 AM, Yago Riveiro wrote: I can attach the full log of the process if you want. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, October 6, 2013 at 12:12 AM, Yago Riveiro wrote: The error in log are: ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: splitshard the collection time out:300s ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: splitshard the collection time out:300s INFO - 2013-10-05 22:48:54.083; org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-000138 complete, response:{success={null={responseHeader={status=0,QTime=1901},core=statistics-13_shard17_0_replica1},null={responseHeader={status=0,QTime=1903},core=statistics-13_shard17_1_replica1},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=6324147}},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_1_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_0_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=1127},core=statistics-13_shard17_0_replica2},null={responseHeader={status=0,QTime=2109},core=statistics-13_shard17_1_replica2}},failure={null=org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I was asked to wait on state active for 192.168. 20.105:8983_solr but I still do not see the requested state. I see state: recovering live:true},Operation splitshard caused exception:=org.apache.solr.common.SolrException: SPLTSHARD failed to create subshard replicas or timed out waiting for them to come up,exception={msg=SPLTSHARD failed to create subshard replicas or timed out waiting for them to come up,rspCode=500}} -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, October 5, 2013 at 5:03 PM, Yago Riveiro wrote: I don't have the log, the rotation log file is configured to only 5 files with a small size, I will reconfigured to a high value and retry the split again. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, October 5, 2013 at 4:54 PM, Shalin Shekhar Mangar wrote: On Sat, Oct 5, 2013 at 8:37 PM, Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote: How I can see the logs of the parent? They are stored on solr.log? Yes. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
[SolrJ] HttpSolrServer - maxRetries
Hi folks, Long story short: I'm occasionally getting exceptions under heavy load (SocketException: Connection reset). I would expect HttpSolrServer to try again maxRetries-times, but it doesn't. For reasons I don't entirely understand, the call to httpClient.execute(method) is not inside the retry block (and thus will never be retried). Is this a bug in HttpSolrServer? Or is this intended behaviour? I'd rather not wrap my code in a retry mechanism if HttpSolrServer provides one. Thx, - Bram
Re: [SolrJ] HttpSolrServer - maxRetries
Hi Bram; Could you send you error logs? 2013/10/7 Bram Van Dam bram.van...@intix.eu Hi folks, Long story short: I'm occasionally getting exceptions under heavy load (SocketException: Connection reset). I would expect HttpSolrServer to try again maxRetries-times, but it doesn't. For reasons I don't entirely understand, the call to httpClient.execute(method) is not inside the retry block (and thus will never be retried). Is this a bug in HttpSolrServer? Or is this intended behaviour? I'd rather not wrap my code in a retry mechanism if HttpSolrServer provides one. Thx, - Bram
Re: [SolrJ] HttpSolrServer - maxRetries
On 10/07/2013 11:51 AM, Furkan KAMACI wrote: Could you send you error logs? Whoops, forgot to paste: Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/fooIndex at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.violet.search.service.IndexingService.addDocument(IndexingService.java:79) ~[Violet-Search-1.06.003.jar:na] ... 8 common frames omitted Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:185) ~[na:1.6.0_24] at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] ... 13 common frames omitted
Re: [SolrJ] HttpSolrServer - maxRetries
One more thing, could you say that which version of Solr you are using? 2013/10/7 Bram Van Dam bram.van...@intix.eu On 10/07/2013 11:51 AM, Furkan KAMACI wrote: Could you send you error logs? Whoops, forgot to paste: Caused by: org.apache.solr.client.solrj.**SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/ **fooIndex http://localhost:8080/solr/fooIndex at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:416) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:181) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.**request.AbstractUpdateRequest.** process(AbstractUpdateRequest.**java:117) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.**SolrServer.add(SolrServer.**java:116) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.apache.solr.client.solrj.**SolrServer.add(SolrServer.**java:102) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] at org.violet.search.service.**IndexingService.addDocument(**IndexingService.java:79) ~[Violet-Search-1.06.003.jar:**na] ... 8 common frames omitted Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.**read(SocketInputStream.java:**185) ~[na:1.6.0_24] at org.apache.http.impl.io.**AbstractSessionInputBuffer.** fillBuffer(**AbstractSessionInputBuffer.**java:166) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.io.**SocketInputBuffer.fillBuffer(**SocketInputBuffer.java:90) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.io.**AbstractSessionInputBuffer.** readLine(**AbstractSessionInputBuffer.**java:281) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.conn.**DefaultHttpResponseParser.** parseHead(**DefaultHttpResponseParser.**java:92) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.conn.**DefaultHttpResponseParser.** parseHead(**DefaultHttpResponseParser.**java:62) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.io.**AbstractMessageParser.parse(** AbstractMessageParser.java:**254) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.**AbstractHttpClientConnection.** receiveResponseHeader(**AbstractHttpClientConnection.**java:289) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.conn.**DefaultClientConnection.** receiveResponseHeader(**DefaultClientConnection.java:**252) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.conn.**ManagedClientConnectionImpl.** receiveResponseHeader(**ManagedClientConnectionImpl.**java:191) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.protocol.**HttpRequestExecutor.** doReceiveResponse(**HttpRequestExecutor.java:300) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.protocol.**HttpRequestExecutor.execute(**HttpRequestExecutor.java:127) ~[httpcore-4.2.2.jar:4.2.2] at org.apache.http.impl.client.**DefaultRequestDirector.** tryExecute(**DefaultRequestDirector.java:**717) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.**DefaultRequestDirector.**execute( **DefaultRequestDirector.java:**522) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.**AbstractHttpClient.execute(**AbstractHttpClient.java:906) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.**AbstractHttpClient.execute(**AbstractHttpClient.java:805) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.http.impl.client.**AbstractHttpClient.execute(**AbstractHttpClient.java:784) ~[httpclient-4.2.3.jar:4.2.3] at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:353) ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57] ... 13 common frames omitted
Re: [SolrJ] HttpSolrServer - maxRetries
On 10/07/2013 12:55 PM, Furkan KAMACI wrote: One more thing, could you say that which version of Solr you are using? The stacktrace comes from 4.2.1, but I suspect that this could occur on 4.4 as well. I've not been able to reproduce this consistently: it has happened twice (!) after indexing around 100 million documents.
feedback on Solr 4.x LotsOfCores feature
Hello, In my company, we use Solr in production to offer full text search on mailboxes. We host dozens million of mailboxes, but only webmail users have such feature (few millions). We have the following use case : - non static indexes with more update (indexing and deleting), than select requests (ratio 7:1) - homogeneous configuration for all indexes - not so much user at the same time We started to index mailboxes with Solr 1.4 in 2010, on a subset of 400,000 users. - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr instance - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per index (~2 million users) - we upgraded to Solr 3.5 in 2012 As indexes grew, IOPS and the response times have increased more and more. The index size was mainly due to stored fields (large .fdt files) Retrieving these fields from the index was costly, because of many seek in large files, and no limit usage possible. There is also an overhead on queries : too many results are filtered to find only results concerning user. For these reason and others, like not pooled users, hardware savings, better scoring, some requests that do not support filtering, we have decided to use the LotsOfCores feature. Our goal was to change the current I/O usage : from lots of random I/O access on huge segments to mostly sequential I/O access on small segments. For our use case, it's not a big deal, that the first query to one not yet loaded core will be slow. And, we don’t need to fit all the cores into memory at once. We started from the SOLR-1293 issue and the LotsOfCores wiki page to finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1 core). We don't need anymore to run so many Solr per node. We are now able to have around 5 cores per Solr and we plan to grow to 100,000 cores per instance. In a first time, we used the solr.xml persistence. All cores have loadOnStartup=false and transient=true attributes, so a cold start is very quick. The response times were better than ever, in comparaison with poor response times, we had before using LotsOfCores. We added 2 Cores options : - numBuckets to create a subdirectory based on a hash on the corename % numBuckets in the core Datadir, because all cores cannot live in the same directory - Auto with 3 differents values : 1) false : default behaviour 2) createLoad : create, if not exist, and load the core on the fly on the first incoming request (update, select). 3) onlyLoad : load the core on the fly on the first incoming request (update, select), if exist on disk Then, to improve performance and avoid synchronization in the solr.xml persistence : we disabled it. The drawback is we cannot see anymore all the availables cores list with the admin core status command, only those warmed up. Finally, we can achieve very good performances with Solr LotsOfCores : - Index 5 emails (avg) + commit + search : x4.9 faster response time (Mean), x5.4 faster (95th per) - Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4 faster (95th per) - Search : x3.7 faster response time (Mean) 4x faster (95th per) In fact, the better performance is mainly due to the little size of each index, but also thanks to the isolation between cores (updates and queries on many mailboxes don’t have side effects to each other). One important thing with the LotsOfCores feature is to take care of : - the number of file descriptors, it used a lot (need to increase global max and per process fd) - the value of the transientCacheSize depending of the RAM size and the PermGen allocated size - the leak of ClassLoader that increase minor GC times, when CMS GC is enabled (use -XX:+CMSClassUnloadingEnabled) - the overhead to parse solrconfig.xml and load dependencies to open each core - lotsOfCores doesn’t work with SolrCloud, then we store indexes location outside of Solr. We have Solr proxies to route requests to the right instance. Not in production, we try the core discovery feature in Solr 4.4 with a lots of cores. When you start, it spend a lot of times to discover cores due to a big number of cores, meanwhile all requests fail (SolrDispatchFilter.init() not done yet). It will be great to have for example an option for a core discovery in background, or just to be able to disable it, like we do in our use case. If someone is interested in these new options for LotsOfCores feature, just tell me Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et
Re: Does the queryResultCache, contain all the results returned by main query or after filtering out
No, the queryResultCache contains the top N for the query, _including_ the filters. The idea is that you should be able to get the next page of results without going to any searching code. You couldn't do this if in the scenario you describe. If your filters are truly unique, you'll gain a little bit of performance by specifying the local param for your fq clause of: {!cache=false}, that will just bypass adding it to the filterCache. Try thinking about it backwards. Especially if you don't care about scoring, say you are sorting by distance. Don't put the terms in the main query, put everything in in fq clauses so your query becomes something like: q=*:*fq=field:term AND field:termfq={!cache=false}unique clause1fq={!cache=false}unique clause2 The beauty of his is that you can assign weight to the cache=false clauses so they will only be calculated for docs that make it through your lower-cost fq clauses. And the main query. See: http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/ Best Erick On Sun, Oct 6, 2013 at 9:56 AM, Ertio Lew ertio...@gmail.com wrote: Background: I need to find items matching keywords provided by user, filtered by availability within certain radius from his location filtered by other user specific params. So I think this may be very relevant for me because my filter queries may be very unique all the time(since I am filtering by geospatial search, people find items nearest to them). Also some additional user specific filters. So filter queries will always be unique, but the people may use common keywords to lookup, so main query (q param) may be common most times. So if queryResultCache contain all the results returned by main query(q param) , as before filtering then only I think this queryResultCache may be helpful for me. Isn't it ? On Sun, Oct 6, 2013 at 7:13 PM, Erick Erickson erickerick...@gmail.comwrote: First, why is it important to you? General background or a specific problem you're trying to address? But to answer, no. The queryResultCache contains the top N ids for the query. You control N by setting queryResultWindowSize in solrconfig.xml. It's often set to 2x the usual rows parameter on the theory that people rarely page past the second page. Best, Erick On Sun, Oct 6, 2013 at 5:22 AM, Ertio Lew ertio...@gmail.com wrote: Does the queryResultCache, contain all the results returned by main query(q param) or it contains results prepared after all filter queries ?
How to share Schema between multicore on Solr 4.4
I am using Solr 4.4 version with SolrCloud on Windows machine. Somehow i am not able to share schema between multiple core. My solr.xml file look like:- solr str name=shareSchema${shareSchema:true}/str solrcloud str name=hostContext${hostContext:SolrEngine}/str int name=hostPort${tomcat.port:8080}/int int name=zkClientTimeout${zkClientTimeout:15000}/int /solrcloud I have used core.properties file for each core. One of the core (say collection1) contains schema.xml file and rest will having all the config file excluding schema.xml. core.properties file contains name=corename After deployment I am getting following error collection2: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error loading schema resource schema.xml Please note that i have provided shareSchema=true in solr.xml file. Please let me know if anything is missing. Any pointer will be helpful. Thanks, Dharmendra Jaiswal -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-share-Schema-between-multicore-on-Solr-4-4-tp4093881.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Improving indexing performance
Just skimmed, but the usual reason you can't max out the server is that the client can't go fast enough. Very quick experiment: comment out the server.add line in your client and run it again, does that speed up the client substantially? If not, then the time is being spent on the client. Or split your csv file into, say, 5 parts and run it from 5 different PCs in parallel. bq: I can't rely on auto commit, otherwise I get an OutOfMemory error This shouldn't be happening, I'd get to the bottom of this. Perhaps simply allocating more memory to the JVM running Solr. bq: committing every 100k docs gives worse performance It'll be best to specify openSearcher=false for max indexing throughput BTW. You should be able to do this quite frequently, 15 seconds seems quite reasonable. Best, Erick On Sun, Oct 6, 2013 at 12:19 PM, Matteo Grolla matteo.gro...@gmail.com wrote: I'd like to have some suggestion on how to improve the indexing performance on the following scenario I'm uploading 1M docs to solr, every docs has id: sequential number title: small string date: date body: 1kb of text Here are my benchmarks (they are all single executions, not averages from multiple executions): 1) using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document total time: 143035ms 1.1)using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document ramBufferSizeMB500/ramBufferSizeMB maxBufferedDocs10/maxBufferedDocs total time: 134493ms 1.2)using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document mergeFactor30/mergeFactor total time: 143134ms 2) using a solrj client from another pc in the lan (100Mbps) with httpsolrserver with javabin format add documents to the server in batches of 1k docs ( server.add( collection ) ) auto commit every 15s with openSearcher=false and commit after last document total time: 139022ms 3) using a solrj client from another pc in the lan (100Mbps) with concurrentupdatesolrserver with javelin format add documents to the server in batches of 1k docs ( server.add( collection ) ) server queue size=20k server threads=4 no auto-commit and commit every 100k docs total time: 167301ms --On the solr server-- cpu averages25% at best 100% for 1 core IO is still far from being saturated iostat gives a pattern like this (every 5 s) time(s) %util 100 45,20 105 1,68 110 17,44 115 76,32 120 2,64 125 68 130 1,28 I thought that using concurrentupdatesolrserver I was able to max cpu or IO but I wasn't. With concurrentupdatesolrserver I can't rely on auto commit, otherwise I get an OutOfMemory error and I found that committing every 100k docs gives worse performance than auto commit every 15s (benchmark 3 with httpsolrserver took 193515) I'd really like to understand why I can't max out the resources on the server hosting solr (disk above all) And I'd really like to understand what I'm doing wrong with concurrentupdatesolrserver thanks
Re: Doing time sensitive search in solr
Wait, are you saying you have fields like 2013-12-01T00:00:00Z_entryDate? So you have some wildcard definition in your schema like *_entryDate type=tdate? If so, I think your model is just wrong and you should have some field(s) that you store dates in. That aside, and assuming you have wildcards like I'm guessing, you could have a copyfield to like source=*_entryDate dest=bag_of_dates and do your ranges on bag_of_dates. Which would be the same as putting your dates in a single field with a fixed name in the first place. Best, Erick On Sun, Oct 6, 2013 at 4:34 PM, Darniz rnizamud...@edmunds.com wrote: Thanks Eric. i hope i understood correctly, but my main concern is i have to tie specific content indexed to a specific time range. and make that document come up in search results only for that time. As i have mentioned in my previous example we have multiple data-string structures which makes a bit more complicated, on top of that i don't know what will be the exact date. Hence if someone searches for toyota and if today is 6-OCT-2013 this doc should not come in search results since the keyword toyota should be searched only after 1-DEC-2013. date name=2013-09-01T00:00:00Z_entryDate2013-09-01T00:00:00Z/date str name=2013-09-01T0:00:00Z_entryTextSept content : Honda is releasing the car this month /str date name=2013-12-01T00:00:00Z_entryDate2013-12-01T00:00:00Z/date str name=2013-12-01T00:00:00Z_entryTextDec content : Toyota is releasing the car this month /str i dont know using a copy field might solve this or correct me if i am wrong. may be we are pursuing something which is not meant for Solr. Thanks Rashid -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093790.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to warm up filter queries for a category field with 1000 possible values ?
That's what the autowarm number for filterCache is about. It re-executes the last N fq clauses and caches them. Similarly for some of the other autowarm. But don't go wild here. Measure _then_ fix. Usually autowarming just a few ( 32) is sufficient. And remember that autowarming is done whenever you open a new searcher, so if you have your soft commits configured to be, say, 5 seconds you'll get minimal benefit here. Are you saying you have almost 1,000 differently-named fields in your documents? Or 1,000 _values_ in your category field? In either case, please measure your query performance before assuming you need to autowarm excessively. If you don't autowarm at all and your first few queries are acceptable after you open a new searcher then don't worry about it. Often the biggest win is just filling the lower-level Lucene caches which you can do with a very few queries. Best Erick On Mon, Oct 7, 2013 at 2:49 AM, user 01 user...@gmail.com wrote: Sorry, didn't get you exactly. I need to warm up my queries after the newSearcher firstSearcher are initialized. I am trying to warm up the filter caches for a category field but I have almost 1000 categories( changing with time), which make it impossible to list them in solrConfig.xml. Is there any way to iterate over all categories warm up the query for each ? On Mon, Oct 7, 2013 at 12:10 PM, Furkan KAMACI furkankam...@gmail.comwrote: If you are asking to read from a file for warm up and if there is not a capability for what you want I can open a Jira issue and send a patch. 2013/10/7 user 01 user...@gmail.com what's the way to warm up filter queries for a category field with 1000 possible values. Would I need to write 1000 lines manually in the solrconig.xml or what is the format?
Re: Soft commit and flush
bq: Does the NRTCachingDirectoryFactory relevant for both types of commit, or just for hard commit Don't know the code deeply, but NRT==Near Real Time == Soft commit I'd guess. bq: If soft commit does not flush... soft commit flushes the transaction log. On restart if the content of the tlog isn't in the index, then it's replayed to catch up the index. OOE? Out Of Energy? You can optionally set up soft commits to fsync the tlog if you want to eliminate the remote possibility that you have an op-system (not JVM) crash between the time the JVM passes the write off to the op system and the op system writes the bits to disk. Best, Erick On Mon, Oct 7, 2013 at 2:57 AM, adfel70 adfe...@gmail.com wrote: I understand the bottom line that soft commits are about visibility, hard commits are about durability. I am just trying to gain better understanding what happens under the hood... 2 more related questions you made me think of: 1. Does the NRTCachingDirectoryFactory relevant for both types of commit, or just for hard commit? 2. If soft commit does not flush - all data exists in RAM until we call hard commit? If so, using soft commit without calling hard commit could cause OOE ... ? -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093834.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference Between Query Time and Elapsed Time at Solrj Query Response
Query time is the time spent in Solr getting the search results. It does NOT include reading the bits off disk to assemble the response etc. elapsed time is the time from when the query was sent to the time it gets back. It includes qtime, reading the bits off disk to assemble the response, transmission time, etc. Best, Erick On Mon, Oct 7, 2013 at 4:49 AM, Furkan KAMACI furkankam...@gmail.com wrote: QueryResponse object at Solrj has two different methods for required time for a given query. One of them is for *QTime(queryTime)* and the other one is for *elapsedTime. *What are the differences between them and what exactly for elapsedTime?
Regarding edismax parsing
Hi, I have a question regarding to parsing of tokens in edismax parser and subsequently a follow up question related to same. - Each field has list of analyzers and tokenizers as configured in schema.xml (Index and query time). Now, say I search for query - red shoes. So, is it like that for forming Disjunction query on each field, edismax will first apply analyzers configured to that field, and then form the query. For e.g if field1 has changes red to rd and field2 changes red to re, query will be like - (field1:rd) | (field2:re) ? - If above holds true, then when I changed ordering of analyzers and put SynonymFilterFactory at top of all analyzers (in schema.xml), edismax still tokenizes the query first with respect to space and then only apply synonym filter factory, which leads me to think that this is not happening. My use case is like , before applying any tokenizer, I want to support phrase level synonym replacement and do rest of analysis. Thanks, Prashant Golash
Re: feedback on Solr 4.x LotsOfCores feature
Thanks for the great writeup! It's always interesting to see how a feature plays out in the real world. A couple of questions though: bq: We added 2 Cores options : Do you mean you patched Solr? If so are you willing to shard the code back? If both are yes, please open a JIRA, attach the patch and assign it to me. bq: the number of file descriptors, it used a lot (need to increase global max and per process fd) Right, this makes sense since you have a bunch of cores all with their own descriptors open. I'm assuming that you hit a rather high max number and it stays pretty steady bq: the overhead to parse solrconfig.xml and load dependencies to open each core Right, I tried to look at sharing the underlying solrconfig object but it seemed pretty hairy. There are some extensive comments in the JIRA of the problems I foresaw. There may be some action on this in the future. bq: lotsOfCores doesn’t work with SolrCloud Right, we haven't concentrated on that, it's an interesting problem. In particular it's not clear what happens when nodes go up/down, replicate, resynch, all that. bq: When you start, it spend a lot of times to discover cores due to a big How long? I tried 15K cores on my laptop and I think I was getting 15 second delays or roughly 1K cores discovered/second. Is your delay on the order of 50 seconds with 50K cores? I'm not sure how you could do that in the background, but I haven't thought about it much. I tried multi-threading core discovery and that didn't help (SSD disk), I assumed that the problem was mostly I/O contention (but didn't prove it). What if a request came in for a core before you'd found it? I'm not sure what the right behavior would be except perhaps to block on that request until core discovery was complete. Hm. How would that work for your case? That seems do-able. BTW, so far you get the prize for the most cores on a node I think. Thanks again for the great feedback! Erick On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier olivier.so...@worldline.com wrote: Hello, In my company, we use Solr in production to offer full text search on mailboxes. We host dozens million of mailboxes, but only webmail users have such feature (few millions). We have the following use case : - non static indexes with more update (indexing and deleting), than select requests (ratio 7:1) - homogeneous configuration for all indexes - not so much user at the same time We started to index mailboxes with Solr 1.4 in 2010, on a subset of 400,000 users. - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr instance - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per index (~2 million users) - we upgraded to Solr 3.5 in 2012 As indexes grew, IOPS and the response times have increased more and more. The index size was mainly due to stored fields (large .fdt files) Retrieving these fields from the index was costly, because of many seek in large files, and no limit usage possible. There is also an overhead on queries : too many results are filtered to find only results concerning user. For these reason and others, like not pooled users, hardware savings, better scoring, some requests that do not support filtering, we have decided to use the LotsOfCores feature. Our goal was to change the current I/O usage : from lots of random I/O access on huge segments to mostly sequential I/O access on small segments. For our use case, it's not a big deal, that the first query to one not yet loaded core will be slow. And, we don’t need to fit all the cores into memory at once. We started from the SOLR-1293 issue and the LotsOfCores wiki page to finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1 core). We don't need anymore to run so many Solr per node. We are now able to have around 5 cores per Solr and we plan to grow to 100,000 cores per instance. In a first time, we used the solr.xml persistence. All cores have loadOnStartup=false and transient=true attributes, so a cold start is very quick. The response times were better than ever, in comparaison with poor response times, we had before using LotsOfCores. We added 2 Cores options : - numBuckets to create a subdirectory based on a hash on the corename % numBuckets in the core Datadir, because all cores cannot live in the same directory - Auto with 3 differents values : 1) false : default behaviour 2) createLoad : create, if not exist, and load the core on the fly on the first incoming request (update, select). 3) onlyLoad : load the core on the fly on the first incoming request (update, select), if exist on disk Then, to improve performance and avoid synchronization in the solr.xml persistence : we disabled it. The drawback is we cannot see anymore all the availables cores list with the admin core status command, only those warmed up. Finally, we can achieve very good performances with Solr LotsOfCores : - Index 5 emails (avg) +
no such field error:smaller big block size details while indexing doc files
Im trying to index .doc,.docx,pdf files, im using this url: curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@complex.doc This is the error I get: Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more Also using same type of url,txt,mp3 and pdf files are indexed successfully. (curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@abc.txt) Schema.xml is: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string class=solr.StrField / fieldType name=text class=solr.TextField / fieldtype name=ignored stored=false indexed=false multiValued=true class=solr.StrField / /types uniqueKeyid/uniqueKey /schema Im not able to understand what kind of error this is,please help me. -- View this message in context: http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Soft commit and flush
Sorry, by OOE I meant Out of memory exception... -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search for non empty fields in a index with denormalized tables
Okay I try to specify my question a little bit. I have a denormalized index of two sql tables patient and table. If I add a patient with two images to the solr index my index contains 3 documents. --- Pat_ID |Patient_Lastnname | Image_ID | Image_Name --- 1 |Miller | EMPTY| EMPTY 1 |Miller | 1 | dog.jpg 1 |Miller | 2 | cat.jpg --- When I add now another patient without any images the solr index contains 4 documents. --- Pat_ID |Patient_Lastnname | Image_ID | Image_Name --- 1 |Miller | EMPTY| EMPTY 1 |Miller | 1 | dog.jpg 1 |Miller | 1 | cat.jpg 2 |Smith | EMPTY| EMPTY --- Now I want to select all patients that have no image. (image_id is empty) If I query this with the following solr query the result would be Miller and Smith but I need a query that will return Smith only. select?q=-Image_ID:[0 TO *] and then group by pat_id What I would need is something like the having clause in sql. So I could group by pat_id and filter for the one where count is less then 2 Bests Sandro -- View this message in context: http://lucene.472066.n3.nabble.com/Search-for-non-empty-fields-in-a-index-with-denormalized-tables-tp4093287p4093903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Soft commit and flush
Out of Memory Exception is well known as OOM. Guido. On 07/10/13 14:11, adfel70 wrote: Sorry, by OOE I meant Out of memory exception... -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: feedback on Solr 4.x LotsOfCores feature
I assume that the lotOfCores feature doesn't use zookeeper I tried simulate the cores as collection, but when the size of clusterstate.json is bigger than 1M and -Djute.maxbuffer is needed to increase the 1 mega limitation. A naive question, why clusterstate.json is doesn't by collection? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, October 7, 2013 at 1:33 PM, Erick Erickson wrote: Thanks for the great writeup! It's always interesting to see how a feature plays out in the real world. A couple of questions though: bq: We added 2 Cores options : Do you mean you patched Solr? If so are you willing to shard the code back? If both are yes, please open a JIRA, attach the patch and assign it to me. bq: the number of file descriptors, it used a lot (need to increase global max and per process fd) Right, this makes sense since you have a bunch of cores all with their own descriptors open. I'm assuming that you hit a rather high max number and it stays pretty steady bq: the overhead to parse solrconfig.xml and load dependencies to open each core Right, I tried to look at sharing the underlying solrconfig object but it seemed pretty hairy. There are some extensive comments in the JIRA of the problems I foresaw. There may be some action on this in the future. bq: lotsOfCores doesn’t work with SolrCloud Right, we haven't concentrated on that, it's an interesting problem. In particular it's not clear what happens when nodes go up/down, replicate, resynch, all that. bq: When you start, it spend a lot of times to discover cores due to a big How long? I tried 15K cores on my laptop and I think I was getting 15 second delays or roughly 1K cores discovered/second. Is your delay on the order of 50 seconds with 50K cores? I'm not sure how you could do that in the background, but I haven't thought about it much. I tried multi-threading core discovery and that didn't help (SSD disk), I assumed that the problem was mostly I/O contention (but didn't prove it). What if a request came in for a core before you'd found it? I'm not sure what the right behavior would be except perhaps to block on that request until core discovery was complete. Hm. How would that work for your case? That seems do-able. BTW, so far you get the prize for the most cores on a node I think. Thanks again for the great feedback! Erick On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier olivier.so...@worldline.com (mailto:olivier.so...@worldline.com) wrote: Hello, In my company, we use Solr in production to offer full text search on mailboxes. We host dozens million of mailboxes, but only webmail users have such feature (few millions). We have the following use case : - non static indexes with more update (indexing and deleting), than select requests (ratio 7:1) - homogeneous configuration for all indexes - not so much user at the same time We started to index mailboxes with Solr 1.4 in 2010, on a subset of 400,000 users. - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr instance - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per index (~2 million users) - we upgraded to Solr 3.5 in 2012 As indexes grew, IOPS and the response times have increased more and more. The index size was mainly due to stored fields (large .fdt files) Retrieving these fields from the index was costly, because of many seek in large files, and no limit usage possible. There is also an overhead on queries : too many results are filtered to find only results concerning user. For these reason and others, like not pooled users, hardware savings, better scoring, some requests that do not support filtering, we have decided to use the LotsOfCores feature. Our goal was to change the current I/O usage : from lots of random I/O access on huge segments to mostly sequential I/O access on small segments. For our use case, it's not a big deal, that the first query to one not yet loaded core will be slow. And, we don’t need to fit all the cores into memory at once. We started from the SOLR-1293 issue and the LotsOfCores wiki page to finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1 core). We don't need anymore to run so many Solr per node. We are now able to have around 5 cores per Solr and we plan to grow to 100,000 cores per instance. In a first time, we used the solr.xml persistence. All cores have loadOnStartup=false and transient=true attributes, so a cold start is very quick. The response times were better than ever, in comparaison with poor response times, we had before using LotsOfCores. We added 2 Cores options : - numBuckets to create a subdirectory based on a hash on the corename % numBuckets in the core Datadir, because all cores cannot live in the same directory - Auto with 3
Re: feedback on Solr 4.x LotsOfCores feature
I think we'd all love to see those improvements land in Solr. I was involved in the work at AOL WebMail where the LotsOfCores idea originated. We had many of the problems that you've had to solve yourself. I remember that we switched to compound file format to reduce file descriptors. Also we had to switch back to the Log Merge Policy from TieredMergePolicy because TieredMergePolicy increased the overall random disk i/o and we had latency issues because of it. On Mon, Oct 7, 2013 at 1:23 PM, Soyez Olivier olivier.so...@worldline.comwrote: Hello, In my company, we use Solr in production to offer full text search on mailboxes. We host dozens million of mailboxes, but only webmail users have such feature (few millions). We have the following use case : - non static indexes with more update (indexing and deleting), than select requests (ratio 7:1) - homogeneous configuration for all indexes - not so much user at the same time We started to index mailboxes with Solr 1.4 in 2010, on a subset of 400,000 users. - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr instance - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per index (~2 million users) - we upgraded to Solr 3.5 in 2012 As indexes grew, IOPS and the response times have increased more and more. The index size was mainly due to stored fields (large .fdt files) Retrieving these fields from the index was costly, because of many seek in large files, and no limit usage possible. There is also an overhead on queries : too many results are filtered to find only results concerning user. For these reason and others, like not pooled users, hardware savings, better scoring, some requests that do not support filtering, we have decided to use the LotsOfCores feature. Our goal was to change the current I/O usage : from lots of random I/O access on huge segments to mostly sequential I/O access on small segments. For our use case, it's not a big deal, that the first query to one not yet loaded core will be slow. And, we don’t need to fit all the cores into memory at once. We started from the SOLR-1293 issue and the LotsOfCores wiki page to finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1 core). We don't need anymore to run so many Solr per node. We are now able to have around 5 cores per Solr and we plan to grow to 100,000 cores per instance. In a first time, we used the solr.xml persistence. All cores have loadOnStartup=false and transient=true attributes, so a cold start is very quick. The response times were better than ever, in comparaison with poor response times, we had before using LotsOfCores. We added 2 Cores options : - numBuckets to create a subdirectory based on a hash on the corename % numBuckets in the core Datadir, because all cores cannot live in the same directory - Auto with 3 differents values : 1) false : default behaviour 2) createLoad : create, if not exist, and load the core on the fly on the first incoming request (update, select). 3) onlyLoad : load the core on the fly on the first incoming request (update, select), if exist on disk Then, to improve performance and avoid synchronization in the solr.xml persistence : we disabled it. The drawback is we cannot see anymore all the availables cores list with the admin core status command, only those warmed up. Finally, we can achieve very good performances with Solr LotsOfCores : - Index 5 emails (avg) + commit + search : x4.9 faster response time (Mean), x5.4 faster (95th per) - Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4 faster (95th per) - Search : x3.7 faster response time (Mean) 4x faster (95th per) In fact, the better performance is mainly due to the little size of each index, but also thanks to the isolation between cores (updates and queries on many mailboxes don’t have side effects to each other). One important thing with the LotsOfCores feature is to take care of : - the number of file descriptors, it used a lot (need to increase global max and per process fd) - the value of the transientCacheSize depending of the RAM size and the PermGen allocated size - the leak of ClassLoader that increase minor GC times, when CMS GC is enabled (use -XX:+CMSClassUnloadingEnabled) - the overhead to parse solrconfig.xml and load dependencies to open each core - lotsOfCores doesn’t work with SolrCloud, then we store indexes location outside of Solr. We have Solr proxies to route requests to the right instance. Not in production, we try the core discovery feature in Solr 4.4 with a lots of cores. When you start, it spend a lot of times to discover cores due to a big number of cores, meanwhile all requests fail (SolrDispatchFilter.init() not done yet). It will be great to have for example an option for a core discovery in background, or just to be able to disable it, like we do in our use case. If someone is interested in these
Re: Among LatLonType SpatialRecursivePrefixTreeFieldType which one for filtering outside of bounding box?
Use the location_rpt field type in the example schema.xml -- it has good performance less memory (what you asked for) compared to LatLonType. To learn how to tweak some of the settings to get better performance at the expense of some accuracy, see http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 ~ David On 10/5/13 8:53 AM, user 01 user...@gmail.com wrote: For geospatial search, I need to filter out all points outside of certain radius from a certain point. No need for precise results, Approximation will work for me! No sorting is required either. I see there are two Spatial impls: LatLonType SpatialRecursivePrefixTreeFieldType. But I am sure which one should I choose for good performance less memory? (As I said approximations are ok for me. Instead of circle if it gives me approx bounding box results then too I will be fine)
Web App Engineer at Harvard-Smithsonian Astrophysical Observatory, full time, indefinite contract
Dear all, We are looking for a new member to join our team. This position requires solid knowledge of Python, plus experience with web development, HTML5, XSLT, JSON, CSS3, relational databases and NoSQL but search (and SOLR) is the central point of everything we do here. So, if you love SOLR/Lucene as we do, then I'm sure there will be plenty of opportunities for search related development for you too. About the project: http://labs.adsabs.harvard.edu/adsabs/ The ADS is the central discovery engine for astronomical information, used nearly every day by nearly every astronomer. Conceived 20 years ago, moving into its third decade, the ADS continues to serve the research community worldwide. The ADS is currently developing the next-generation web-based platform supporting current and future services. The project is committed to developing and using open-source software. The main components of the system architecture are: Apache SOLR/Lucene (search), CERN Invenio and MongoDB (storage), Python+Flask+Bootstrap (frontend). We are looking for a highly-motivated full-stack developer interested in joining a dynamic team of talented individuals architecting and implementing the new platform. Your primary responsibility is the design, development, and support of the ADS front-end applications (including the new search interface) as well as the implementation of the user database, login system and personalization features. For more information, please see the full posting online at: http://www.cfa.harvard.edu/hr/postings/13-32.html Thank you, Roman -- Dr. Roman Chyla ADS, Harvard-Smithsonian Center for Astrophysics roman.ch...@gmail.com
Adding OR operator in querystring and grouping fields?
This query returns the correct results: http://localhost:8983/solr/tt/select/?indent=onfq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist() However, I want to add OR select on a field city as well: fq=city:(brooklyn) But when I add that to my querystring I get: http://localhost:8983/solr/tt/select/?indent=onfq=city:(brooklyn)fq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist() Then I get 0 results. I have this in my schema.xml: solrQueryParser defaultOperator=OR/ How can I add an OR operator in my querystring and group fields city and my geodist parameters? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding OR operator in querystring and grouping fields?
The default query operator applies only within a single query parameter. If you want to OR two filter queries, you must combine them into one filter query parameter. -- Jack Krupansky -Original Message- From: PeterKerk Sent: Monday, October 07, 2013 1:08 PM To: solr-user@lucene.apache.org Subject: Adding OR operator in querystring and grouping fields? This query returns the correct results: http://localhost:8983/solr/tt/select/?indent=onfq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist() However, I want to add OR select on a field city as well: fq=city:(brooklyn) But when I add that to my querystring I get: http://localhost:8983/solr/tt/select/?indent=onfq=city:(brooklyn)fq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist() Then I get 0 results. I have this in my schema.xml: solrQueryParser defaultOperator=OR/ How can I add an OR operator in my querystring and group fields city and my geodist parameters? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding OR operator in querystring and grouping fields?
Ok thanks. you must combine them into one filter query parameter. , how would I do that? Can I simply change the URL structure or must I change my schema.xml and/or data-config.xml? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093947.html Sent from the Solr - User mailing list archive at Nabble.com.
Delete a field - Atomic updates (SOLR 4.1.0) without using null=true
I am using SOLR 4.1.0 and perform atomic updates on SOLR documents. Unfortunately there is a bug in 4.1.0 (https://issues.apache.org/jira/browse/SOLR-4297) that blocks me from using null=true for deleting a field through atomic update functionality. Is there any other way to delete a field other than using this syntax? FYI..I wont be able to migrate to latest version now due to company code freeze hence trying to figure out a temporary work around. -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-a-field-Atomic-updates-SOLR-4-1-0-without-using-null-true-tp4093951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: {soft}Commit and cache flusing
Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
Re: solr cpu usage
Fantastic article! Tim On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote: From my perspective, your question is almost impossible to answer, there are too many variables. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, More CPU cores means more concurrency. This is good if you need to handle high query rates. Faster cores mean lower query latency, assuming you are not bottlenecked by memory or disk IO or network IO. So what is ideal for you depends on your concurrency and latency needs. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote: hi We're building a spec for a machine to purchase. We're going to buy 10 machines. we aren't sure yet how many proccesses we will run per machine. the question is -should we buy faster cpu with less cores or slower cpu with more cores? in any case we will have 2 cpus in each machine. should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores? what will we gain by having many cores? what kinds of usages would make cpu be the bottleneck? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete a field - Atomic updates (SOLR 4.1.0) without using null=true
I don't know if there's a way to accomplish your goal directly, but as a pure workaround, you can write a routine to fetch all the stored values and resubmit the document without the field in question. This is what atomic updates do, minus the overhead of the transmission. On Oct 7, 2013, at 11:15 AM, SolrLover bbar...@gmail.com wrote: I am using SOLR 4.1.0 and perform atomic updates on SOLR documents. Unfortunately there is a bug in 4.1.0 (https://issues.apache.org/jira/browse/SOLR-4297) that blocks me from using null=true for deleting a field through atomic update functionality. Is there any other way to delete a field other than using this syntax? FYI..I wont be able to migrate to latest version now due to company code freeze hence trying to figure out a temporary work around. -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-a-field-Atomic-updates-SOLR-4-1-0-without-using-null-true-tp4093951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Doing time sensitive search in solr
Thanks Eric Ok if we go by that proposal of copying all date fields into on bag_of_dates field Hence now we have a field and it will look something like this. arr name=bag_of_dates str2013-09-01T00:00:00Z/str str2013-12-01T00:00:00Z/str /arr arr name=text strSept content : Honda is releasing the car this month/str strDec content : Toyota is releasing the car this month /str /arr and i also agree now we can make a range query where bag_of_dates:[* TO NOW] AND text:Toyota but still how are we going to make sure the document should not get returned since toyota is only searchable from 1-DEC-2013 i hope i am able to explain it properly ON our website, when we render data we dont show this line Dec content : Toyota is releasing the car this month on the page since todays date is not 1-DEC-2013 yet. hence we dont want this doc to be shown in search result as well when we query solr -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093961.html Sent from the Solr - User mailing list archive at Nabble.com.
Gracefully stopping jetty server - LockObtainFailedException
Hi, I have solr cloud(4.1) setup with embedded jetty server. I use the below command to start and stop the server. start server : nohup java -DSTOP.PORT=8085 -DSTOP.KEY=key -DnumShards=2 -Dbootstrap_confdir=./solr/nlp/conf -Dcollection.configName=myconf -DzkHost=10.88.139.206:2181,10.88.139.206:2182,10.88.139.206:2183 -jar start.jar output.log 21 Stop Server : java -DSTOP.PORT=8085 -DSTOP.KEY=key -jar start.jar --stop What I have observed is once I stop the server and start again, while indexing it gives me 'org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:' with the 'NativeFSLock@solr /nlp/data/index.20130924205253479/write.lock' after I delete lock file manually and start the server, indexing works fine. Please let me know how we can resolve this. If this issue is answered earlier, I would appreciate pointing me to the url, tried finding it but could not. Thanks in Advance, Ashwin
SolrJ best pratices
Are there any links describing best practices for interacting with SolrJ? I've checked the wiki and it seems woefully incomplete: (http://wiki.apache.org/solr/Solrj) Some specific questions: - When working with HttpSolrServer should we keep around instances for ever or should we create a singleton that can/should be used over and over? - Is there a way to change the collection after creating the server or do we need to create a new server for each collection? -..
Re: Adding OR operator in querystring and grouping fields?
Combine the two filter queries with an explicit OR operator. -- Jack Krupansky -Original Message- From: PeterKerk Sent: Monday, October 07, 2013 1:50 PM To: solr-user@lucene.apache.org Subject: Re: Adding OR operator in querystring and grouping fields? Ok thanks. you must combine them into one filter query parameter. , how would I do that? Can I simply change the URL structure or must I change my schema.xml and/or data-config.xml? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093947.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding OR operator in querystring and grouping fields?
fq=here:there OR this:that For the lurker: an AND should be: fq=here:therefq=this:that While you can, technically, pass: fq=here:there AND this:that Solr will cache the separate fq= parameters and reuse them in any context. The AND(ed) filter will be cached as a single entry and only used when the same AND construct is sent. Perhaps useful, not as generally desirable. On Oct 7, 2013, at 2:10 PM, Jack Krupansky j...@basetechnology.com wrote: Combine the two filter queries with an explicit OR operator. -- Jack Krupansky -Original Message- From: PeterKerk Sent: Monday, October 07, 2013 1:50 PM To: solr-user@lucene.apache.org Subject: Re: Adding OR operator in querystring and grouping fields? Ok thanks. you must combine them into one filter query parameter. , how would I do that? Can I simply change the URL structure or must I change my schema.xml and/or data-config.xml? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093947.html Sent from the Solr - User mailing list archive at Nabble.com.
Split shard doesn't persist data correctly on solr.xml
I notice that when a SPLISHARD operation finish, the solr.xml is not update properly. # Parent solr.xml: core numShards=2 name=test_shard1_replica1 instanceDir=test_shard1_replica1 shard=shard1 collection=test/ # Children solr.xml: core name=test_shard1_0_replica1 shardState=construction instanceDir=test_shard1_0_replica1 shard=shard1_0 collection=test property name=shardRange value=8000-bfff/ /core core name=test_shard1_1_replica1 shardState=construction instanceDir=test_shard1_1_replica1 shard=shard1_1 collection=test property name=shardRange value=c000-/ /core # Paren Clusterstate: shard1:{ range:8000-, state:inactive, replicas:{192.168.2.18:8983_solr_test_shard1_replica1:{ state:active, base_url:http://192.168.2.18:8983/solr;, core:test_shard1_replica1, node_name:192.168.2.18:8983_solr, leader:true}}}, # Children Clusterstate: shard1_0:{ range:8000-bfff, state:active, replicas:{192.168.2.18:8983_solr_test_shard1_0_replica1:{ state:active, base_url:http://192.168.2.18:8983/solr;, core:statistics-11_shard1_0_replica1, node_name:192.168.2.18:8983_solr, leader:true}}}, shard1_1:{ range:c000-, state:active, replicas:{192.168.2.18:8983_solr_test_shard1_1_replica1:{ state:active, base_url:http://192.168.2.18:8983/solr;, core:statistics-11_shard1_1_replica1, node_name:192.168.2.18:8983_solr, leader:true}}}, I only notice this because I did a restart and the nodes was show on cloud graph as down. The shards where I did a manual replication were written to the solr.xml file as expected, but not at the time that I executed the CREATE command. command: curl 'http://192.168.2.18:8983/solr/admin/cores?action=CREATEname=test_shard2_0_replicaXcollection=testshard=shard2_0' Create replicaA - solr.xml not write nothing about the replicaA. Create replicaA - solr.xml not write nothing about the replicaB, registered data about the replicaA. Is like I have a lag of 1 operation, this is normal? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Split-shard-doesn-t-persist-data-correctly-on-solr-xml-tp4093996.html Sent from the Solr - User mailing list archive at Nabble.com.
How to achieve distributed spelling check in SolrCloud ?
Hi, We are in the process of transitioning to SolrCloud (4.4) from Master-Slave architecture (4.2) . One of the issues I'm facing now is with making spell check work. It only seems to work if I explicitly set distrib=false. I'm using a custom request handler and included the spell check option. str name=spellcheckon/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str str name=spellcheck.dictionarydefault/str /lst !-- append spellchecking to our list of components -- arr name=last-components strspellcheck/str /arr The spellcheck component has the usual configuration. The spell check is part of the request handler which is being used to executed a distributed query.. I can't possibly add distrib=false. Just wondering if there's a way to address this. Any pointers will be appreciated. -Thanks, Shamik
Re: Adding OR operator in querystring and grouping fields?
@Jason: your example worked perfectly! -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093999.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.5 - CoreAPI issue with CREATE
Hi, I'm doing replicas for my shards manually and the solr.xml config doesn't save the changes (solr.xml attribute persist = true). The command used is: curl 'http://192.168.2.18:8983/solr/admin/cores?action=CREATEname=test_shard1_replica2collection=testshard=shard1' Someone else with the same behaviour? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-5-CoreAPI-issue-with-CREATE-tp4094001.html Sent from the Solr - User mailing list archive at Nabble.com.
Fix sort order within an index ?
Any way to store documents in a fixed sort order within the indexes of certain fields(either the arrival order or sorted by int ids, that also serve as my unique key), so that I could store them optimized for browsing lists of items ? The order for browsing is always fixed there are no further filter queries. Just I need to fetch the top 20 (most recently added) document with field value topic=x1 I came across this article a JIRA issue which encouraged me that something like this may be possible: http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html https://issues.apache.org/jira/browse/LUCENE-4752
Issue with distributed spelling check in Solr 4.4
Hi, We are in the process of transitioning to SolrCloud (4.4) from Master-Slave architecture (4.2) . One of the issues I'm facing now is with making spell check work. It only seems to work if I explicitly set distrib=false. I'm using a custom request handler and included the spell check option. str name=spellcheckon/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str str name=spellcheck.dictionarydefault/str /lst arr name=last-components strspellcheck/str /arr The spellcheck component has the usual configuration. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypespell/str lst name=spellchecker str name=namedefault/str str name=fieldtext/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength4/int float name=maxQueryFrequency0.01/float /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtext/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent The spell check is part of the request handler which is being used to executed a distributed query. I can't possibly add distrib=false. Just wondering if there's a way to address this. Any pointers will be appreciated. -Thanks, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-distributed-spelling-check-in-Solr-4-4-tp4094009.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regarding edismax parsing
You're probably having problem with the distinction between query parsing and analysis which has been discussed many times. The issues is that the query parser breaks things up into individual tokens and _then_ sends them to the analyzer chain as individual tokens (usually). Try escaping your spaces. Best Erick On Mon, Oct 7, 2013 at 8:28 AM, Prashant Golash prashant.gol...@gmail.com wrote: Hi, I have a question regarding to parsing of tokens in edismax parser and subsequently a follow up question related to same. - Each field has list of analyzers and tokenizers as configured in schema.xml (Index and query time). Now, say I search for query - red shoes. So, is it like that for forming Disjunction query on each field, edismax will first apply analyzers configured to that field, and then form the query. For e.g if field1 has changes red to rd and field2 changes red to re, query will be like - (field1:rd) | (field2:re) ? - If above holds true, then when I changed ordering of analyzers and put SynonymFilterFactory at top of all analyzers (in schema.xml), edismax still tokenizes the query first with respect to space and then only apply synonym filter factory, which leads me to think that this is not happening. My use case is like , before applying any tokenizer, I want to support phrase level synonym replacement and do rest of analysis. Thanks, Prashant Golash
Re: no such field error:smaller big block size details while indexing doc files
Well, one of the attributes parsed out of, probably the meta-information associated with one of your structured docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and Solr Cel is faithfully sending that to your index. If you want to throw all these in the bit bucket, try defining a true catch-all field that ignores things, like this. dynamicField name=* type=ignored multiValued=true / Best, Erick On Mon, Oct 7, 2013 at 8:03 AM, sweety sweetyshind...@yahoo.com wrote: Im trying to index .doc,.docx,pdf files, im using this url: curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@complex.doc This is the error I get: Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more Also using same type of url,txt,mp3 and pdf files are indexed successfully. (curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@abc.txt) Schema.xml is: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string
Re: Search for non empty fields in a index with denormalized tables
I don't think your model fits well into Solr. What I'd do is make my uniqueKey the patient ID, and put the image names (or links or whatever) in a multiValued field. Then you can do what you want by a simple q=*:* -image_name:[* TO *] Best, Erick On Mon, Oct 7, 2013 at 9:20 AM, SandroZbinden zbin...@imagic.ch wrote: Okay I try to specify my question a little bit. I have a denormalized index of two sql tables patient and table. If I add a patient with two images to the solr index my index contains 3 documents. --- Pat_ID |Patient_Lastnname | Image_ID | Image_Name --- 1 |Miller | EMPTY| EMPTY 1 |Miller | 1 | dog.jpg 1 |Miller | 2 | cat.jpg --- When I add now another patient without any images the solr index contains 4 documents. --- Pat_ID |Patient_Lastnname | Image_ID | Image_Name --- 1 |Miller | EMPTY| EMPTY 1 |Miller | 1 | dog.jpg 1 |Miller | 1 | cat.jpg 2 |Smith | EMPTY| EMPTY --- Now I want to select all patients that have no image. (image_id is empty) If I query this with the following solr query the result would be Miller and Smith but I need a query that will return Smith only. select?q=-Image_ID:[0 TO *] and then group by pat_id What I would need is something like the having clause in sql. So I could group by pat_id and filter for the one where count is less then 2 Bests Sandro -- View this message in context: http://lucene.472066.n3.nabble.com/Search-for-non-empty-fields-in-a-index-with-denormalized-tables-tp4093287p4093903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Soft commit and flush
bq: If so, using soft commit without calling hard commit could cause OOM no. Aside from anything you have configured for auto(hard) commit, the ramBufferSizeMB in solrconfig.xml will flush the in-memory structures out to the segments when the size reaches this limit. It won't _close_ the current segment, so it won't be permanent, but it'll limit memory consumption. Best, Erick On Mon, Oct 7, 2013 at 9:40 AM, Guido Medina guido.med...@temetra.com wrote: Out of Memory Exception is well known as OOM. Guido. On 07/10/13 14:11, adfel70 wrote: Sorry, by OOE I meant Out of memory exception... -- View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr cpu usage
Tim: Thanks! Mostly I wrote it to have something official looking to hide behind when I didn't have a good answer to the hardware sizing question :). On Mon, Oct 7, 2013 at 2:48 PM, Tim Vaillancourt t...@elementspace.com wrote: Fantastic article! Tim On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote: From my perspective, your question is almost impossible to answer, there are too many variables. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, More CPU cores means more concurrency. This is good if you need to handle high query rates. Faster cores mean lower query latency, assuming you are not bottlenecked by memory or disk IO or network IO. So what is ideal for you depends on your concurrency and latency needs. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote: hi We're building a spec for a machine to purchase. We're going to buy 10 machines. we aren't sure yet how many proccesses we will run per machine. the question is -should we buy faster cpu with less cores or slower cpu with more cores? in any case we will have 2 cpus in each machine. should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores? what will we gain by having many cores? what kinds of usages would make cpu be the bottleneck? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Doing time sensitive search in solr
I'd index them as separate documents. Best, Erick On Mon, Oct 7, 2013 at 2:59 PM, Darniz rnizamud...@edmunds.com wrote: Thanks Eric Ok if we go by that proposal of copying all date fields into on bag_of_dates field Hence now we have a field and it will look something like this. arr name=bag_of_dates str2013-09-01T00:00:00Z/str str2013-12-01T00:00:00Z/str /arr arr name=text strSept content : Honda is releasing the car this month/str strDec content : Toyota is releasing the car this month /str /arr and i also agree now we can make a range query where bag_of_dates:[* TO NOW] AND text:Toyota but still how are we going to make sure the document should not get returned since toyota is only searchable from 1-DEC-2013 i hope i am able to explain it properly ON our website, when we render data we dont show this line Dec content : Toyota is releasing the car this month on the page since todays date is not 1-DEC-2013 yet. hence we dont want this doc to be shown in search result as well when we query solr -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093961.html Sent from the Solr - User mailing list archive at Nabble.com.